Today (or maybe yesterday?) Qwen released it’s Qwen image model so I thought I’d give it a try because that’s what I do now. I try ALL the new AI toys.
What I had heard is that Qwen is really good at following intricate prompts, as well as getting text right which a lot of generative AI models really struggle with.
On my system I can’t run the full model so I had to wait for smart people to quantize it. I wound up getting a 13 GB Q4_M model from city96 at Hugging Face and my system seems comfortable running that, though it is a bit slow. I’m sure in the days to come the brains with find ways to speed things up.
The picture at the top of this post was generated in response to this prompt:
“A dog is running through a field. He is a golden retriever. He carries a yellow tennis ball in his mouth. A long leather lease trails behind him; he has escaped!
Behind him is a small boy, chasing him. The boy has red hair and freckles and he is wearing a baseball cap.
In the far distance is a baseball diamond where a bunch of children play.
The field is grass with patches of dirt. Dandelions are sprinkled about.
The day is sunny and bright. “
I think it did a pretty good job, and I just noticed I misspelled “leash” but it still got everything correct. It does look a little ‘cartoony’ to me, and that wasn’t something I asked for. This is a pure generation with no lora’s or filters or anything and again, I’m sure the community will fill in the blanks.
I tried a few times to get something more photorealistic but had no luck. I did put the text stuff to the test by specifying, in later runs, that the kid’s hat had “Tigers” on it, and it not only put that clearly on the cap but matched it on the shirt, too. On the other hand, now the kid has the leash in his hand.
That’s the way it is with generative art I guess. The model giveth, the model taketh away. Still, I was happy with the results.

So I’d like to see this work faster (and maybe a smaller model would help there) and I’d like to figure out how to make things look more realistic. I tried the usually stuff like specifying the type of camera and dumping “ultra realistic” into the prompt but that didn’t seem to help. On the other hand I’ve only been playing around with the model for like an hour so…we’ll see!
Update: Just for giggles I used basically the same prompt in HiDream on NightCafe and got the image below. It actually did the text better than I expected, though it put it on the shirt and not the hat. And the dog is not loose, nor does it look like a Golden Retriever.
