Testing Qwen’s New Image Models

Today (or maybe yesterday?) Qwen released it’s Qwen image model so I thought I’d give it a try because that’s what I do now. I try ALL the new AI toys.

What I had heard is that Qwen is really good at following intricate prompts, as well as getting text right which a lot of generative AI models really struggle with.

On my system I can’t run the full model so I had to wait for smart people to quantize it. I wound up getting a 13 GB Q4_M model from city96 at Hugging Face and my system seems comfortable running that, though it is a bit slow. I’m sure in the days to come the brains with find ways to speed things up.

The picture at the top of this post was generated in response to this prompt:

“A dog is running through a field. He is a golden retriever. He carries a yellow tennis ball in his mouth. A long leather lease trails behind him; he has escaped!

Behind him is a small boy, chasing him. The boy has red hair and freckles and he is wearing a baseball cap.

In the far distance is a baseball diamond where a bunch of children play.

The field is grass with patches of dirt. Dandelions are sprinkled about.

The day is sunny and bright. “

I think it did a pretty good job, and I just noticed I misspelled “leash” but it still got everything correct. It does look a little ‘cartoony’ to me, and that wasn’t something I asked for. This is a pure generation with no lora’s or filters or anything and again, I’m sure the community will fill in the blanks.

I tried a few times to get something more photorealistic but had no luck. I did put the text stuff to the test by specifying, in later runs, that the kid’s hat had “Tigers” on it, and it not only put that clearly on the cap but matched it on the shirt, too. On the other hand, now the kid has the leash in his hand.

That’s the way it is with generative art I guess. The model giveth, the model taketh away. Still, I was happy with the results.

A boy and his dog

So I’d like to see this work faster (and maybe a smaller model would help there) and I’d like to figure out how to make things look more realistic. I tried the usually stuff like specifying the type of camera and dumping “ultra realistic” into the prompt but that didn’t seem to help. On the other hand I’ve only been playing around with the model for like an hour so…we’ll see!

Update: Just for giggles I used basically the same prompt in HiDream on NightCafe and got the image below. It actually did the text better than I expected, though it put it on the shirt and not the hat. And the dog is not loose, nor does it look like a Golden Retriever.

Another kid and another dog

2 thoughts on “Testing Qwen’s New Image Models

  1. Hmm. The accuracy of the top image is very impressive. It gets pretty much everything exactly right. Haven’t seen too many stick as closely to the prompt as that.

    On the con side, though, as an image it’s really artificial. The dog, the dandelions, the grass and the ground are all close to photo-realistic but the kid chasing is straight out of an animated TV show and so are all the other kids in the background. That gives the whole thing an uncofortable uncanny valley feel. The second image is all kinds of wrong, from there being far to many kids on the baseball diamond to the chasing kid apparently levitating to the dog having only three legs!

    The third image, though, is really nice. That’s a very good Norman Rockwell pastiche that could be used in all sorts of contexts. It’s super 1950s, too. About the only thing that’s off is that it’s given you daisies instead of dandelions but they look great so I’d take it. I’d also say that is clearly a golden retriever in the aesthetic context of the style of illustration.

    1. PartPurple aka Mrs Nimgimli noticed that all the kids in the background image have red hair, too.

      But yeah I’ve since learned that it really pays attention to style prompts, so maybe if I’d added “photorealistic” it might have helped. I’m told that just adding the word ‘anime’ to the prompt will completely change the style, for example.

      I can’t really run comfyui and do anything else so I’ll have to wait until tonight to try more prompt tweaks.

      That HiDream model I used on the last one isn’t even a “Pro” model on Night Cafe, as far as I recall. That site is such a bargain. I’ve built up 290 credits just by logging in and making something, and once in a while putting a creation on Instagram for a couple of extra credits. Thanks for the tip on that one!

Comments are closed.