r/StableDiffusion – Telegram
2512) I got with ZIT for the same prompts:

[This is the only one where a cellphone wasn't visible.](https://preview.redd.it/q1yed942oqag1.png?width=1024&format=png&auto=webp&s=08dd3e5923cd898da8ce1fd40fac332bd1c07bf1)

https://preview.redd.it/jmn9gpgxpqag1.png?width=1024&format=png&auto=webp&s=868c685311515ddcb7e26bd8edff534e7dd63a0e

https://preview.redd.it/3y9jb1p4qqag1.png?width=1024&format=png&auto=webp&s=2fc73a922836bb7a5eb580d3b5e1d8970435911a

[Actually this one might beat Qwen 2512](https://preview.redd.it/khvcl2xbqqag1.png?width=1024&format=png&auto=webp&s=876b9006b7f9ece23235075d3fcc2999d74c21a7)

https://preview.redd.it/an7dojomrqag1.png?width=1024&format=png&auto=webp&s=a3d2faef2c8752c0f226ee5e233b134773f26c03

While ZIT Turbo is great for its small size, it is less apt at prompt adherence than Qwen 2512. Maybe we need a large model based on ZIT's architecture.


Qwen 2512 is also the first model that does very complex scenes, either with unusual poses:

*A master samurai performing an acrobatic backflip off a galloping horse, frozen in mid-air at the peak of motion. His body is perfectly balanced and tense, armor plates shifting with the movement, silk cords and fabric trailing behind him. The samurai has his bow fully drawn while upside down, muscles taut, eyes locked with absolute focus on his target.*

*Nearby, a powerful tiger sits calmly yet menacingly on the ground, its massive body coiled with latent strength. Its striped fur is illuminated by dramatic light, eyes sharp and unblinking, watching the airborne warrior with predatory intelligence.*

*The scene takes place in a wild, untamed landscape — tall grass bending under the horse’s charge, dust and leaves suspended in the air, the moment stretched in time. The horse continues forward beneath the samurai, muscles straining, mane flowing, captured mid-stride.*

*The composition emphasizes motion and tension: a dynamic diagonal framing, cinematic depth of field, dramatic lighting with strong contrasts, subtle motion blur on the environment but razor-sharp focus on the samurai and the tiger.*

https://preview.redd.it/rsg287gqtqag1.png?width=1328&format=png&auto=webp&s=b1298fbd7031501a6380167ace6bc1944c44771f

All in all, I'd say there is a significant increase in quality between the August 2025 Qwen model and the December 2025 Qwen model. I hope they keep releasing open source models with this trend of improving quality.

As a reference, for the latest image, here are the GPT and NBP result:

https://preview.redd.it/nkwcu2yquqag1.png?width=1024&format=png&auto=webp&s=17b8a1512f94d388f115ec13e8d90a3c80097beb

https://preview.redd.it/ytpwqi4duqag1.png?width=1024&format=png&auto=webp&s=b165618b2cd583d3907310f1c380f55ae676630d

While closed models are still on top, I think the difference is narrowing (and at some point, it might be too narrow to be noticeable compared to the advantage, notably in ability to train specific concept that the board is very interested in and usually can't be used with online models.

https://redd.it/1q14unh
@rStableDiffusion