so thick it seems to radiate outward, tinting the entire scene in sickly green tones.*
*Outside the cage, in the foreground, stands a mad scientist. He has an eccentric, unkempt appearance: wild, frizzy gray hair sticking in all directions, a long lab coat stained with chemicals, and small round glasses reflecting the glow of the cage. His expression is maniacally focused, a grin half-hidden as he scribbles furiously into a leather-bound notebook. The notebook is filled with incomprehensible diagrams and notes, his pen moving fast as if documenting every second of the experiment. One hand holds the notebook against his hip, while the other moves quickly, writing with obsessive energy.*
*The laboratory itself is cluttered and chaotic: wires snake across the floor, glass beakers bubble with strange liquids, and metallic instruments hum with faint vibrations. The lighting is dramatic, mostly coming from the cage itself and the glowing gas, creating sharp shadows and streaks of green reflected on the scientist’s glasses and lab coat.*
*The atmosphere is oppressive and heavy, like a scene from a gothic science-fiction horror film. The key effect is the visual contrast: the young woman’s fragile form almost lost in the swirling toxic mist, versus the sharp, manic figure of the scientist calmly taking notes as if this cruelty is nothing more than data collection.*
*The overall mood: unsettling, surreal, and cinematic—a blend of realism and nightmarish exaggeration, with the gas obscuring most details, making the viewer struggle to see clearly what happens within the glass cage.*
Original Qwen:
https://preview.redd.it/ggxzu09heqag1.png?width=1080&format=png&auto=webp&s=cb232a684be16adff149150010c573f6f2e8f2a6
https://preview.redd.it/gn53reg7iqag1.png?width=1328&format=png&auto=webp&s=2facaa849dd043d78d18804964793a74b7fe1fff
Again, much better IMHO, though the concept of pouring the gas into the cage still escape the model. A good basis, though (I can see just photobashing a metal tube going from the one at the left and the outlet in the glass cage, erase the green fog outside the cage and run it through an I2I with very low denoise...
Prompt #5 : the VHS slasher film cover.
*A cinematic horror movie poster in 1980s slasher style, set in a dark urban alley lit by a single flickering neon sign. In the forefront, a teenage girl in retro-mirror skates looks, freeze mid-motion, her eyes wide mouth and open in a scream. Her outfit is colorful and vintage: striped knee socks, denim shorts, and a T-shirt with bold 80s print. She is dramatically backlit, casting a long shadow across the wet pavement. Towering behind her is the silhouette of a masked killer, wearing a grimy hockey mask that hides his face completely. He wields a long gleaming samurai sword, raised menacingly, the blade catching the light, impaling the girl. On both side of the girl, the wound gushes with blood. The killer's body language is threatening and powerful, while the girl's posture conveys shock and helplessness. The entire composition feels like a horror movie still: mist curling around the street, neon reflections in puddles, posters peeling from walls brick. The colors are highly saturated in 80s horror style — neon pinks, blood reds, sickly greens. At the bottom of the image, bold block letters spell out a fake horror movie noscript "Horror at Horrorville", though this was a vintage VHS cover.*
Qwen Original:
[This version had no mention of the noscript due to a human error.](https://preview.redd.it/2i1b0ngjjqag1.png?width=1080&format=png&auto=webp&s=40ef6176256692867fa840ab1a4a72d4ac8bc2ec)
Qwen 2512:
https://preview.redd.it/41frfc9vjqag1.png?width=1328&format=png&auto=webp&s=f908f885dc3c6a8b1987cf7b2cf45dad468752c1
The newer model is better at gore. But it still can't do much in that department. I tried to get it to draw a headless, decapitated orc, with its severed neck spewing blood, but it won't.
For reference, here is the best of 16 (it takes approximately the same running time to do 16 images with ZIT than 4 with Qwen
*Outside the cage, in the foreground, stands a mad scientist. He has an eccentric, unkempt appearance: wild, frizzy gray hair sticking in all directions, a long lab coat stained with chemicals, and small round glasses reflecting the glow of the cage. His expression is maniacally focused, a grin half-hidden as he scribbles furiously into a leather-bound notebook. The notebook is filled with incomprehensible diagrams and notes, his pen moving fast as if documenting every second of the experiment. One hand holds the notebook against his hip, while the other moves quickly, writing with obsessive energy.*
*The laboratory itself is cluttered and chaotic: wires snake across the floor, glass beakers bubble with strange liquids, and metallic instruments hum with faint vibrations. The lighting is dramatic, mostly coming from the cage itself and the glowing gas, creating sharp shadows and streaks of green reflected on the scientist’s glasses and lab coat.*
*The atmosphere is oppressive and heavy, like a scene from a gothic science-fiction horror film. The key effect is the visual contrast: the young woman’s fragile form almost lost in the swirling toxic mist, versus the sharp, manic figure of the scientist calmly taking notes as if this cruelty is nothing more than data collection.*
*The overall mood: unsettling, surreal, and cinematic—a blend of realism and nightmarish exaggeration, with the gas obscuring most details, making the viewer struggle to see clearly what happens within the glass cage.*
Original Qwen:
https://preview.redd.it/ggxzu09heqag1.png?width=1080&format=png&auto=webp&s=cb232a684be16adff149150010c573f6f2e8f2a6
https://preview.redd.it/gn53reg7iqag1.png?width=1328&format=png&auto=webp&s=2facaa849dd043d78d18804964793a74b7fe1fff
Again, much better IMHO, though the concept of pouring the gas into the cage still escape the model. A good basis, though (I can see just photobashing a metal tube going from the one at the left and the outlet in the glass cage, erase the green fog outside the cage and run it through an I2I with very low denoise...
Prompt #5 : the VHS slasher film cover.
*A cinematic horror movie poster in 1980s slasher style, set in a dark urban alley lit by a single flickering neon sign. In the forefront, a teenage girl in retro-mirror skates looks, freeze mid-motion, her eyes wide mouth and open in a scream. Her outfit is colorful and vintage: striped knee socks, denim shorts, and a T-shirt with bold 80s print. She is dramatically backlit, casting a long shadow across the wet pavement. Towering behind her is the silhouette of a masked killer, wearing a grimy hockey mask that hides his face completely. He wields a long gleaming samurai sword, raised menacingly, the blade catching the light, impaling the girl. On both side of the girl, the wound gushes with blood. The killer's body language is threatening and powerful, while the girl's posture conveys shock and helplessness. The entire composition feels like a horror movie still: mist curling around the street, neon reflections in puddles, posters peeling from walls brick. The colors are highly saturated in 80s horror style — neon pinks, blood reds, sickly greens. At the bottom of the image, bold block letters spell out a fake horror movie noscript "Horror at Horrorville", though this was a vintage VHS cover.*
Qwen Original:
[This version had no mention of the noscript due to a human error.](https://preview.redd.it/2i1b0ngjjqag1.png?width=1080&format=png&auto=webp&s=40ef6176256692867fa840ab1a4a72d4ac8bc2ec)
Qwen 2512:
https://preview.redd.it/41frfc9vjqag1.png?width=1328&format=png&auto=webp&s=f908f885dc3c6a8b1987cf7b2cf45dad468752c1
The newer model is better at gore. But it still can't do much in that department. I tried to get it to draw a headless, decapitated orc, with its severed neck spewing blood, but it won't.
For reference, here is the best of 16 (it takes approximately the same running time to do 16 images with ZIT than 4 with Qwen
2512) I got with ZIT for the same prompts:
[This is the only one where a cellphone wasn't visible.](https://preview.redd.it/q1yed942oqag1.png?width=1024&format=png&auto=webp&s=08dd3e5923cd898da8ce1fd40fac332bd1c07bf1)
https://preview.redd.it/jmn9gpgxpqag1.png?width=1024&format=png&auto=webp&s=868c685311515ddcb7e26bd8edff534e7dd63a0e
https://preview.redd.it/3y9jb1p4qqag1.png?width=1024&format=png&auto=webp&s=2fc73a922836bb7a5eb580d3b5e1d8970435911a
[Actually this one might beat Qwen 2512](https://preview.redd.it/khvcl2xbqqag1.png?width=1024&format=png&auto=webp&s=876b9006b7f9ece23235075d3fcc2999d74c21a7)
https://preview.redd.it/an7dojomrqag1.png?width=1024&format=png&auto=webp&s=a3d2faef2c8752c0f226ee5e233b134773f26c03
While ZIT Turbo is great for its small size, it is less apt at prompt adherence than Qwen 2512. Maybe we need a large model based on ZIT's architecture.
Qwen 2512 is also the first model that does very complex scenes, either with unusual poses:
*A master samurai performing an acrobatic backflip off a galloping horse, frozen in mid-air at the peak of motion. His body is perfectly balanced and tense, armor plates shifting with the movement, silk cords and fabric trailing behind him. The samurai has his bow fully drawn while upside down, muscles taut, eyes locked with absolute focus on his target.*
*Nearby, a powerful tiger sits calmly yet menacingly on the ground, its massive body coiled with latent strength. Its striped fur is illuminated by dramatic light, eyes sharp and unblinking, watching the airborne warrior with predatory intelligence.*
*The scene takes place in a wild, untamed landscape — tall grass bending under the horse’s charge, dust and leaves suspended in the air, the moment stretched in time. The horse continues forward beneath the samurai, muscles straining, mane flowing, captured mid-stride.*
*The composition emphasizes motion and tension: a dynamic diagonal framing, cinematic depth of field, dramatic lighting with strong contrasts, subtle motion blur on the environment but razor-sharp focus on the samurai and the tiger.*
https://preview.redd.it/rsg287gqtqag1.png?width=1328&format=png&auto=webp&s=b1298fbd7031501a6380167ace6bc1944c44771f
All in all, I'd say there is a significant increase in quality between the August 2025 Qwen model and the December 2025 Qwen model. I hope they keep releasing open source models with this trend of improving quality.
As a reference, for the latest image, here are the GPT and NBP result:
https://preview.redd.it/nkwcu2yquqag1.png?width=1024&format=png&auto=webp&s=17b8a1512f94d388f115ec13e8d90a3c80097beb
https://preview.redd.it/ytpwqi4duqag1.png?width=1024&format=png&auto=webp&s=b165618b2cd583d3907310f1c380f55ae676630d
While closed models are still on top, I think the difference is narrowing (and at some point, it might be too narrow to be noticeable compared to the advantage, notably in ability to train specific concept that the board is very interested in and usually can't be used with online models.
https://redd.it/1q14unh
@rStableDiffusion
[This is the only one where a cellphone wasn't visible.](https://preview.redd.it/q1yed942oqag1.png?width=1024&format=png&auto=webp&s=08dd3e5923cd898da8ce1fd40fac332bd1c07bf1)
https://preview.redd.it/jmn9gpgxpqag1.png?width=1024&format=png&auto=webp&s=868c685311515ddcb7e26bd8edff534e7dd63a0e
https://preview.redd.it/3y9jb1p4qqag1.png?width=1024&format=png&auto=webp&s=2fc73a922836bb7a5eb580d3b5e1d8970435911a
[Actually this one might beat Qwen 2512](https://preview.redd.it/khvcl2xbqqag1.png?width=1024&format=png&auto=webp&s=876b9006b7f9ece23235075d3fcc2999d74c21a7)
https://preview.redd.it/an7dojomrqag1.png?width=1024&format=png&auto=webp&s=a3d2faef2c8752c0f226ee5e233b134773f26c03
While ZIT Turbo is great for its small size, it is less apt at prompt adherence than Qwen 2512. Maybe we need a large model based on ZIT's architecture.
Qwen 2512 is also the first model that does very complex scenes, either with unusual poses:
*A master samurai performing an acrobatic backflip off a galloping horse, frozen in mid-air at the peak of motion. His body is perfectly balanced and tense, armor plates shifting with the movement, silk cords and fabric trailing behind him. The samurai has his bow fully drawn while upside down, muscles taut, eyes locked with absolute focus on his target.*
*Nearby, a powerful tiger sits calmly yet menacingly on the ground, its massive body coiled with latent strength. Its striped fur is illuminated by dramatic light, eyes sharp and unblinking, watching the airborne warrior with predatory intelligence.*
*The scene takes place in a wild, untamed landscape — tall grass bending under the horse’s charge, dust and leaves suspended in the air, the moment stretched in time. The horse continues forward beneath the samurai, muscles straining, mane flowing, captured mid-stride.*
*The composition emphasizes motion and tension: a dynamic diagonal framing, cinematic depth of field, dramatic lighting with strong contrasts, subtle motion blur on the environment but razor-sharp focus on the samurai and the tiger.*
https://preview.redd.it/rsg287gqtqag1.png?width=1328&format=png&auto=webp&s=b1298fbd7031501a6380167ace6bc1944c44771f
All in all, I'd say there is a significant increase in quality between the August 2025 Qwen model and the December 2025 Qwen model. I hope they keep releasing open source models with this trend of improving quality.
As a reference, for the latest image, here are the GPT and NBP result:
https://preview.redd.it/nkwcu2yquqag1.png?width=1024&format=png&auto=webp&s=17b8a1512f94d388f115ec13e8d90a3c80097beb
https://preview.redd.it/ytpwqi4duqag1.png?width=1024&format=png&auto=webp&s=b165618b2cd583d3907310f1c380f55ae676630d
While closed models are still on top, I think the difference is narrowing (and at some point, it might be too narrow to be noticeable compared to the advantage, notably in ability to train specific concept that the board is very interested in and usually can't be used with online models.
https://redd.it/1q14unh
@rStableDiffusion