r/StableDiffusion – Telegram
Improvements between Qwen Image and Qwen Image 2511 (mostly)

Hi,


I tried a few prompts I had collected for measuring prompt adherence of various models, and ran them again with the latest Qwen Image 2512.

TLDR: there is a measurable increase in image quality and prompt adherence in my opinion.


The images were generated using the recommanded 40 steps, with euler beta, best out of 4 generations.

Prompt #1: the cyberpunk selfie

A hyper-detailed, cinematic close-up selfie shot in a cyberpunk megacity environment, framed as if taken with a futuristic augmented-reality smartphone. The composition is tight on three young adults—two women and one man—posing together at arm’s length, their faces illuminated by the neon chaos of the city. The photo should feel gritty, futuristic, and authentic, with ultra-sharp focus on the faces, intricate skin textures, reflections of neon lights, cybernetic implants, and the faint atmospheric haze of rain-damp air. The background should be blurred with bokeh from glowing neon billboards, holograms, and flickering advertisements in colors like electric blue, magenta, and acid green.

The first girl, on the left, has warm bronze skin with micro-circuit tattoos faintly glowing along her jawline and temples, like embedded circuitry under the skin. Her eyes are hazel, enhanced with subtle digital overlays, tiny lines of data shimmering across her irises when the light catches them. Her hair is thick, black, and streaked with neon blue highlights, shaved at one side to reveal a chrome-plated neural jack. Her lips curve into a wide smile, showing a small gold tooth cap that reflects the neon light. The faint glint of augmented reality lenses sits over her pupils, giving her gaze a futuristic intensity.

The second girl, on the right, has pale porcelain skin with freckles, though some are replaced with delicate clusters of glowing nano-LEDs arranged like constellations across her cheeks. Her face is angular, with sharp cheekbones accentuated by the high-contrast neon lighting. She has emerald-green cybernetic eyes, with a faint circular HUD visible inside, and a subtle lens flare effect in the pupils. Her lips are painted matte black, and a silver septum ring gleams under violet neon light. Her hair is platinum blonde with iridescent streaks, straight and flowing, with strands reflecting holographic advertisements around them. She tilts her head toward the lens with a half-smile that looks playful yet dangerous, her gaze almost predatory.

The man, in the center and slightly behind them, has tan skin with a faint metallic sheen at the edges of his jaw where cybernetic plating meets flesh. His steel-gray eyes glow faintly with artificial enhancement, thin veins of light radiating outward like cracks of electricity. A faint scar cuts across his left eyebrow, but it is partially reinforced with a chrome implant. His lips form a confident smirk, a thin trail of smoke curling upward from the glowing tip of a cyber-cig between his fingers. His hair is short, spiked with streaks of neon purple, slightly wet from the drizzle. He wears a black jacket lined with faintly glowing circuitry that pulses like veins of light across his collar.

The lighting is moody and saturated with neon: electric pinks, blues, and greens paint their faces in dynamic contrasts. Droplets of rain cling to their skin and hair, catching the neon glow like tiny prisms. Reflections of holographic ads shimmer in their eyes. Subtle lens distortion from the selfie framing makes the faces slightly exaggerated at the edges, adding realism.

The mood is rebellious, electric, and hyper-modern, blending candid warmth with the raw edge of a cyberpunk dystopia. Despite the advanced tech, the moment feels intimate: three friends, united in a neon-drenched world of chaos, capturing a fleeting instant of humanity amidst the synthetic glow.
Improvements between Qwen Image and Qwen Image 2511 (mostly)

Hi,


I tried a few prompts I had collected for measuring prompt adherence of various models, and ran them again with the latest Qwen Image 2512.

TLDR: there is a measurable increase in image quality and prompt adherence in my opinion.


The images were generated using the recommanded 40 steps, with euler beta, best out of 4 generations.

Prompt #1: the cyberpunk selfie

*A hyper-detailed, cinematic close-up selfie shot in a cyberpunk megacity environment, framed as if taken with a futuristic augmented-reality smartphone. The composition is tight on three young adults—two women and one man—posing together at arm’s length, their faces illuminated by the neon chaos of the city. The photo should feel gritty, futuristic, and authentic, with ultra-sharp focus on the faces, intricate skin textures, reflections of neon lights, cybernetic implants, and the faint atmospheric haze of rain-damp air. The background should be blurred with bokeh from glowing neon billboards, holograms, and flickering advertisements in colors like electric blue, magenta, and acid green.*

*The first girl, on the left, has warm bronze skin with micro-circuit tattoos faintly glowing along her jawline and temples, like embedded circuitry under the skin. Her eyes are hazel, enhanced with subtle digital overlays, tiny lines of data shimmering across her irises when the light catches them. Her hair is thick, black, and streaked with neon blue highlights, shaved at one side to reveal a chrome-plated neural jack. Her lips curve into a wide smile, showing a small gold tooth cap that reflects the neon light. The faint glint of augmented reality lenses sits over her pupils, giving her gaze a futuristic intensity.*

*The second girl, on the right, has pale porcelain skin with freckles, though some are replaced with delicate clusters of glowing nano-LEDs arranged like constellations across her cheeks. Her face is angular, with sharp cheekbones accentuated by the high-contrast neon lighting. She has emerald-green cybernetic eyes, with a faint circular HUD visible inside, and a subtle lens flare effect in the pupils. Her lips are painted matte black, and a silver septum ring gleams under violet neon light. Her hair is platinum blonde with iridescent streaks, straight and flowing, with strands reflecting holographic advertisements around them. She tilts her head toward the lens with a half-smile that looks playful yet dangerous, her gaze almost predatory.*

*The man, in the center and slightly behind them, has tan skin with a faint metallic sheen at the edges of his jaw where cybernetic plating meets flesh. His steel-gray eyes glow faintly with artificial enhancement, thin veins of light radiating outward like cracks of electricity. A faint scar cuts across his left eyebrow, but it is partially reinforced with a chrome implant. His lips form a confident smirk, a thin trail of smoke curling upward from the glowing tip of a cyber-cig between his fingers. His hair is short, spiked with streaks of neon purple, slightly wet from the drizzle. He wears a black jacket lined with faintly glowing circuitry that pulses like veins of light across his collar.*

*The lighting is moody and saturated with neon: electric pinks, blues, and greens paint their faces in dynamic contrasts. Droplets of rain cling to their skin and hair, catching the neon glow like tiny prisms. Reflections of holographic ads shimmer in their eyes. Subtle lens distortion from the selfie framing makes the faces slightly exaggerated at the edges, adding realism.*

*The mood is rebellious, electric, and hyper-modern, blending candid warmth with the raw edge of a cyberpunk dystopia. Despite the advanced tech, the moment feels intimate: three friends, united in a neon-drenched world of chaos, capturing a fleeting instant of humanity amidst the synthetic glow.*
Original:

https://preview.redd.it/4aecu8809qag1.png?width=1080&format=png&auto=webp&s=51d5b47f7669c3525326d62f20f5d1194aba7429

2512:

https://preview.redd.it/jtknm4k99qag1.png?width=1328&format=png&auto=webp&s=ee011b64288b2fe76809ed2f73471d4f23c3218d

Not only is image quality (and skin) significantly improved, but the model missed less elements from the prompt. Still not perfect, though.



Prompt #2 : the renaissance technosaint

*A grand Renaissance-style oil painting, as if created by a master such as Caravaggio or Raphael, depicting an unexpected modern subject: a hacker wearing a VR headset, portrayed with the solemn majesty of a religious figure. The painting is composed with a dramatic chiaroscuro effect: deep shadows dominate the background while radiant golden light floods the central figure, symbolizing revelation and divine inspiration.*

*The hacker sits at the center of the canvas in three-quarter view, clad in simple dark clothing that contrasts with the rich fabric folds often seen in Renaissance portraits. His hands are placed reverently on an open laptop that resembles an illuminated manunoscript. His head is bowed slightly forward, as if in deep contemplation, but his face is obscured by a sleek black VR headset, which gleams with reflected highlights. Despite its modernity, the headset is rendered with the same meticulous brushwork as a polished chalice or crown in a sacred altarpiece.*

*Around the hacker’s head shines a halo of golden light, painted in radiant concentric circles, recalling the divine aureoles of saints. This halo is not traditional but fractured, with angular shards of digital code glowing faintly within the gold, blending Renaissance piety with cybernetic abstraction. The golden light pours downward, illuminating his hands and casting luminous streaks across his laptop, making the device itself appear like a holy relic.*

*The background is dark and architectural, suggesting the stone arches of a cathedral interior, half-lost in shadow. Columns rise in the gloom, while faint silhouettes of angels or allegorical figures appear in the corners, holding scrolls that morph into glowing data streams. The palette is warm and rich: ochres, umbers, deep carmines, and the brilliant gold of divine illumination. Subtle cracks in the painted surface give it the patina of age, as if this sacred image has hung in a chapel for centuries.*

*The style should be authentically Renaissance: textured oil brushstrokes, balanced composition, dramatic use of light and shadow, naturalistic anatomy. Every detail of fabric, skin, and light is rendered with reverence, as though this hacker is a prophet of the digital age. The VR headset, laptop, and digital motifs are integrated seamlessly into the sacred iconography, creating an intentional tension between the ancient style and the modern subject.*

*The mood is sublime, reverent, and paradoxical: a celebration of knowledge and vision, as if technology itself has become a vessel of divine enlightenment. It should feel both anachronistic and harmonious, a painting that could hang in a Renaissance chapel yet unmistakably belongs to the cyber age.*

Original Qwen:

https://preview.redd.it/n5wkmscgaqag1.png?width=1080&format=png&auto=webp&s=e1ab0bc57441e993adf04c285c2fa8fdacea9ada

2512:

https://preview.redd.it/xfkkzl5zaqag1.png?width=1328&format=png&auto=webp&s=740620026e7b7a9ec7d0e718410c12ec44cef60d

We still can't have a decent Renaissance-style VR headset, but it's clearly improved (even though the improved face makes it less Raphaelite in my layman's opinion).


Prompt #3 : Roger Rabbit Santa

*A hyper-realistic, photographic depiction of a luxurious Parisian penthouse living room at night, captured in sharp detail with cinematic lighting. The space is ultra-modern, sleek, and stylish, with floor-to-ceiling glass windows that stretch the entire wall, overlooking the glittering Paris skyline. The Eiffel Tower glows in the distance, its lights shimmering against the night sky. The interior design is minimalist yet opulent: polished marble floors, a
low-profile Italian leather sofa in charcoal gray, a glass coffee table with chrome legs, and a suspended designer fireplace with a soft orange flame casting warm reflections across the room. Subtle decorative accents—abstract sculptures, high-end books, and a large contemporary rug in muted tones—anchor the aesthetic.*

*Into this elegant, hyperrealistic scene intrudes something utterly fantastical and deliberately out of place: a cartoonish, classic Santa Claus sneaking across the room on tiptoe. He is rendered in a vintage 1940s–1950s cartoon style, with exaggerated rounded proportions, oversized boots, bright red suit, comically bulging belly, fluffy white beard, and a sack of toys slung over his back. His expression is mischievous yet playful, eyes wide and darting as if he’s been caught in the act. His red suit has bold, flat shading and thick black outlines, making him look undeniably drawn rather than photographed.*

*The contrast between the realistic environment and the cartoony Santa is striking: the polished marble reflects the glow of the fireplace realistically, while Santa casts a simple, flat, 2D-style shadow that doesn’t quite match the physical lighting, enhancing the surreal "Who Framed Roger Rabbit" effect. His hotte (sack of toys) bounces with exaggerated squash-and-stretch animation style, defying the stillness of the photorealistic room.*

*Through the towering glass windows behind him, another whimsical element appears: Santa’s sleigh hovering in mid-air, rendered in the same vintage cartoon style as Santa. The sleigh is pulled by reindeer that flap comically oversized hooves, frozen mid-leap in exaggerated poses, with little puffs of animated smoke trailing behind them. The glowing neon of Paris reflects off the glass, mixing realistically with the flat, cel-shaded cartoon outlines of the sleigh, heightening the uncanny blend of real and drawn worlds.*

*The overall mood is playful and surreal, balancing luxury and absurdity. The image should feel like a carefully staged photograph of a high-end penthouse, interrupted by a cartoon character stepping right into reality. The style contrast must be emphasized: photographic realism in the architecture, textures, and city view, versus cartoon simplicity in Santa and his sleigh. This juxtaposition should create a whimsical tension, evoking the exact “Roger Rabbit effect”: two incompatible realities colliding in one frame, yet blending seamlessly into a single narrative moment.*

Original Qwen:

https://preview.redd.it/od510yzldqag1.png?width=1080&format=png&auto=webp&s=4f776b39cd757963f049b19270b34650c481dea2

Qwen 2512:

https://preview.redd.it/npc8th8udqag1.png?width=1328&format=png&auto=webp&s=06af2e529414c7bd942f5aa7a12501886f05fc54

Finally a model that can (sometimes) draw Santa's sled without adding Santa in it. Not perfect, mostly with the sled consistently being drawn inside the room, but that's not the worst to correct. Santa's shadow still isn't cartoony solid.



Prompt #4:

*A dark, cinematic laboratory interior filled with strange machinery and glowing chemical tanks. At the center of the composition stands a large transparent glass cage, reinforced with metallic frames and covered in faint reflections of flickering overhead lights. Inside the cage is a young blonde woman serving as a test subject from a zombification expermient. Her hair is shoulder-length, messy, and illuminated by the eerie light of the environment. She wears a simple, pale hospital-style gown, clinging slightly to her figure in the damp atmosphere. Her face is partly visible but blurred through the haze, showing a mixture of fear and resignation.*

*From nozzles built into the walls of the cage, a dense green gas hisses and pours out, swirling like toxic smoke. The gas quickly fills the enclosure, its luminescent glow obscuring most of the details inside. Only fragments of the woman’s silhouette are visible through the haze: the outline of her raised hands pressed against the glass, the curve of her shoulders, the pale strands of hair floating in the mist. The gas is
so thick it seems to radiate outward, tinting the entire scene in sickly green tones.*

*Outside the cage, in the foreground, stands a mad scientist. He has an eccentric, unkempt appearance: wild, frizzy gray hair sticking in all directions, a long lab coat stained with chemicals, and small round glasses reflecting the glow of the cage. His expression is maniacally focused, a grin half-hidden as he scribbles furiously into a leather-bound notebook. The notebook is filled with incomprehensible diagrams and notes, his pen moving fast as if documenting every second of the experiment. One hand holds the notebook against his hip, while the other moves quickly, writing with obsessive energy.*

*The laboratory itself is cluttered and chaotic: wires snake across the floor, glass beakers bubble with strange liquids, and metallic instruments hum with faint vibrations. The lighting is dramatic, mostly coming from the cage itself and the glowing gas, creating sharp shadows and streaks of green reflected on the scientist’s glasses and lab coat.*

*The atmosphere is oppressive and heavy, like a scene from a gothic science-fiction horror film. The key effect is the visual contrast: the young woman’s fragile form almost lost in the swirling toxic mist, versus the sharp, manic figure of the scientist calmly taking notes as if this cruelty is nothing more than data collection.*

*The overall mood: unsettling, surreal, and cinematic—a blend of realism and nightmarish exaggeration, with the gas obscuring most details, making the viewer struggle to see clearly what happens within the glass cage.*


Original Qwen:

https://preview.redd.it/ggxzu09heqag1.png?width=1080&format=png&auto=webp&s=cb232a684be16adff149150010c573f6f2e8f2a6

https://preview.redd.it/gn53reg7iqag1.png?width=1328&format=png&auto=webp&s=2facaa849dd043d78d18804964793a74b7fe1fff

Again, much better IMHO, though the concept of pouring the gas into the cage still escape the model. A good basis, though (I can see just photobashing a metal tube going from the one at the left and the outlet in the glass cage, erase the green fog outside the cage and run it through an I2I with very low denoise...


Prompt #5 : the VHS slasher film cover.


*A cinematic horror movie poster in 1980s slasher style, set in a dark urban alley lit by a single flickering neon sign. In the forefront, a teenage girl in retro-mirror skates looks, freeze mid-motion, her eyes wide mouth and open in a scream. Her outfit is colorful and vintage: striped knee socks, denim shorts, and a T-shirt with bold 80s print. She is dramatically backlit, casting a long shadow across the wet pavement. Towering behind her is the silhouette of a masked killer, wearing a grimy hockey mask that hides his face completely. He wields a long gleaming samurai sword, raised menacingly, the blade catching the light, impaling the girl. On both side of the girl, the wound gushes with blood. The killer's body language is threatening and powerful, while the girl's posture conveys shock and helplessness. The entire composition feels like a horror movie still: mist curling around the street, neon reflections in puddles, posters peeling from walls brick. The colors are highly saturated in 80s horror style — neon pinks, blood reds, sickly greens. At the bottom of the image, bold block letters spell out a fake horror movie noscript "Horror at Horrorville", though this was a vintage VHS cover.*


Qwen Original:

[This version had no mention of the noscript due to a human error.](https://preview.redd.it/2i1b0ngjjqag1.png?width=1080&format=png&auto=webp&s=40ef6176256692867fa840ab1a4a72d4ac8bc2ec)


Qwen 2512:

https://preview.redd.it/41frfc9vjqag1.png?width=1328&format=png&auto=webp&s=f908f885dc3c6a8b1987cf7b2cf45dad468752c1

The newer model is better at gore. But it still can't do much in that department. I tried to get it to draw a headless, decapitated orc, with its severed neck spewing blood, but it won't.


For reference, here is the best of 16 (it takes approximately the same running time to do 16 images with ZIT than 4 with Qwen
2512) I got with ZIT for the same prompts:

[This is the only one where a cellphone wasn't visible.](https://preview.redd.it/q1yed942oqag1.png?width=1024&format=png&auto=webp&s=08dd3e5923cd898da8ce1fd40fac332bd1c07bf1)

https://preview.redd.it/jmn9gpgxpqag1.png?width=1024&format=png&auto=webp&s=868c685311515ddcb7e26bd8edff534e7dd63a0e

https://preview.redd.it/3y9jb1p4qqag1.png?width=1024&format=png&auto=webp&s=2fc73a922836bb7a5eb580d3b5e1d8970435911a

[Actually this one might beat Qwen 2512](https://preview.redd.it/khvcl2xbqqag1.png?width=1024&format=png&auto=webp&s=876b9006b7f9ece23235075d3fcc2999d74c21a7)

https://preview.redd.it/an7dojomrqag1.png?width=1024&format=png&auto=webp&s=a3d2faef2c8752c0f226ee5e233b134773f26c03

While ZIT Turbo is great for its small size, it is less apt at prompt adherence than Qwen 2512. Maybe we need a large model based on ZIT's architecture.


Qwen 2512 is also the first model that does very complex scenes, either with unusual poses:

*A master samurai performing an acrobatic backflip off a galloping horse, frozen in mid-air at the peak of motion. His body is perfectly balanced and tense, armor plates shifting with the movement, silk cords and fabric trailing behind him. The samurai has his bow fully drawn while upside down, muscles taut, eyes locked with absolute focus on his target.*

*Nearby, a powerful tiger sits calmly yet menacingly on the ground, its massive body coiled with latent strength. Its striped fur is illuminated by dramatic light, eyes sharp and unblinking, watching the airborne warrior with predatory intelligence.*

*The scene takes place in a wild, untamed landscape — tall grass bending under the horse’s charge, dust and leaves suspended in the air, the moment stretched in time. The horse continues forward beneath the samurai, muscles straining, mane flowing, captured mid-stride.*

*The composition emphasizes motion and tension: a dynamic diagonal framing, cinematic depth of field, dramatic lighting with strong contrasts, subtle motion blur on the environment but razor-sharp focus on the samurai and the tiger.*

https://preview.redd.it/rsg287gqtqag1.png?width=1328&format=png&auto=webp&s=b1298fbd7031501a6380167ace6bc1944c44771f

All in all, I'd say there is a significant increase in quality between the August 2025 Qwen model and the December 2025 Qwen model. I hope they keep releasing open source models with this trend of improving quality.

As a reference, for the latest image, here are the GPT and NBP result:

https://preview.redd.it/nkwcu2yquqag1.png?width=1024&format=png&auto=webp&s=17b8a1512f94d388f115ec13e8d90a3c80097beb

https://preview.redd.it/ytpwqi4duqag1.png?width=1024&format=png&auto=webp&s=b165618b2cd583d3907310f1c380f55ae676630d

While closed models are still on top, I think the difference is narrowing (and at some point, it might be too narrow to be noticeable compared to the advantage, notably in ability to train specific concept that the board is very interested in and usually can't be used with online models.

https://redd.it/1q14unh
@rStableDiffusion