Shoutout to China, they've given good competition and good local models!
From Qwen to now z-image,
Models like flux, and flux 2.0 being distilled and lobotomized and harder to finetune are not worth it anymore.
Currently been working on a finetune for Qwen on 200 Images which I'll release soon.
https://redd.it/1p7vnxo
@rStableDiffusion
From Qwen to now z-image,
Models like flux, and flux 2.0 being distilled and lobotomized and harder to finetune are not worth it anymore.
Currently been working on a finetune for Qwen on 200 Images which I'll release soon.
https://redd.it/1p7vnxo
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
The best thing about Z-Image isn't the image quality, its small size or N.S.F.W capability. It's that they will also release the non-distilled foundation model to the community.
## ✨ Z-Image
Z-Image is a powerful and highly efficient image generation model with 6B parameters. It is currently has three variants:
* 🚀 Z-Image-Turbo – A distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations). It offers ⚡️sub-second inference latency⚡️ on enterprise-grade H800 GPUs and fits comfortably within 16G VRAM consumer devices. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
* **🧱 Z-Image-Base – The non-distilled foundation model. By releasing this checkpoint, we aim to unlock the full potential for community-driven fine-tuning and custom development.**
* ✍️ Z-Image-Edit – A variant fine-tuned on Z-Image specifically for image editing tasks. It supports creative image-to-image generation with impressive instruction-following capabilities, allowing for precise edits based on natural language prompts.
**Source:** https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo/
https://redd.it/1p7ykw8
@rStableDiffusion
## ✨ Z-Image
Z-Image is a powerful and highly efficient image generation model with 6B parameters. It is currently has three variants:
* 🚀 Z-Image-Turbo – A distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations). It offers ⚡️sub-second inference latency⚡️ on enterprise-grade H800 GPUs and fits comfortably within 16G VRAM consumer devices. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
* **🧱 Z-Image-Base – The non-distilled foundation model. By releasing this checkpoint, we aim to unlock the full potential for community-driven fine-tuning and custom development.**
* ✍️ Z-Image-Edit – A variant fine-tuned on Z-Image specifically for image editing tasks. It supports creative image-to-image generation with impressive instruction-following capabilities, allowing for precise edits based on natural language prompts.
**Source:** https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo/
https://redd.it/1p7ykw8
@rStableDiffusion
www.modelscope.cn
造相-Z-Image-Turbo
ModelScope——汇聚各领域先进的机器学习模型,提供模型探索体验、推理、训练、部署和应用的一站式服务。在这里,共建模型开源社区,发现、学习、定制和分享心仪的模型。
Guys, Z-Image Can Generate COMICS with Multi-panels!!
Holy cow, I am blown away. Seriously, this model is what Stable Diffusion 3.5 should have been. It can generate a variety of images, including comics! I think if the model is further fine-tunes on comics, it would handle them pretty well. We are almost there! Soon, we can make our own manga!
I have an RTX3090, and I generate in 1920x1200. It takes 23 second to generate, which is insane!
Here is the prompt used for these examples (written by Kimi2-thinking):
A dynamic manga page layout featuring a cyberpunk action sequence, drawn in a gritty seinen style. The page uses stark black and white ink with heavy cross-hatching, Ben-Day dot screentones, and kinetic speed lines.
\*Panel 1 (Top, wide establishing shot):** A bustling neon-drenched alleyway in a dystopian metropolis. Towering holographic kanji signs flicker above, casting electric blue and magenta light on wet pavement. The perspective is from a high angle, looking down at the narrow street crowded with food stalls and faceless pedestrians. In the foreground, a mysterious figure in a long coat pushes through the crowd. Heavy rainfall is indicated with fast vertical motion lines and white-on-black sound effects: "ZAAAAAA" across the panel.
**Panel 2 (Below Panel 1, left side, medium close-up):** The figure turns, revealing a young woman with sharp eyes and a cybernetic eye gleaming with data streams. Her face is half-shadowed, jaw clenched. The panel border is irregular and jagged, suggesting tension. Detailed hatching defines her cheekbones, and concentrated screentones create deep shadows. Speed lines radiate from her head. A small speech bubble: "Found you."
**Panel 3 (Below Panel 1, right side, horizontal):** A gloved hand clenches into a fist, hydraulic servos in the knuckles activating with "SH-CHNK" sound effects. The cyborg arm is exposed, showing chrome plating and pulsing fiber-optic cables. Extreme close-up with dramatic foreshortening, deep black shadows, and white highlights catching on metal grooves. Thin panel frame.
**Panel 4 (Center, large vertical panel):** The woman explodes into action, launching from a crouch. Dynamic low-angle perspective (worm's eye view) captures her mid-leap, coat billowing, one leg extended for a flying kick. Her mechanical arm is pulled back, crackling with electricity rendered as bold, jagged white lines. Background dissolves into pure speed lines and speed blurs. The panel borders are slanted diagonally for energy.
**Panel 5 (Bottom left, inset):** Impact frame—her boot connects with a chrome helmet. The enemy's head snaps back, shards of metal flying. Drawn with extreme speed lines radiating from the impact point, negative space reversed (white background with black speed lines). "GA-KOOM!" sound effect in bold, cracked letters dominates the panel.
**Panel 6 (Bottom right, final panel):** The woman lands in a three-point stance on the rain-slicked ground, steam rising from her overheating arm. Low angle shot, her face is tilted up with a fierce smirk. Background shows fallen assailants blurred. Heavy blacks in the shadows, screentones on her coat, and a single white highlight on her cybernetic eye. Panel border is clean and solid, providing a sense of finality.
https://preview.redd.it/3cyjd350vs3g1.png?width=1200&format=png&auto=webp&s=28abcf04cad59c018d325c16d9118fcf90490f0f
The prompt for the second page:
**PAGE 2**
**Panel 1 (Top, wide shot):** The cyborg woman rises to her full height, rainwater streaming down her coat. Steam continues to vent from her arm's exhaust ports with thin, wispy lines. She cracks her neck, head tilted slightly. The perspective is eye-level, showing the alley stretching behind her with three downed assailants lying in twisted heaps. Heavy cross-hatching in the shadows under the neon signs. Sound effect: "GISHI..." (creak). Her speech bubble, small and cold: "...That's all?"
**Panel 2 (Inset, overlapping Panel 1, bottom right):** A tight close-up of her cybernetic
Holy cow, I am blown away. Seriously, this model is what Stable Diffusion 3.5 should have been. It can generate a variety of images, including comics! I think if the model is further fine-tunes on comics, it would handle them pretty well. We are almost there! Soon, we can make our own manga!
I have an RTX3090, and I generate in 1920x1200. It takes 23 second to generate, which is insane!
Here is the prompt used for these examples (written by Kimi2-thinking):
A dynamic manga page layout featuring a cyberpunk action sequence, drawn in a gritty seinen style. The page uses stark black and white ink with heavy cross-hatching, Ben-Day dot screentones, and kinetic speed lines.
\*Panel 1 (Top, wide establishing shot):** A bustling neon-drenched alleyway in a dystopian metropolis. Towering holographic kanji signs flicker above, casting electric blue and magenta light on wet pavement. The perspective is from a high angle, looking down at the narrow street crowded with food stalls and faceless pedestrians. In the foreground, a mysterious figure in a long coat pushes through the crowd. Heavy rainfall is indicated with fast vertical motion lines and white-on-black sound effects: "ZAAAAAA" across the panel.
**Panel 2 (Below Panel 1, left side, medium close-up):** The figure turns, revealing a young woman with sharp eyes and a cybernetic eye gleaming with data streams. Her face is half-shadowed, jaw clenched. The panel border is irregular and jagged, suggesting tension. Detailed hatching defines her cheekbones, and concentrated screentones create deep shadows. Speed lines radiate from her head. A small speech bubble: "Found you."
**Panel 3 (Below Panel 1, right side, horizontal):** A gloved hand clenches into a fist, hydraulic servos in the knuckles activating with "SH-CHNK" sound effects. The cyborg arm is exposed, showing chrome plating and pulsing fiber-optic cables. Extreme close-up with dramatic foreshortening, deep black shadows, and white highlights catching on metal grooves. Thin panel frame.
**Panel 4 (Center, large vertical panel):** The woman explodes into action, launching from a crouch. Dynamic low-angle perspective (worm's eye view) captures her mid-leap, coat billowing, one leg extended for a flying kick. Her mechanical arm is pulled back, crackling with electricity rendered as bold, jagged white lines. Background dissolves into pure speed lines and speed blurs. The panel borders are slanted diagonally for energy.
**Panel 5 (Bottom left, inset):** Impact frame—her boot connects with a chrome helmet. The enemy's head snaps back, shards of metal flying. Drawn with extreme speed lines radiating from the impact point, negative space reversed (white background with black speed lines). "GA-KOOM!" sound effect in bold, cracked letters dominates the panel.
**Panel 6 (Bottom right, final panel):** The woman lands in a three-point stance on the rain-slicked ground, steam rising from her overheating arm. Low angle shot, her face is tilted up with a fierce smirk. Background shows fallen assailants blurred. Heavy blacks in the shadows, screentones on her coat, and a single white highlight on her cybernetic eye. Panel border is clean and solid, providing a sense of finality.
https://preview.redd.it/3cyjd350vs3g1.png?width=1200&format=png&auto=webp&s=28abcf04cad59c018d325c16d9118fcf90490f0f
The prompt for the second page:
**PAGE 2**
**Panel 1 (Top, wide shot):** The cyborg woman rises to her full height, rainwater streaming down her coat. Steam continues to vent from her arm's exhaust ports with thin, wispy lines. She cracks her neck, head tilted slightly. The perspective is eye-level, showing the alley stretching behind her with three downed assailants lying in twisted heaps. Heavy cross-hatching in the shadows under the neon signs. Sound effect: "GISHI..." (creak). Her speech bubble, small and cold: "...That's all?"
**Panel 2 (Inset, overlapping Panel 1, bottom right):** A tight close-up of her cybernetic
Guys, Z-Image Can Generate COMICS with Multi-panels!!
Holy cow, I am blown away. Seriously, this model is what Stable Diffusion 3.5 should have been. It can generate a variety of images, including comics! I think if the model is further fine-tunes on comics, it would handle them pretty well. We are almost there! Soon, we can make our own manga!
**I have an RTX3090, and I generate in 1920x1200. It takes 23 second to generate, which is insane!**
Here is the prompt used for these examples (written by Kimi2-thinking):
*A dynamic manga page layout featuring a cyberpunk action sequence, drawn in a gritty seinen style. The page uses stark black and white ink with heavy cross-hatching, Ben-Day dot screentones, and kinetic speed lines.*
*\*\*Panel 1 (Top, wide establishing shot):\*\* A bustling neon-drenched alleyway in a dystopian metropolis. Towering holographic kanji signs flicker above, casting electric blue and magenta light on wet pavement. The perspective is from a high angle, looking down at the narrow street crowded with food stalls and faceless pedestrians. In the foreground, a mysterious figure in a long coat pushes through the crowd. Heavy rainfall is indicated with fast vertical motion lines and white-on-black sound effects: "ZAAAAAA" across the panel.*
*\*\*Panel 2 (Below Panel 1, left side, medium close-up):\*\* The figure turns, revealing a young woman with sharp eyes and a cybernetic eye gleaming with data streams. Her face is half-shadowed, jaw clenched. The panel border is irregular and jagged, suggesting tension. Detailed hatching defines her cheekbones, and concentrated screentones create deep shadows. Speed lines radiate from her head. A small speech bubble: "Found you."*
*\*\*Panel 3 (Below Panel 1, right side, horizontal):\*\* A gloved hand clenches into a fist, hydraulic servos in the knuckles activating with "SH-CHNK" sound effects. The cyborg arm is exposed, showing chrome plating and pulsing fiber-optic cables. Extreme close-up with dramatic foreshortening, deep black shadows, and white highlights catching on metal grooves. Thin panel frame.*
*\*\*Panel 4 (Center, large vertical panel):\*\* The woman explodes into action, launching from a crouch. Dynamic low-angle perspective (worm's eye view) captures her mid-leap, coat billowing, one leg extended for a flying kick. Her mechanical arm is pulled back, crackling with electricity rendered as bold, jagged white lines. Background dissolves into pure speed lines and speed blurs. The panel borders are slanted diagonally for energy.*
*\*\*Panel 5 (Bottom left, inset):\*\* Impact frame—her boot connects with a chrome helmet. The enemy's head snaps back, shards of metal flying. Drawn with extreme speed lines radiating from the impact point, negative space reversed (white background with black speed lines). "GA-KOOM!" sound effect in bold, cracked letters dominates the panel.*
*\*\*Panel 6 (Bottom right, final panel):\*\* The woman lands in a three-point stance on the rain-slicked ground, steam rising from her overheating arm. Low angle shot, her face is tilted up with a fierce smirk. Background shows fallen assailants blurred. Heavy blacks in the shadows, screentones on her coat, and a single white highlight on her cybernetic eye. Panel border is clean and solid, providing a sense of finality.*
https://preview.redd.it/3cyjd350vs3g1.png?width=1200&format=png&auto=webp&s=28abcf04cad59c018d325c16d9118fcf90490f0f
The prompt for the second page:
*\*\*PAGE 2\*\**
*\*\*Panel 1 (Top, wide shot):\*\* The cyborg woman rises to her full height, rainwater streaming down her coat. Steam continues to vent from her arm's exhaust ports with thin, wispy lines. She cracks her neck, head tilted slightly. The perspective is eye-level, showing the alley stretching behind her with three downed assailants lying in twisted heaps. Heavy cross-hatching in the shadows under the neon signs. Sound effect: "GISHI..." (creak). Her speech bubble, small and cold: "...That's all?"*
*\*\*Panel 2 (Inset, overlapping Panel 1, bottom right):\*\* A tight close-up of her cybernetic
Holy cow, I am blown away. Seriously, this model is what Stable Diffusion 3.5 should have been. It can generate a variety of images, including comics! I think if the model is further fine-tunes on comics, it would handle them pretty well. We are almost there! Soon, we can make our own manga!
**I have an RTX3090, and I generate in 1920x1200. It takes 23 second to generate, which is insane!**
Here is the prompt used for these examples (written by Kimi2-thinking):
*A dynamic manga page layout featuring a cyberpunk action sequence, drawn in a gritty seinen style. The page uses stark black and white ink with heavy cross-hatching, Ben-Day dot screentones, and kinetic speed lines.*
*\*\*Panel 1 (Top, wide establishing shot):\*\* A bustling neon-drenched alleyway in a dystopian metropolis. Towering holographic kanji signs flicker above, casting electric blue and magenta light on wet pavement. The perspective is from a high angle, looking down at the narrow street crowded with food stalls and faceless pedestrians. In the foreground, a mysterious figure in a long coat pushes through the crowd. Heavy rainfall is indicated with fast vertical motion lines and white-on-black sound effects: "ZAAAAAA" across the panel.*
*\*\*Panel 2 (Below Panel 1, left side, medium close-up):\*\* The figure turns, revealing a young woman with sharp eyes and a cybernetic eye gleaming with data streams. Her face is half-shadowed, jaw clenched. The panel border is irregular and jagged, suggesting tension. Detailed hatching defines her cheekbones, and concentrated screentones create deep shadows. Speed lines radiate from her head. A small speech bubble: "Found you."*
*\*\*Panel 3 (Below Panel 1, right side, horizontal):\*\* A gloved hand clenches into a fist, hydraulic servos in the knuckles activating with "SH-CHNK" sound effects. The cyborg arm is exposed, showing chrome plating and pulsing fiber-optic cables. Extreme close-up with dramatic foreshortening, deep black shadows, and white highlights catching on metal grooves. Thin panel frame.*
*\*\*Panel 4 (Center, large vertical panel):\*\* The woman explodes into action, launching from a crouch. Dynamic low-angle perspective (worm's eye view) captures her mid-leap, coat billowing, one leg extended for a flying kick. Her mechanical arm is pulled back, crackling with electricity rendered as bold, jagged white lines. Background dissolves into pure speed lines and speed blurs. The panel borders are slanted diagonally for energy.*
*\*\*Panel 5 (Bottom left, inset):\*\* Impact frame—her boot connects with a chrome helmet. The enemy's head snaps back, shards of metal flying. Drawn with extreme speed lines radiating from the impact point, negative space reversed (white background with black speed lines). "GA-KOOM!" sound effect in bold, cracked letters dominates the panel.*
*\*\*Panel 6 (Bottom right, final panel):\*\* The woman lands in a three-point stance on the rain-slicked ground, steam rising from her overheating arm. Low angle shot, her face is tilted up with a fierce smirk. Background shows fallen assailants blurred. Heavy blacks in the shadows, screentones on her coat, and a single white highlight on her cybernetic eye. Panel border is clean and solid, providing a sense of finality.*
https://preview.redd.it/3cyjd350vs3g1.png?width=1200&format=png&auto=webp&s=28abcf04cad59c018d325c16d9118fcf90490f0f
The prompt for the second page:
*\*\*PAGE 2\*\**
*\*\*Panel 1 (Top, wide shot):\*\* The cyborg woman rises to her full height, rainwater streaming down her coat. Steam continues to vent from her arm's exhaust ports with thin, wispy lines. She cracks her neck, head tilted slightly. The perspective is eye-level, showing the alley stretching behind her with three downed assailants lying in twisted heaps. Heavy cross-hatching in the shadows under the neon signs. Sound effect: "GISHI..." (creak). Her speech bubble, small and cold: "...That's all?"*
*\*\*Panel 2 (Inset, overlapping Panel 1, bottom right):\*\* A tight close-up of her cybernetic
eye whirring as the iris aperture contracts. Data streams and targeting reticles flicker in her vision, rendered as thin concentric circles and scrolling vertical text (binary code or garbled kanji) in the screentone. The pupil glows with a faint white highlight. No border, just the eye detail floating over the previous panel.*
*\*\*Panel 3 (Middle left, vertical):\*\* Her head snaps to the right, eyes wide, rain droplets flying off her hair. Dynamic motion lines arc across the panel. In the blurred background, visible through the downpour, a massive silhouette emerges—heavy tactical armor with a single glowing red optic sensor. The panel border is cracked and fragmented. Sound effect: "ZUUN!" (rumble).*
*\*\*Panel 4 (Middle right, small):\*\* A booted foot stomps down, cracking the concrete. Thick, jagged cracks radiate from the impact. Extreme foreshortening from a low angle, showing the weight and power. The armor plating is covered in warning stickers and weathered paint. Sound effect: "DOON!" (crash).*
*\*\*Panel 5 (Bottom, large horizontal spread):\*\* Full reveal of the enemy—an 8-foot tall enforcer droid, bulky and asymmetrical, with a rotary cannon arm and a rusted riot shield. It looms over her, filling the panel. The perspective is from behind the woman's shoulder, low angle, emphasizing its size. Rain sheets down its chassis, white highlights catching on metal edges. In the far background, more red eyes glow in the darkness. The woman's shadow stretches small before it. Sound effect across the top: "GOGOGOGOGO..." (menacing rumble).*
*\*\*Panel 6 (Bottom right corner, inset):\*\* A tight shot of her face, now smirking dangerously, one eye hidden by wet hair. She raises her mechanical arm, fingers spreading as hidden compartments slide open, revealing glowing energy cores. White-hot light bleeds into the black ink. Her dialogue bubble, sharp and cocky: "Now we're talking."*
https://preview.redd.it/n454tt4rvs3g1.png?width=1200&format=png&auto=webp&s=b5a50811918ead8ed3fbbbe74b06a7bc9a423382
https://redd.it/1p823jr
@rStableDiffusion
*\*\*Panel 3 (Middle left, vertical):\*\* Her head snaps to the right, eyes wide, rain droplets flying off her hair. Dynamic motion lines arc across the panel. In the blurred background, visible through the downpour, a massive silhouette emerges—heavy tactical armor with a single glowing red optic sensor. The panel border is cracked and fragmented. Sound effect: "ZUUN!" (rumble).*
*\*\*Panel 4 (Middle right, small):\*\* A booted foot stomps down, cracking the concrete. Thick, jagged cracks radiate from the impact. Extreme foreshortening from a low angle, showing the weight and power. The armor plating is covered in warning stickers and weathered paint. Sound effect: "DOON!" (crash).*
*\*\*Panel 5 (Bottom, large horizontal spread):\*\* Full reveal of the enemy—an 8-foot tall enforcer droid, bulky and asymmetrical, with a rotary cannon arm and a rusted riot shield. It looms over her, filling the panel. The perspective is from behind the woman's shoulder, low angle, emphasizing its size. Rain sheets down its chassis, white highlights catching on metal edges. In the far background, more red eyes glow in the darkness. The woman's shadow stretches small before it. Sound effect across the top: "GOGOGOGOGO..." (menacing rumble).*
*\*\*Panel 6 (Bottom right corner, inset):\*\* A tight shot of her face, now smirking dangerously, one eye hidden by wet hair. She raises her mechanical arm, fingers spreading as hidden compartments slide open, revealing glowing energy cores. White-hot light bleeds into the black ink. Her dialogue bubble, sharp and cocky: "Now we're talking."*
https://preview.redd.it/n454tt4rvs3g1.png?width=1200&format=png&auto=webp&s=b5a50811918ead8ed3fbbbe74b06a7bc9a423382
https://redd.it/1p823jr
@rStableDiffusion
According to Laxhar Labs, the Alibaba Z-Image team has intent to do their own official anime fine-tuning of Z-Image and has reached out asking for access to the NoobAI dataset
https://redd.it/1p856z1
@rStableDiffusion
https://redd.it/1p856z1
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: According to Laxhar Labs, the Alibaba Z-Image team has intent to do their own official…
Explore this post and more from the StableDiffusion community