What makes Z-image so good?
Im a bit of a noob when it comes to AI and image generation. Mostly watching different models generating images like qwen or sd.
I just use Nano banana for hobby.
Question i had was what makes Z-image so good? I know it can run efficiently on older gpus and generate good images but what prevents other models from doing the same.
tldr : what is Z-image doing differently?
Better training , better weights?
Question : what is the Z-image base what everyone is talking about? Next version of z-image
https://redd.it/1pldusz
@rStableDiffusion
Im a bit of a noob when it comes to AI and image generation. Mostly watching different models generating images like qwen or sd.
I just use Nano banana for hobby.
Question i had was what makes Z-image so good? I know it can run efficiently on older gpus and generate good images but what prevents other models from doing the same.
tldr : what is Z-image doing differently?
Better training , better weights?
Question : what is the Z-image base what everyone is talking about? Next version of z-image
https://redd.it/1pldusz
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Increase Your Level Of Details With Daemon Details Nodes and Generate Images at 4k With Z Img Turbo with DyPE
https://youtu.be/8rRCfhj4GBo
https://redd.it/1plh6dm
@rStableDiffusion
https://youtu.be/8rRCfhj4GBo
https://redd.it/1plh6dm
@rStableDiffusion
YouTube
ComfyUI Tutorial : Increase Your Details and Generate images at 4k With Z Img Turbo #comfyui
On this tutorial I will show you how to take your images to the next level using two workflows. the first one is based on Dype nodes that allows you to generate high resolution images at 4k resolution without upscaling and it goes beyond the model training…
Use Qwen3-VL-8B for Image-to-Image Prompting in Z-Image!
Knowing that Z-image used Qwn3-VL-4B as a text encoder. So, I've been using Qwen3-VL-8B as an image-to-image prompt to write detailed denoscriptions of images and then feed it to Z-image.
I tested all the Qwen-3-VL models from the 2B to 32B, and found that the denoscription quality is similar for 8B and above. Z-image seems to really love long detailed prompts, and in my testing, it just prefers prompts by the Qwen3 series of models.
P.S. I strongly believe that some of the TechLinked videos were used in the training dataset, otherwise it's uncanny how much Z-image managed to reproduced the images from text denoscription alone.
Prompt: "This is a medium shot of a man, identified by a lower-third graphic as Riley Murdock, standing in what appears to be a modern studio or set. He has dark, wavy hair, a light beard and mustache, and is wearing round, thin-framed glasses. He is directly looking at the viewer. He is dressed in a simple, dark-colored long-sleeved crewneck shirt. His expression is engaged and he appears to be speaking, with his mouth slightly open. The background is a stylized, colorful wall composed of geometric squares in various shades of blue, white, and yellow-orange, arranged in a pattern that creates a sense of depth and visual interest. A solid orange horizontal band runs across the upper portion of the background. In the lower-left corner, a graphic overlay displays the name "RILEY MURDOCK" in bold, orange, sans-serif capital letters on a white rectangular banner, which is accented with a colorful, abstract geometric design to its left. The lighting is bright and even, typical of a professional video production, highlighting the subject clearly against the vibrant backdrop. The overall impression is that of a presenter or host in a contemporary, upbeat setting. Riley Murdock, presenter, studio, modern, colorful background, geometric pattern, glasses, dark shirt, lower-third graphic, video production, professional, engaging, speaking, orange accent, blue and yellow wall."
Original Screenshot
Image generated from text Denoscription alone
Image generated from text Denoscription alone
Image generated from text Denoscription alone
https://redd.it/1pli1np
@rStableDiffusion
Knowing that Z-image used Qwn3-VL-4B as a text encoder. So, I've been using Qwen3-VL-8B as an image-to-image prompt to write detailed denoscriptions of images and then feed it to Z-image.
I tested all the Qwen-3-VL models from the 2B to 32B, and found that the denoscription quality is similar for 8B and above. Z-image seems to really love long detailed prompts, and in my testing, it just prefers prompts by the Qwen3 series of models.
P.S. I strongly believe that some of the TechLinked videos were used in the training dataset, otherwise it's uncanny how much Z-image managed to reproduced the images from text denoscription alone.
Prompt: "This is a medium shot of a man, identified by a lower-third graphic as Riley Murdock, standing in what appears to be a modern studio or set. He has dark, wavy hair, a light beard and mustache, and is wearing round, thin-framed glasses. He is directly looking at the viewer. He is dressed in a simple, dark-colored long-sleeved crewneck shirt. His expression is engaged and he appears to be speaking, with his mouth slightly open. The background is a stylized, colorful wall composed of geometric squares in various shades of blue, white, and yellow-orange, arranged in a pattern that creates a sense of depth and visual interest. A solid orange horizontal band runs across the upper portion of the background. In the lower-left corner, a graphic overlay displays the name "RILEY MURDOCK" in bold, orange, sans-serif capital letters on a white rectangular banner, which is accented with a colorful, abstract geometric design to its left. The lighting is bright and even, typical of a professional video production, highlighting the subject clearly against the vibrant backdrop. The overall impression is that of a presenter or host in a contemporary, upbeat setting. Riley Murdock, presenter, studio, modern, colorful background, geometric pattern, glasses, dark shirt, lower-third graphic, video production, professional, engaging, speaking, orange accent, blue and yellow wall."
Original Screenshot
Image generated from text Denoscription alone
Image generated from text Denoscription alone
Image generated from text Denoscription alone
https://redd.it/1pli1np
@rStableDiffusion
Excuse me, WHO MADE THIS NODE??? Please elaborate, how can we use this node?
https://redd.it/1pli7p9
@rStableDiffusion
https://redd.it/1pli7p9
@rStableDiffusion
The upcoming Z-image base will be a unified model that handles both image generation and editing.
https://redd.it/1pllpaf
@rStableDiffusion
https://redd.it/1pllpaf
@rStableDiffusion
Créer un LoRA de personne pour Z-Image Turbo pour les novices avec AI-Toolkit
https://redd.it/1plojo7
@rStableDiffusion
https://redd.it/1plojo7
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Créer un LoRA de personne pour Z-Image Turbo pour les novices avec AI-Toolkit
Explore this post and more from the StableDiffusion community
Just a quick PSA. Delete your ComfyUI prefs after big updates.
I had noticed that the new theme was quite different from the copy I had made. (Had set it to show nodes as boxes). And thought to myself, perhaps default settings are different now too.
So I deleted my prefs and, sure enough, a lot of strange issues I was having just disappeared. Just wish I had done this before filling out the survey... some of my complaints won't make sense to them 🤦
https://redd.it/1plp7ye
@rStableDiffusion
I had noticed that the new theme was quite different from the copy I had made. (Had set it to show nodes as boxes). And thought to myself, perhaps default settings are different now too.
So I deleted my prefs and, sure enough, a lot of strange issues I was having just disappeared. Just wish I had done this before filling out the survey... some of my complaints won't make sense to them 🤦
https://redd.it/1plp7ye
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Announcing The Release of Qwen 360 Diffusion, The World's Best 360° Text-to-Image Model
https://www.reddit.com/gallery/1pltzay
https://redd.it/1plv6ry
@rStableDiffusion
https://www.reddit.com/gallery/1pltzay
https://redd.it/1plv6ry
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Announcing The Release of Qwen 360 Diffusion, The World's Best 360° Text-to-Image…
Explore this post and more from the StableDiffusion community
Announcing The Release of Qwen 360 Diffusion, The World's Best 360° Text-to-Image Model
https://redd.it/1pltzay
@rStableDiffusion
https://redd.it/1pltzay
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Announcing The Release of Qwen 360 Diffusion, The World's Best 360° Text-to-Image…
Explore this post and more from the StableDiffusion community