r/StableDiffusion – Telegram
What makes Z-image so good?

Im a bit of a noob when it comes to AI and image generation. Mostly watching different models generating images like qwen or sd.
I just use Nano banana for hobby.

Question i had was what makes Z-image so good? I know it can run efficiently on older gpus and generate good images but what prevents other models from doing the same.

tldr : what is Z-image doing differently?
Better training , better weights?

Question : what is the Z-image base what everyone is talking about? Next version of z-image



https://redd.it/1pldusz
@rStableDiffusion
Use Qwen3-VL-8B for Image-to-Image Prompting in Z-Image!

Knowing that Z-image used Qwn3-VL-4B as a text encoder. So, I've been using Qwen3-VL-8B as an image-to-image prompt to write detailed denoscriptions of images and then feed it to Z-image.

I tested all the Qwen-3-VL models from the 2B to 32B, and found that the denoscription quality is similar for 8B and above. Z-image seems to really love long detailed prompts, and in my testing, it just prefers prompts by the Qwen3 series of models.

P.S. I strongly believe that some of the TechLinked videos were used in the training dataset, otherwise it's uncanny how much Z-image managed to reproduced the images from text denoscription alone.

Prompt: "This is a medium shot of a man, identified by a lower-third graphic as Riley Murdock, standing in what appears to be a modern studio or set. He has dark, wavy hair, a light beard and mustache, and is wearing round, thin-framed glasses. He is directly looking at the viewer. He is dressed in a simple, dark-colored long-sleeved crewneck shirt. His expression is engaged and he appears to be speaking, with his mouth slightly open. The background is a stylized, colorful wall composed of geometric squares in various shades of blue, white, and yellow-orange, arranged in a pattern that creates a sense of depth and visual interest. A solid orange horizontal band runs across the upper portion of the background. In the lower-left corner, a graphic overlay displays the name "RILEY MURDOCK" in bold, orange, sans-serif capital letters on a white rectangular banner, which is accented with a colorful, abstract geometric design to its left. The lighting is bright and even, typical of a professional video production, highlighting the subject clearly against the vibrant backdrop. The overall impression is that of a presenter or host in a contemporary, upbeat setting. Riley Murdock, presenter, studio, modern, colorful background, geometric pattern, glasses, dark shirt, lower-third graphic, video production, professional, engaging, speaking, orange accent, blue and yellow wall."

Original Screenshot

Image generated from text Denoscription alone

Image generated from text Denoscription alone

Image generated from text Denoscription alone

https://redd.it/1pli1np
@rStableDiffusion
Excuse me, WHO MADE THIS NODE??? Please elaborate, how can we use this node?
https://redd.it/1pli7p9
@rStableDiffusion
The upcoming Z-image base will be a unified model that handles both image generation and editing.
https://redd.it/1pllpaf
@rStableDiffusion
Just a quick PSA. Delete your ComfyUI prefs after big updates.

I had noticed that the new theme was quite different from the copy I had made. (Had set it to show nodes as boxes). And thought to myself, perhaps default settings are different now too.

So I deleted my prefs and, sure enough, a lot of strange issues I was having just disappeared. Just wish I had done this before filling out the survey... some of my complaints won't make sense to them 🤦

https://redd.it/1plp7ye
@rStableDiffusion