r/StableDiffusion – Telegram
Do you still use older models?

Who here still uses older models, and what for? I still get a ton of use out of SD 1.4 and 1.5. They make great start images.

https://redd.it/1pkwms0
@rStableDiffusion
Z-Image: A bit of prompt engineering (prompt included)
https://redd.it/1pl6cxv
@rStableDiffusion
Chroma on itself kinda sux due to speed and image quality. Z-image kinda sux regarding artistic styles. both of them together kinda rules. small 768x1024 10 steps chroma image and 2 k zimage refiner.

https://redd.it/1plaaeo
@rStableDiffusion
What makes Z-image so good?

Im a bit of a noob when it comes to AI and image generation. Mostly watching different models generating images like qwen or sd.
I just use Nano banana for hobby.

Question i had was what makes Z-image so good? I know it can run efficiently on older gpus and generate good images but what prevents other models from doing the same.

tldr : what is Z-image doing differently?
Better training , better weights?

Question : what is the Z-image base what everyone is talking about? Next version of z-image



https://redd.it/1pldusz
@rStableDiffusion
Use Qwen3-VL-8B for Image-to-Image Prompting in Z-Image!

Knowing that Z-image used Qwn3-VL-4B as a text encoder. So, I've been using Qwen3-VL-8B as an image-to-image prompt to write detailed denoscriptions of images and then feed it to Z-image.

I tested all the Qwen-3-VL models from the 2B to 32B, and found that the denoscription quality is similar for 8B and above. Z-image seems to really love long detailed prompts, and in my testing, it just prefers prompts by the Qwen3 series of models.

P.S. I strongly believe that some of the TechLinked videos were used in the training dataset, otherwise it's uncanny how much Z-image managed to reproduced the images from text denoscription alone.

Prompt: "This is a medium shot of a man, identified by a lower-third graphic as Riley Murdock, standing in what appears to be a modern studio or set. He has dark, wavy hair, a light beard and mustache, and is wearing round, thin-framed glasses. He is directly looking at the viewer. He is dressed in a simple, dark-colored long-sleeved crewneck shirt. His expression is engaged and he appears to be speaking, with his mouth slightly open. The background is a stylized, colorful wall composed of geometric squares in various shades of blue, white, and yellow-orange, arranged in a pattern that creates a sense of depth and visual interest. A solid orange horizontal band runs across the upper portion of the background. In the lower-left corner, a graphic overlay displays the name "RILEY MURDOCK" in bold, orange, sans-serif capital letters on a white rectangular banner, which is accented with a colorful, abstract geometric design to its left. The lighting is bright and even, typical of a professional video production, highlighting the subject clearly against the vibrant backdrop. The overall impression is that of a presenter or host in a contemporary, upbeat setting. Riley Murdock, presenter, studio, modern, colorful background, geometric pattern, glasses, dark shirt, lower-third graphic, video production, professional, engaging, speaking, orange accent, blue and yellow wall."

Original Screenshot

Image generated from text Denoscription alone

Image generated from text Denoscription alone

Image generated from text Denoscription alone

https://redd.it/1pli1np
@rStableDiffusion
Excuse me, WHO MADE THIS NODE??? Please elaborate, how can we use this node?
https://redd.it/1pli7p9
@rStableDiffusion
The upcoming Z-image base will be a unified model that handles both image generation and editing.
https://redd.it/1pllpaf
@rStableDiffusion