To be very clear: as good as it is, Z-Image is NOT multi-modal or auto-regressive, there is NO difference whatsoever in how it uses Qwen relative to how other models use T5 / Mistral / etc. It DOES NOT "think" about your prompt and it never will. It is a standard diffusion model in all ways.
A lot of people seem extremely confused about this and appear to be convinced that Z-Image is something it isn't and never will be (the somewhat misleadingly worded, perhaps intentionally but perhaps not, blurbs on various parts of the Z-Image HuggingFace being mostly to blame).
TLDR it loads Qwen the SAME way that any other model loads any other text encoder, it's purely processing with absolutely none of the typical Qwen chat format personality being "alive". This is why for example it also cannot refuse prompts that Qwen certainly otherwise would if you had it loaded in a conventional chat context on Ollama or in LMStudio.
https://redd.it/1pm5vw0
@rStableDiffusion
A lot of people seem extremely confused about this and appear to be convinced that Z-Image is something it isn't and never will be (the somewhat misleadingly worded, perhaps intentionally but perhaps not, blurbs on various parts of the Z-Image HuggingFace being mostly to blame).
TLDR it loads Qwen the SAME way that any other model loads any other text encoder, it's purely processing with absolutely none of the typical Qwen chat format personality being "alive". This is why for example it also cannot refuse prompts that Qwen certainly otherwise would if you had it loaded in a conventional chat context on Ollama or in LMStudio.
https://redd.it/1pm5vw0
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
It turns out that weight size matters quite a lot with Kandinsky 5
fp8
bf16
Sorry for the boring video, I initially set out to do some basics with CFG on the Pro 5s T2V model, and someone asked which quant I was using, so I did this comparison while I was at it. This is same seed/settings, the only difference here is fp8 vs bf16. I'm used to most models having small accuracy issues, but this is practically a whole different result, so I thought I'd pass this along here.
Workflow: https://pastebin.com/daZdYLAv
edit: Crap! I uploaded the wrong video for bf16, this is the proper one:
proper bf16
https://redd.it/1pm4y7t
@rStableDiffusion
fp8
bf16
Sorry for the boring video, I initially set out to do some basics with CFG on the Pro 5s T2V model, and someone asked which quant I was using, so I did this comparison while I was at it. This is same seed/settings, the only difference here is fp8 vs bf16. I'm used to most models having small accuracy issues, but this is practically a whole different result, so I thought I'd pass this along here.
Workflow: https://pastebin.com/daZdYLAv
edit: Crap! I uploaded the wrong video for bf16, this is the proper one:
proper bf16
https://redd.it/1pm4y7t
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Simplest method increase the variation in z-image turbo
from https://www.bilibili.com/video/BV1Z7m2BVEH2/
Add a new K-sampler at the front of the original K-sampler The scheduler uses ddim_uniform, running only one step, with the rest remaining unchanged.
https://preview.redd.it/i7b9dajcd47g1.png?width=1688&format=png&auto=webp&s=8555bc28187e53edf922a1baaf7014b694415708
same prompt for 15 fig test
https://redd.it/1pm82hf
@rStableDiffusion
from https://www.bilibili.com/video/BV1Z7m2BVEH2/
Add a new K-sampler at the front of the original K-sampler The scheduler uses ddim_uniform, running only one step, with the rest remaining unchanged.
https://preview.redd.it/i7b9dajcd47g1.png?width=1688&format=png&auto=webp&s=8555bc28187e53edf922a1baaf7014b694415708
same prompt for 15 fig test
https://redd.it/1pm82hf
@rStableDiffusion
Bilibili
重磅首发--Z-image专用调度器_哔哩哔哩_bilibili
重磅首发--Z-image专用调度器, 视频播放量 646、弹幕量 0、点赞数 63、投硬币枚数 57、收藏人数 66、转发人数 5, 视频作者 蓝色多脑盒, 作者简介 ,相关视频:QWEN 2509 镜头控制节点 + 专用镜头loras,Z-image随机性完全破解,z-image绝世美女一致性lora,17.Z_Image_Turbo最佳参数采样调度器配合,18.Z_image_Turbo微调,更强的写实感,更强的NSFW,Z-image大画幅4K质感超级写实主义工作流,【重磅】阿里-贼图像模型Z-Image…