Quick PSA, the StableDiffusioncpp implementation of z-image is up to 2x faster than the comfyui implementation on some cards.
I was first alerted to this by someone in a discord server and I felt this was important enough to share. There's a good chance that this only applies to 20 series (and maybe some other non-bf16 cards) but if you're interested I suggest giving it a try. This is very informal, I'm just reporting my experience in case it's useful to others.
With my 2060 6GB and the fp16 hack I get about 7.5-8s/it on z-image with comfyui. I've tried several different speedups like cache-dit and the gguf nodes (to limit offloading), but they either look noticeably worse (cache-dit) or make no difference (gguf).
Now with StableDiffusioncpp I'm getting 4s/it, nearly a 2x speed increase without any noticeable quality degradation.
I'm using the newest (at the time of writing) versions of both. Comfy is the portable version while I compiled sdcpp myself (too much work to troubleshoot the precompiled cuda binaries).
I assume the difference comes primarily from more effective use of the gguf format and better offloading. It's also possible that the way sdcpp handles the fp32 upcasting required on 20 series cards is more efficient than what the comfyui hack does.
The big drawback here is obviously the lack of flexibility compared to comfy, but for me at least it's a big enough gain that I'll probably slop together a wrapper node for it at some point (if it doesn't already exist).
Here's the command I'm using for sdcpp:
To get this image:
https://preview.redd.it/um059qe04u4g1.png?width=1024&format=png&auto=webp&s=9c79951152a301aabd5b8b0098fb512cb7cd127a
You might notice that I'm using the heretic version of qwen. This is simply because I already had the gguf version downloaded from when I tested it previously. In practice it makes a very mild difference.
Instructions for setting up z-image in stablediffusioncpp are here: https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/z\_image.md.
Huge props to leejet and everyone else who works on stablediffusioncpp.
https://redd.it/1pchpjb
@rStableDiffusion
I was first alerted to this by someone in a discord server and I felt this was important enough to share. There's a good chance that this only applies to 20 series (and maybe some other non-bf16 cards) but if you're interested I suggest giving it a try. This is very informal, I'm just reporting my experience in case it's useful to others.
With my 2060 6GB and the fp16 hack I get about 7.5-8s/it on z-image with comfyui. I've tried several different speedups like cache-dit and the gguf nodes (to limit offloading), but they either look noticeably worse (cache-dit) or make no difference (gguf).
Now with StableDiffusioncpp I'm getting 4s/it, nearly a 2x speed increase without any noticeable quality degradation.
I'm using the newest (at the time of writing) versions of both. Comfy is the portable version while I compiled sdcpp myself (too much work to troubleshoot the precompiled cuda binaries).
I assume the difference comes primarily from more effective use of the gguf format and better offloading. It's also possible that the way sdcpp handles the fp32 upcasting required on 20 series cards is more efficient than what the comfyui hack does.
The big drawback here is obviously the lack of flexibility compared to comfy, but for me at least it's a big enough gain that I'll probably slop together a wrapper node for it at some point (if it doesn't already exist).
Here's the command I'm using for sdcpp:
.\stable-diffusion.cpp\build\bin\Release\sd.exe --diffusion-model ./z_image_turbo-Q4_K.gguf --vae ./ae.sft --llm ./qwen-4b-zimage-heretic-q8.gguf -p "a photo of a skeleton wearing tophat and a black t-shirt with white text that says '4s/it on a 2060' in a cursive font" --cfg-scale 1.0 -v --offload-to-cpu --diffusion-fa -H 1024 -W 1024 --steps 8To get this image:
https://preview.redd.it/um059qe04u4g1.png?width=1024&format=png&auto=webp&s=9c79951152a301aabd5b8b0098fb512cb7cd127a
You might notice that I'm using the heretic version of qwen. This is simply because I already had the gguf version downloaded from when I tested it previously. In practice it makes a very mild difference.
Instructions for setting up z-image in stablediffusioncpp are here: https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/z\_image.md.
Huge props to leejet and everyone else who works on stablediffusioncpp.
https://redd.it/1pchpjb
@rStableDiffusion