r/StableDiffusion – Telegram
Local Dream 2.2.0 - batch mode and history

The new version of Local Dream has been released, with two new features:
- you can also perform (linear) batch generation,
- you can review and save previously generated images, per model!

The new version can be downloaded for Android from here:
https://github.com/xororz/local-dream/releases/tag/v2.2.0

https://redd.it/1omhokh
@rStableDiffusion
Is SD 1.5 still relevant? Are there any cool models?

The other day I was testing the stuff I generated on old infrastructure of the company (for one year and half the only infrastructure we had was a single 2080 Ti...) and now with the more advanced infrastructure we have, something like SDXL (Turbo) and SD 1.5 will cost next to nothing.

But I'm afraid with all these new advanced models, these models aren't as satisfying as the past. So here I just ask you, if you still use these models, which checkpoints are you using?

https://redd.it/1omkh9h
@rStableDiffusion
It turns out WDDM driver mode is making our RAM - GPU transfer extremely slower compared to TCC or MCDM mode. Anyone has figured out the bypass NVIDIA software level restrictions?

We have noticed this issue while I was working on Qwen Images models training.

We are getting massive speed loss when we do big data transfer between RAM and GPU on Windows compared to Linux. It is all due to Block Swapping.

The hit is such a big scale that Linux runs 2x faster than Windows even more.

Tests are made on same : GPU RTX 5090

You can read more info here : https://github.com/kohya-ss/musubi-tuner/pull/700

It turns out if we enable TCC mode on Windows, it gets equal speed as Linux.

However NVIDIA blocked this at driver level.

I found a Chinese article with just changing few letters, via Patching nvlddmkm.sys, the TCC mode fully becomes working on consumer GPUs. However this option is extremely hard and complex for average users.

Everything I found says it is due to driver mode WDDM

Moreover it seems like Microsoft added this feature : MCDM

https://learn.microsoft.com/en-us/windows-hardware/drivers/display/mcdm-architecture

And as far as I understood, MCDM mode should be also same speed.

Anyone managed to fix this issue? Able to set mode to MCDM or TCC on consumer GPUs?

This is a very hidden issue on the community. This would probably speed up inference as well.

Usin WSL2 makes absolutely 0 difference. I tested.

https://redd.it/1ommmek
@rStableDiffusion