Local Dream 2.2.0 - batch mode and history
The new version of Local Dream has been released, with two new features:
- you can also perform (linear) batch generation,
- you can review and save previously generated images, per model!
The new version can be downloaded for Android from here:
https://github.com/xororz/local-dream/releases/tag/v2.2.0
https://redd.it/1omhokh
@rStableDiffusion
The new version of Local Dream has been released, with two new features:
- you can also perform (linear) batch generation,
- you can review and save previously generated images, per model!
The new version can be downloaded for Android from here:
https://github.com/xororz/local-dream/releases/tag/v2.2.0
https://redd.it/1omhokh
@rStableDiffusion
GitHub
Release v2.2.0 · xororz/local-dream
Support single-click serial batch generation feature (#111)
Added generation history feature for each model, displaying only the most recent 100 images
Display the most recent 20 historical generat...
Added generation history feature for each model, displaying only the most recent 100 images
Display the most recent 20 historical generat...
Is SD 1.5 still relevant? Are there any cool models?
The other day I was testing the stuff I generated on old infrastructure of the company (for one year and half the only infrastructure we had was a single 2080 Ti...) and now with the more advanced infrastructure we have, something like SDXL (Turbo) and SD 1.5 will cost next to nothing.
But I'm afraid with all these new advanced models, these models aren't as satisfying as the past. So here I just ask you, if you still use these models, which checkpoints are you using?
https://redd.it/1omkh9h
@rStableDiffusion
The other day I was testing the stuff I generated on old infrastructure of the company (for one year and half the only infrastructure we had was a single 2080 Ti...) and now with the more advanced infrastructure we have, something like SDXL (Turbo) and SD 1.5 will cost next to nothing.
But I'm afraid with all these new advanced models, these models aren't as satisfying as the past. So here I just ask you, if you still use these models, which checkpoints are you using?
https://redd.it/1omkh9h
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
It turns out WDDM driver mode is making our RAM - GPU transfer extremely slower compared to TCC or MCDM mode. Anyone has figured out the bypass NVIDIA software level restrictions?
We have noticed this issue while I was working on Qwen Images models training.
We are getting massive speed loss when we do big data transfer between RAM and GPU on Windows compared to Linux. It is all due to Block Swapping.
The hit is such a big scale that Linux runs 2x faster than Windows even more.
Tests are made on same : GPU RTX 5090
You can read more info here : https://github.com/kohya-ss/musubi-tuner/pull/700
It turns out if we enable TCC mode on Windows, it gets equal speed as Linux.
However NVIDIA blocked this at driver level.
I found a Chinese article with just changing few letters, via Patching nvlddmkm.sys, the TCC mode fully becomes working on consumer GPUs. However this option is extremely hard and complex for average users.
Everything I found says it is due to driver mode WDDM
Moreover it seems like Microsoft added this feature : MCDM
https://learn.microsoft.com/en-us/windows-hardware/drivers/display/mcdm-architecture
And as far as I understood, MCDM mode should be also same speed.
Anyone managed to fix this issue? Able to set mode to MCDM or TCC on consumer GPUs?
This is a very hidden issue on the community. This would probably speed up inference as well.
Usin WSL2 makes absolutely 0 difference. I tested.
https://redd.it/1ommmek
@rStableDiffusion
We have noticed this issue while I was working on Qwen Images models training.
We are getting massive speed loss when we do big data transfer between RAM and GPU on Windows compared to Linux. It is all due to Block Swapping.
The hit is such a big scale that Linux runs 2x faster than Windows even more.
Tests are made on same : GPU RTX 5090
You can read more info here : https://github.com/kohya-ss/musubi-tuner/pull/700
It turns out if we enable TCC mode on Windows, it gets equal speed as Linux.
However NVIDIA blocked this at driver level.
I found a Chinese article with just changing few letters, via Patching nvlddmkm.sys, the TCC mode fully becomes working on consumer GPUs. However this option is extremely hard and complex for average users.
Everything I found says it is due to driver mode WDDM
Moreover it seems like Microsoft added this feature : MCDM
https://learn.microsoft.com/en-us/windows-hardware/drivers/display/mcdm-architecture
And as far as I understood, MCDM mode should be also same speed.
Anyone managed to fix this issue? Able to set mode to MCDM or TCC on consumer GPUs?
This is a very hidden issue on the community. This would probably speed up inference as well.
Usin WSL2 makes absolutely 0 difference. I tested.
https://redd.it/1ommmek
@rStableDiffusion
GitHub
feat: add use_pinned_memory option for block swap in multiple models by kohya-ss · Pull Request #700 · kohya-ss/musubi-tuner
Add --use_pinned_memory_for_block_swap for each training noscript to enable pinned memory. Will work with Windows and Linux, but tested with Windows only.
Qwen-Image fine tuning is tested.
Qwen-Image fine tuning is tested.