It turns out WDDM driver mode is making our RAM - GPU transfer extremely slower compared to TCC or MCDM mode. Anyone has figured out the bypass NVIDIA software level restrictions?
We have noticed this issue while I was working on Qwen Images models training.
We are getting massive speed loss when we do big data transfer between RAM and GPU on Windows compared to Linux. It is all due to Block Swapping.
The hit is such a big scale that Linux runs 2x faster than Windows even more.
Tests are made on same : GPU RTX 5090
You can read more info here : https://github.com/kohya-ss/musubi-tuner/pull/700
It turns out if we enable TCC mode on Windows, it gets equal speed as Linux.
However NVIDIA blocked this at driver level.
I found a Chinese article with just changing few letters, via Patching nvlddmkm.sys, the TCC mode fully becomes working on consumer GPUs. However this option is extremely hard and complex for average users.
Everything I found says it is due to driver mode WDDM
Moreover it seems like Microsoft added this feature : MCDM
https://learn.microsoft.com/en-us/windows-hardware/drivers/display/mcdm-architecture
And as far as I understood, MCDM mode should be also same speed.
Anyone managed to fix this issue? Able to set mode to MCDM or TCC on consumer GPUs?
This is a very hidden issue on the community. This would probably speed up inference as well.
Usin WSL2 makes absolutely 0 difference. I tested.
https://redd.it/1ommmek
@rStableDiffusion
We have noticed this issue while I was working on Qwen Images models training.
We are getting massive speed loss when we do big data transfer between RAM and GPU on Windows compared to Linux. It is all due to Block Swapping.
The hit is such a big scale that Linux runs 2x faster than Windows even more.
Tests are made on same : GPU RTX 5090
You can read more info here : https://github.com/kohya-ss/musubi-tuner/pull/700
It turns out if we enable TCC mode on Windows, it gets equal speed as Linux.
However NVIDIA blocked this at driver level.
I found a Chinese article with just changing few letters, via Patching nvlddmkm.sys, the TCC mode fully becomes working on consumer GPUs. However this option is extremely hard and complex for average users.
Everything I found says it is due to driver mode WDDM
Moreover it seems like Microsoft added this feature : MCDM
https://learn.microsoft.com/en-us/windows-hardware/drivers/display/mcdm-architecture
And as far as I understood, MCDM mode should be also same speed.
Anyone managed to fix this issue? Able to set mode to MCDM or TCC on consumer GPUs?
This is a very hidden issue on the community. This would probably speed up inference as well.
Usin WSL2 makes absolutely 0 difference. I tested.
https://redd.it/1ommmek
@rStableDiffusion
GitHub
feat: add use_pinned_memory option for block swap in multiple models by kohya-ss · Pull Request #700 · kohya-ss/musubi-tuner
Add --use_pinned_memory_for_block_swap for each training noscript to enable pinned memory. Will work with Windows and Linux, but tested with Windows only.
Qwen-Image fine tuning is tested.
Qwen-Image fine tuning is tested.
updates on comfyui-integrated video editor, love to hear your opinion
https://reddit.com/link/1omn0c6/video/jk40xjl7nvyf1/player
"Hey everyone, I'm the cofounder of **Gausian** with u/maeng31
2 weeks ago, I shared a demo of my AI video editor web app, the feedback was loud and clear: **make it local, and make it open source.** That's exactly what I've been heads-down building.
I'm now deep in development on a **ComfyUI-integrated desktop editor** built with Rust/Tauri. The goal is to open-source it as soon as the MVP is ready for launch.
The Core Idea: Structured Storytelling
The reason I started this project is because I found that using ComfyUI is great for **generation**, but terrible for **storytelling**. We need a way to easily go from a narrative idea to a final sequence.
**Gausian connects the whole pre-production pipeline with your ComfyUI generation flows:**
* **Screenplay & Storyboard:** Create a noscript/screenplay and visually plan your scenes with a linked storyboard.
* **ComfyUI Integration:** Send a specific prompt/scene denoscription from a storyboard panel directly to your local ComfyUI instance.
* **Timeline:** The generated video automatically lands in the correct sequence and position on the timeline, giving you an instant rough cut.
https://redd.it/1omn0c6
@rStableDiffusion
https://reddit.com/link/1omn0c6/video/jk40xjl7nvyf1/player
"Hey everyone, I'm the cofounder of **Gausian** with u/maeng31
2 weeks ago, I shared a demo of my AI video editor web app, the feedback was loud and clear: **make it local, and make it open source.** That's exactly what I've been heads-down building.
I'm now deep in development on a **ComfyUI-integrated desktop editor** built with Rust/Tauri. The goal is to open-source it as soon as the MVP is ready for launch.
The Core Idea: Structured Storytelling
The reason I started this project is because I found that using ComfyUI is great for **generation**, but terrible for **storytelling**. We need a way to easily go from a narrative idea to a final sequence.
**Gausian connects the whole pre-production pipeline with your ComfyUI generation flows:**
* **Screenplay & Storyboard:** Create a noscript/screenplay and visually plan your scenes with a linked storyboard.
* **ComfyUI Integration:** Send a specific prompt/scene denoscription from a storyboard panel directly to your local ComfyUI instance.
* **Timeline:** The generated video automatically lands in the correct sequence and position on the timeline, giving you an instant rough cut.
https://redd.it/1omn0c6
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
One trainer Config Illustrious
As the noscript suggests, I’m still new to this training thing and hoping someone has a OneTrainer configuration file I could start with. Looking to train a specific realistic face Lora on a 4070 Super/32GB Ram
https://redd.it/1omj6cr
@rStableDiffusion
As the noscript suggests, I’m still new to this training thing and hoping someone has a OneTrainer configuration file I could start with. Looking to train a specific realistic face Lora on a 4070 Super/32GB Ram
https://redd.it/1omj6cr
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
Any ideas how to achieve High Quality Video-to-Anime Transformations
https://redd.it/1omv63f
@rStableDiffusion
https://redd.it/1omv63f
@rStableDiffusion
🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
Wan2.2 FLF used for VFX clothing changes - There's a very interesting fact in the post about the Tuxedo.
https://redd.it/1on0v6v
@rStableDiffusion
https://redd.it/1on0v6v
@rStableDiffusion
🤯1
Qwen Image Edit Lens conversion Lora test
https://preview.redd.it/bvwqoofaqzyf1.jpg?width=3666&format=pjpg&auto=webp&s=5090a938dbee41e249840760d7cbc3a3edecf4fa
https://preview.redd.it/q7gsql7hqzyf1.jpg?width=1970&format=pjpg&auto=webp&s=c55a1fd1db5080258a567ca1572829e42e55a543
Today, I'd like to share a very interesting Lora model of Qwen Edit. It was shared by a great expert named Big Xiong. This Lora model allows us to control the camera to move up, down, left, and right, as well as rotate left and right. You can also look down or up. The camera can be changed to a wide-angle or close-up lens.
**models link**:https://huggingface.co/dx8152/Qwen-Edit-2509-Multiple-angles
**Workflow down**:https://civitai.com/models/2096307/qwen-edit2509-multi-angle-storyboard-direct-output
The picture above shows tests conducted on 10 different lenses respectively, with the corresponding prompt: Move the camera forward.
* Move the camera left.
* Move the camera right.
* Move the camera down.
* Rotate the camera 45 degrees to the left.
* Rotate the camera 45 degrees to the right.
* Turn the camera to a top-down view.
* Turn the camera to an upward angle.
* Turn the camera to a wide-angle lens.
* Turn the camera to a close-up.
https://redd.it/1on560y
@rStableDiffusion
https://preview.redd.it/bvwqoofaqzyf1.jpg?width=3666&format=pjpg&auto=webp&s=5090a938dbee41e249840760d7cbc3a3edecf4fa
https://preview.redd.it/q7gsql7hqzyf1.jpg?width=1970&format=pjpg&auto=webp&s=c55a1fd1db5080258a567ca1572829e42e55a543
Today, I'd like to share a very interesting Lora model of Qwen Edit. It was shared by a great expert named Big Xiong. This Lora model allows us to control the camera to move up, down, left, and right, as well as rotate left and right. You can also look down or up. The camera can be changed to a wide-angle or close-up lens.
**models link**:https://huggingface.co/dx8152/Qwen-Edit-2509-Multiple-angles
**Workflow down**:https://civitai.com/models/2096307/qwen-edit2509-multi-angle-storyboard-direct-output
The picture above shows tests conducted on 10 different lenses respectively, with the corresponding prompt: Move the camera forward.
* Move the camera left.
* Move the camera right.
* Move the camera down.
* Rotate the camera 45 degrees to the left.
* Rotate the camera 45 degrees to the right.
* Turn the camera to a top-down view.
* Turn the camera to an upward angle.
* Turn the camera to a wide-angle lens.
* Turn the camera to a close-up.
https://redd.it/1on560y
@rStableDiffusion
How do you curate your mountains of generated media?
Until recently, I have just deleted any image or video I've generated that doesn't directly fit into a current project. Now though, I'm setting aside anything I deem "not slop" with the notion that maybe I can make use of it in the future. Suddenly I have hundreds of files and no good way to navigate them.
I could auto-caption these and slap together a simple database, but surely this is an already-solved problem. Google and LLMs show me many options for managing image and video libraries. Are there any that stand above the rest for this use case? I'd like something lightweight that can just ingest the media and the metadata and then allow me to search it meaningfully without much fuss.
How do others manage their "not slop" collection?
https://redd.it/1on7h64
@rStableDiffusion
Until recently, I have just deleted any image or video I've generated that doesn't directly fit into a current project. Now though, I'm setting aside anything I deem "not slop" with the notion that maybe I can make use of it in the future. Suddenly I have hundreds of files and no good way to navigate them.
I could auto-caption these and slap together a simple database, but surely this is an already-solved problem. Google and LLMs show me many options for managing image and video libraries. Are there any that stand above the rest for this use case? I'd like something lightweight that can just ingest the media and the metadata and then allow me to search it meaningfully without much fuss.
How do others manage their "not slop" collection?
https://redd.it/1on7h64
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Flux Gym updated (fluxgymbuckets)
I updated my fork of the flux gym
[https://github.com/FartyPants/fluxgym\bucket](https://github.com/FartyPants/fluxgymbucket)
I just realised with a bit of surprise that the original code would often skip some of the images. I had 100 images, but FLux Gym collected only 70. This isn't obvious, only if you look in the dataset directory.
It's because the way the collection code was written - very questionably.
So this new code is more robust and does what it suppose to do.
You only need the [app.py](https://github.com/FartyPants/fluxgymbucket/blob/main/app.py) that's where all the changes are (backup your original, and just drop the new in)
Also as previously, this version also fixes other things regarding buckets and resizing, it's described in readme.
https://redd.it/1on8bcw
@rStableDiffusion
I updated my fork of the flux gym
[https://github.com/FartyPants/fluxgym\bucket](https://github.com/FartyPants/fluxgymbucket)
I just realised with a bit of surprise that the original code would often skip some of the images. I had 100 images, but FLux Gym collected only 70. This isn't obvious, only if you look in the dataset directory.
It's because the way the collection code was written - very questionably.
So this new code is more robust and does what it suppose to do.
You only need the [app.py](https://github.com/FartyPants/fluxgymbucket/blob/main/app.py) that's where all the changes are (backup your original, and just drop the new in)
Also as previously, this version also fixes other things regarding buckets and resizing, it's described in readme.
https://redd.it/1on8bcw
@rStableDiffusion
Telegram's Cocoon - AI network (Important)
Pavel Durov (Telegram's founder) has announced a new project called Cocoon.
It's a decentralized AI network built on the TON blockchain.
The goal is to let people use AI tools without giving up their data privacy to big tech companies.
https://preview.redd.it/qyr4pgb7c1zf1.png?width=700&format=png&auto=webp&s=6893fa20ea19738ebe2c137d553099479ab833f0
https://redd.it/1onachu
@rStableDiffusion
Pavel Durov (Telegram's founder) has announced a new project called Cocoon.
It's a decentralized AI network built on the TON blockchain.
The goal is to let people use AI tools without giving up their data privacy to big tech companies.
https://preview.redd.it/qyr4pgb7c1zf1.png?width=700&format=png&auto=webp&s=6893fa20ea19738ebe2c137d553099479ab833f0
https://redd.it/1onachu
@rStableDiffusion
Finetuned LoRA for Enhanced Skin Realism in Qwen-Image-Edit-2509
Today I'm sharing a Qwen Edit 2509 based lora I created for improving Skin details across variety of subjects style shots.
I wrote about the problem, solution and my process of training in more details here on LinkedIn if you're interested in a bit of a deeper dive and exploring Nano Banana's attempt at improving skin, or understanding the approach to the dataset etc.
If you just want to grab the resources itself, feel free to download:
here on HF: [https://huggingface.co/tlennon-ie/qwen-edit-skin](https://huggingface.co/tlennon-ie/qwen-edit-skin)
here on Civitai: https://civitai.com/models/2097058?modelVersionId=2372630
The HuggingFace repo also includes a ComfyUI workflow I used for the comparison images.
It also includes the AI-Toolkit configuration file which has the settings I used to train this.
Want some comparisons? See below for some examples of before/after using the LORA.
If you have any feedback, I'd love to hear it. Yeah it might not be a perfect result, and there are other lora's likely trying to do the same but I thought I'd at least share my approach along with the resulting files to help out where I can. If you have further ideas, let me know. If you have questions, I'll try to answer.
https://preview.redd.it/yhph5r2vp1zf1.png?width=1333&format=png&auto=webp&s=fa4d649848cbf0061b50587a07785eeb79f94341
https://preview.redd.it/0wh4bs2vp1zf1.png?width=1333&format=png&auto=webp&s=aaafd02c6fae76b2f075efadee416379f1930afc
https://preview.redd.it/quhqxt2vp1zf1.png?width=1333&format=png&auto=webp&s=20d10493d681fca7998f0088d569d183e0e2f8f5
https://preview.redd.it/02ecc7xzp1zf1.png?width=3936&format=png&auto=webp&s=e5fe8690f77a9cc18a30b227fa3bbe924f50b910
https://preview.redd.it/3z6497xzp1zf1.png?width=3936&format=png&auto=webp&s=d166ed5d96295a2d4de4923c86c58ca05ef7b350
https://preview.redd.it/43q7ufxzp1zf1.png?width=3840&format=png&auto=webp&s=0a7f1bf052817b2770412c869daf631c027019e5
https://preview.redd.it/o6ab48xzp1zf1.png?width=3936&format=png&auto=webp&s=ebc94d0cc70dd8290c6fac69f3314f3c36d1a131
https://preview.redd.it/1o31e7xzp1zf1.png?width=4480&format=png&auto=webp&s=b67791b34e61272b977bfb195e8d6ec75745ae30
https://preview.redd.it/sy8557xzp1zf1.png?width=3936&format=png&auto=webp&s=bd7d2554ca5295dd45f48b86a4146410180eedbc
https://preview.redd.it/ce3yn8xzp1zf1.png?width=3936&format=png&auto=webp&s=aa75b4a735be9283d246d294d2899b0d9b906ef8
https://preview.redd.it/ahnq89xzp1zf1.png?width=3936&format=png&auto=webp&s=e7291843ae3404ff6c521f43b29689e572f1f973
https://preview.redd.it/52xgi8xzp1zf1.png?width=4096&format=png&auto=webp&s=ee72c3672c2534fe03848539684313e8f8a4f4a9
https://preview.redd.it/cz9ev9xzp1zf1.png?width=3936&format=png&auto=webp&s=550867f814d691b60e935a38c2970a958c2425a1
https://redd.it/1onc7ok
@rStableDiffusion
Today I'm sharing a Qwen Edit 2509 based lora I created for improving Skin details across variety of subjects style shots.
I wrote about the problem, solution and my process of training in more details here on LinkedIn if you're interested in a bit of a deeper dive and exploring Nano Banana's attempt at improving skin, or understanding the approach to the dataset etc.
If you just want to grab the resources itself, feel free to download:
here on HF: [https://huggingface.co/tlennon-ie/qwen-edit-skin](https://huggingface.co/tlennon-ie/qwen-edit-skin)
here on Civitai: https://civitai.com/models/2097058?modelVersionId=2372630
The HuggingFace repo also includes a ComfyUI workflow I used for the comparison images.
It also includes the AI-Toolkit configuration file which has the settings I used to train this.
Want some comparisons? See below for some examples of before/after using the LORA.
If you have any feedback, I'd love to hear it. Yeah it might not be a perfect result, and there are other lora's likely trying to do the same but I thought I'd at least share my approach along with the resulting files to help out where I can. If you have further ideas, let me know. If you have questions, I'll try to answer.
https://preview.redd.it/yhph5r2vp1zf1.png?width=1333&format=png&auto=webp&s=fa4d649848cbf0061b50587a07785eeb79f94341
https://preview.redd.it/0wh4bs2vp1zf1.png?width=1333&format=png&auto=webp&s=aaafd02c6fae76b2f075efadee416379f1930afc
https://preview.redd.it/quhqxt2vp1zf1.png?width=1333&format=png&auto=webp&s=20d10493d681fca7998f0088d569d183e0e2f8f5
https://preview.redd.it/02ecc7xzp1zf1.png?width=3936&format=png&auto=webp&s=e5fe8690f77a9cc18a30b227fa3bbe924f50b910
https://preview.redd.it/3z6497xzp1zf1.png?width=3936&format=png&auto=webp&s=d166ed5d96295a2d4de4923c86c58ca05ef7b350
https://preview.redd.it/43q7ufxzp1zf1.png?width=3840&format=png&auto=webp&s=0a7f1bf052817b2770412c869daf631c027019e5
https://preview.redd.it/o6ab48xzp1zf1.png?width=3936&format=png&auto=webp&s=ebc94d0cc70dd8290c6fac69f3314f3c36d1a131
https://preview.redd.it/1o31e7xzp1zf1.png?width=4480&format=png&auto=webp&s=b67791b34e61272b977bfb195e8d6ec75745ae30
https://preview.redd.it/sy8557xzp1zf1.png?width=3936&format=png&auto=webp&s=bd7d2554ca5295dd45f48b86a4146410180eedbc
https://preview.redd.it/ce3yn8xzp1zf1.png?width=3936&format=png&auto=webp&s=aa75b4a735be9283d246d294d2899b0d9b906ef8
https://preview.redd.it/ahnq89xzp1zf1.png?width=3936&format=png&auto=webp&s=e7291843ae3404ff6c521f43b29689e572f1f973
https://preview.redd.it/52xgi8xzp1zf1.png?width=4096&format=png&auto=webp&s=ee72c3672c2534fe03848539684313e8f8a4f4a9
https://preview.redd.it/cz9ev9xzp1zf1.png?width=3936&format=png&auto=webp&s=550867f814d691b60e935a38c2970a958c2425a1
https://redd.it/1onc7ok
@rStableDiffusion
Linkedin
The Uncanny Valley of AI-Generated Skin: A Training Approach to Realism
A small exploration into the common pitfall of AI image generation , the flawless, plastic-like skin and how a targeted lora training of Qwen-Image-Edit-2509 allows for more natural and detailed human subjects. Artificial intelligence has made astounding…
Alibaba has released an early preview of its new AI model, Qwen3-Max-Thinking.
Even as an early version still in training, it's already achieving 100% on challenging reasoning benchmarks like AIME 2025 and HMMT. You can try it now in Qwen Chat and via the Alibaba Cloud API.
https://preview.redd.it/1r4kjj7je2zf1.png?width=680&format=png&auto=webp&s=1d0567f47199dc5cfda5d0c381b0e20da37c3f4a
https://redd.it/1onfljd
@rStableDiffusion
Even as an early version still in training, it's already achieving 100% on challenging reasoning benchmarks like AIME 2025 and HMMT. You can try it now in Qwen Chat and via the Alibaba Cloud API.
https://preview.redd.it/1r4kjj7je2zf1.png?width=680&format=png&auto=webp&s=1d0567f47199dc5cfda5d0c381b0e20da37c3f4a
https://redd.it/1onfljd
@rStableDiffusion