Qwen Image LoRA Training Tutorial on RunPod using Diffusion Pipe
https://www.youtube.com/watch?v=hXnFChMvLwg
https://redd.it/1olnwl4
@rStableDiffusion
https://www.youtube.com/watch?v=hXnFChMvLwg
https://redd.it/1olnwl4
@rStableDiffusion
YouTube
Qwen Image LoRA Training Tutorial on RunPod using Diffusion Pipe
This video takes you through captioning a dataset and training a Qwen Image LoRA on RunPod.
To deploy:
https://get.runpod.io/diffusion-pipe-template
► To Join The Hideout: https://www.hearmemanai.com
Join my Discord server for updates on new Workflows…
To deploy:
https://get.runpod.io/diffusion-pipe-template
► To Join The Hideout: https://www.hearmemanai.com
Join my Discord server for updates on new Workflows…
Wan 2.2 multi-shot scene + character consistency test
The post Wan 2.2 MULTI-SHOTS (no extras) Consistent Scene + Character : r/comfyui took my interest on how to raise consistence for shots in a scene. The idea is not to create the whole scene in one go but rather to create 81 frames videos including multiple shots to get some material for start/end frames of actual shots. Due the 81 frames sampling the model keeps the consistency at a higher level in that window. It's not perfect but gets in the direction of believable.
Here is the test result, which startet with one 1080p image generated in Wan 2.2 t2i.
Final result after rife47 frame interpolation + Wan2.2 v2v and SeedVR2 1080p passes.
Other than the original post I used Wan 2.2 fun control, with 5 random pexels videos and different poses, cut down to fit into 81 frames.
https://reddit.com/link/1oloosp/video/4o4dtwy3hnyf1/player
With the starting t2i image and the poses Wan 2.2 Fun control generated the following 81 frames at 720p.
Not sure if needed but I added random shot denoscriptions in the prompt to describe a simple photo studio scene and plain simple gray background.
Wan 2.2 Fun Control 87 frames
Still a bit rough on the edges so I did a Wan 2.2 v2v pass to get it to 1536x864 resolution to sharpen things up.
https://reddit.com/link/1oloosp/video/kn4pnob0inyf1/player
And the top video is after rife47 frame interpolation from 16 to 32 and SeedVR2 upscale to 1080p with batch size 89.
\---------------
My takeaway from this is that this may help to get believable somewhat consistent shot frames. But more importantly it can be used to generate material for a character lora since from one high res start image dozens of shots can be made to get all sorts of expressions and poses with a high likeness.
The workflows used are just the default workflows with almost nothing changed other than resolution and and random messing with sampler values.
https://redd.it/1oloosp
@rStableDiffusion
The post Wan 2.2 MULTI-SHOTS (no extras) Consistent Scene + Character : r/comfyui took my interest on how to raise consistence for shots in a scene. The idea is not to create the whole scene in one go but rather to create 81 frames videos including multiple shots to get some material for start/end frames of actual shots. Due the 81 frames sampling the model keeps the consistency at a higher level in that window. It's not perfect but gets in the direction of believable.
Here is the test result, which startet with one 1080p image generated in Wan 2.2 t2i.
Final result after rife47 frame interpolation + Wan2.2 v2v and SeedVR2 1080p passes.
Other than the original post I used Wan 2.2 fun control, with 5 random pexels videos and different poses, cut down to fit into 81 frames.
https://reddit.com/link/1oloosp/video/4o4dtwy3hnyf1/player
With the starting t2i image and the poses Wan 2.2 Fun control generated the following 81 frames at 720p.
Not sure if needed but I added random shot denoscriptions in the prompt to describe a simple photo studio scene and plain simple gray background.
Wan 2.2 Fun Control 87 frames
Still a bit rough on the edges so I did a Wan 2.2 v2v pass to get it to 1536x864 resolution to sharpen things up.
https://reddit.com/link/1oloosp/video/kn4pnob0inyf1/player
And the top video is after rife47 frame interpolation from 16 to 32 and SeedVR2 upscale to 1080p with batch size 89.
\---------------
My takeaway from this is that this may help to get believable somewhat consistent shot frames. But more importantly it can be used to generate material for a character lora since from one high res start image dozens of shots can be made to get all sorts of expressions and poses with a high likeness.
The workflows used are just the default workflows with almost nothing changed other than resolution and and random messing with sampler values.
https://redd.it/1oloosp
@rStableDiffusion
Reddit
From the comfyui community on Reddit
Explore this post and more from the comfyui community
Any way to get consistent face with flymy-ai/qwen-image-realism-lora
https://redd.it/1olpt5t
@rStableDiffusion
https://redd.it/1olpt5t
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Any way to get consistent face with flymy-ai/qwen-image-realism-lora
Explore this post and more from the StableDiffusion community
Media is too big
VIEW IN TELEGRAM
Mario the crazy conspiracy theorist was too much fun not to create! LTX-2
https://redd.it/1olt8jb
@rStableDiffusion
https://redd.it/1olt8jb
@rStableDiffusion
Reporting Pro 6000 Blackwell can handle batch size 8 while training an Illustrious LoRA.
https://redd.it/1olvxy8
@rStableDiffusion
https://redd.it/1olvxy8
@rStableDiffusion
What Illustrious models is everyone using?
I have experimented with many Illustrious models, with WAI, Prefect and JANKU being my favorites, but I am curious what you guys are using! I'd love to find a daily driver as opposed to swapping between models so often.
https://redd.it/1om1e9a
@rStableDiffusion
I have experimented with many Illustrious models, with WAI, Prefect and JANKU being my favorites, but I am curious what you guys are using! I'd love to find a daily driver as opposed to swapping between models so often.
https://redd.it/1om1e9a
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Got Wan2.2 I2V running 2.5x faster on 8xH100 using Sequence Parallelism + Magcache
https://preview.redd.it/07lwyvl5zryf1.png?width=1200&format=png&auto=webp&s=ad22c52c861c18c94c54f27bbe71a6e120a8f3e7
Hey everyone,
I was curious how much faster we can get with Magcache on 8xH100 instead of 1xH100 for Wan 2.2 I2V. Currently, the original repositories of Magcache and Teacache only support 1GPU inference for Wan2.2 because of FSDP, as shown in this GitHub issue.
I managed to scale Magcache on 8XH100 with FSDP and sequence parallelism. Also experimented with several techniques: Flash-Attention-3, TF32 tensor cores, int8 quantization, Magcache, and torch.compile.
The fastest combo I got was FA3+TF32+Magcache+torch.compile that runs a 1280x720 video (81 frames, 40 steps) in 109s, down from 250s baseline (8xH100 sequence parallelism and FA2 only) without noticeable loss of quality. We can also play with the Magcache parameters for a quality tradeoff, for example, E024K2R10 (Error threshold =0.24, Skip K=2, Retention ratio = 0.1) to get 2.5x + speed boost.
Full breakdown, commands, and comparisons are here:
👉 Blog post with full benchmarks and configs
👉 Github repo with code
Curious if anyone else here is exploring sequence parallelism or similar caching methods on FSDP-based video diffusion models? Would love to compare notes.
Disclosure: I worked on and co-wrote this technical breakdown as part of the Morphic team
https://redd.it/1om8sr9
@rStableDiffusion
https://preview.redd.it/07lwyvl5zryf1.png?width=1200&format=png&auto=webp&s=ad22c52c861c18c94c54f27bbe71a6e120a8f3e7
Hey everyone,
I was curious how much faster we can get with Magcache on 8xH100 instead of 1xH100 for Wan 2.2 I2V. Currently, the original repositories of Magcache and Teacache only support 1GPU inference for Wan2.2 because of FSDP, as shown in this GitHub issue.
I managed to scale Magcache on 8XH100 with FSDP and sequence parallelism. Also experimented with several techniques: Flash-Attention-3, TF32 tensor cores, int8 quantization, Magcache, and torch.compile.
The fastest combo I got was FA3+TF32+Magcache+torch.compile that runs a 1280x720 video (81 frames, 40 steps) in 109s, down from 250s baseline (8xH100 sequence parallelism and FA2 only) without noticeable loss of quality. We can also play with the Magcache parameters for a quality tradeoff, for example, E024K2R10 (Error threshold =0.24, Skip K=2, Retention ratio = 0.1) to get 2.5x + speed boost.
Full breakdown, commands, and comparisons are here:
👉 Blog post with full benchmarks and configs
👉 Github repo with code
Curious if anyone else here is exploring sequence parallelism or similar caching methods on FSDP-based video diffusion models? Would love to compare notes.
Disclosure: I worked on and co-wrote this technical breakdown as part of the Morphic team
https://redd.it/1om8sr9
@rStableDiffusion
ComfyUI Tutorial: Take Your Prompt To The Next Level With Qwen 3 VL
https://youtu.be/cfgtvXeYYb0
https://redd.it/1omavip
@rStableDiffusion
https://youtu.be/cfgtvXeYYb0
https://redd.it/1omavip
@rStableDiffusion
YouTube
ComfyUI Tutorial: Take Your Prompt To The Next Level With Qwen 3 VL #comfyui #comfyuitutorial
On this tutorial I will show you how to generate prompt by analyzing an image input using qwen vl3 new model dedicated for prompt extraction using input image, it allows you to extract all the needed data like poses, outfit, colors, environment to create…
Dataset tool to organize images by quality (sharp / blurry, jpeg artifacts, compression, etc).
I have rolled some of my own image quality tools before but I'll try asking. Any tool that allows for grouping / sorting / filtering images by different quality criteria like sharpness, blurriness, jpeg artifacts (even imperceptible), compression, out-of-focus depth of field, etc - basically by overall quality?
I am looking to root out outliers out of larger datasets that could negatively affect training quality.
https://redd.it/1omac5p
@rStableDiffusion
I have rolled some of my own image quality tools before but I'll try asking. Any tool that allows for grouping / sorting / filtering images by different quality criteria like sharpness, blurriness, jpeg artifacts (even imperceptible), compression, out-of-focus depth of field, etc - basically by overall quality?
I am looking to root out outliers out of larger datasets that could negatively affect training quality.
https://redd.it/1omac5p
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community