Still doesn't seem to be a robust way of creating extended videos with Wan 2.2
With 2.1 and InfiniteTalk, we can create long running videos with very little quality loss
It seems strange to me that nothing in 2.2 seems to offer this capability. Wan Animate does a decent job, but it's limited to fixed pose references which struggle with any complex movement between multiple characters
All extend-from-last-frame techniques look extremely questionable because the quality has gone after decoding. VACE 2.2 does nothing to help here and even when it does provide continuous movement between segments (with frames for context), it will 'smooth the transition' rather than keep it consistent
Without something like InfiniteTalk in 2.2, I'm finding it difficult to make any good extended video, which is a shame given all the capabilities or 2.2 motion and loras
https://redd.it/1p187pr
@rStableDiffusion
With 2.1 and InfiniteTalk, we can create long running videos with very little quality loss
It seems strange to me that nothing in 2.2 seems to offer this capability. Wan Animate does a decent job, but it's limited to fixed pose references which struggle with any complex movement between multiple characters
All extend-from-last-frame techniques look extremely questionable because the quality has gone after decoding. VACE 2.2 does nothing to help here and even when it does provide continuous movement between segments (with frames for context), it will 'smooth the transition' rather than keep it consistent
Without something like InfiniteTalk in 2.2, I'm finding it difficult to make any good extended video, which is a shame given all the capabilities or 2.2 motion and loras
https://redd.it/1p187pr
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
ComfyUI Tip: How To Use rgthree Labels
https://www.youtube.com/watch?v=Iv98i8HWeKw
https://redd.it/1p1bquo
@rStableDiffusion
https://www.youtube.com/watch?v=Iv98i8HWeKw
https://redd.it/1p1bquo
@rStableDiffusion
YouTube
ComfyUI Tip: How To Use rgthree Labels
ComfyUI Tip: How to use rgthree labels for clean noscripts and notes in your workflows. This tutorial shows how labels work, how to install the rgthree node, how to format text, choose fonts, apply colors, use backgrounds, control padding, and match label colors…
Is it just me who gets this impression ? Is SDXL better than Flux and Qwen for generating art like this ? Is the problem the text encoder ?
https://redd.it/1p1dsce
@rStableDiffusion
https://redd.it/1p1dsce
@rStableDiffusion
2 million parameters denoiser model is everything that you need! ( Source code + model + poison detector) Anti Nightshade, Anti Glaze
Wassup
Today, I’m going to show you a new model designed for image **depoisoning**.
I decided to build something fresh, and this time the focus was on efficiency: the new model is incredibly lightweight, clocking in at just **2 million parameters**.
In addition to the denoiser, I’ve also trained a separate AI "Detector" that can tell you whether an image has been poisoned or if it's clean.
A quick heads-up: neither model is magic. They can (and likely will) make mistakes, but I have done my best to minimize errors. Regarding the denoiser specifically, I feel the architecture is a solid improvement over my previous version.
# 1. The Denoiser Architecture
Unlike standard heavy U-Nets, this architecture is designed to be Bias-Free and highly responsive to the specific noise level of the image.
Here is how this works:
* **Gaussian Prior Extraction:** Before the network even starts processing, the model uses a `ResidualPriorExtractor`. It runs fixed Gaussian kernels over the image to separate high-frequency details (edges/noise) from the smooth background. This gives the model a "head start" by highlighting areas where poison usually hides.
* **Noise Conditioning:** The model isn't static. It uses a `NoiseConditioner` that takes a noise level (sgima) and a content denoscriptor. It projects these into an embedding that modulates the network layers. Essentially, the model adjusts its "aggression" based on how noisy the image is.
* **Bias-Free Design:** All convolutions in the network have `bias=False`. This forces the network to rely purely on the feature data and normalization (`LayerNorm2d`), which often leads to better generalization in restoration tasks.
* **Gated Residual Blocks:** The core building blocks use **Global Gating**. The network calculates a gating value (0 to 1) based on the global mean of the features, allowing it to selectively let information pass through or be suppressed.
# 2. The "Predictor" (Detector) Architecture
Why I call it the "Predictor" ( for fun and giggles):
I named this model the Predictor because it doesn't just classify an image as "Bad" or "Good"—it simultaneously predicts the noise mask (where the poison is located).
This is a much more complex beast called **GhostResidualDecomposition-Net** . Here is how it achieves high accuracy :
* **The Backbone (ResNet + SE):** The encoder uses Residual Blocks enhanced with **Squeeze-and-Excitation (SE) Blocks**. SE blocks allow the network to perform "channel attention"—learning which feature maps are important and weighing them higher.
* **ASPP (Atrous Spatial Pyramid Pooling):** Located at the bottleneck, this module looks at the image with different "zoom levels" (dilated convolutions). This captures context at multiple scales, ensuring the model sees both fine noise patterns and the global image structure.
* **Attention Gates in the Decoder:** When the network upsamples the image to reconstruct it, it uses **Attention Gates** on the skip connections. Instead of blindly copying features from the encoder, these gates filter the features to focus only on relevant regions (the poisoned pixels).
# 3. The "Ghost Loss" Function
[yeah, only 7 epochs. ](https://preview.redd.it/58zzm7qyd92g1.png?width=331&format=png&auto=webp&s=66692cfdf6f459c25024dd86a2a0a3456dbc2038)
To train the Predictor, I used a custom loss function I call **Ghost Loss** ( very original ). It ensures the model isn't just hallucinating a clean image. It combines four specific penalties:
1. **Pixel-wise Noise Match:** Does the predicted noise mask match the real poison?
2. **Restoration Match (MSE):** If we subtract the mask, does the result look like the original clean image?
3. **Binary Classification (BCE):** Did it correctly flag the image as Poisoned/Safe?
4. **Semantic Anchor (Perceptual Loss):** This is the "Ghost" part. It runs the restored image through a frozen **VGG16** network to ensure the *features* (not just pixels) match
Wassup
Today, I’m going to show you a new model designed for image **depoisoning**.
I decided to build something fresh, and this time the focus was on efficiency: the new model is incredibly lightweight, clocking in at just **2 million parameters**.
In addition to the denoiser, I’ve also trained a separate AI "Detector" that can tell you whether an image has been poisoned or if it's clean.
A quick heads-up: neither model is magic. They can (and likely will) make mistakes, but I have done my best to minimize errors. Regarding the denoiser specifically, I feel the architecture is a solid improvement over my previous version.
# 1. The Denoiser Architecture
Unlike standard heavy U-Nets, this architecture is designed to be Bias-Free and highly responsive to the specific noise level of the image.
Here is how this works:
* **Gaussian Prior Extraction:** Before the network even starts processing, the model uses a `ResidualPriorExtractor`. It runs fixed Gaussian kernels over the image to separate high-frequency details (edges/noise) from the smooth background. This gives the model a "head start" by highlighting areas where poison usually hides.
* **Noise Conditioning:** The model isn't static. It uses a `NoiseConditioner` that takes a noise level (sgima) and a content denoscriptor. It projects these into an embedding that modulates the network layers. Essentially, the model adjusts its "aggression" based on how noisy the image is.
* **Bias-Free Design:** All convolutions in the network have `bias=False`. This forces the network to rely purely on the feature data and normalization (`LayerNorm2d`), which often leads to better generalization in restoration tasks.
* **Gated Residual Blocks:** The core building blocks use **Global Gating**. The network calculates a gating value (0 to 1) based on the global mean of the features, allowing it to selectively let information pass through or be suppressed.
# 2. The "Predictor" (Detector) Architecture
Why I call it the "Predictor" ( for fun and giggles):
I named this model the Predictor because it doesn't just classify an image as "Bad" or "Good"—it simultaneously predicts the noise mask (where the poison is located).
This is a much more complex beast called **GhostResidualDecomposition-Net** . Here is how it achieves high accuracy :
* **The Backbone (ResNet + SE):** The encoder uses Residual Blocks enhanced with **Squeeze-and-Excitation (SE) Blocks**. SE blocks allow the network to perform "channel attention"—learning which feature maps are important and weighing them higher.
* **ASPP (Atrous Spatial Pyramid Pooling):** Located at the bottleneck, this module looks at the image with different "zoom levels" (dilated convolutions). This captures context at multiple scales, ensuring the model sees both fine noise patterns and the global image structure.
* **Attention Gates in the Decoder:** When the network upsamples the image to reconstruct it, it uses **Attention Gates** on the skip connections. Instead of blindly copying features from the encoder, these gates filter the features to focus only on relevant regions (the poisoned pixels).
# 3. The "Ghost Loss" Function
[yeah, only 7 epochs. ](https://preview.redd.it/58zzm7qyd92g1.png?width=331&format=png&auto=webp&s=66692cfdf6f459c25024dd86a2a0a3456dbc2038)
To train the Predictor, I used a custom loss function I call **Ghost Loss** ( very original ). It ensures the model isn't just hallucinating a clean image. It combines four specific penalties:
1. **Pixel-wise Noise Match:** Does the predicted noise mask match the real poison?
2. **Restoration Match (MSE):** If we subtract the mask, does the result look like the original clean image?
3. **Binary Classification (BCE):** Did it correctly flag the image as Poisoned/Safe?
4. **Semantic Anchor (Perceptual Loss):** This is the "Ghost" part. It runs the restored image through a frozen **VGG16** network to ensure the *features* (not just pixels) match
the clean image. This prevents the model from blurring out important details.
Of course, source code will be in github and models in google disk [https://github.com/livinginparadise/GRDDenoiser/tree/main](https://github.com/livinginparadise/GRDDenoiser/tree/main) [https://drive.google.com/file/d/1u9xd3bAtF4zxSNeYB59Hcrwa1SoJxbhc/view?usp=sharing](https://drive.google.com/file/d/1u9xd3bAtF4zxSNeYB59Hcrwa1SoJxbhc/view?usp=sharing)
That's all. If you have questions: ask them
[Poison detector \( yeah, best GUI, I know.... \)](https://preview.redd.it/u4eqg9vw892g1.png?width=1299&format=png&auto=webp&s=ae510760568ce1073e578304647b8d50b5637086)
[Denoiser](https://preview.redd.it/6k03636pa92g1.png?width=1919&format=png&auto=webp&s=96553c1972cd4fb58de9e2d089d48dc8715080d6)
[Sometimes it helps \( with TTO\)](https://preview.redd.it/hhcp7lonb92g1.png?width=1900&format=png&auto=webp&s=f6deaa3ad619a645b1cf3e321f47be149977a19f)
[Without TTO](https://preview.redd.it/5i7ixc0qb92g1.png?width=1884&format=png&auto=webp&s=8f68c1bfd15a207302b44ba284de52bcdf24923a)
https://redd.it/1p1gjb2
@rStableDiffusion
Of course, source code will be in github and models in google disk [https://github.com/livinginparadise/GRDDenoiser/tree/main](https://github.com/livinginparadise/GRDDenoiser/tree/main) [https://drive.google.com/file/d/1u9xd3bAtF4zxSNeYB59Hcrwa1SoJxbhc/view?usp=sharing](https://drive.google.com/file/d/1u9xd3bAtF4zxSNeYB59Hcrwa1SoJxbhc/view?usp=sharing)
That's all. If you have questions: ask them
[Poison detector \( yeah, best GUI, I know.... \)](https://preview.redd.it/u4eqg9vw892g1.png?width=1299&format=png&auto=webp&s=ae510760568ce1073e578304647b8d50b5637086)
[Denoiser](https://preview.redd.it/6k03636pa92g1.png?width=1919&format=png&auto=webp&s=96553c1972cd4fb58de9e2d089d48dc8715080d6)
[Sometimes it helps \( with TTO\)](https://preview.redd.it/hhcp7lonb92g1.png?width=1900&format=png&auto=webp&s=f6deaa3ad619a645b1cf3e321f47be149977a19f)
[Without TTO](https://preview.redd.it/5i7ixc0qb92g1.png?width=1884&format=png&auto=webp&s=8f68c1bfd15a207302b44ba284de52bcdf24923a)
https://redd.it/1p1gjb2
@rStableDiffusion
GitHub
GitHub - livinginparadise/GRDDenoiser: Denoiser which task is to fight adversarial noise
Denoiser which task is to fight adversarial noise - GitHub - livinginparadise/GRDDenoiser: Denoiser which task is to fight adversarial noise
Nvidia sells an H100 for 10 times its manufacturing cost. Nvidia is the big villain company; it's because of them that large models like GPU 4 aren't available to run on consumer hardware. AI development will only advance when this company is dethroned.
Nvidia's profit margin on data center GPUs is really very high, 7 to 10 times higher.
It would hypothetically be possible for this GPU to be available to home consumers without Nvidia's inflated monopoly!
This company is delaying the development of AI.
https://redd.it/1p1m5gl
@rStableDiffusion
Nvidia's profit margin on data center GPUs is really very high, 7 to 10 times higher.
It would hypothetically be possible for this GPU to be available to home consumers without Nvidia's inflated monopoly!
This company is delaying the development of AI.
https://redd.it/1p1m5gl
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Version 1.0 The Easiest Way to Train Wan 2.2 LoRAs (Under $5)
https://github.com/obsxrver/wan22-lora-training
If you’ve been wanting to train your own Wan 2.2 Video LoRAs but are intimidated by the hardware requirements, parameter tweaking insanity, or the installation nightmare—I built a solution that handles it all for you.
If
https://preview.redd.it/8avncmwwbb2g1.png?width=875&format=png&auto=webp&s=71f66d615d269a03af89744285543476c7ab880e
This is currently the easiest, fastest, and cheapest way to get a high-quality training run done.
Why this method?
Zero Setup: No installing Python, CUDA, or hunting for dependencies. You launch a pre-built [Vast.AI](http://Vast.AI) template, and it's ready in minutes.
Full WebUI: Drag-and-drop your videos/images, edit captions, and click "Start." No terminal commands required.
Extremely Cheap: You can rent a dual RTX 5090 node, train a full LoRA in 2-3 hours, and auto-shutdown. Total cost is usually under $5.
Auto-Save: It automatically uploads your finished LoRA to your Cloud Storage (Google Drive/S3/Dropbox) and kills the instance so you don't pay for a second longer than necessary.
How it works:
1. Click the Vast.AI template link (in the repo).
2. Open the WebUI in your browser.
3. Upload your dataset and press Train.
4. Come back in an hour to find your LoRA in your Google Drive.
It supports both Text-to-Video and Image-to-Video, and optimizes for dual-GPU setups (training High/Low noise simultaneously) to cut training time in half.
Repo + Template Link:
https://github.com/obsxrver/wan22-lora-training
Let me know
if you have questions
https://redd.it/1p1puml
@rStableDiffusion
https://github.com/obsxrver/wan22-lora-training
If you’ve been wanting to train your own Wan 2.2 Video LoRAs but are intimidated by the hardware requirements, parameter tweaking insanity, or the installation nightmare—I built a solution that handles it all for you.
If
https://preview.redd.it/8avncmwwbb2g1.png?width=875&format=png&auto=webp&s=71f66d615d269a03af89744285543476c7ab880e
This is currently the easiest, fastest, and cheapest way to get a high-quality training run done.
Why this method?
Zero Setup: No installing Python, CUDA, or hunting for dependencies. You launch a pre-built [Vast.AI](http://Vast.AI) template, and it's ready in minutes.
Full WebUI: Drag-and-drop your videos/images, edit captions, and click "Start." No terminal commands required.
Extremely Cheap: You can rent a dual RTX 5090 node, train a full LoRA in 2-3 hours, and auto-shutdown. Total cost is usually under $5.
Auto-Save: It automatically uploads your finished LoRA to your Cloud Storage (Google Drive/S3/Dropbox) and kills the instance so you don't pay for a second longer than necessary.
How it works:
1. Click the Vast.AI template link (in the repo).
2. Open the WebUI in your browser.
3. Upload your dataset and press Train.
4. Come back in an hour to find your LoRA in your Google Drive.
It supports both Text-to-Video and Image-to-Video, and optimizes for dual-GPU setups (training High/Low noise simultaneously) to cut training time in half.
Repo + Template Link:
https://github.com/obsxrver/wan22-lora-training
Let me know
if you have questions
https://redd.it/1p1puml
@rStableDiffusion
GitHub
GitHub - obsxrver/wan22-lora-training: The easiest way to train a wan2.2 Lora. UPDATE: Now with webui!
The easiest way to train a wan2.2 Lora. UPDATE: Now with webui! - obsxrver/wan22-lora-training
角色迁移到场景的Lora
https://preview.redd.it/csium62eye2g1.png?width=2217&format=png&auto=webp&s=f768ad1c26423cb63435f42aa904494aa8dcfe53
https://preview.redd.it/hq5g80ifye2g1.png?width=6509&format=png&auto=webp&s=d306d61880fb3ad31ee28656502938097a3dc20d
https://preview.redd.it/8bmhpf5gye2g1.png?width=6134&format=png&auto=webp&s=69629ea3f65beb4d59e4ab1532b9024de1b7213f
https://preview.redd.it/0lixjergye2g1.png?width=5727&format=png&auto=webp&s=b1cd9df101639a61bf93ce0a696fca11c28cd2b0
https://preview.redd.it/f3b8bhrgye2g1.png?width=2450&format=png&auto=webp&s=d84fdb2028527b833834a2d933e221203ae5ac20
https://preview.redd.it/wcwolqfhye2g1.png?width=3848&format=png&auto=webp&s=67704d46a0fc69706298d6a26426cc61f37387c4
I used Qwen image editing 2509 + RoleScene Blend LORA, and used 5090 to complete the migration of the following characters to the scene in about 30 seconds
You can download the model here: https://civitai.com/models/2142049/rolescene-blend
Use the workflow I built here: https://www.runninghub.ai/post/1991385798813790209
You can register using my invitation link: https://www.runninghub.ai/?inviteCode=t0lfdxyz
Here is my teaching video, currently only in Chinese: https://www.bilibili.com/video/BV1afCfBFEJG/?spm\_id\_from=333.1387.homepage.video\_card.click&vd\_source=ae85ec1de21e4084d40c5d4eec667b8f
I used Qwen image editing 2509 + RoleScene Blend LORA, and used 5090 to complete the migration of the following characters to the scene in about 30 seconds
You can download the model here: https://civitai.com/models/2142049/rolescene-blend
Use the workflow I built here: https://www.runninghub.ai/post/1991385798813790209
You can register using my invitation link: https://www.runninghub.ai/?inviteCode=t0lfdxyz
Here is my teaching video, currently only in Chinese: https://www.bilibili.com/video/BV1afCfBFEJG/?spm\_id\_from=333.1387.homepage.video\_card.click&vd\_source=ae85ec1de21e4084d40c5d4eec667b8f
https://redd.it/1p233zo
@rStableDiffusion
https://preview.redd.it/csium62eye2g1.png?width=2217&format=png&auto=webp&s=f768ad1c26423cb63435f42aa904494aa8dcfe53
https://preview.redd.it/hq5g80ifye2g1.png?width=6509&format=png&auto=webp&s=d306d61880fb3ad31ee28656502938097a3dc20d
https://preview.redd.it/8bmhpf5gye2g1.png?width=6134&format=png&auto=webp&s=69629ea3f65beb4d59e4ab1532b9024de1b7213f
https://preview.redd.it/0lixjergye2g1.png?width=5727&format=png&auto=webp&s=b1cd9df101639a61bf93ce0a696fca11c28cd2b0
https://preview.redd.it/f3b8bhrgye2g1.png?width=2450&format=png&auto=webp&s=d84fdb2028527b833834a2d933e221203ae5ac20
https://preview.redd.it/wcwolqfhye2g1.png?width=3848&format=png&auto=webp&s=67704d46a0fc69706298d6a26426cc61f37387c4
I used Qwen image editing 2509 + RoleScene Blend LORA, and used 5090 to complete the migration of the following characters to the scene in about 30 seconds
You can download the model here: https://civitai.com/models/2142049/rolescene-blend
Use the workflow I built here: https://www.runninghub.ai/post/1991385798813790209
You can register using my invitation link: https://www.runninghub.ai/?inviteCode=t0lfdxyz
Here is my teaching video, currently only in Chinese: https://www.bilibili.com/video/BV1afCfBFEJG/?spm\_id\_from=333.1387.homepage.video\_card.click&vd\_source=ae85ec1de21e4084d40c5d4eec667b8f
I used Qwen image editing 2509 + RoleScene Blend LORA, and used 5090 to complete the migration of the following characters to the scene in about 30 seconds
You can download the model here: https://civitai.com/models/2142049/rolescene-blend
Use the workflow I built here: https://www.runninghub.ai/post/1991385798813790209
You can register using my invitation link: https://www.runninghub.ai/?inviteCode=t0lfdxyz
Here is my teaching video, currently only in Chinese: https://www.bilibili.com/video/BV1afCfBFEJG/?spm\_id\_from=333.1387.homepage.video\_card.click&vd\_source=ae85ec1de21e4084d40c5d4eec667b8f
https://redd.it/1p233zo
@rStableDiffusion
Is InstantID + Canny still the best method in 2025 for generating consistent LoRA reference images?
Hey everyone,
I’m building a LoRA for a custom female character and I need around 10–20 consistent face images (different angles, light, expressions, etc). I’m planning to use the InstantID + Canny ControlNet workflow in ComfyUI.
Before I finalize my setup, I want to ask:
1. Is InstantID + Canny still the most reliable method in 2025 for producing identity-consistent images for LoRA training?
2. Are there any improved workflows (InstantID + Depth, FaceID, or new consistency nodes) that give better results?
3. Does anyone have a ComfyUI graph or recommended settings they can share?
4. Anything I should avoid when generating reference shots (lighting, resolution, negative prompts, etc.)?
I’m aiming for high identity consistency (90%+), so any updated advice from 2025 users would really help.
Thanks!
https://redd.it/1p22zbb
@rStableDiffusion
Hey everyone,
I’m building a LoRA for a custom female character and I need around 10–20 consistent face images (different angles, light, expressions, etc). I’m planning to use the InstantID + Canny ControlNet workflow in ComfyUI.
Before I finalize my setup, I want to ask:
1. Is InstantID + Canny still the most reliable method in 2025 for producing identity-consistent images for LoRA training?
2. Are there any improved workflows (InstantID + Depth, FaceID, or new consistency nodes) that give better results?
3. Does anyone have a ComfyUI graph or recommended settings they can share?
4. Anything I should avoid when generating reference shots (lighting, resolution, negative prompts, etc.)?
I’m aiming for high identity consistency (90%+), so any updated advice from 2025 users would really help.
Thanks!
https://redd.it/1p22zbb
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
How do I stop female characters from dancing and bouncing their boobs in WAN 2.2 video?
Everytime I include a reference character of a woman she just starts dancing and her boobs start bouncing for literally no reason. The prompt I used for one of the videos is "the woman pulls out a gun and aims at the man" but while aiming the gun she just started doing tiktok dances and furiously shaking her hips.
I included in the negative prompts "dancing, tiktok dances, shaking hips" etc... but it doesn't seem to be having any effect.
Edit: I'm using the Wan smooth mix checkpoint. Does that affect the motion that much? The characters only bounce and dance when they are 3D models, real women just follow the prompt.
https://redd.it/1p26ebl
@rStableDiffusion
Everytime I include a reference character of a woman she just starts dancing and her boobs start bouncing for literally no reason. The prompt I used for one of the videos is "the woman pulls out a gun and aims at the man" but while aiming the gun she just started doing tiktok dances and furiously shaking her hips.
I included in the negative prompts "dancing, tiktok dances, shaking hips" etc... but it doesn't seem to be having any effect.
Edit: I'm using the Wan smooth mix checkpoint. Does that affect the motion that much? The characters only bounce and dance when they are 3D models, real women just follow the prompt.
https://redd.it/1p26ebl
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Any suggestions on getting images that look like they came from a Sears Portrait Studio?
https://redd.it/1p261vr
@rStableDiffusion
https://redd.it/1p261vr
@rStableDiffusion