r/StableDiffusion – Telegram
WIP report: t5 sd1.5

Just a little attention mongering, because I'm an attention.. junkie...
Still trying to retrain sd to take T5 frontend.

Uncountable oddities. But here's a training output progression to make it look like im actually progressing towards something :-}

target was "a woman". This is at 10,000 steps through 18,000 steps, batch size 64


\\"woman\\"

Sad thing is, output degrades in various ways after that, so I cant release that checkpoint.

The work continues....

https://redd.it/1oyphkb
@rStableDiffusion
Most efficient/convenient setup/tooling for a 5060 Ti 16gb on Linux?

I just upgraded from an RTX 2070 Super 8gb to a RTX 5060 Ti 16gb. Common generation for a single image went from \~20.5 seconds to \~12.5 seconds. I then used a Dockerfile to build a wheel for Sage Attention 2.2 (so I could use recent versions of python/torch/cuda)—installing that yielded about a 6% speedup, to roughly \~11.5 seconds.

The RTX 5060 is sm120 (SM 12.0) Blackwell. It's fast but I guess there aren't a ton of optimizations (Sage/Flash) built for it yet. ChatGPT tells me I can install prebuilt wheels of Flash Attention 3 with great Blackwell support that offer far greater speeds, but I'm not sure it's right about that--where are these wheels? I don't even see a major version 3 in the flash attention repo's release section yet.

IMO this is all pretty fast now. But I was interested in testing out some video (e.g. Wan 2.2) and for that any speedup is really helpful. I'm not up for compiling Flash Attention--I gave it a try one evening but after two hours of 100% CPU I was about 1/8th of the way through the compilation and I quit it. Seems much better to download a good precompiled wheel somewhere if available. But (on Blackwell) would I really get a big improvement over Sage Attention 2.2?

And I've never tried Nunchaku and I'm not sure how that compares.

Is Sage Attention 2.2 about on par with alternatives for sm120 Blackwell? What do you think the best option is for someone with a RTX 5060 Ti 16gb on Linux?

https://redd.it/1oyomk1
@rStableDiffusion
Get rid of the halftone pattern in Qwen Image/Qwen Image Edit with this
https://redd.it/1oytasv
@rStableDiffusion
Has anyone switched fully from cloud AI to local, What surprised you most?

Hey everyone,
I’ve been thinking about moving away from cloud AI tools and running everything locally instead. I keep hearing mixed things. Some people say it feels amazing and private, others say the models feel slower or not as smart.

If you’ve actually made the switch to local AI, I would love to hear your honest experience:

What surprised you the most?
Was it the speed? The setup? Freedom?
Did you miss anything from cloud models?
And for anyone who tried switching but went back, what made you return?

I’m not trying to start a cloud vs. local fight. I am just curious how it feels to use local AI day to day. Real stories always help more than specs or benchmarks.

Thanks in advance!

https://redd.it/1oyv3zt
@rStableDiffusion
3060 12gb to 5060 Ti 16gb upgrade

So i can potentially get a 5060 TI 16gb for like $450 (i'm not from USA so maybe accurate or not :) ) brand new from a local business with warranty and all the good stuff.


Could you tell me if the upgrade is worth it, or should i keep on saving until next year so i can get an even better card?


I am pretty sure that at least for this yeas is as good as it gets, i already try on FB Marketplace of my city and is full of lemons/iffy stuff/overpriced garbage.

The best is could get is a 3080 12gb that i cannot run with the PSU i have, not used 4060 16gb, not a single decent x070 RTX series, just nothing


As a note i only have a 500w gold PSU so i cannot right now put anything power hungry on my pc.

https://redd.it/1oyz49j
@rStableDiffusion
Best way to change eye direction?

What is the best way to change the eye direction of the character of an image, so that his eyes look exactly in the direction I want? A model/Lora/comfy UI node that does this? Thank you

https://redd.it/1oza286
@rStableDiffusion
Help How to do SFT on Wan2.2-I2V-A14B while keeping Lighting’s distillation speedups?

Hi everyone, I’m working with Wan2.2-I2V-A14B for image-to-video generation, and I’m running into issues when trying to combine SFT with the Lighting acceleration.

# Setup / context

Base model: Wan2.2-I2V-A14B.
Acceleration: Lighting LoRA.
Goal: Do SFT on Wan2.2 for my own dataset, without losing the speedup brought by Lighting.



# What I’ve tried

1. Step 1: SFT on vanilla Wan2.2
I used DiffSynth-Studio to fine-tune Wan2.2 with a LoRA
After training, this LoRA alone works reasonably well when applied to Wan2.2 (no Lighting).
2. Step 2: Add Lighting on top of SFT LoRA
At inference time, I then stacked Lightning LoRA
The result is very bad
quality drops sharply
strange colors in the video
So simply “SFT first, then slap Lighting LoRA on top” obviously doesn’t work in my case.

# What I want to do

My intuition is that Lighting should be active during training, so that the model learns under the same accelerated architecture it will use at inference. In other words, I want to:

Start from Wan2.2 + Lighting 
Then run SFT on top of that

But here is the problem. I haven’t found a clean way to do SFT on “Wan2.2 + Lighting” together. DiffSynth-Studio seems to assume you fine-tune a single base model, not base + a pre-existing LoRA. And the scheduler might be a hindrance.

# Questions

So I’m looking for advice from anyone who has fine-tuned Wan2.2 with Lighting and kept the speedups after SFT.

https://redd.it/1oz9d1p
@rStableDiffusion
As of today, what is the best cloud AI Image Generator?

I’m looking for a good AI image gen with:
- A wide variety of models (Flux, Kontext, Illustrious, etc.)
- The ability to import my own LoRAs
- A clean and easy-to-use UI/UX for a smooth experience
- Upscaling
- No restrictive safety filters
- Some unexpected or extra useful features

Preferably at a fair and reasonable price!

What would you say is the best option?




https://redd.it/1ozkn46
@rStableDiffusion
A spotlight (quick finding tool) for ComfyUI

quite possibly the most important QOL plugin of the year.

tl;dr - find anything, anywhere, anytime.

https://preview.redd.it/op4op4fsm21g1.png?width=1068&format=png&auto=webp&s=41731f6781e16b9fc93e89454726aec49a0a4d31

The (configurable) hotkeys are Control\+Shift \+Space  or  Control\+K  or (if you are lazy), just /.

https://github.com/sfinktah/ovum-spotlight or search for `spotlight` in Comfy Manager.

Hold down Shift while scrolling to have the graph scroll with you to the highlighted node, that includes going inside subgraphs!

Want to find where you set the width to 480? Just search for `width:480`

Want to know what 16/9 is? Search for `math 16/9`

Want to find out where "link 182" is? Search for `link 182`

Want to jump to a node inside a subgraph by number? Search for `123:456:111` and you can go straight there.

Want to write your own extensions? It's supported, and there are examples.

https://redd.it/1ozmay9
@rStableDiffusion