r/StableDiffusion – Telegram
lora-gym update: local GPU training for WAN LoRAs

Update on lora-gym (github.com/alvdansen/lora-gym) — added local training support.

Running on my A6000 right now. Same config structure, same hyperparameters, same dual-expert WAN 2.2 handling. No cloud setup required.

Currently validated on 48GB VRAM.

https://redd.it/1ravptl
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
I built the first Android app in the world that detects AI content locally and offline over any app using a Quick Tile

https://redd.it/1raxdg6
@rStableDiffusion
FLUX2 Klein 9B LoKR Training – My Ostris AI Toolkit Configuration & Observations

I’d like to share my current Ostris AI Toolkit configuration for training FLUX2 Klein 9B LoKR, along with some structured insights that have worked well for me. I’m quite satisfied with the results so far and would appreciate constructive feedback from the community.

Step & Epoch Strategy

Here’s the formula I’ve been following:

• Assume you have N images (example: 32 images).

• Save every (N × 3) steps

→ 32 × 3 = 96 steps per save

• Total training steps = (Save Steps × 6)

→ 96 × 6 = 576 total steps

In short:

• Multiply your dataset size by 3 → that’s your checkpoint save interval.

• Multiply that result by 6 → that’s your total training steps.

Training Behavior Observed

• Noticeable improvements typically begin around epoch 12–13

• Best balance achieved between epoch 13–16

• Beyond that, gains appear marginal in my tests

Results & Observations

• Reduced character bleeding

• Strong resemblance to the trained character

• Decent prompt adherence

• LoKR strength works well at power = 1

Overall, this setup has given me consistent and clean outputs with minimal artifacts.



I’m open to suggestions, constructive criticism, and genuine feedback. If you’ve experimented with different step scaling or alternative strategies for Klein 9B, I’d love to hear your thoughts so we can refine this configuration further. Here is the config - https://pastebin.com/sd3xE2Z3. // Note: This configuration was tested on an RTX 5090. Depending on your GPU (especially if you’re using lower VRAM cards), you may need to adjust certain parameters such as batch size, resolution, gradient accumulation, or total steps to ensure stability and optimal performance.

https://redd.it/1rayrbj
@rStableDiffusion
I'm completely done with Z-Image character training... exhausted

First of all, I'm not a native English speaker. This post was translated by AI, so please forgive any awkward parts.

I've tried countless times to make a LoRA of my own character using Z-Image base with my dataset.
I've run over 100 training sessions already.

It feels like it reaches about 85% similarity to my dataset.
But no matter how many more steps I add, it never improves beyond that.
It always plateaus at around 85% and stops developing further, like that's the maximum.

Today I loaded up an old LoRA I made before Z-Image came out — the one trained on the Turbo model.
I only switched the base model to Turbo and kept almost the same LoKr settings... and suddenly it got 95%+ likeness.
It felt so much closer to my dataset.

After all the experiments with Z-Image (aitoolkit, OneTrainer, every recommended config, etc.), the Turbo model still performed way better.

There were rumors about Ztuner or some fixes coming to solve the training issues, but there's been no news or release since.

So for now, I'm giving up on Z-Image character training.
I'm going to save my energy, money, and electricity until something actually improves.

I'm writing this just in case there are others who are as obsessed and stuck in the same loop as I was.

(Note: I tried aitoolkit and OneTrainer, and all the recommended settings, but they were still worse than training on the Turbo model.)

Thanks for reading. 😔

https://redd.it/1rb0uh8
@rStableDiffusion
Turns out LTX-2 makes a very good video upscaler for WAN

I have had a lot of fun with LTX but for a lot of usecases it is useless for me. for example this usecase where I could not get anything proper with LTX no matter how much I tried (mild nudity):
https://aurelm.com/portfolio/ode-to-the-female-form/
The video may be choppy on the site but you can download it locally. Looks quite good to me and also gets rid of the warping and artefacts from wan and the temporal upscaler also does a damn good job.
First 5 shots were upscaled from 720p to 1440p and the rest are from 440p to 1080p (that's why they look worse). No upscaling outside Comfy was used.

workwlow. I could not get a proper link of the 2 steps in one run (OOM) so the first group is for wan, second you load the wan video and run with only the second group active.

https://aurelm.com/upload/ComfyWorkflows/Wan\_22\_IMG2VID\_3\_STEPS\_TOTAL\_LTX2Upsampler.json

This are the kind of videos I could get from LTX only, sometimes with double faces, twisted heads and all in all milky, blurry.
https://aurelm.com/upload/ComfyUI\_01500-audio.mp4
https://aurelm.com/upload/ComfyUI\_01501-audio.mp4

Denoising should normally not go above 0.15 otherwise you run into ltx-related issues like blur, distort, artefacts. Also for wan you can set for both samplers the number of steps to 3 for faster iteration.

https://redd.it/1rb4ms7
@rStableDiffusion
How would you go about generating video with a character ref sheet?

I've generated a character sheet for a character that I want to use in a series of videos. I'm struggling to figure out how to properly use it when creating videos. Specifically Titmouse style DnD animation of a fight sequence that happened in game.

Would appreciate an workflow examples you can point to or tutorial vids for making my own.

https://preview.redd.it/kpallbyckxkg1.png?width=1024&format=png&auto=webp&s=d0fe33baeabeee6d356020ea81c0bae707cad638

https://preview.redd.it/805h1eyckxkg1.png?width=1024&format=png&auto=webp&s=42ef42bde1edee800e25210bf471831c93290726



https://redd.it/1rb5n9h
@rStableDiffusion
A single diffusion pass is enough to fool SynthID

I've been digging into invisible watermarks, SynthID, StableSignature, TreeRing — the stuff baked into pixels by Gemini, DALL-E, etc. Can't see them, can't Photoshop them out, they survive screenshots. Got curious how robust they actually are, so I threw together noai-watermark over a weekend. It runs a watermarked image through a diffusion model and the output looks the same but the watermark is gone. A single pass at low strength fools SynthID. There's also a CtrlRegen mode for higher quality. Strips all AI metadata too.

Mostly built this for research and education, wanted to understand how these systems work under the hood. Open source if anyone wants to poke around.

github: https://github.com/mertizci/noai-watermark

https://redd.it/1rbb24f
@rStableDiffusion