r/StableDiffusion – Telegram
Lora Training for Z Image Turbo on 12gb VRAM

Shoutout to Ostris for getting Z Image supported for lora training so quickly.

[https://github.com/ostris/ai-toolkit](https://github.com/ostris/ai-toolkit)

[https://huggingface.co/ostris/zimage\_turbo\_training\_adapter](https://huggingface.co/ostris/zimage_turbo_training_adapter)

Wanted to share that it looks like you will be able to train this with GPU's with 12gb VRAM. Currently running it on his run pod template.

[https://console.runpod.io/hub/template/ai-toolkit-ostris-ui-official?id=0fqzfjy6f3](https://console.runpod.io/hub/template/ai-toolkit-ostris-ui-official?id=0fqzfjy6f3)

`MODEL OPTIONS`

* `Low VRAM: ON`
* `LAYER OFFLOADING: OFF`


`QUANTIZATION`

* `Transformer: float8 (default)`
* `Text Encoder: float8 (default)`

`TARGET`

* `Target Type: LoRA`
* `Linear Rank: 32`

`SAVE`

* `Data Type: BF16`
* `Save Every: 500`
* `Max Step Saves to Keep: 4`

`TRAINING`

* `Batch Size: 1`
* `Gradient Accumulation: 1`
* `Steps: 3000`
* `Optimizer: AdamW8Bit`
* `Learning Rate: 0.0001`
* `Weight Decay: 0.0001`
* `Timestep Type: Sigmoid`
* `Timestep Bias: Balanced`
* `EMA (Exponential Moving Average):`
* `Use EMA: OFF`
* `Text Encoder Optimizations:`
* `Unload TE: OFF`
* `Cache Text Embeddings: ON`
* `Regularization:`
* `Differential Output Preservation: OFF`
* `Blank Prompt Preservation: OFF`



17 Image data set - Resolution settings 512, 768,1024 (ON)

RTX 5090

1.30s/it, lr: 1.0e-04 loss: 3.742e-01]





Halfway through my training and it's already looking fantastic. Estimating about 1.5hrs to train 3000 steps with samples and saves.

CivitAI is about to be flooded with LORAs. Give this dude some money: [https://www.patreon.com/ostris](https://www.patreon.com/ostris)

https://redd.it/1p957k2
@rStableDiffusion
Z-Image: Best Practices for Maximum detail, Clarity and Quality?

Z-Image pics tend to be a *little blurry, a *little grainy, and a *little compressed looking.

Here's what I know (or think I know) so far that can help clear things up a bit.

\- Don't render at 1024x1024. Go higher to 1440x1440, 1920x1088 or 2048x2048. 3840x2160 is too high for this model natively.

\- Change the shift (ModelSamplingAuraFlow) from 3 (default) to 7. If the node is off, it defaults to 3.

\- Using more steps than 9 doesn't help, it hurts. 20 or 30 steps just results in blotchy skin.
EDIT \- The combination of euler and sgm_uniform solves the problem of skin getting blotchy at higher steps. But after SOME testing I can't notice any reason to go higher than 9 steps. The image isn't any sharper, there aren't any more details. Text accuracy doesn't increase either. Anatomy is equal in 9 or 25 steps etc. But maybe there is SOME reason increase steps? IDK

\- From my testing res2 and bong_tangent also result in worse looking blotchy skin. Euler/Beta or Euler/linear_quadratic seem to produce the cleanest images (I have NOT tried all combinations)

\- Lowering cfg from 1 to 0.8 will mute colors a bit, which you may like.
Raising cfg from 1 to 2 or 3 will saturate colors and make them pop while still remaining balanced. Any higher than 3 and your images burn. And honestly I prefer the look of cfg2 compared to cfg1, BUT raising cfg above 1 will also result in a near doubling of your render time.

\- Up-scaling with Topaz produces *very nice results, but if you know of an in-Comfy solution that is better I'd love to hear about it.

What have you found produces the best results from Z-Image?

https://redd.it/1p8xtln
@rStableDiffusion