train for a looong time (say, 1000 steps), the likelihood of hitting those tail regions is almost zero. The problem? When the model finally does see them, the loss spikes hard, throwing training out of whack—even with a huge batch size.
In high Batch Sizes (BS), this instability might be diluted. In small BS, there is a small probability that most samples in a batch fall into these "**sparse timestep**" zones—an anomaly the model hasn't seen much—causing instability.
**The Solution:** I manually modified the configuration to set **Min SNR Gamma = 5**.
* This drastically reduced the loss at low timesteps.
* Surprisingly, it also alleviated the loss spikes at the 950-1000 range. The high-step instability might actually be a ripple effect of the low-step spikes.
https://preview.redd.it/bc29t9aoylhg1.png?width=323&format=png&auto=webp&s=296f6f9c0359f20b143d959cddcb16683d82a8c9
# 3. How to Implement
If you are using unmodified OneTrainer or AI Toolkit, Z-IMAGE might not support the Min SNR option directly yet. You can try **limiting the minimum timesteps** to achieve a similar effect. And use logit normal and dynmatic timestep shift on OneTrainer
Alternatively, you can use my fork of OneTrainer:
\*\*GitHub:\*\*[https://github.com/gesen2egee/OneTrainer](https://github.com/gesen2egee/OneTrainer)
My fork includes support for:
* LoKR
* Min SNR Gamma
* A modified optimizer: `automagic_sinkgd` (which already includes Kahan summation).
**(If you want to maintain the original fork, all optimizers ending with \_ADV are versions that have already added Stochastic rounding, which can greatly solve the precision problem.)**
Hope this helps anyone else struggling with ZIB training!
https://redd.it/1qwc4t0
@rStableDiffusion
In high Batch Sizes (BS), this instability might be diluted. In small BS, there is a small probability that most samples in a batch fall into these "**sparse timestep**" zones—an anomaly the model hasn't seen much—causing instability.
**The Solution:** I manually modified the configuration to set **Min SNR Gamma = 5**.
* This drastically reduced the loss at low timesteps.
* Surprisingly, it also alleviated the loss spikes at the 950-1000 range. The high-step instability might actually be a ripple effect of the low-step spikes.
https://preview.redd.it/bc29t9aoylhg1.png?width=323&format=png&auto=webp&s=296f6f9c0359f20b143d959cddcb16683d82a8c9
# 3. How to Implement
If you are using unmodified OneTrainer or AI Toolkit, Z-IMAGE might not support the Min SNR option directly yet. You can try **limiting the minimum timesteps** to achieve a similar effect. And use logit normal and dynmatic timestep shift on OneTrainer
Alternatively, you can use my fork of OneTrainer:
\*\*GitHub:\*\*[https://github.com/gesen2egee/OneTrainer](https://github.com/gesen2egee/OneTrainer)
My fork includes support for:
* LoKR
* Min SNR Gamma
* A modified optimizer: `automagic_sinkgd` (which already includes Kahan summation).
**(If you want to maintain the original fork, all optimizers ending with \_ADV are versions that have already added Stochastic rounding, which can greatly solve the precision problem.)**
Hope this helps anyone else struggling with ZIB training!
https://redd.it/1qwc4t0
@rStableDiffusion
Z Image lora training is solved! A new Ztuner trainer soon!
Finally, the day we have all been waiting for has arrived. On X we got the answer:
https://x.com/bdsqlsz/status/2019349964602982494
The problem was that adam8bit performs very poorly, and even AdamW and earlier it was found by a user "None9527", but now we have the answer: it is "prodigy_adv + Stochastic rounding". This optimizer will get the job done and not only this.
Soon we will get a new trainer called "Ztuner".
And as of now OneTrainer exposes Prodigy_Adv as an optimizer option and explicitly lists Stochastic Rounding as a toggleable feature for BF16/FP16 training.
Hopefully we will get this implementation soon in other trainers too.
https://redd.it/1qwj4hu
@rStableDiffusion
Finally, the day we have all been waiting for has arrived. On X we got the answer:
https://x.com/bdsqlsz/status/2019349964602982494
The problem was that adam8bit performs very poorly, and even AdamW and earlier it was found by a user "None9527", but now we have the answer: it is "prodigy_adv + Stochastic rounding". This optimizer will get the job done and not only this.
Soon we will get a new trainer called "Ztuner".
And as of now OneTrainer exposes Prodigy_Adv as an optimizer option and explicitly lists Stochastic Rounding as a toggleable feature for BF16/FP16 training.
Hopefully we will get this implementation soon in other trainers too.
https://redd.it/1qwj4hu
@rStableDiffusion
X (formerly Twitter)
青龍聖者 (@bdsqlsz) on X
Through ablation experiments and collaboration with the official team, the training problem was finally solved.
Recommended configuration now: prodigy_adv + Stochastic rounding
It has been confirmed that adam8bit performs very poorly, and adamw seems to do…
Recommended configuration now: prodigy_adv + Stochastic rounding
It has been confirmed that adam8bit performs very poorly, and adamw seems to do…
Ace step 1.5 instrument only = garbage ?
Is it me or does everyone else have the same problem ? i really just want calm southing piano music and everything i get is like dubstep .... any advices ?
https://redd.it/1qwe940
@rStableDiffusion
Is it me or does everyone else have the same problem ? i really just want calm southing piano music and everything i get is like dubstep .... any advices ?
https://redd.it/1qwe940
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community