NEW BOT Телеграм, страница

6 views13:40

I implemented text encoder training into Z-Image-Turbo training using AI-Toolkit and here is how you can too!

I love Kohya and Ostris, but I have been very disappointed at the lack of text encoder training in all the newer models from WAN onwards.

This became especially noticeable in Z-Image-Turbo, where without text encoder training it would really struggle to portray a character or other concept using your chosen token if it is not a generic token like "woman" or whatever.

I have spent 5 hours into the night yesterday vibe-coding and troubleshooting implementing text encoder training into AI-Tookits Z-Image-Turbo training and succeeded. however this is highly experimental still. it was very easy to overtrain the text encoder and very easy to undertrain it too.

so far the best settings i had were:

64 dim/alpha, 2e-4 unet lr on a cosine schedule with a 1e-4 min lr, and a separate 1e-5 text encoder lr.

however this was still somewhat overtrained. i am now testing various lower text encoder lrs and unet lrs and dim combinations.

to implement and use text encoder training, you need the following files:

https://www.dropbox.com/scl/fi/d1efo1o7838o84f69vhi4/kohya\_lora.py?rlkey=13v9un7ulhj2ix7to9nflb8f7&st=h0cqwz40&dl=1

https://www.dropbox.com/scl/fi/ge5g94h2s49tuoqxps0da/BaseSDTrainProcess.py?rlkey=10r175euuh22rl0jmwgykxd3q&st=gw9nacno&dl=1

https://www.dropbox.com/scl/fi/hpy3mo1qnecb1nqeybbd9/\_\_init\_\_.py?rlkey=bds8flo9zq3flzpq4fz7vxhlc&st=jj9r20b2&dl=1

https://www.dropbox.com/scl/fi/ttw3z287cj8lveq56o1b4/z\_image.py?rlkey=1tgt28rfsev7vcaql0etsqov7&st=zbj22fjo&dl=1

https://www.dropbox.com/scl/fi/dmsny3jkof6mdns6tfz5z/lora\_special.py?rlkey=n0uk9rwm79uw60i2omf9a4u2i&st=cfzqgnxk&dl=1

put basesdtrainprocess into /jobs/process, kohyalora and loraspecial into /toolkit/, and zimage into /extensions_built_in/diffusion_models/z_image

put the following into your config.yaml under train: train_text_encoder: true text_encoder_lr: 0.00001

you also need to not quantize the TE or cache the text embeddings or unload the te.

the init is a custom lora load node because comfyui cannot load the lora text encoder parts otherwise. put it under /custom_nodes/qwen_te_lora_loader/ in your comfyui directory. the node is then called Load LoRA (Z-Image Qwen TE).

you then need to restart your comfyui.

please note that training the text encoder will increase your vram usage considerably, and training time will be somewhat increased too.

i am currently using 96.x gb vram on a rented H200 with 140gb vram, with no unet or te quantization, no caching, no adamw8bit (i am using adamw aka 32 bit), and no gradient checkpointing. you can for sure fit this into a A100 80gb with these optimizations turned on, maybe even into 48gb vram A6000.

hopefully someone else will experiment with this too!

If you like my experimentation and free share of models and knowledge with the community, consider donating to my Patreon or Ko-Fi!

https://redd.it/1prdbke
@rStableDiffusion

6 views14:40

r/StableDiffusion

Disappointment about Qwen-Image-Layered

This is frustrating:

there is no control over the content of the layers. (Or I couldn't tell him that)
unsatisfactory filling quality
it requires a lot of resources,
the work takes a lot of time

https://preview.redd.it/iopdkwemhc8g1.png?width=720&format=png&auto=webp&s=668fe36625d35ae3cf0a1f438d461f3323b92a84

https://preview.redd.it/npkw0tythc8g1.png?width=720&format=png&auto=webp&s=a5567878f9cc8df17aa56455b4c29b42be6a2c97

https://preview.redd.it/zfku2522ic8g1.png?width=720&format=png&auto=webp&s=4f3cb91ec1e23584237f5afcef4c88321fa592f1

2 leyers \(720\*1024\), 20 steps, time 16:25

https://preview.redd.it/th9bnivuhc8g1.png?width=368&format=png&auto=webp&s=1fb5380f2db0405ea68ecbb16d72f6663a949ffb

https://preview.redd.it/b8l97oavhc8g1.png?width=368&format=png&auto=webp&s=5566a98b32223a77e9e6450ddec7ea9d28ab68a8

https://preview.redd.it/62crq6ovhc8g1.png?width=368&format=png&auto=webp&s=a527cbcffc2c5a619b41f11e349167fb20971b0f

3 leyers \(368\*512\), 20 steps, time 07:04

I tested \\"Qwen\_Image\_Layered-Q5\_K\_M.gguf\\", because I don't have a very powerful computer.

https://redd.it/1prc89p
@rStableDiffusion

7 views15:40

r/StableDiffusion

Yes, it is THIS bad!
https://redd.it/1prhofj
@rStableDiffusion

8 views17:40

r/StableDiffusion

Loras work on DFloat11 now (100% lossless).
https://redd.it/1prjnn9
@rStableDiffusion

7 views18:40

r/StableDiffusion

Let’s reconstruct and document the history of open generative media before we forget it

If you have been here for a while you must have noticed how fast things change. Maybe you remember that just in the past 3 years we had AUTOMATIC1111, Invoke, text embeddings, IPAdapters, Lycoris, Deforum, AnimateDiff, CogVideoX, etc. So many tools, models and techniques that seemed to pop out of nowhere on a weekly basis, many of which are now obsolete or deprecated.

Many people who have contributed to the community with models, LoRAs, noscripts, content creators that make free tutorials for everyone to learn, companies like Stability AI that released open source models, are now forgotten.

Personally, I’ve been here since the early days of SD1.5 and I’ve observed the evolution of this community together with rest of the open source AI ecosystem. I’ve seen the impact that things like ComfyUI, SDXL, Flux, Wan, Qwen, and now Z-Image had in the community and I’m noticing a shift towards things becoming more centralized, less open, less local. There are several reasons why this is happening, maybe because models are becoming increasingly bigger, maybe unsustainable businesses models are dying off, maybe the people who contribute are burning out or getting busy with other stuff, who knows? ComfyUI is focusing more on developing their business side, Invoke was acquired by Adobe, Alibaba is keeping newer versions of Wan behind APIs, Flux is getting too big for local inference while hardware is getting more expensive…

In any case, I’d like to open this discussion for documentation purposes, so that we can collectively write about our experiences with this emerging technology over the past years. Feel free to write whatever you want about what attracted you to this community, what you enjoy about it, what impact it had on you personally or professionally, projects (even if small and obscure ones) that you engaged with, extensions/custom nodes you used, platforms, content creators you learned from, people like Kijai, Ostris and many others (write their names in your replies) that you might be thankful for, anything really.

I hope many of you can contribute to this discussion with your experiences so we can have a good common source of information, publicly available, about how open generative media evolved, and we are in a better position to assess where it’s going.

https://redd.it/1prp3cz
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

6 views01:40

r/StableDiffusion

Final Fantasy Tactics Style LoRA for Z-Image-Turbo - Link in denoscription

https://redd.it/1prt5oj
@rStableDiffusion

From the StableDiffusion community on Reddit: Final Fantasy Tactics Style LoRA for Z-Image-Turbo - Link in denoscription

Explore this post and more from the StableDiffusion community

6 views02:40

r/StableDiffusion