r/StableDiffusion – Telegram
Improve Z-Image Turbo Seed Diversity with this Custom Node.
https://redd.it/1pg0vvv
@rStableDiffusion
Auto-generate caption files for LoRA training with local vision LLMs

Hey everyone!

I made a tool that automatically generates .txt caption files for your training datasets using local Ollama vision models (Qwen3-VL, LLaVA, Llama Vision).

Why this tool over other image annotators?

Modern models like Z-Image or Flux need long, precise, and well-structured denoscriptions to perform at their best — not just a string of tags separated by commas.

The advantage of multimodal vision LLMs is that you can give them instructions in natural language to define exactly the output format you want. The result: much richer denoscriptions, better organized, and truly adapted to what these models actually expect.

Built-in presets:

Z-Image / Flux: detailed, structured denoscriptions (composition, lighting, textures, atmosphere) — the prompt uses the [official Tongyi-MAI instructions](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo/blob/main/pe.py), the team behind Z-Image
Stable Diffusion: classic format with weight syntax (element:1.2) and quality tags

You can also create your own presets very easily by editing the config file.

Check out the project on GitHub: https://github.com/hydropix/ollama-image-describer
Feel free to open issues or suggest improvements!

https://redd.it/1pfusb3
@rStableDiffusion
Is Z-image a legit replacement for popular models, or just the new hotness?

Currently the subreddit is full of gushing over Z-image. I'm not experienced enough to draw my own conclusions from testing, but I was wondering whether it looks to be a legitimate replacement for current popular models (eg flux, sdxl, qwen), or whether it's just the flavour of the day?

https://redd.it/1pg93up
@rStableDiffusion