Z-Image-Turbo - GPU Benchmark (RTX 5090, RTX Pro 6000, RTX 3090 (Ti))
https://redd.it/1pfyk0y
@rStableDiffusion
https://redd.it/1pfyk0y
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Z-Image-Turbo - GPU Benchmark (RTX 5090, RTX Pro 6000, RTX 3090 (Ti))
Explore this post and more from the StableDiffusion community
Auto-generate caption files for LoRA training with local vision LLMs
Hey everyone!
I made a tool that automatically generates
Why this tool over other image annotators?
Modern models like Z-Image or Flux need long, precise, and well-structured denoscriptions to perform at their best — not just a string of tags separated by commas.
The advantage of multimodal vision LLMs is that you can give them instructions in natural language to define exactly the output format you want. The result: much richer denoscriptions, better organized, and truly adapted to what these models actually expect.
Built-in presets:
Z-Image / Flux: detailed, structured denoscriptions (composition, lighting, textures, atmosphere) — the prompt uses the [official Tongyi-MAI instructions](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo/blob/main/pe.py), the team behind Z-Image
Stable Diffusion: classic format with weight syntax
You can also create your own presets very easily by editing the config file.
Check out the project on GitHub: https://github.com/hydropix/ollama-image-describer
Feel free to open issues or suggest improvements!
https://redd.it/1pfusb3
@rStableDiffusion
Hey everyone!
I made a tool that automatically generates
.txt caption files for your training datasets using local Ollama vision models (Qwen3-VL, LLaVA, Llama Vision).Why this tool over other image annotators?
Modern models like Z-Image or Flux need long, precise, and well-structured denoscriptions to perform at their best — not just a string of tags separated by commas.
The advantage of multimodal vision LLMs is that you can give them instructions in natural language to define exactly the output format you want. The result: much richer denoscriptions, better organized, and truly adapted to what these models actually expect.
Built-in presets:
Z-Image / Flux: detailed, structured denoscriptions (composition, lighting, textures, atmosphere) — the prompt uses the [official Tongyi-MAI instructions](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo/blob/main/pe.py), the team behind Z-Image
Stable Diffusion: classic format with weight syntax
(element:1.2) and quality tagsYou can also create your own presets very easily by editing the config file.
Check out the project on GitHub: https://github.com/hydropix/ollama-image-describer
Feel free to open issues or suggest improvements!
https://redd.it/1pfusb3
@rStableDiffusion
huggingface.co
pe.py · Tongyi-MAI/Z-Image-Turbo at main
We’re on a journey to advance and democratize artificial intelligence through open source and open science.