Auto-generate caption files for LoRA training with local vision LLMs
Hey everyone!
I made a tool that automatically generates
Why this tool over other image annotators?
Modern models like Z-Image or Flux need long, precise, and well-structured denoscriptions to perform at their best — not just a string of tags separated by commas.
The advantage of multimodal vision LLMs is that you can give them instructions in natural language to define exactly the output format you want. The result: much richer denoscriptions, better organized, and truly adapted to what these models actually expect.
Built-in presets:
Z-Image / Flux: detailed, structured denoscriptions (composition, lighting, textures, atmosphere) — the prompt uses the [official Tongyi-MAI instructions](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo/blob/main/pe.py), the team behind Z-Image
Stable Diffusion: classic format with weight syntax
You can also create your own presets very easily by editing the config file.
Check out the project on GitHub: https://github.com/hydropix/ollama-image-describer
Feel free to open issues or suggest improvements!
https://redd.it/1pfusb3
@rStableDiffusion
Hey everyone!
I made a tool that automatically generates
.txt caption files for your training datasets using local Ollama vision models (Qwen3-VL, LLaVA, Llama Vision).Why this tool over other image annotators?
Modern models like Z-Image or Flux need long, precise, and well-structured denoscriptions to perform at their best — not just a string of tags separated by commas.
The advantage of multimodal vision LLMs is that you can give them instructions in natural language to define exactly the output format you want. The result: much richer denoscriptions, better organized, and truly adapted to what these models actually expect.
Built-in presets:
Z-Image / Flux: detailed, structured denoscriptions (composition, lighting, textures, atmosphere) — the prompt uses the [official Tongyi-MAI instructions](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo/blob/main/pe.py), the team behind Z-Image
Stable Diffusion: classic format with weight syntax
(element:1.2) and quality tagsYou can also create your own presets very easily by editing the config file.
Check out the project on GitHub: https://github.com/hydropix/ollama-image-describer
Feel free to open issues or suggest improvements!
https://redd.it/1pfusb3
@rStableDiffusion
huggingface.co
pe.py · Tongyi-MAI/Z-Image-Turbo at main
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
New image model based on Wan 2.2 just dropped 🔥 early results are surprisingly good!
So, a new image model based on Wan 2.2 just dropped quietly on HF, no big announcements or anything. From my early tests, it actually looks better than the regular Wan 2.2 T2V! I haven’t done a ton of testing yet, but the results so far look pretty promising.
https://huggingface.co/aquif-ai/aquif-Image-14B
https://preview.redd.it/m792hxr7zp5g1.png?width=1024&format=png&auto=webp&s=bf9743ea8381c6d2ecc97a936d6eaff1c5f1ace7
https://preview.redd.it/2kpuqvr7zp5g1.png?width=1024&format=png&auto=webp&s=9ae473cd3b4ffdaa45e4d5b3ffad096972ee0fea
https://preview.redd.it/czvkwur7zp5g1.png?width=1024&format=png&auto=webp&s=077b7b4645b71d178ad842b6764f66aef56f79b4
https://preview.redd.it/ha4gmft9zp5g1.png?width=950&format=png&auto=webp&s=f3e98db2fb05863d937d6c3b3e237cb732c5f8e4
https://redd.it/1pganxi
@rStableDiffusion
So, a new image model based on Wan 2.2 just dropped quietly on HF, no big announcements or anything. From my early tests, it actually looks better than the regular Wan 2.2 T2V! I haven’t done a ton of testing yet, but the results so far look pretty promising.
https://huggingface.co/aquif-ai/aquif-Image-14B
https://preview.redd.it/m792hxr7zp5g1.png?width=1024&format=png&auto=webp&s=bf9743ea8381c6d2ecc97a936d6eaff1c5f1ace7
https://preview.redd.it/2kpuqvr7zp5g1.png?width=1024&format=png&auto=webp&s=9ae473cd3b4ffdaa45e4d5b3ffad096972ee0fea
https://preview.redd.it/czvkwur7zp5g1.png?width=1024&format=png&auto=webp&s=077b7b4645b71d178ad842b6764f66aef56f79b4
https://preview.redd.it/ha4gmft9zp5g1.png?width=950&format=png&auto=webp&s=f3e98db2fb05863d937d6c3b3e237cb732c5f8e4
https://redd.it/1pganxi
@rStableDiffusion
huggingface.co
aquif-ai/aquif-Image-14B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Is Z-image a legit replacement for popular models, or just the new hotness?
Currently the subreddit is full of gushing over Z-image. I'm not experienced enough to draw my own conclusions from testing, but I was wondering whether it looks to be a legitimate replacement for current popular models (eg flux, sdxl, qwen), or whether it's just the flavour of the day?
https://redd.it/1pg93up
@rStableDiffusion
Currently the subreddit is full of gushing over Z-image. I'm not experienced enough to draw my own conclusions from testing, but I was wondering whether it looks to be a legitimate replacement for current popular models (eg flux, sdxl, qwen), or whether it's just the flavour of the day?
https://redd.it/1pg93up
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Z-Image Turbo Workflow Update: Console Z v2.1 - Modular UI, Color Match, Integrated I2I and Stage Previews
https://redd.it/1pg9jmn
@rStableDiffusion
https://redd.it/1pg9jmn
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Z-Image Turbo Workflow Update: Console Z v2.1 - Modular UI, Color Match, Integrated…
Explore this post and more from the StableDiffusion community