NEW BOT Телеграм, страница

Auto-generate caption files for LoRA training with local vision LLMs

Hey everyone!

I made a tool that automatically generates .txt caption files for your training datasets using local Ollama vision models (Qwen3-VL, LLaVA, Llama Vision).

Why this tool over other image annotators?

Modern models like Z-Image or Flux need long, precise, and well-structured denoscriptions to perform at their best — not just a string of tags separated by commas.

The advantage of multimodal vision LLMs is that you can give them instructions in natural language to define exactly the output format you want. The result: much richer denoscriptions, better organized, and truly adapted to what these models actually expect.

Built-in presets:

Z-Image / Flux: detailed, structured denoscriptions (composition, lighting, textures, atmosphere) — the prompt uses the [official Tongyi-MAI instructions](https://huggingface.co/spaces/Tongyi-MAI/Z-Image-Turbo/blob/main/pe.py), the team behind Z-Image
Stable Diffusion: classic format with weight syntax (element:1.2) and quality tags

You can also create your own presets very easily by editing the config file.

Check out the project on GitHub: https://github.com/hydropix/ollama-image-describer
Feel free to open issues or suggest improvements!

https://redd.it/1pfusb3
@rStableDiffusion

huggingface.co

pe.py · Tongyi-MAI/Z-Image-Turbo at main

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

7 views04:40

r/StableDiffusion

New image model based on Wan 2.2 just dropped 🔥 early results are surprisingly good!

So, a new image model based on Wan 2.2 just dropped quietly on HF, no big announcements or anything. From my early tests, it actually looks better than the regular Wan 2.2 T2V! I haven’t done a ton of testing yet, but the results so far look pretty promising.
https://huggingface.co/aquif-ai/aquif-Image-14B

https://preview.redd.it/m792hxr7zp5g1.png?width=1024&format=png&auto=webp&s=bf9743ea8381c6d2ecc97a936d6eaff1c5f1ace7

https://preview.redd.it/2kpuqvr7zp5g1.png?width=1024&format=png&auto=webp&s=9ae473cd3b4ffdaa45e4d5b3ffad096972ee0fea

https://preview.redd.it/czvkwur7zp5g1.png?width=1024&format=png&auto=webp&s=077b7b4645b71d178ad842b6764f66aef56f79b4

https://preview.redd.it/ha4gmft9zp5g1.png?width=950&format=png&auto=webp&s=f3e98db2fb05863d937d6c3b3e237cb732c5f8e4

https://redd.it/1pganxi
@rStableDiffusion

huggingface.co

aquif-ai/aquif-Image-14B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

8 views06:40

r/StableDiffusion

Is Z-image a legit replacement for popular models, or just the new hotness?

Currently the subreddit is full of gushing over Z-image. I'm not experienced enough to draw my own conclusions from testing, but I was wondering whether it looks to be a legitimate replacement for current popular models (eg flux, sdxl, qwen), or whether it's just the flavour of the day?

https://redd.it/1pg93up
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

7 views07:40

r/StableDiffusion

Z-Image Turbo Workflow Update: Console Z v2.1 - Modular UI, Color Match, Integrated I2I and Stage Previews

https://redd.it/1pg9jmn
@rStableDiffusion

From the StableDiffusion community on Reddit: Z-Image Turbo Workflow Update: Console Z v2.1 - Modular UI, Color Match, Integrated…

Explore this post and more from the StableDiffusion community

6 views09:40

r/StableDiffusion

5 views09:40

About

Blog

Apps

Platform