r/StableDiffusion – Telegram
Z-Image Turbo: The definitive guide to creating a realistic character LoRA

https://preview.redd.it/isxkvx7ir69g1.jpg?width=780&format=pjpg&auto=webp&s=f5c2f67a92b28dec4c368db5aef66bf50c04102b

I’m going to leave here a small guide on how to create LoRAs for **Z-Image Turbo** of **real people** (not styles or poses). I’ve followed many recommendations, and I think that with these parameters you can be sure to get an amazing result that you can use however you want.

The fastest and cheapest way to train it is by using **AI Toolkit**, either locally or on **RunPod**. In about **30 minutes** you can have the LoRA ready, depending on the GPU you have at home.

**1 – Collect photos to create the LoRA**

The goal is to gather **as many high-quality photos as possible**, although **medium-quality images are also valid**. Around 7**0-80 photos** should be enough. It’s important to include **good-quality face photos (close-ups)** as well as **some full-body shots**.

In the example I use here, I searched for photos of **Wednesday Addams**, and most of the images were **low quality and very grainy**. This is clearly a **bad scenario** for building a good dataset, but I collected whatever Google provided.

If the photos are grainy like in this example, the generations made with **Z-Image** will look similar. If you use a **cleaner dataset**, the results will be **cleaner as well**—simple as that.

**2 – Clean the dataset**

Once the photos are collected, they need to be **cleaned**. This means **reframing them**, **cropping out people or elements that shouldn’t appear**, and **removing watermarks, text, etc.**

After that, what I usually do is open them in **Lightroom** and simply crop them there, or alternatively use the **Windows image viewer** itself.

When exporting all the images, it’s recommended to set the **longest edge to 1024 pixels**.

**Optional step – Enhance low-quality photos**

If you’re working with **low-quality images**, an optional step is to **improve their sharpness** using tools like **Topaz** to recover some quality. However, this can **negatively affect certain parts of the image**, such as hair, which may end up looking **weird or plastic-like**.

Topaz allows you to **enhance only the face**, which is very useful and helps avoid these issues.

**3 – Open AI Toolkit (local or RunPod)**

Open **AI Toolkit** either locally or on **RunPod**. For the RunPod option, it’s as simple as going to the website, searching for the **Ostris AI Toolkit** template, and using a typical **RTX 5090**.

In about **30–40 minutes at most**, you’ll have the LoRA trained. Once it’s open, **add the dataset**.

**4 – Name the dataset**

There are different theories about the best way to do this: some say **don’t add anything**, others recommend **detailed phrases**. I’ve had the **best results by keeping it simple**. I usually name them with phrases like: "a photo of (subject’s name)”

If a photo contains something unusual that I **don’t want the model to learn**, I specify it, for example: “a photo of (subject’s name) with ponytail”

In the example photos of **Wednesday Addams**, I didn’t tag anything about her characteristic uniform. When generating images later, simply writing **“school uniform”** makes the model **automatically reproduce that specific outfit**.

**Tip:** It works to include photos **without the face**, only **body shots**, and label them as “a photo of (subject’s name) without face”

**5 – New Job**

Create the new job with the settings described below:

We **don’t use a trigger word**, it’s not necessary.

Select the **Z-Image Turbo model** with the **training adapter** (V2 required).

* If you’re using a card like the **RTX 5090**, **deselect “Low VRAM”**.
* If you have a **powerful GPU**, it’s highly recommended to select **NONE** for the **Quantization** of both the **Transformer** and **Text Encoder**.
* If you’re training locally with a **less powerful GPU**, use **Float8 Quantization**.

The **Linear Rank** is important—leave it at **64** if you want **realistic skin texture**. I’ve tried 16 and 32, and
the results aren’t good.

For the **recommended save steps**, I suggest keeping the **last 6–7 checkpoints**, saved every **250 steps**. If you’re using **4000 steps**, save the final step and the following ones: **3750, 3500, 3250, 3000, 2750, and 2500**

Then select **4000 steps**, **Adam8bit optimizer**, a **learning rate of 0.0002**, and **weight decay set to 0.0001**. It’s important to set the **Timestep Type to Sigmoid**.

After that, select the **dataset you created earlier** and set the training **resolution to 512**. In my tests, increasing the resolution doesn’t add much benefit.

Finally, **disable sample generation** (it’s not very useful and only makes the training take longer unnecessarily).

[This is how the workflow should look.](https://preview.redd.it/kdz0vkmxl69g1.png?width=3834&format=png&auto=webp&s=f98a14a9f9bc9db0025b5f5f8d922896affe5060)

Generate the job and save the **LoRAs** that are produced. They are usually **usable from around 2000 steps**, but they reach their **sweet spot between 3000 and 4000 steps**, depending on the dataset.

I’m leaving a file with the **dataset I used** \- [Wednesday Dataset ](https://drive.google.com/file/d/1hZhi1Yzv4XPxOGSHFZMwYTxHb7VBSH3R/view?usp=drive_link)

Aaand leaving here the **workflow I used to generate the examples**. It includes **FaceDetailer**, since it’s sometimes necessary - [Workflow](https://drive.google.com/file/d/1yOvubEN_drf2hBnc1zLFAKSCVxA4jfwj/view?usp=drive_link)

Some Examples with better quality:

https://preview.redd.it/fn0a9m6xr69g1.png?width=1088&format=png&auto=webp&s=57fc90486fbb18039c0cd0e765ff43702045b2a5

https://preview.redd.it/vpw4wm6xr69g1.png?width=1088&format=png&auto=webp&s=e78344272e8b8139a93c4d9dc58e07e781e13f6e

https://preview.redd.it/ukdzfu7xr69g1.png?width=1088&format=png&auto=webp&s=315c10348cf4e47404c5615437e2e325d6ce420b

https://preview.redd.it/jj966s7xr69g1.png?width=1088&format=png&auto=webp&s=6b55d7e3acdad8120ebcde08dff033aaf894ffe7

https://preview.redd.it/546snp7xr69g1.png?width=1088&format=png&auto=webp&s=54c1a1d94b7159360ae9956750451c2577f09c21

https://preview.redd.it/iiv7dn6xr69g1.png?width=1088&format=png&auto=webp&s=ef4a501b98a06ad824e525c65138eddcfbeaccba

https://redd.it/1pusfnz
@rStableDiffusion
A ComfyUI workflow where nobody understands shit anymore (including the author).
https://redd.it/1puviaq
@rStableDiffusion
Not very satisfied by Qwen Edit 2511

I've been testing it all day, but I'm not really happy with the results. I'm using the comfy workflow without the lighting LORA, with the FP8 model on a 5090 and the results are usually sub-par (a lot of detail changed, blurred images and so forth). Are your results perfect? Is there anything you'd suggest? Thanks in advance.

https://redd.it/1puynjh
@rStableDiffusion
You subscribed to Gemini pro, so naturally Google decided it's time for the model's daily lobotomy.

Let’s give a round of applause to the absolute geniuses in Google’s finance department. 👏

https://preview.redd.it/9wkvb492y99g1.png?width=2816&format=png&auto=webp&s=9029d735caa2e24a33929529b45cfc93b38d40d3

The strategy is brilliant, really. It’s the classic “Bait and Switch” masterclass.



Phase 1: Release a "Full Power" model. It’s smart, it follows instructions, it actually codes. It feels like magic. We all rush in, credit cards in hand, thinking, "Finally, a worthy competitor."



Phase 2 (We are here): Once the subnoscription revenue is locked in, start the "Dynamic Compute Rationing" (or whatever corporate euphemism they use for "throttling the hell out of it").



Has anyone else noticed that Gemini Advanced feels like it’s undergoing a progressive cognitive decline? It’s not just “hallucinating”; it’s straight-up refusing to think. It feels like they are actively A/B testing our pain threshold: "How much can we lower the parameter count and reduce the inference compute before this user cancels?"



It’s insulting. We are paying a premium for a “Pro” model, yet the output quality varies wildly depending on traffic, often degrading to the level of a free tier chatbot that’s had a few too many drinks.



It’s corporate gaslighting at its finest. They get us hooked on the high-IQ version, then quietly swap it out for a cheaper, dumber, quantized version to save on server costs, banking on the fact that we’re too lazy to switch ecosystems.



So, here is the million-dollar question:



At what point does this become actual consumer fraud or false advertising? We are paying for a specific tier of service (Advanced/Ultra), but the backend delivery is opaque and manipulated.



For those of you with legal backgrounds or experience in consumer protection:



Is there any precedent for class-action pressure against SaaS companies that dynamically degrade product quality after payment?



How do we actually verify the compute/model version we are being served?



Aside from voting with our wallets and cancelling (which I’m about to do), is there any regulatory body that actually cares about this kind of "digital shrinkflation"?



Disappointed but not surprised. Do better, Google. Or actually, just do what you advertised.

https://redd.it/1pv4v3i
@rStableDiffusion
PhotomapAI - A tool to optimise your dataset for lora training

One of the most common questions I see is: "How many images do I need for a good LoRA?"

The raw number matters much less than the diversity and value of each image. Even if all your images are high quality, if you have 50 photos of a person, but 40 of them are from the same angle in the same lighting, you aren’t training the lora on a concept, you’re training it to overfit on a single moment.

For example: say you’re training an Arcane LoRA. If your dataset has 100 images of Vi and only 10 images of other characters, you won't get a generalized style. Your LoRA will be heavily biased toward Vi (overfit) and won't know how to handle other characters (underfit).

I struggled with this in my own datasets, so I built a tool for my personal workflow based on PhotoMapAI (an awesome project by lstein on GitHub). It’s been invaluable for identifying low-quality images and refining my datasets to include only semantically different images. I thought this would be invaluable for others too so I created a PR.

Lstein’s original tool uses clip embeddings generated 100% locally to "map" your images based on their relationship to one another, the closer two images are on the map, the more similar they are. The feature I've added builds on this functionality, a feature called the Dataset Curator, which has now been merged into the official 1.0 release. It uses math to pick the most "valuable" images so you don't have to do it manually (which images are the most different based on the clip embeddings).


Have a read here first to understand how it works:
Image Dataset Curation - PhotoMapAI

Here's a quick summary:

How it works:

Diversity (Farthest Point Sampling): This algorithm finds "outliers." It’s great for finding rare angles or unique lighting. Warning: It also finds the "garbage" (blurry or broken images), which is actually helpful because it shows you exactly what you need to exclude first! Use this to balance out your dataset to optimise for variability.

Balance (K-Means): This groups your photos into clusters and picks a representative from each. If you have 100 full-body shots and 10 close-ups, it ensures your final selection preserves those ratios so the model doesn't "forget" the rare concepts. Use this to thin-out your dataset but maintain ratios.

The workflow I use:

1. Run the Curator with 20 iterations on FPS mode: This uses a Monte Carlo simulation to find "consensus" selections. Since these algorithms can be sensitive to the starting point, running multiple passes helps identify the images that are statistically the most important regardless of where the algorithm starts.
2. Check the Magenta (Core Outliers): These are the results that showed up in >90% of the Monte Carlo runs. If any of these are blurry or "junk," I just hit "Exclude." If they are not junk, this is good, it means that the analysis shows these images have the most different clip embeddings (and for good reasons).
3. Run it again if you excluded images. The algorithm will now ignore the junk and find the next best unique (but clean) images to fill the gap.
4. Export: It automatically copies your images and your .txt captions to a new folder, handling any filename collisions for you. You can even export an analysis to see how many times the images were selected in the process.

The goal isn't to have the most images; it’s to have a dataset where every single image teaches the model something new.


Huge thanks to lstein for creating the original tool which is incredible for its original use too.

Here's the release notes for 1.0.0 by lstein and install files:
Release v1.0.0 · lstein/PhotoMapAI



https://redd.it/1pv6aok
@rStableDiffusion
我做了一些列 LoRA 训练的教学视频及配套的汉化版 AITOOLKIT I've created a series of tutorial videos on LoRA training (with English subnoscripts)

我做了一些列 LoRA 训练的教学视频(配有英语字幕)及配套的汉化版 AITOOLKIT,以尽可能通俗易懂的方式详细介绍了每个参数的设置以及它们的作用,帮助你开启炼丹之路,如果你觉得视频内容对你有帮助,请帮我点赞关注支持一下✧٩(ˊωˋ)و✧
_
I've created a series of tutorial videos on LoRA training (with English subnoscripts) and a localized version of AITOOLKIT. These resources provide detailed explanations of each parameter's settings and their functions in the most accessible way possible, helping you embark on your AI model training journey. If you find the content helpful, please show your support by liking, following, and subscribing. ✧٩(ˊωˋ
)و✧

https://youtube.com/playlist?list=PLFJyQMhHMt0lC4X7LQACHSSeymynkS7KE&si=JvFOzt2mf54E7n27

https://redd.it/1pvb4x2
@rStableDiffusion