Unable to create images with Illustrious XL
Hello,
I have not worked with Stable Diffusion in a long time. I returned because I wanted to use it to make some concept Pixel Art for an upcoming project. I did some research on what is currently the go to system. I ended up downloading and setting up Forge. I got the Illustrious-XL base model, but anything I enter results in abstract art. Even a simple single word like "alien" does not show anything viable.
I am sorry, if I am too noobish, but how can I investigate what fails?
https://preview.redd.it/8moclc8umcmg1.png?width=1920&format=png&auto=webp&s=d63b94479fb1f83798922fe1d6f17387f9350d4e
https://redd.it/1rhm36b
@stablediffusion_r
Hello,
I have not worked with Stable Diffusion in a long time. I returned because I wanted to use it to make some concept Pixel Art for an upcoming project. I did some research on what is currently the go to system. I ended up downloading and setting up Forge. I got the Illustrious-XL base model, but anything I enter results in abstract art. Even a simple single word like "alien" does not show anything viable.
I am sorry, if I am too noobish, but how can I investigate what fails?
https://preview.redd.it/8moclc8umcmg1.png?width=1920&format=png&auto=webp&s=d63b94479fb1f83798922fe1d6f17387f9350d4e
https://redd.it/1rhm36b
@stablediffusion_r
Civitai
Illustrious-XL - v0.1 | Illustrious Checkpoint | Civitai
Illustrious XL: A Powerful Model for Illustration Illustrious XL is an advanced Stable Diffusion XL (SD XL)-based model, developed by OnomaAI Resea...
How to get clean audio using ace step 1.5?
I tried it few times with comfyui but I got bad audio, is it possible with comfyui?
https://redd.it/1rhsccd
@stablediffusion_r
I tried it few times with comfyui but I got bad audio, is it possible with comfyui?
https://redd.it/1rhsccd
@stablediffusion_r
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
ComfyUI Custom Node - Music Flamingo
https://github.com/C0untFloyd/comfyui-musicflamingo
https://redd.it/1rhtgsn
@stablediffusion_r
https://github.com/C0untFloyd/comfyui-musicflamingo
https://redd.it/1rhtgsn
@stablediffusion_r
GitHub
GitHub - C0untFloyd/comfyui-musicflamingo: ComfyUI custom node for NVIDIA Music Flamingo
ComfyUI custom node for NVIDIA Music Flamingo. Contribute to C0untFloyd/comfyui-musicflamingo development by creating an account on GitHub.
[CVPR 2026] ImageCritic: Correcting Inconsistencies in Generated Images!
https://redd.it/1rhvhmc
@stablediffusion_r
https://redd.it/1rhvhmc
@stablediffusion_r
Reddit
From the StableDiffusion community on Reddit: [CVPR 2026] ImageCritic: Correcting Inconsistencies in Generated Images!
Explore this post and more from the StableDiffusion community
Basic Guide to Creating Character LoRAs for Klein 9B
***Downloadable LoRAs at the end of the guide***
Disclaimer: This guide was not created using ChatGPT, however I did use it to translate the text into English.
This guide is based on my numerous tests creating LoRAs with AI Toolkit, including characters, styles, and poses. There may be better methods, but so far I haven’t found a configuration that outperforms these results. Here I will focus exclusively on the process for character LoRAs. Parameters for actions or poses are different and are not covered in this guide. If anyone would like to contribute improvements, they are welcome.
# 1️⃣ Dataset Preparation
Image Selection:
The first step is gathering the photos for the dataset. The idea is simple: the higher the quality and the more variety, the better. There is no strict minimum or maximum number of photos, what really matters is that the dataset is good.
In the example Lora created for this guide:
Well-known character from a TV Series.
Few images available, many low-quality photos (very grainy images)
Final dataset: 50 images:
Mostly face shots
Some half-body
Very few full-body
It’s a difficult case, but even so, it’s possible to obtain good results.
Resolution and Basic Enhancement:
Shortest side at least 1024 pixels
Basic sharpening applied in Lightroom (optional)
No extreme artificial upscaling
It’s recommended to crop to standard aspect ratios: 3:4, 1:1, or 16:9, always trying to frame the subject properly.
Dataset Cleaning:
Very important: Remove watermarks or text, delete unwanted people, remove distracting elements. This can be done using the standard Windows image editor, AI erase tools, and manual cropping if necessary.
# 2️⃣ Captions (VERY IMPORTANT)
Once the dataset is ready, load it into AI Toolkit. The next step is adding captions to each image. After many tests, I’ve confirmed that:
❌ Using only a single token (e.g., merlinaw) is NOT effective
✅ It’s better to use a denoscriptive base phrases
This allows you to:
Introduce the token at the beginning
Reinforce key characteristics
Better control variations
❌ Do not describe characteristics that are always present.
✅ Only describe elements when there are variations.
Edit: You should include the person/character distinctive name at the beginning of each sentence, as in this example “photo of Merlina.” You shouldn’t include the character’s gender in the caption; a simple distinctive name would be enough.
If the character has a very distinctive hairstyle that appears in most images Do NOT mention it in the captions. But if in some images the character has a ponytail or different loose hair styles, then you should specify it.
The same applies to Signature uniform, Iconic dress, special poses or specific expressions.
For example, if a character is known for making the “rock horns” hand gesture, and the base model does not represent it correctly, then it’s worth describing it.
Example Captions from This Guide’s LoRA
>photo of merlina wearing school uniform
>photo of merlina wearing a dress
With this approach, when generating images using the LoRA, if you write “school uniform,” the model will understand it refers to the character’s signature uniform.
How Many Images to Use?
I’ve tested with: 25 images 50 images and 100 images
Conclusion: It depends heavily on the dataset quality.
With 25 good images, you can achieve something usable.
With 50–100 images, it usually works very well.
More than 100 can improve it even further.
It’s better to have too many good images than too few.
# 3️⃣ Training (Using AI Tookit)
Recommended Settings:
🔹 Trigger Word Leave this field empty.
🔹 Steps Recommended average: 3500 steps
Similarity starts to become noticeable around 1500 steps
Around 2500 it usually improves significantly
Continues improving progressively until 3000–3500 steps
Recommendation: Save every 100 steps and test results progressively.
🔹 Learning Rate: 0.00008
🔹 Timestep: Linear
I’ve tested Weighted
***Downloadable LoRAs at the end of the guide***
Disclaimer: This guide was not created using ChatGPT, however I did use it to translate the text into English.
This guide is based on my numerous tests creating LoRAs with AI Toolkit, including characters, styles, and poses. There may be better methods, but so far I haven’t found a configuration that outperforms these results. Here I will focus exclusively on the process for character LoRAs. Parameters for actions or poses are different and are not covered in this guide. If anyone would like to contribute improvements, they are welcome.
# 1️⃣ Dataset Preparation
Image Selection:
The first step is gathering the photos for the dataset. The idea is simple: the higher the quality and the more variety, the better. There is no strict minimum or maximum number of photos, what really matters is that the dataset is good.
In the example Lora created for this guide:
Well-known character from a TV Series.
Few images available, many low-quality photos (very grainy images)
Final dataset: 50 images:
Mostly face shots
Some half-body
Very few full-body
It’s a difficult case, but even so, it’s possible to obtain good results.
Resolution and Basic Enhancement:
Shortest side at least 1024 pixels
Basic sharpening applied in Lightroom (optional)
No extreme artificial upscaling
It’s recommended to crop to standard aspect ratios: 3:4, 1:1, or 16:9, always trying to frame the subject properly.
Dataset Cleaning:
Very important: Remove watermarks or text, delete unwanted people, remove distracting elements. This can be done using the standard Windows image editor, AI erase tools, and manual cropping if necessary.
# 2️⃣ Captions (VERY IMPORTANT)
Once the dataset is ready, load it into AI Toolkit. The next step is adding captions to each image. After many tests, I’ve confirmed that:
❌ Using only a single token (e.g., merlinaw) is NOT effective
✅ It’s better to use a denoscriptive base phrases
This allows you to:
Introduce the token at the beginning
Reinforce key characteristics
Better control variations
❌ Do not describe characteristics that are always present.
✅ Only describe elements when there are variations.
Edit: You should include the person/character distinctive name at the beginning of each sentence, as in this example “photo of Merlina.” You shouldn’t include the character’s gender in the caption; a simple distinctive name would be enough.
If the character has a very distinctive hairstyle that appears in most images Do NOT mention it in the captions. But if in some images the character has a ponytail or different loose hair styles, then you should specify it.
The same applies to Signature uniform, Iconic dress, special poses or specific expressions.
For example, if a character is known for making the “rock horns” hand gesture, and the base model does not represent it correctly, then it’s worth describing it.
Example Captions from This Guide’s LoRA
>photo of merlina wearing school uniform
>photo of merlina wearing a dress
With this approach, when generating images using the LoRA, if you write “school uniform,” the model will understand it refers to the character’s signature uniform.
How Many Images to Use?
I’ve tested with: 25 images 50 images and 100 images
Conclusion: It depends heavily on the dataset quality.
With 25 good images, you can achieve something usable.
With 50–100 images, it usually works very well.
More than 100 can improve it even further.
It’s better to have too many good images than too few.
# 3️⃣ Training (Using AI Tookit)
Recommended Settings:
🔹 Trigger Word Leave this field empty.
🔹 Steps Recommended average: 3500 steps
Similarity starts to become noticeable around 1500 steps
Around 2500 it usually improves significantly
Continues improving progressively until 3000–3500 steps
Recommendation: Save every 100 steps and test results progressively.
🔹 Learning Rate: 0.00008
🔹 Timestep: Linear
I’ve tested Weighted
Basic Guide to Creating Character LoRAs for Klein 9B
**\*\*\*Downloadable LoRAs at the end of the guide\*\*\***
**Disclaimer**: This guide was not created using ChatGPT, however I did use it to translate the text into English.
This guide is based on my numerous tests creating LoRAs with AI Toolkit, including characters, styles, and poses. There may be better methods, but so far I haven’t found a configuration that outperforms these results. Here I will focus exclusively on the process for character LoRAs. Parameters for actions or poses are different and are not covered in this guide. If anyone would like to contribute improvements, they are welcome.
# 1️⃣ Dataset Preparation
**Image Selection:**
The first step is gathering the photos for the dataset. The idea is simple: the higher the quality and the more variety, the better. There is no strict minimum or maximum number of photos, what really matters is that the dataset is good.
In the example Lora created for this guide:
* Well-known character from a TV Series.
* Few images available, many low-quality photos (very grainy images)
Final dataset: 50 images:
* Mostly face shots
* Some half-body
* Very few full-body
It’s a difficult case, but even so, it’s possible to obtain good results.
**Resolution and Basic Enhancement:**
* Shortest side at least 1024 pixels
* Basic sharpening applied in Lightroom (optional)
* No extreme artificial upscaling
It’s recommended to crop to standard aspect ratios: 3:4, 1:1, or 16:9, always trying to frame the subject properly.
**Dataset Cleaning:**
Very important: Remove watermarks or text, delete unwanted people, remove distracting elements. This can be done using the standard Windows image editor, AI erase tools, and manual cropping if necessary.
# 2️⃣ Captions (VERY IMPORTANT)
Once the dataset is ready, load it into AI Toolkit. The next step is adding captions to each image. After many tests, I’ve confirmed that:
❌ Using only a single token (e.g., merlinaw) is NOT effective
✅ It’s better to use a denoscriptive base phrases
This allows you to:
* Introduce the token at the beginning
* Reinforce key characteristics
* Better control variations
❌ Do not describe characteristics that are always present.
✅ Only describe elements when there are variations.
**Edit**: You should include the person/character distinctive name at the beginning of each sentence, as in this example “photo of Merlina.” You shouldn’t include the character’s gender in the caption; a simple distinctive name would be enough.
If the character has a very distinctive hairstyle that appears in most images Do NOT mention it in the captions. But if in some images the character has a ponytail or different loose hair styles, then you should specify it.
The same applies to Signature uniform, Iconic dress, special poses or specific expressions.
For example, if a character is known for making the “rock horns” hand gesture, and the base model does not represent it correctly, then it’s worth describing it.
Example Captions from This Guide’s LoRA
>photo of merlina wearing school uniform
>photo of merlina wearing a dress
With this approach, when generating images using the LoRA, if you write “school uniform,” the model will understand it refers to the character’s signature uniform.
**How Many Images to Use?**
I’ve tested with: 25 images 50 images and 100 images
Conclusion: It depends heavily on the dataset quality.
With 25 good images, you can achieve something usable.
With 50–100 images, it usually works very well.
More than 100 can improve it even further.
It’s better to have too many good images than too few.
# 3️⃣ Training (Using AI Tookit)
**Recommended Settings:**
🔹 Trigger Word Leave this field empty.
🔹 Steps Recommended average: 3500 steps
* Similarity starts to become noticeable around 1500 steps
* Around 2500 it usually improves significantly
* Continues improving progressively until 3000–3500 steps
Recommendation: Save every 100 steps and test results progressively.
🔹 Learning Rate: 0.00008
🔹 Timestep: Linear
I’ve tested Weighted
**\*\*\*Downloadable LoRAs at the end of the guide\*\*\***
**Disclaimer**: This guide was not created using ChatGPT, however I did use it to translate the text into English.
This guide is based on my numerous tests creating LoRAs with AI Toolkit, including characters, styles, and poses. There may be better methods, but so far I haven’t found a configuration that outperforms these results. Here I will focus exclusively on the process for character LoRAs. Parameters for actions or poses are different and are not covered in this guide. If anyone would like to contribute improvements, they are welcome.
# 1️⃣ Dataset Preparation
**Image Selection:**
The first step is gathering the photos for the dataset. The idea is simple: the higher the quality and the more variety, the better. There is no strict minimum or maximum number of photos, what really matters is that the dataset is good.
In the example Lora created for this guide:
* Well-known character from a TV Series.
* Few images available, many low-quality photos (very grainy images)
Final dataset: 50 images:
* Mostly face shots
* Some half-body
* Very few full-body
It’s a difficult case, but even so, it’s possible to obtain good results.
**Resolution and Basic Enhancement:**
* Shortest side at least 1024 pixels
* Basic sharpening applied in Lightroom (optional)
* No extreme artificial upscaling
It’s recommended to crop to standard aspect ratios: 3:4, 1:1, or 16:9, always trying to frame the subject properly.
**Dataset Cleaning:**
Very important: Remove watermarks or text, delete unwanted people, remove distracting elements. This can be done using the standard Windows image editor, AI erase tools, and manual cropping if necessary.
# 2️⃣ Captions (VERY IMPORTANT)
Once the dataset is ready, load it into AI Toolkit. The next step is adding captions to each image. After many tests, I’ve confirmed that:
❌ Using only a single token (e.g., merlinaw) is NOT effective
✅ It’s better to use a denoscriptive base phrases
This allows you to:
* Introduce the token at the beginning
* Reinforce key characteristics
* Better control variations
❌ Do not describe characteristics that are always present.
✅ Only describe elements when there are variations.
**Edit**: You should include the person/character distinctive name at the beginning of each sentence, as in this example “photo of Merlina.” You shouldn’t include the character’s gender in the caption; a simple distinctive name would be enough.
If the character has a very distinctive hairstyle that appears in most images Do NOT mention it in the captions. But if in some images the character has a ponytail or different loose hair styles, then you should specify it.
The same applies to Signature uniform, Iconic dress, special poses or specific expressions.
For example, if a character is known for making the “rock horns” hand gesture, and the base model does not represent it correctly, then it’s worth describing it.
Example Captions from This Guide’s LoRA
>photo of merlina wearing school uniform
>photo of merlina wearing a dress
With this approach, when generating images using the LoRA, if you write “school uniform,” the model will understand it refers to the character’s signature uniform.
**How Many Images to Use?**
I’ve tested with: 25 images 50 images and 100 images
Conclusion: It depends heavily on the dataset quality.
With 25 good images, you can achieve something usable.
With 50–100 images, it usually works very well.
More than 100 can improve it even further.
It’s better to have too many good images than too few.
# 3️⃣ Training (Using AI Tookit)
**Recommended Settings:**
🔹 Trigger Word Leave this field empty.
🔹 Steps Recommended average: 3500 steps
* Similarity starts to become noticeable around 1500 steps
* Around 2500 it usually improves significantly
* Continues improving progressively until 3000–3500 steps
Recommendation: Save every 100 steps and test results progressively.
🔹 Learning Rate: 0.00008
🔹 Timestep: Linear
I’ve tested Weighted
and Sigmoid, and they did not give good results for characters.
🔹 Precision: BF16 or FP16
FP16 may provide a slight quality improvement, but the difference is not huge.
🔹 Rank (VERY IMPORTANT)
Two common options:
**Rank 32**
* More stable
* Lower risk of hallucinations
* Slightly more artificial texture
**Rank 64**
* Absorbs more dataset information
* More texture
* More realistic
* But may introduce later hallucinations
Both can work very well, it depends on what you want to achieve.
🔹 EMA
It can be advantageous to enable it, recommended value: 0.99
I’ve obtained good results both with and without EMA.
🔹 Training Resolution
You can training only at 512px: Faster but loses detail in distant faces
Better option is train simultaneously at 512, 768, and 1024px.
This helps retain finer details, especially in long shots. For close-ups, it’s less critical.
🔹 Batch Size and Gradient Accumulation
Recommended:
Batch size: 1
Gradient accumulation: 2
More stable training, but longer training time.
🔹 Samples During Training
Recommendation: Disable automatic sample generation but save every 100 steps and test manually
🔹 Optimizer
Tested AdamW8bit/AdamW
My impression is that AdamW may give slightly better quality. I can’t guarantee it 100%, but my tests point in that direction. I’ve tested Prodigy, but I haven’t obtained good results. It requires more experimentation.
[AI tookit Parameters](https://preview.redd.it/wpw5f5vcghmg1.png?width=3831&format=png&auto=webp&s=46e323165eb8295c2821b833c5ed8e147b5d0c15)
Also, I want to mention that I tried creating Lokr instead of a LoRA, and although the results are good, it’s too heavy and I don’t quite have control over how to get high quality. The potential is high.
Resulting example Loras and some examples:
[V1 - V2 - V3 - V4](https://preview.redd.it/jr4q1v8gghmg1.jpg?width=1040&format=pjpg&auto=webp&s=861394e8fa09575834200da75c501a0751c38fd3)
https://preview.redd.it/xoxuzdwgghmg1.jpg?width=1050&format=pjpg&auto=webp&s=9bbf14b89d78e2316b7bf52bf01667d3236051e5
https://preview.redd.it/uxc4f0vhghmg1.jpg?width=1050&format=pjpg&auto=webp&s=65f71974896a9b52161efaf3ad7f3eab89b280ce
Attached here are the LoRAs resulting for your own tests of the fictional character Wednesday , included to illustrate this guide. ( I used “Merlina,” the Spanish name, because using the token “Wednesday” could have caused confusion when creating the LoRA.)
2000 steps, 2500 steps, 3000 steps, 3500 steps for each one included:
Lora V1 - Timestep: Weighted, Rank64, trained at 512, 724 y 1024px
[Download V1](https://drive.google.com/file/d/1p3A4y04mKc-elE1zK8Sg84ypCvvvJSK_/view?usp=sharing)
Lora V2 - copy of V1 but Timestep: Linear
[Download V2](https://drive.google.com/file/d/1_u2CrEC7c_N7x75FMOljMGXOdcqwDGyh/view?usp=sharing)
Lora V3 - copy of V2 but NO EMA.
[Download V3](https://drive.google.com/file/d/1Jjd072cU5ef4qov-Yuajv03Z1SpV53MQ/view?usp=sharing)
Lora V4 - copy of V3 but Rank32.
[Download V4](https://drive.google.com/file/d/1jaKp_BlDdBK3irXt9tYqv-HwKn-XDc1_/view?usp=sharing)
https://redd.it/1ri65uz
@stablediffusion_r
🔹 Precision: BF16 or FP16
FP16 may provide a slight quality improvement, but the difference is not huge.
🔹 Rank (VERY IMPORTANT)
Two common options:
**Rank 32**
* More stable
* Lower risk of hallucinations
* Slightly more artificial texture
**Rank 64**
* Absorbs more dataset information
* More texture
* More realistic
* But may introduce later hallucinations
Both can work very well, it depends on what you want to achieve.
🔹 EMA
It can be advantageous to enable it, recommended value: 0.99
I’ve obtained good results both with and without EMA.
🔹 Training Resolution
You can training only at 512px: Faster but loses detail in distant faces
Better option is train simultaneously at 512, 768, and 1024px.
This helps retain finer details, especially in long shots. For close-ups, it’s less critical.
🔹 Batch Size and Gradient Accumulation
Recommended:
Batch size: 1
Gradient accumulation: 2
More stable training, but longer training time.
🔹 Samples During Training
Recommendation: Disable automatic sample generation but save every 100 steps and test manually
🔹 Optimizer
Tested AdamW8bit/AdamW
My impression is that AdamW may give slightly better quality. I can’t guarantee it 100%, but my tests point in that direction. I’ve tested Prodigy, but I haven’t obtained good results. It requires more experimentation.
[AI tookit Parameters](https://preview.redd.it/wpw5f5vcghmg1.png?width=3831&format=png&auto=webp&s=46e323165eb8295c2821b833c5ed8e147b5d0c15)
Also, I want to mention that I tried creating Lokr instead of a LoRA, and although the results are good, it’s too heavy and I don’t quite have control over how to get high quality. The potential is high.
Resulting example Loras and some examples:
[V1 - V2 - V3 - V4](https://preview.redd.it/jr4q1v8gghmg1.jpg?width=1040&format=pjpg&auto=webp&s=861394e8fa09575834200da75c501a0751c38fd3)
https://preview.redd.it/xoxuzdwgghmg1.jpg?width=1050&format=pjpg&auto=webp&s=9bbf14b89d78e2316b7bf52bf01667d3236051e5
https://preview.redd.it/uxc4f0vhghmg1.jpg?width=1050&format=pjpg&auto=webp&s=65f71974896a9b52161efaf3ad7f3eab89b280ce
Attached here are the LoRAs resulting for your own tests of the fictional character Wednesday , included to illustrate this guide. ( I used “Merlina,” the Spanish name, because using the token “Wednesday” could have caused confusion when creating the LoRA.)
2000 steps, 2500 steps, 3000 steps, 3500 steps for each one included:
Lora V1 - Timestep: Weighted, Rank64, trained at 512, 724 y 1024px
[Download V1](https://drive.google.com/file/d/1p3A4y04mKc-elE1zK8Sg84ypCvvvJSK_/view?usp=sharing)
Lora V2 - copy of V1 but Timestep: Linear
[Download V2](https://drive.google.com/file/d/1_u2CrEC7c_N7x75FMOljMGXOdcqwDGyh/view?usp=sharing)
Lora V3 - copy of V2 but NO EMA.
[Download V3](https://drive.google.com/file/d/1Jjd072cU5ef4qov-Yuajv03Z1SpV53MQ/view?usp=sharing)
Lora V4 - copy of V3 but Rank32.
[Download V4](https://drive.google.com/file/d/1jaKp_BlDdBK3irXt9tYqv-HwKn-XDc1_/view?usp=sharing)
https://redd.it/1ri65uz
@stablediffusion_r