PhotomapAI - A tool to optimise your dataset for lora training
One of the most common questions I see is: "How many images do I need for a good LoRA?"
The raw number matters much less than the diversity and value of each image. Even if all your images are high quality, if you have 50 photos of a person, but 40 of them are from the same angle in the same lighting, you aren’t training the lora on a concept, you’re training it to overfit on a single moment.
For example: say you’re training an Arcane LoRA. If your dataset has 100 images of Vi and only 10 images of other characters, you won't get a generalized style. Your LoRA will be heavily biased toward Vi (overfit) and won't know how to handle other characters (underfit).
I struggled with this in my own datasets, so I built a tool for my personal workflow based on PhotoMapAI (an awesome project by lstein on GitHub). It’s been invaluable for identifying low-quality images and refining my datasets to include only semantically different images. I thought this would be invaluable for others too so I created a PR.
Lstein’s original tool uses clip embeddings generated 100% locally to "map" your images based on their relationship to one another, the closer two images are on the map, the more similar they are. The feature I've added builds on this functionality, a feature called the Dataset Curator, which has now been merged into the official 1.0 release. It uses math to pick the most "valuable" images so you don't have to do it manually (which images are the most different based on the clip embeddings).
Have a read here first to understand how it works:
Image Dataset Curation - PhotoMapAI
Here's a quick summary:
How it works:
Diversity (Farthest Point Sampling): This algorithm finds "outliers." It’s great for finding rare angles or unique lighting. Warning: It also finds the "garbage" (blurry or broken images), which is actually helpful because it shows you exactly what you need to exclude first! Use this to balance out your dataset to optimise for variability.
Balance (K-Means): This groups your photos into clusters and picks a representative from each. If you have 100 full-body shots and 10 close-ups, it ensures your final selection preserves those ratios so the model doesn't "forget" the rare concepts. Use this to thin-out your dataset but maintain ratios.
The workflow I use:
1. Run the Curator with 20 iterations on FPS mode: This uses a Monte Carlo simulation to find "consensus" selections. Since these algorithms can be sensitive to the starting point, running multiple passes helps identify the images that are statistically the most important regardless of where the algorithm starts.
2. Check the Magenta (Core Outliers): These are the results that showed up in >90% of the Monte Carlo runs. If any of these are blurry or "junk," I just hit "Exclude." If they are not junk, this is good, it means that the analysis shows these images have the most different clip embeddings (and for good reasons).
3. Run it again if you excluded images. The algorithm will now ignore the junk and find the next best unique (but clean) images to fill the gap.
4. Export: It automatically copies your images and your .txt captions to a new folder, handling any filename collisions for you. You can even export an analysis to see how many times the images were selected in the process.
The goal isn't to have the most images; it’s to have a dataset where every single image teaches the model something new.
Huge thanks to lstein for creating the original tool which is incredible for its original use too.
Here's the release notes for 1.0.0 by lstein and install files:
Release v1.0.0 · lstein/PhotoMapAI
https://redd.it/1pv6aok
@rStableDiffusion
One of the most common questions I see is: "How many images do I need for a good LoRA?"
The raw number matters much less than the diversity and value of each image. Even if all your images are high quality, if you have 50 photos of a person, but 40 of them are from the same angle in the same lighting, you aren’t training the lora on a concept, you’re training it to overfit on a single moment.
For example: say you’re training an Arcane LoRA. If your dataset has 100 images of Vi and only 10 images of other characters, you won't get a generalized style. Your LoRA will be heavily biased toward Vi (overfit) and won't know how to handle other characters (underfit).
I struggled with this in my own datasets, so I built a tool for my personal workflow based on PhotoMapAI (an awesome project by lstein on GitHub). It’s been invaluable for identifying low-quality images and refining my datasets to include only semantically different images. I thought this would be invaluable for others too so I created a PR.
Lstein’s original tool uses clip embeddings generated 100% locally to "map" your images based on their relationship to one another, the closer two images are on the map, the more similar they are. The feature I've added builds on this functionality, a feature called the Dataset Curator, which has now been merged into the official 1.0 release. It uses math to pick the most "valuable" images so you don't have to do it manually (which images are the most different based on the clip embeddings).
Have a read here first to understand how it works:
Image Dataset Curation - PhotoMapAI
Here's a quick summary:
How it works:
Diversity (Farthest Point Sampling): This algorithm finds "outliers." It’s great for finding rare angles or unique lighting. Warning: It also finds the "garbage" (blurry or broken images), which is actually helpful because it shows you exactly what you need to exclude first! Use this to balance out your dataset to optimise for variability.
Balance (K-Means): This groups your photos into clusters and picks a representative from each. If you have 100 full-body shots and 10 close-ups, it ensures your final selection preserves those ratios so the model doesn't "forget" the rare concepts. Use this to thin-out your dataset but maintain ratios.
The workflow I use:
1. Run the Curator with 20 iterations on FPS mode: This uses a Monte Carlo simulation to find "consensus" selections. Since these algorithms can be sensitive to the starting point, running multiple passes helps identify the images that are statistically the most important regardless of where the algorithm starts.
2. Check the Magenta (Core Outliers): These are the results that showed up in >90% of the Monte Carlo runs. If any of these are blurry or "junk," I just hit "Exclude." If they are not junk, this is good, it means that the analysis shows these images have the most different clip embeddings (and for good reasons).
3. Run it again if you excluded images. The algorithm will now ignore the junk and find the next best unique (but clean) images to fill the gap.
4. Export: It automatically copies your images and your .txt captions to a new folder, handling any filename collisions for you. You can even export an analysis to see how many times the images were selected in the process.
The goal isn't to have the most images; it’s to have a dataset where every single image teaches the model something new.
Huge thanks to lstein for creating the original tool which is incredible for its original use too.
Here's the release notes for 1.0.0 by lstein and install files:
Release v1.0.0 · lstein/PhotoMapAI
https://redd.it/1pv6aok
@rStableDiffusion
GitHub
Release v1.0.0 · lstein/PhotoMapAI
What's New
This release of PhotoMapAI adds many new features, bug fixes, and performance enhancers.
Image Dataset Curation
A completely new image curation panel was contributed by @NMWave. This...
This release of PhotoMapAI adds many new features, bug fixes, and performance enhancers.
Image Dataset Curation
A completely new image curation panel was contributed by @NMWave. This...
PSA: Eliminate or greatly reduce Qwen Edit 2509/2511 pixel drift with latent reference chaining
https://redd.it/1pv96a2
@rStableDiffusion
https://redd.it/1pv96a2
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: PSA: Eliminate or greatly reduce Qwen Edit 2509/2511 pixel drift with latent reference…
Explore this post and more from the StableDiffusion community
我做了一些列 LoRA 训练的教学视频及配套的汉化版 AITOOLKIT I've created a series of tutorial videos on LoRA training (with English subnoscripts)
我做了一些列 LoRA 训练的教学视频(配有英语字幕)及配套的汉化版 AITOOLKIT,以尽可能通俗易懂的方式详细介绍了每个参数的设置以及它们的作用,帮助你开启炼丹之路,如果你觉得视频内容对你有帮助,请帮我点赞关注支持一下✧٩(ˊωˋ)و✧
_
I've created a series of tutorial videos on LoRA training (with English subnoscripts) and a localized version of AITOOLKIT. These resources provide detailed explanations of each parameter's settings and their functions in the most accessible way possible, helping you embark on your AI model training journey. If you find the content helpful, please show your support by liking, following, and subscribing. ✧٩(ˊωˋ)و✧
https://youtube.com/playlist?list=PLFJyQMhHMt0lC4X7LQACHSSeymynkS7KE&si=JvFOzt2mf54E7n27
https://redd.it/1pvb4x2
@rStableDiffusion
我做了一些列 LoRA 训练的教学视频(配有英语字幕)及配套的汉化版 AITOOLKIT,以尽可能通俗易懂的方式详细介绍了每个参数的设置以及它们的作用,帮助你开启炼丹之路,如果你觉得视频内容对你有帮助,请帮我点赞关注支持一下✧٩(ˊωˋ)و✧
_
I've created a series of tutorial videos on LoRA training (with English subnoscripts) and a localized version of AITOOLKIT. These resources provide detailed explanations of each parameter's settings and their functions in the most accessible way possible, helping you embark on your AI model training journey. If you find the content helpful, please show your support by liking, following, and subscribing. ✧٩(ˊωˋ)و✧
https://youtube.com/playlist?list=PLFJyQMhHMt0lC4X7LQACHSSeymynkS7KE&si=JvFOzt2mf54E7n27
https://redd.it/1pvb4x2
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: 我做了一些列 LoRA 训练的教学视频及配套的汉化版 AITOOLKIT I've created a series of tutorial videos on…
Explore this post and more from the StableDiffusion community
LoRa vs. LoKr, It's amazing!
I tried making a LoKr for the first time, and it's amazing. I saw in the comments on this sub that LoKr is better for characters, so I gave it a shot, and it was a game-changer. With just 20 photos, 500 steps on the ZIT-Deturbo model with factor 4 settings, it took only about 10 minutes on my 5090—way better than the previous LoRA that needed 2000 steps and over an hour.
The most impressive part was that LoRAs, which often applied effects to men in images with both genders, but this LoKr applied precisely only to the woman. Aside from the larger file size, LoKr seems much superior overall.
I'm curious why more people aren't using LoKr. Of course, this is highly personal and based on just a few samples, so it could be off the mark.
https://redd.it/1pvdxs5
@rStableDiffusion
I tried making a LoKr for the first time, and it's amazing. I saw in the comments on this sub that LoKr is better for characters, so I gave it a shot, and it was a game-changer. With just 20 photos, 500 steps on the ZIT-Deturbo model with factor 4 settings, it took only about 10 minutes on my 5090—way better than the previous LoRA that needed 2000 steps and over an hour.
The most impressive part was that LoRAs, which often applied effects to men in images with both genders, but this LoKr applied precisely only to the woman. Aside from the larger file size, LoKr seems much superior overall.
I'm curious why more people aren't using LoKr. Of course, this is highly personal and based on just a few samples, so it could be off the mark.
https://redd.it/1pvdxs5
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community