Lora Training with different body parts
I am trying to create and train my character Lora for ZiT. I have good set of images but I want to have the capability to have uncensored images without using any other loras. So is it possible to use random pictures of intimate body parts (closeup without any face) and combine with my images and then train it so whenever I prompt, it can produce images without the need to use external Loras?
https://redd.it/1q1r5ru
@rStableDiffusion
I am trying to create and train my character Lora for ZiT. I have good set of images but I want to have the capability to have uncensored images without using any other loras. So is it possible to use random pictures of intimate body parts (closeup without any face) and combine with my images and then train it so whenever I prompt, it can produce images without the need to use external Loras?
https://redd.it/1q1r5ru
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Zipf's law in AI learning and generation
So Zipf's law is essentially a recognized phenomena that happens across a ton of areas, but most commonly language, where the most common thing is some amount more common than the second common thing, which is that amount more common than the third most common thing, etc etc.
A practical example is words in books, where the most common word has twice the occurrences as the second most common word, which has twice the occurrences as the third most common word, all the way down.
This has also been observed in language models outputs. (This linked paper isn't the only example, nearly all LLMs adhere to zipf's law even more strictly than human written data.)
More recently, this paper came out, showing that LLMs inherently fall into power law scaling, not only as a result of human language, but by their architectural nature.
Now I'm an image model trainer/provider, so I don't care a ton about LLMs beyond that they do what I ask them to do. But, since this discovery about power law scaling in LLMs has implications for training them, I wanted to see if there is any close relation for image models.
I found something pretty cool:
If you treat colors like the 'words' in the example above, and how many pixels of that color are in the image, human made images (artwork, photography, etc) DO NOT follow a zipfian distribution, but AI generated images (across several models I tested) DO follow a zipfian distribution.
I only tested across some 'small' sets of images, but it was statistically significant enough to be interesting. I'd love to see a larger scale test.
Human made images \(colors are X, frequency is Y\)
AI generated images \(colors are X, frequency is Y\)
I suspect if you look at a more fundamental component of image models, you'll find a deeper reason for this and a connection to why LLMs follow similar patterns.
What really sticks out to me here is how differently shaped the distributions of colors in the images is. This changes across image categories and models, but even Gemini (which has a more human shaped curve, with the slope, then hump at the end) still has a <90% fit to a zipfian distribution.
Anyways there is my incomplete thought. It seemed interesting enough that I wanted to share.
What I still don't know:
Does training on images that closely follow a zipfian distribution create better image models?
Does this method hold up at larger scales?
Should we try and find ways to make image models LESS zipfian to help with realism?
https://redd.it/1q20c3k
@rStableDiffusion
So Zipf's law is essentially a recognized phenomena that happens across a ton of areas, but most commonly language, where the most common thing is some amount more common than the second common thing, which is that amount more common than the third most common thing, etc etc.
A practical example is words in books, where the most common word has twice the occurrences as the second most common word, which has twice the occurrences as the third most common word, all the way down.
This has also been observed in language models outputs. (This linked paper isn't the only example, nearly all LLMs adhere to zipf's law even more strictly than human written data.)
More recently, this paper came out, showing that LLMs inherently fall into power law scaling, not only as a result of human language, but by their architectural nature.
Now I'm an image model trainer/provider, so I don't care a ton about LLMs beyond that they do what I ask them to do. But, since this discovery about power law scaling in LLMs has implications for training them, I wanted to see if there is any close relation for image models.
I found something pretty cool:
If you treat colors like the 'words' in the example above, and how many pixels of that color are in the image, human made images (artwork, photography, etc) DO NOT follow a zipfian distribution, but AI generated images (across several models I tested) DO follow a zipfian distribution.
I only tested across some 'small' sets of images, but it was statistically significant enough to be interesting. I'd love to see a larger scale test.
Human made images \(colors are X, frequency is Y\)
AI generated images \(colors are X, frequency is Y\)
I suspect if you look at a more fundamental component of image models, you'll find a deeper reason for this and a connection to why LLMs follow similar patterns.
What really sticks out to me here is how differently shaped the distributions of colors in the images is. This changes across image categories and models, but even Gemini (which has a more human shaped curve, with the slope, then hump at the end) still has a <90% fit to a zipfian distribution.
Anyways there is my incomplete thought. It seemed interesting enough that I wanted to share.
What I still don't know:
Does training on images that closely follow a zipfian distribution create better image models?
Does this method hold up at larger scales?
Should we try and find ways to make image models LESS zipfian to help with realism?
https://redd.it/1q20c3k
@rStableDiffusion
arXiv.org
"Genlangs" and Zipf's Law: Do languages generated by...
OpenAI's GPT-4 is a Large Language Model (LLM) that can generate coherent constructed languages, or "conlangs," which we propose be called "genlangs" when generated by Artificial Intelligence...
SVI 2.0 Pro - Tip about seeds
I apologize if this is common knowledge, but I saw a few SVI 2.0 Pro workflows that use a global random seed, in which this wouldn't work.
If your workflow has a random noise seed node attached to each extension step (instead of 1 global random seed for all), you can work like this:
Eg: If you have generated step 1, 2, and 3, but don’t like how step 3 turned out, you can just change the seed and / or prompt of step 3 and run again.
Now the workflow will skip step 1 and 2 (as they are already generated and nothing changed), keep them, and will only generate step 3 again.
This way you can extend and adjust as many times as you want, without having to regenerate earlier good extensions or wait for them to be generated again.
It’s awesome, really - I'm a bit mind blown about how good SVI 2.0 Pro is.
https://preview.redd.it/r4ymil14ryag1.png?width=2292&format=png&auto=webp&s=d3a17bbb8e70438cf773474449a9f35ea6e23b6c
Edit:
This is the workflow I am using:
https://github.com/user-attachments/files/24359648/wan22\_SVI\_Pro\_native\_example\_KJ.json
Though I did change models to native and experimenting with some other speed loras.
https://redd.it/1q2354i
@rStableDiffusion
I apologize if this is common knowledge, but I saw a few SVI 2.0 Pro workflows that use a global random seed, in which this wouldn't work.
If your workflow has a random noise seed node attached to each extension step (instead of 1 global random seed for all), you can work like this:
Eg: If you have generated step 1, 2, and 3, but don’t like how step 3 turned out, you can just change the seed and / or prompt of step 3 and run again.
Now the workflow will skip step 1 and 2 (as they are already generated and nothing changed), keep them, and will only generate step 3 again.
This way you can extend and adjust as many times as you want, without having to regenerate earlier good extensions or wait for them to be generated again.
It’s awesome, really - I'm a bit mind blown about how good SVI 2.0 Pro is.
https://preview.redd.it/r4ymil14ryag1.png?width=2292&format=png&auto=webp&s=d3a17bbb8e70438cf773474449a9f35ea6e23b6c
Edit:
This is the workflow I am using:
https://github.com/user-attachments/files/24359648/wan22\_SVI\_Pro\_native\_example\_KJ.json
Though I did change models to native and experimenting with some other speed loras.
https://redd.it/1q2354i
@rStableDiffusion
Frustrated with current state of video generation
I'm sure this boils down to a skill issue at the moment but
I've been trying video for a long time and I just don't think it's useful for much other than short dumb videos. It's too hard to get actual consistency and you have little control over the action, requiring a lot of redos. Which takes a lot more time then you would think. Even the closed source models are really unreliable in generation
Whenever you see someone's video that "looks finished" they probably had to gen that thing 20 times to get what they wanted, and that's just one chunk of the video, most have many chunks. If you are paying for an online service that's a lot of wasted "credits" just burning on nothing
I want to like doing video and want to think it's going to allow people to make stories but it just not good enough, not easy enough to use, too unpredictable, and too slow right now.
Even the online tools aren't much better from my testing . They still give me too much randomness. For example even Veo gave me slow motion problems similar to WAN for some scenes
What are your thoughts?
https://redd.it/1q27cp7
@rStableDiffusion
I'm sure this boils down to a skill issue at the moment but
I've been trying video for a long time and I just don't think it's useful for much other than short dumb videos. It's too hard to get actual consistency and you have little control over the action, requiring a lot of redos. Which takes a lot more time then you would think. Even the closed source models are really unreliable in generation
Whenever you see someone's video that "looks finished" they probably had to gen that thing 20 times to get what they wanted, and that's just one chunk of the video, most have many chunks. If you are paying for an online service that's a lot of wasted "credits" just burning on nothing
I want to like doing video and want to think it's going to allow people to make stories but it just not good enough, not easy enough to use, too unpredictable, and too slow right now.
Even the online tools aren't much better from my testing . They still give me too much randomness. For example even Veo gave me slow motion problems similar to WAN for some scenes
What are your thoughts?
https://redd.it/1q27cp7
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
I figured out how to completely bypass Nano Banana Pro's invisible watermark and here's how you can try it for free
https://redd.it/1q29ya6
@rStableDiffusion
https://redd.it/1q29ya6
@rStableDiffusion