Ace step 1.5 instrument only = garbage ?
Is it me or does everyone else have the same problem ? i really just want calm southing piano music and everything i get is like dubstep .... any advices ?
https://redd.it/1qwe940
@rStableDiffusion
Is it me or does everyone else have the same problem ? i really just want calm southing piano music and everything i get is like dubstep .... any advices ?
https://redd.it/1qwe940
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Why simple image merging fails in Flux.2 Klein 9B (And how to fix it)
Not like this
If you've ever tried to combine elements from two reference images with Flux.2 Klein 9B, you’ve probably seen how the two reference images merge together into a messy mix:
https://preview.redd.it/xove50g79phg1.png?width=2638&format=png&auto=webp&s=cb6dec4fec43bb3896a2b69043be7733f1cff8bc
Why does this happen? Why can’t I just type "change the character in image 1 to match the character from image 2"? Actually, you can.
# The Core Principle
I’ve been experimenting with character replacement recently but with little success—until one day I tried using a figure mannequin as a pose reference. To my surprise, it worked very well:
https://preview.redd.it/etx7jxd99phg1.jpg?width=2262&format=pjpg&auto=webp&s=67918ddaa11c9d029684e4e988586cfa71b27fe0
But why does this work, while using a pose with an actual character often fails? My hypothesis is that failure occurs due to information interference.
Let me illustrate what I mean. Imagine you were given these two images and asked to "combine them together":
Follow the red rabbit
These images together contain two sets of clothes, two haircuts/hair colors, two poses, and two backgrounds. Any of these elements could end up in the resulting image.
But what if the input images looked like this:
https://preview.redd.it/xsy2rnpi9phg1.jpg?width=1617&format=pjpg&auto=webp&s=f82f65c6de97dd6ebb151e8b68b744f287dfd19b
Now there’s only one outfit, one haircut, and one background.
Think of it this way: No matter how good prompt adherence is, too many competing elements still vie for Flux’s attention. But if we remove all unwanted elements from both input images, Flux has an easier job. It doesn’t need to choose the correct background - there’s only one background for the model to work with. Only one set of clothes, one haircut, etc.
And here’s the result (image with workflow):
https://preview.redd.it/fdz0t3ix9phg1.png?width=1056&format=png&auto=webp&s=140b63763c2e544dbb3b1ac49ff0ad8043b0436f
I’ve built this ComfyUI workflow that runs both input images through a preprocessing stage to prepare them for merging. It was originally made for character replacement but can be adapted for other tasks like outfit swap (image with workflow):
https://preview.redd.it/0ht1gfzhbphg1.jpg?width=2067&format=pjpg&auto=webp&s=d0cdbdd3baec186a02e1bc2dff672ae43afa1c62
So you can modify it to fit your specific task. Just follow the core principle: Remove everything you don’t want to see in the resulting image.
# More Examples
https://preview.redd.it/2anrb93qaphg1.jpg?width=2492&format=pjpg&auto=webp&s=c6638adb60ca534f40f789202418367e823d33f4
https://preview.redd.it/6mgjvo8raphg1.jpg?width=2675&format=pjpg&auto=webp&s=99d1cdf5e576963ac101defa7fc02572c970a0fa
https://preview.redd.it/854ua2jmbphg1.jpg?width=2415&format=pjpg&auto=webp&s=47ef2f530a11305bb2f58f338ad39321ab413782
https://preview.redd.it/8htl2dfobphg1.jpg?width=2548&format=pjpg&auto=webp&s=040765eac57a26d0dc5e8e5a2859a7dd118f32ae
# Caveats
Style bleeding: The resulting style will be a blend of the styles from both input images. You can control this by bringing your reference images closer to the desired target style of the final image. For example, if your pose reference has a cartoon style but your character reference is 3D or realistic, try adding "in the style of amateur photo" to the end of the pose reference’s prompt so it becomes stylistically closer to your subject reference. Conversely, try a prompt like "in the style of flat-color anime" if you want the opposite effect.
Missing bits: Flux
Not like this
If you've ever tried to combine elements from two reference images with Flux.2 Klein 9B, you’ve probably seen how the two reference images merge together into a messy mix:
https://preview.redd.it/xove50g79phg1.png?width=2638&format=png&auto=webp&s=cb6dec4fec43bb3896a2b69043be7733f1cff8bc
Why does this happen? Why can’t I just type "change the character in image 1 to match the character from image 2"? Actually, you can.
# The Core Principle
I’ve been experimenting with character replacement recently but with little success—until one day I tried using a figure mannequin as a pose reference. To my surprise, it worked very well:
https://preview.redd.it/etx7jxd99phg1.jpg?width=2262&format=pjpg&auto=webp&s=67918ddaa11c9d029684e4e988586cfa71b27fe0
But why does this work, while using a pose with an actual character often fails? My hypothesis is that failure occurs due to information interference.
Let me illustrate what I mean. Imagine you were given these two images and asked to "combine them together":
Follow the red rabbit
These images together contain two sets of clothes, two haircuts/hair colors, two poses, and two backgrounds. Any of these elements could end up in the resulting image.
But what if the input images looked like this:
https://preview.redd.it/xsy2rnpi9phg1.jpg?width=1617&format=pjpg&auto=webp&s=f82f65c6de97dd6ebb151e8b68b744f287dfd19b
Now there’s only one outfit, one haircut, and one background.
Think of it this way: No matter how good prompt adherence is, too many competing elements still vie for Flux’s attention. But if we remove all unwanted elements from both input images, Flux has an easier job. It doesn’t need to choose the correct background - there’s only one background for the model to work with. Only one set of clothes, one haircut, etc.
And here’s the result (image with workflow):
https://preview.redd.it/fdz0t3ix9phg1.png?width=1056&format=png&auto=webp&s=140b63763c2e544dbb3b1ac49ff0ad8043b0436f
I’ve built this ComfyUI workflow that runs both input images through a preprocessing stage to prepare them for merging. It was originally made for character replacement but can be adapted for other tasks like outfit swap (image with workflow):
https://preview.redd.it/0ht1gfzhbphg1.jpg?width=2067&format=pjpg&auto=webp&s=d0cdbdd3baec186a02e1bc2dff672ae43afa1c62
So you can modify it to fit your specific task. Just follow the core principle: Remove everything you don’t want to see in the resulting image.
# More Examples
https://preview.redd.it/2anrb93qaphg1.jpg?width=2492&format=pjpg&auto=webp&s=c6638adb60ca534f40f789202418367e823d33f4
https://preview.redd.it/6mgjvo8raphg1.jpg?width=2675&format=pjpg&auto=webp&s=99d1cdf5e576963ac101defa7fc02572c970a0fa
https://preview.redd.it/854ua2jmbphg1.jpg?width=2415&format=pjpg&auto=webp&s=47ef2f530a11305bb2f58f338ad39321ab413782
https://preview.redd.it/8htl2dfobphg1.jpg?width=2548&format=pjpg&auto=webp&s=040765eac57a26d0dc5e8e5a2859a7dd118f32ae
# Caveats
Style bleeding: The resulting style will be a blend of the styles from both input images. You can control this by bringing your reference images closer to the desired target style of the final image. For example, if your pose reference has a cartoon style but your character reference is 3D or realistic, try adding "in the style of amateur photo" to the end of the pose reference’s prompt so it becomes stylistically closer to your subject reference. Conversely, try a prompt like "in the style of flat-color anime" if you want the opposite effect.
Missing bits: Flux
will only generate what's visible. So if you character reference is only upper body add prompt that details their bottom unless you want to leave them pantless.
https://redd.it/1qwpqek
@rStableDiffusion
https://redd.it/1qwpqek
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
Free local browser to organize your generated images — Filter by Prompt, LoRA, Seed & Model. Now handles Video/GIFs too
https://redd.it/1qwr3pd
@rStableDiffusion
https://redd.it/1qwr3pd
@rStableDiffusion
Media is too big
VIEW IN TELEGRAM
Tried training an ACEStep1.5 LoRA for my favorite anime. I didn't expect it to be this good!
https://redd.it/1qwrhp4
@rStableDiffusion
https://redd.it/1qwrhp4
@rStableDiffusion
The real "trick" to simple image merging on Klein: just use a prompt that actually has a sufficient level of detail to make it clear what you want
https://redd.it/1qwrqph
@rStableDiffusion
https://redd.it/1qwrqph
@rStableDiffusion
Anima is the new illustrious!!? 2.0!
i've been using illustrous/noobai for a long time and arguably its the best for anime so far. like qwen is great for image change but it doesnt recognize famous characters. So after pony disastrous v7 launch, the only options where noobai. which is good especially if you know danbooru tags, but my god its hell trying to make a multiple character complex image (even with krita).
Until yesterday, i tried this thing called anima (this is not a advertisement of the model, you are free to tell me your opinions on it or would love to know if im wrong). so anima is a mixture of danbooru and natural language. FINALLY FIXING THE BIGGEST PROBLEM OF SDXL MODELS. no doubt its not magic, for now its just preview model which im guessing is the base one. its not compatible with any pony/illustrous/noobai loras cause its structure is different. but with my testing so far, it is better than artist style like noobai. but noobai still wins cause of its character accuracy due to its sheer loras amount.
https://redd.it/1qwukjs
@rStableDiffusion
i've been using illustrous/noobai for a long time and arguably its the best for anime so far. like qwen is great for image change but it doesnt recognize famous characters. So after pony disastrous v7 launch, the only options where noobai. which is good especially if you know danbooru tags, but my god its hell trying to make a multiple character complex image (even with krita).
Until yesterday, i tried this thing called anima (this is not a advertisement of the model, you are free to tell me your opinions on it or would love to know if im wrong). so anima is a mixture of danbooru and natural language. FINALLY FIXING THE BIGGEST PROBLEM OF SDXL MODELS. no doubt its not magic, for now its just preview model which im guessing is the base one. its not compatible with any pony/illustrous/noobai loras cause its structure is different. but with my testing so far, it is better than artist style like noobai. but noobai still wins cause of its character accuracy due to its sheer loras amount.
https://redd.it/1qwukjs
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community