r/StableDiffusion – Telegram
Why simple image merging fails in Flux.2 Klein 9B (And how to fix it)

Not like this

If you've ever tried to combine elements from two reference images with Flux.2 Klein 9B, you’ve probably seen how the two reference images merge together into a messy mix:

https://preview.redd.it/xove50g79phg1.png?width=2638&format=png&auto=webp&s=cb6dec4fec43bb3896a2b69043be7733f1cff8bc

Why does this happen? Why can’t I just type "change the character in image 1 to match the character from image 2"? Actually, you can.

# The Core Principle

I’ve been experimenting with character replacement recently but with little success—until one day I tried using a figure mannequin as a pose reference. To my surprise, it worked very well:

https://preview.redd.it/etx7jxd99phg1.jpg?width=2262&format=pjpg&auto=webp&s=67918ddaa11c9d029684e4e988586cfa71b27fe0

But why does this work, while using a pose with an actual character often fails? My hypothesis is that failure occurs due to information interference.

Let me illustrate what I mean. Imagine you were given these two images and asked to "combine them together":

Follow the red rabbit

These images together contain two sets of clothes, two haircuts/hair colors, two poses, and two backgrounds. Any of these elements could end up in the resulting image.

But what if the input images looked like this:

https://preview.redd.it/xsy2rnpi9phg1.jpg?width=1617&format=pjpg&auto=webp&s=f82f65c6de97dd6ebb151e8b68b744f287dfd19b

Now there’s only one outfit, one haircut, and one background.

Think of it this way: No matter how good prompt adherence is, too many competing elements still vie for Flux’s attention. But if we remove all unwanted elements from both input images, Flux has an easier job. It doesn’t need to choose the correct background - there’s only one background for the model to work with. Only one set of clothes, one haircut, etc.

And here’s the result (image with workflow):

https://preview.redd.it/fdz0t3ix9phg1.png?width=1056&format=png&auto=webp&s=140b63763c2e544dbb3b1ac49ff0ad8043b0436f

I’ve built this ComfyUI workflow that runs both input images through a preprocessing stage to prepare them for merging. It was originally made for character replacement but can be adapted for other tasks like outfit swap (image with workflow):

https://preview.redd.it/0ht1gfzhbphg1.jpg?width=2067&format=pjpg&auto=webp&s=d0cdbdd3baec186a02e1bc2dff672ae43afa1c62

So you can modify it to fit your specific task. Just follow the core principle: Remove everything you don’t want to see in the resulting image.

# More Examples

https://preview.redd.it/2anrb93qaphg1.jpg?width=2492&format=pjpg&auto=webp&s=c6638adb60ca534f40f789202418367e823d33f4

https://preview.redd.it/6mgjvo8raphg1.jpg?width=2675&format=pjpg&auto=webp&s=99d1cdf5e576963ac101defa7fc02572c970a0fa

https://preview.redd.it/854ua2jmbphg1.jpg?width=2415&format=pjpg&auto=webp&s=47ef2f530a11305bb2f58f338ad39321ab413782

https://preview.redd.it/8htl2dfobphg1.jpg?width=2548&format=pjpg&auto=webp&s=040765eac57a26d0dc5e8e5a2859a7dd118f32ae

# Caveats

Style bleeding: The resulting style will be a blend of the styles from both input images. You can control this by bringing your reference images closer to the desired target style of the final image. For example, if your pose reference has a cartoon style but your character reference is 3D or realistic, try adding "in the style of amateur photo" to the end of the pose reference’s prompt so it becomes stylistically closer to your subject reference. Conversely, try a prompt like "in the style of flat-color anime" if you want the opposite effect.

Missing bits: Flux
will only generate what's visible. So if you character reference is only upper body add prompt that details their bottom unless you want to leave them pantless.

https://redd.it/1qwpqek
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
Free local browser to organize your generated images — Filter by Prompt, LoRA, Seed & Model. Now handles Video/GIFs too

https://redd.it/1qwr3pd
@rStableDiffusion
Media is too big
VIEW IN TELEGRAM
Tried training an ACEStep1.5 LoRA for my favorite anime. I didn't expect it to be this good!

https://redd.it/1qwrhp4
@rStableDiffusion
The real "trick" to simple image merging on Klein: just use a prompt that actually has a sufficient level of detail to make it clear what you want
https://redd.it/1qwrqph
@rStableDiffusion
Anima is the new illustrious!!? 2.0!

i've been using illustrous/noobai for a long time and arguably its the best for anime so far. like qwen is great for image change but it doesnt recognize famous characters. So after pony disastrous v7 launch, the only options where noobai. which is good especially if you know danbooru tags, but my god its hell trying to make a multiple character complex image (even with krita).
Until yesterday, i tried this thing called anima (this is not a advertisement of the model, you are free to tell me your opinions on it or would love to know if im wrong). so anima is a mixture of danbooru and natural language. FINALLY FIXING THE BIGGEST PROBLEM OF SDXL MODELS. no doubt its not magic, for now its just preview model which im guessing is the base one. its not compatible with any pony/illustrous/noobai loras cause its structure is different. but with my testing so far, it is better than artist style like noobai. but noobai still wins cause of its character accuracy due to its sheer loras amount.

https://redd.it/1qwukjs
@rStableDiffusion
I obtained these images by training DORA on Flux 1 Dev. The advantage is that it made each person's face look different. Perhaps it would be a good idea for people to try training DORA on the newer models.

https://redd.it/1qx1rr0
@rStableDiffusion
Is CivitAI slop now?

Now I could just be looking in the wrong places sometimes the real best models and loras are obscure, but it seems to me 99% of CivitAI is complete slop now, just poor quality loras to add more boobs with plasticy skin textures that look lowkey worse than old sdxl finetunes I mean I was so amazed when like I found juggertnautXL, RealvisXL, or something, or even PixelWave to mention a slightly more modern one that was the first full fine tune of FLUX.1 [dev\] and it was pretty great, but nobody seems to really make big impressive fine-tunes anymore that actually change the model significantly

Am I misinformed? I would love it if I was and there are actually really good ones for models that aren't SDXL or Flux

https://redd.it/1qx8y38
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
Seedance 2.0 (teaser) better than Sora 2! True multimodal video creation (text + images + video + audio) and seriously controllable outputs.

https://redd.it/1qxdtjb
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
LTX-2 I was going to trim it but this made me Lmfao, Anyone found a rock solid way to reduce blur in faster motion?

https://redd.it/1qxez7u
@rStableDiffusion
most effective ways to earn money using ComfyUI right now?

What are the most effective ways to earn money using ComfyUI right now? I’m interested in how people are actually monetizing it—client work, content creation, selling workflows, automation, or something else. If you’ve had real results, I’d love to hear what’s working for you.

https://redd.it/1qxmhf6
@rStableDiffusion
SwarmUI 0.9.8 Release

https://preview.redd.it/rfmgtb22jwhg1.png?width=2016&format=png&auto=webp&s=f8aac5ffb981c15f9d21d092c2d976f4cb16f075


In following of my promise in the SwarmUI 0.9.7 Release notes, the schedule continues to follow the fibonnaci sequence, and it has been 6 months since that release that I'm now posting the next one. I feel it is worth noting that these release versions are arbitrary and not actually meaningful to when updates come out, updates come out instantly, I just like summing up periods of development in big posts every once in a while.

# If You're New Here

If you're not familiar with Swarm - it's an image/video generation UI. It's a thing you install that lets you run flux klein or ltx-2 or wan or whatever ai generator you want.

https://preview.redd.it/0ggaa84cfwhg1.png?width=1080&format=png&auto=webp&s=ad4c999c0f9d043d9b0963ed8c9bb5087c06205e

It's free, local, open source, smart, and a bunch of other nice adjectives. You can check it out on GitHub https://github.com/mcmonkeyprojects/SwarmUI or the nice lil webpage https://swarmui.net/

Swarm is a carefully crafted user-friendly yet still powerful frontend, that uses ComfyUI's full power as its backend (including letting you customize workflows when you want, you literally get an entire unrestricted comfy install as part of your swarm install).

Basically, if you're generating AI images or video on your computer, and you're not using Swarm yet, you should give Swarm a try, I can just about guarantee you'll like it.

# Model Support

https://preview.redd.it/usr6sqf2kwhg1.png?width=2018&format=png&auto=webp&s=21b5e01a634b5e6b23c7fef5d0b3926595c41c16

New models get released all the time. SwarmUI proudly adds day-1 support whenever comfy does. It's been 6 months since the last big update post, so, uh, a lot of those have came out! Here's some models Swarm supported immediately on release:
\- Flux.2 Dev, the giant boi (both image gen and very easy to use image editing)
\- Flux.2 Klein 4B and 9B, the reasonably sized but still pretty cool bois (same as above)
\- Z-Image, Turbo and then also Base
\- The different variants of Qwen Edit plus and 2511/2512/whatever
\- Hunyuan Image 2.1 (remember that?)
\- Hunyuan Video 1.5 (not every release gets a lot of community love, but Swarm still adds them)
\- LTX-2 (audio/video generation fully supported)
\- Anima
\- Probably other ones honestly listen it's been a long time, whatever came out we added support when it did, yknow?

# Beyond Just Image

https://preview.redd.it/8om7crv5iwhg1.png?width=1428&format=png&auto=webp&s=c84eb77c7b6ca3d4be659fb98c111761f7cad1ef

Prior versions of SwarmUI were very focused on image generation. Video generation was supported too (all the way back since when SVD, Stable Video Diffusion, came out. Ancient history, wild right?) but always felt a bit hacked on. A few months ago, Video became a full first-class citizen of SwarmUI. Audio is decently supported too, still some work to do - by the time of the next release, audio-only models (ace step, TTS, etc.) will be well supported (currently ace step impl works but it's a little janky tbh).

I would like to expand a moment on why and how Swarm is such a nice user-friendly frontend, using the screenshot of a video in the UI as an example.

Most software you'll find and use out there in the AI space, is gonna be slapped together from common components. You'll get a basic HTML video object, or maybe a gradio version of one, or maybe a real sparklesparkle fancy option with use react.

Swarm is built from the ground up with care in every step. That video player UI? Yeah, that's custom. Why is it custom? Well to be honest because the vanilla html video UI is janky af in most browsers and also different between browsers and just kinda a pain to work with. BUT also, look at how the colored slidebars use the theme color (in my case I have a purple-emphasis theme