Qwen Image Edit is a beauty I don't fully understand....
I'll keep this post as short as I can.
For the past few days, I've been testing Qwen Image Edit and comparing its outputs to Nano Banana. Sometimes, I've gotten results on par with Nano Banana or better. It's never 100% consistent quality, but neither is NB. Qwen is extremely powerful, far more than I originally thought. But it's a weird conundrum, and I don't quite understand why.
When you use Qwen IE out of the box, the results can be moderate to decent. And yet, when you give it reference, it can generate quality to the same level of that reference. I'm talking super detailed/realistic work of all different types of styles. So it's like a really good copy-cat. And if you prompt it the right way, it can generate results on the level of some of the best models. And I'm talking without LoRAs. And it can even improve on that work.
So somewhere inside, Qwen IE has the ability to produce just about anything.
And yet, its general output seems mid without LoRAs. So, it CAN match the best models, it has the ability. But it needs "guidance" to get there.
I feel like Qwen is like this magic "black box" that maybe we don't really understand how big its potential is yet. Which raises a bigger question:
Are we tossing out too many models before we've really learned to maximize the most out of the ones we have?
Between LoRAs, model mixing, and refining, I'm seeing flexibility out of older Illustrious models to such an extent that I'm creating content that looks absolutely NOTHING like the models I'm using.
We're releasing finetuned versions of these models almost daily, but it could literally take years to get the most out of the ones we already have.
Now that I've finally gotten around to testing out Wan 2.2, I've been in a state of "mind blown" for the past 2 weeks. Pandora's @#$% box.
Anyway, back to the topic - Qwen IE? This is pretty much Nano-Banana at home. But unlimited.
I really want to see this model grow. It's one of the most useful open source tools we've gotten in the past two years. The potential I see here, this can permanently change creative pipelines and speed up production.
I just need to better understand it so I can maximize it.
https://redd.it/1onop3p
@rStableDiffusion
I'll keep this post as short as I can.
For the past few days, I've been testing Qwen Image Edit and comparing its outputs to Nano Banana. Sometimes, I've gotten results on par with Nano Banana or better. It's never 100% consistent quality, but neither is NB. Qwen is extremely powerful, far more than I originally thought. But it's a weird conundrum, and I don't quite understand why.
When you use Qwen IE out of the box, the results can be moderate to decent. And yet, when you give it reference, it can generate quality to the same level of that reference. I'm talking super detailed/realistic work of all different types of styles. So it's like a really good copy-cat. And if you prompt it the right way, it can generate results on the level of some of the best models. And I'm talking without LoRAs. And it can even improve on that work.
So somewhere inside, Qwen IE has the ability to produce just about anything.
And yet, its general output seems mid without LoRAs. So, it CAN match the best models, it has the ability. But it needs "guidance" to get there.
I feel like Qwen is like this magic "black box" that maybe we don't really understand how big its potential is yet. Which raises a bigger question:
Are we tossing out too many models before we've really learned to maximize the most out of the ones we have?
Between LoRAs, model mixing, and refining, I'm seeing flexibility out of older Illustrious models to such an extent that I'm creating content that looks absolutely NOTHING like the models I'm using.
We're releasing finetuned versions of these models almost daily, but it could literally take years to get the most out of the ones we already have.
Now that I've finally gotten around to testing out Wan 2.2, I've been in a state of "mind blown" for the past 2 weeks. Pandora's @#$% box.
Anyway, back to the topic - Qwen IE? This is pretty much Nano-Banana at home. But unlimited.
I really want to see this model grow. It's one of the most useful open source tools we've gotten in the past two years. The potential I see here, this can permanently change creative pipelines and speed up production.
I just need to better understand it so I can maximize it.
https://redd.it/1onop3p
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
Voting is happening for the first edition of our open source AI art competition, The Arca Gidan Prize. Astonishing to see what people can do in a week w/ open models! If you have time, your attention/votes would be appreciated! Link below, trailer attached.
https://redd.it/1onpxvw
@rStableDiffusion
https://redd.it/1onpxvw
@rStableDiffusion
Will Stability ever make a comeback?
I know the family of SD3 models was really not what we had hoped for. But it seemed like they got a decent investment after that. And they've been making a lot of commercial deals (EA and UMG). Do you think they'll ever come back to the open-source space? Or are they just going to go full close and be corporate? Model providers at this point.
I know we have a lot better open models like flux and qwen but for me SDXL is still a GOAT of a model, and I find myself still using it for different specific tasks even though I can run the larger ones.
https://redd.it/1onkffi
@rStableDiffusion
I know the family of SD3 models was really not what we had hoped for. But it seemed like they got a decent investment after that. And they've been making a lot of commercial deals (EA and UMG). Do you think they'll ever come back to the open-source space? Or are they just going to go full close and be corporate? Model providers at this point.
I know we have a lot better open models like flux and qwen but for me SDXL is still a GOAT of a model, and I find myself still using it for different specific tasks even though I can run the larger ones.
https://redd.it/1onkffi
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Sprite generator | Generation of detailed sprites for full body | SDXL\Pony\IL\NoobAI
https://redd.it/1oo00tk
@rStableDiffusion
https://redd.it/1oo00tk
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Sprite generator | Generation of detailed sprites for full body | SDXL\Pony\IL\NoobAI
Explore this post and more from the StableDiffusion community
Has anybody managed to get hunyuan 3d to work on GPUs that only have 8GB of VRAM?
I'm a 3D hobbyists looking for a program that can turn images into rough blockouts.
https://redd.it/1ony2nw
@rStableDiffusion
I'm a 3D hobbyists looking for a program that can turn images into rough blockouts.
https://redd.it/1ony2nw
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
QwenEditUtils2.0 Any Resolution Reference
Hey everyone, I am xiaozhijason aka lrzjason! I'm excited to share my latest custom node collection for Qwen-based image editing workflows.
Comfyui-QwenEditUtils is a comprehensive set of utility nodes that brings advanced text encoding with reference image support for Qwen-based image editing.
Key Features:
\- Multi-Image Support: Incorporate up to 5 reference images into your text-to-image generation workflow
\- Dual Resize Options: Separate resizing controls for VAE encoding (1024px) and VL encoding (384px)
\- Individual Image Outputs: Each processed reference image is provided as a separate output for flexible connections
\- Latent Space Integration: Encode reference images into latent space for efficient processing
\- Qwen Model Compatibility: Specifically designed for Qwen-based image editing models
\- Customizable Templates: Use custom Llama templates for tailored image editing instructions
New in v2.0.0:
\- Added TextEncodeQwenImageEditPlusCustom_lrzjason for highly customized image editing
\- Added QwenEditConfigPreparer, QwenEditConfigJsonParser for creating image configurations
\- Added QwenEditOutputExtractor for extracting outputs from the custom node
\- Added QwenEditListExtractor for extracting items from lists
\- Added CropWithPadInfo for cropping images with pad information
Available Nodes:
\- TextEncodeQwenImageEditPlusCustom: Maximum customization with per-image configurations
\- Helper Nodes: QwenEditConfigPreparer, QwenEditConfigJsonParser, QwenEditOutputExtractor, QwenEditListExtractor, CropWithPadInfo
The package includes complete workflow examples in both simple and advanced configurations. The custom node offers maximum flexibility by allowing per-image configurations for both reference and vision-language processing.
Perfect for users who need fine-grained control over image editing workflows with multiple reference images and customizable processing parameters.
Installation: Manager or Clone/download to your ComfyUI's custom_nodes directory and restart.
Check out the full documentation on GitHub for detailed usage instructions and examples. Looking forward to seeing what you create!
https://preview.redd.it/7j76g2csi7zf1.jpg?width=4344&format=pjpg&auto=webp&s=6e4f39f8da6aabae91c9f9b4f047f4184434a43f
https://preview.redd.it/iseesncsi7zf1.jpg?width=4344&format=pjpg&auto=webp&s=2e2ad72f92e2e3bf74b0396d3ff2dbe99f0532b0
https://preview.redd.it/wd97d3csi7zf1.jpg?width=4344&format=pjpg&auto=webp&s=25cc1724d8397ad214f594886f75816b8086c750
https://redd.it/1oo2u0i
@rStableDiffusion
Hey everyone, I am xiaozhijason aka lrzjason! I'm excited to share my latest custom node collection for Qwen-based image editing workflows.
Comfyui-QwenEditUtils is a comprehensive set of utility nodes that brings advanced text encoding with reference image support for Qwen-based image editing.
Key Features:
\- Multi-Image Support: Incorporate up to 5 reference images into your text-to-image generation workflow
\- Dual Resize Options: Separate resizing controls for VAE encoding (1024px) and VL encoding (384px)
\- Individual Image Outputs: Each processed reference image is provided as a separate output for flexible connections
\- Latent Space Integration: Encode reference images into latent space for efficient processing
\- Qwen Model Compatibility: Specifically designed for Qwen-based image editing models
\- Customizable Templates: Use custom Llama templates for tailored image editing instructions
New in v2.0.0:
\- Added TextEncodeQwenImageEditPlusCustom_lrzjason for highly customized image editing
\- Added QwenEditConfigPreparer, QwenEditConfigJsonParser for creating image configurations
\- Added QwenEditOutputExtractor for extracting outputs from the custom node
\- Added QwenEditListExtractor for extracting items from lists
\- Added CropWithPadInfo for cropping images with pad information
Available Nodes:
\- TextEncodeQwenImageEditPlusCustom: Maximum customization with per-image configurations
\- Helper Nodes: QwenEditConfigPreparer, QwenEditConfigJsonParser, QwenEditOutputExtractor, QwenEditListExtractor, CropWithPadInfo
The package includes complete workflow examples in both simple and advanced configurations. The custom node offers maximum flexibility by allowing per-image configurations for both reference and vision-language processing.
Perfect for users who need fine-grained control over image editing workflows with multiple reference images and customizable processing parameters.
Installation: Manager or Clone/download to your ComfyUI's custom_nodes directory and restart.
Check out the full documentation on GitHub for detailed usage instructions and examples. Looking forward to seeing what you create!
https://preview.redd.it/7j76g2csi7zf1.jpg?width=4344&format=pjpg&auto=webp&s=6e4f39f8da6aabae91c9f9b4f047f4184434a43f
https://preview.redd.it/iseesncsi7zf1.jpg?width=4344&format=pjpg&auto=webp&s=2e2ad72f92e2e3bf74b0396d3ff2dbe99f0532b0
https://preview.redd.it/wd97d3csi7zf1.jpg?width=4344&format=pjpg&auto=webp&s=25cc1724d8397ad214f594886f75816b8086c750
https://redd.it/1oo2u0i
@rStableDiffusion
Open source Model to create posters/educational pictures
I have been trying to create a text to image tool for K-12 students for educational purpose. Outputs along with aesthetic pictures needs to be posters, flash cards etc with text in it.
Problem is stable diffusion models and even flux struggles with text heavily. Flux is somewhat ok sometimes but not reliable enough. I have tried layout parsing over background generated by stable diffusion too, this gives me okayish results if i hard code layouts properly so can't be automated with llm being attached for layouts.
What are my options in terms of open source models or anyone has done any work in this domain before which i can take reference from?
https://redd.it/1oo4w5g
@rStableDiffusion
I have been trying to create a text to image tool for K-12 students for educational purpose. Outputs along with aesthetic pictures needs to be posters, flash cards etc with text in it.
Problem is stable diffusion models and even flux struggles with text heavily. Flux is somewhat ok sometimes but not reliable enough. I have tried layout parsing over background generated by stable diffusion too, this gives me okayish results if i hard code layouts properly so can't be automated with llm being attached for layouts.
What are my options in terms of open source models or anyone has done any work in this domain before which i can take reference from?
https://redd.it/1oo4w5g
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community