r/StableDiffusion – Telegram
Has anybody managed to get hunyuan 3d to work on GPUs that only have 8GB of VRAM?

I'm a 3D hobbyists looking for a program that can turn images into rough blockouts.

https://redd.it/1ony2nw
@rStableDiffusion
QwenEditUtils2.0 Any Resolution Reference

Hey everyone, I am xiaozhijason aka lrzjason! I'm excited to share my latest custom node collection for Qwen-based image editing workflows.



Comfyui-QwenEditUtils is a comprehensive set of utility nodes that brings advanced text encoding with reference image support for Qwen-based image editing.



Key Features:

\- Multi-Image Support: Incorporate up to 5 reference images into your text-to-image generation workflow

\- Dual Resize Options: Separate resizing controls for VAE encoding (1024px) and VL encoding (384px)

\- Individual Image Outputs: Each processed reference image is provided as a separate output for flexible connections

\- Latent Space Integration: Encode reference images into latent space for efficient processing

\- Qwen Model Compatibility: Specifically designed for Qwen-based image editing models

\- Customizable Templates: Use custom Llama templates for tailored image editing instructions



New in v2.0.0:

\- Added TextEncodeQwenImageEditPlusCustom_lrzjason for highly customized image editing

\- Added QwenEditConfigPreparer, QwenEditConfigJsonParser for creating image configurations

\- Added QwenEditOutputExtractor for extracting outputs from the custom node

\- Added QwenEditListExtractor for extracting items from lists

\- Added CropWithPadInfo for cropping images with pad information



Available Nodes:

\- TextEncodeQwenImageEditPlusCustom: Maximum customization with per-image configurations

\- Helper Nodes: QwenEditConfigPreparer, QwenEditConfigJsonParser, QwenEditOutputExtractor, QwenEditListExtractor, CropWithPadInfo



The package includes complete workflow examples in both simple and advanced configurations. The custom node offers maximum flexibility by allowing per-image configurations for both reference and vision-language processing.



Perfect for users who need fine-grained control over image editing workflows with multiple reference images and customizable processing parameters.



Installation: Manager or Clone/download to your ComfyUI's custom_nodes directory and restart.



Check out the full documentation on GitHub for detailed usage instructions and examples. Looking forward to seeing what you create!

https://preview.redd.it/7j76g2csi7zf1.jpg?width=4344&format=pjpg&auto=webp&s=6e4f39f8da6aabae91c9f9b4f047f4184434a43f

https://preview.redd.it/iseesncsi7zf1.jpg?width=4344&format=pjpg&auto=webp&s=2e2ad72f92e2e3bf74b0396d3ff2dbe99f0532b0

https://preview.redd.it/wd97d3csi7zf1.jpg?width=4344&format=pjpg&auto=webp&s=25cc1724d8397ad214f594886f75816b8086c750




https://redd.it/1oo2u0i
@rStableDiffusion
Open source Model to create posters/educational pictures

I have been trying to create a text to image tool for K-12 students for educational purpose. Outputs along with aesthetic pictures needs to be posters, flash cards etc with text in it.

Problem is stable diffusion models and even flux struggles with text heavily. Flux is somewhat ok sometimes but not reliable enough. I have tried layout parsing over background generated by stable diffusion too, this gives me okayish results if i hard code layouts properly so can't be automated with llm being attached for layouts.

What are my options in terms of open source models or anyone has done any work in this domain before which i can take reference from?


https://redd.it/1oo4w5g
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
New extension for ComfyUI, Model Linker. A tool that automatically detects and fixes missing model references in workflows using fuzzy matching, eliminating the need to manually relink models through multiple dropdowns

https://redd.it/1oo823a
@rStableDiffusion
What’s the best AI tool for actually making cinematic videos?






I’ve been experimenting with a few AI video creation tools lately, trying to figure out which ones actually deliver something that feels cinematic instead of just stitched-together clips. I’ve mostly been using Veo 3, Runway, and imini AI, all of them have solid strengths, but each one seems to excel at different things.

Veo does a great job with character motion and realism, but it’s not always consistent with complex scenes. Runway is fast and user-friendly, especially for social-style edits, though it still feels a bit limited when it comes to storytelling. imini AI, on the other hand, feels super smooth for generating short clips and scenes directly from prompts, especially when I want something that looks good right away without heavy editing.

What I’m chasing is a workflow where I can type something like: “A 20-second video of a sunset over Tokyo with ambient music and light motion blur,” and get something watchable without having to stitch together five different tools.

what’s everyone else using right now? Have you found a single platform that can actually handle visuals, motion, and sound together, or are you mixing multiple ones to get the right result? Would love to hear what’s working best for you.

https://redd.it/1oo4ir2
@rStableDiffusion