Open source Model to create posters/educational pictures
I have been trying to create a text to image tool for K-12 students for educational purpose. Outputs along with aesthetic pictures needs to be posters, flash cards etc with text in it.
Problem is stable diffusion models and even flux struggles with text heavily. Flux is somewhat ok sometimes but not reliable enough. I have tried layout parsing over background generated by stable diffusion too, this gives me okayish results if i hard code layouts properly so can't be automated with llm being attached for layouts.
What are my options in terms of open source models or anyone has done any work in this domain before which i can take reference from?
https://redd.it/1oo4w5g
@rStableDiffusion
I have been trying to create a text to image tool for K-12 students for educational purpose. Outputs along with aesthetic pictures needs to be posters, flash cards etc with text in it.
Problem is stable diffusion models and even flux struggles with text heavily. Flux is somewhat ok sometimes but not reliable enough. I have tried layout parsing over background generated by stable diffusion too, this gives me okayish results if i hard code layouts properly so can't be automated with llm being attached for layouts.
What are my options in terms of open source models or anyone has done any work in this domain before which i can take reference from?
https://redd.it/1oo4w5g
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
New extension for ComfyUI, Model Linker. A tool that automatically detects and fixes missing model references in workflows using fuzzy matching, eliminating the need to manually relink models through multiple dropdowns
https://redd.it/1oo823a
@rStableDiffusion
https://redd.it/1oo823a
@rStableDiffusion
What’s the best AI tool for actually making cinematic videos?
I’ve been experimenting with a few AI video creation tools lately, trying to figure out which ones actually deliver something that feels cinematic instead of just stitched-together clips. I’ve mostly been using Veo 3, Runway, and imini AI, all of them have solid strengths, but each one seems to excel at different things.
Veo does a great job with character motion and realism, but it’s not always consistent with complex scenes. Runway is fast and user-friendly, especially for social-style edits, though it still feels a bit limited when it comes to storytelling. imini AI, on the other hand, feels super smooth for generating short clips and scenes directly from prompts, especially when I want something that looks good right away without heavy editing.
What I’m chasing is a workflow where I can type something like: “A 20-second video of a sunset over Tokyo with ambient music and light motion blur,” and get something watchable without having to stitch together five different tools.
what’s everyone else using right now? Have you found a single platform that can actually handle visuals, motion, and sound together, or are you mixing multiple ones to get the right result? Would love to hear what’s working best for you.
https://redd.it/1oo4ir2
@rStableDiffusion
I’ve been experimenting with a few AI video creation tools lately, trying to figure out which ones actually deliver something that feels cinematic instead of just stitched-together clips. I’ve mostly been using Veo 3, Runway, and imini AI, all of them have solid strengths, but each one seems to excel at different things.
Veo does a great job with character motion and realism, but it’s not always consistent with complex scenes. Runway is fast and user-friendly, especially for social-style edits, though it still feels a bit limited when it comes to storytelling. imini AI, on the other hand, feels super smooth for generating short clips and scenes directly from prompts, especially when I want something that looks good right away without heavy editing.
What I’m chasing is a workflow where I can type something like: “A 20-second video of a sunset over Tokyo with ambient music and light motion blur,” and get something watchable without having to stitch together five different tools.
what’s everyone else using right now? Have you found a single platform that can actually handle visuals, motion, and sound together, or are you mixing multiple ones to get the right result? Would love to hear what’s working best for you.
https://redd.it/1oo4ir2
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Spent 48 hours building a cinematic AI portrait workflow — here’s my best result so far.
https://redd.it/1oocz8o
@rStableDiffusion
https://redd.it/1oocz8o
@rStableDiffusion
Stability AI largely wins UK court battle against Getty Images over copyright and trademark
https://abcnews.go.com/amp/Technology/wireStory/stability-ai-largely-wins-uk-court-battle-getty-127164244
https://redd.it/1oodg1i
@rStableDiffusion
https://abcnews.go.com/amp/Technology/wireStory/stability-ai-largely-wins-uk-court-battle-getty-127164244
https://redd.it/1oodg1i
@rStableDiffusion
ABC News
Stability AI largely wins UK court battle against Getty Images over copyright and trademark
Stability AI has mostly prevailed against Getty Images in a British court battle over intellectual property
How to avoid slow motion in Wan 2.2?
New to Wan kicking the tires right now. The quality is great but everything is super slow motion. I've tried changing prompts, length duration and fps and the characters are always moving in molasses. Does anyone have any thoughts about how to correct this? Thanks.
https://redd.it/1oojkjq
@rStableDiffusion
New to Wan kicking the tires right now. The quality is great but everything is super slow motion. I've tried changing prompts, length duration and fps and the characters are always moving in molasses. Does anyone have any thoughts about how to correct this? Thanks.
https://redd.it/1oojkjq
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Can Windows itself hog less VRAM if I only control it remotely?
for some reason Windows is hogging up 1.2gb of my VRAM even when I have no apps open and not generating anything, leaving less for my gens.
I'm thinking about using this computer strictly as a remote computer (for my Wan2.2 gens), no monitors connected, strictly controlling it remotely through my laptop. would Windows still hog the VRAM in this situation?
I know that IF I had integrated graphics I could just let Windows use that instead, but sadly my garbage computer has no iGPU. I know I could buy a seperate GPU for windows, but that feels so wasteful if it's just being connected through remotely anyway
Threadripper 3960x, TRX40 extreme motherboard, win11 pro, 5090, 256gb RAM.
https://redd.it/1oov5q3
@rStableDiffusion
for some reason Windows is hogging up 1.2gb of my VRAM even when I have no apps open and not generating anything, leaving less for my gens.
I'm thinking about using this computer strictly as a remote computer (for my Wan2.2 gens), no monitors connected, strictly controlling it remotely through my laptop. would Windows still hog the VRAM in this situation?
I know that IF I had integrated graphics I could just let Windows use that instead, but sadly my garbage computer has no iGPU. I know I could buy a seperate GPU for windows, but that feels so wasteful if it's just being connected through remotely anyway
Threadripper 3960x, TRX40 extreme motherboard, win11 pro, 5090, 256gb RAM.
https://redd.it/1oov5q3
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
How to Generate 4k Images With Flux Dype Nodes + QwenVL VS Flash VSR
https://youtu.be/iFK4AJHhOks
https://redd.it/1oowed2
@rStableDiffusion
https://youtu.be/iFK4AJHhOks
https://redd.it/1oowed2
@rStableDiffusion
YouTube
ComfyUI Tutorial: Generate 4k Images With Flux Dype Nodes #comfyui #comfyuitutorial #fluxkrea #flux
On this tutorial I will show you how to generate high resolution images using a new node known as DYPE which uses regular flux model that were trained on regular resolution images and allows them to create images at resolution far beyond their limits, reaching…
[WIP] VRAM Suite — Declarative VRAM Management Layer for PyTorch / ComfyUI
I've been developing an experimental runtime-level framework called VRAM Suite — a declarative meta-layer designed to predict and orchestrate GPU memory behavior during graph execution.
The project started as an internal debugging tool and gradually evolved into a minimal architecture for VRAM state modeling, fragmentation analysis, and predictive release scheduling.
---
Core Concept
Instead of profiling memory usage after the fact, VRAM Suite introduces a predictive orchestration layer that manages VRAM pressure before out-of-memory conditions occur.
It uses an abstract resource denoscriptor (.vramcard) and a runtime guard to coordinate allocation bursts across independent workflow nodes.
‐--
Architecture Overview
.vramcard
A JSON-based denoscriptor that defines the VRAM state at each workflow phase (reserved, allocated, released, predicted_peak).
VRAM Reader
Collects live telemetry from the CUDA allocator (total, reserved, active, fragmented).
Lightweight and independent of PyTorch internals.
VRAM Guard
Implements a phase-based memory orchestration model.
Tracks allocation patterns between nodes and predicts release windows using lag between alloc_peak and release_time.
Workflow Profiler (WIP)
Integrates with ComfyUI node graphs to visualize per-node VRAM utilization and allocation overlap.
---
Technical Notes
Runtime: PyTorch ≥ 2.10 (CUDA 13.0)
Environment: Ubuntu 24.04 (WSL2)
Error margin of VRAM prediction: ~3%
No modification of CUDACachingAllocator
Designed for ComfyUI custom node interface
---
Motivation
Current ComfyUI pipelines fail under complex chaining (VAE → LoRA → Refiner) due to unpredictable fragmentation.
Allocator caching helps persistence, but not orchestration.
VRAM Suite models the temporal structure of allocations, providing a deterministic headroom window for each node execution.
---
Roadmap
Public repository and documentation release: within a week
Initial tests will include:
sample .vramcard schema
early Guard telemetry logs
ComfyUI integration preview
---
TL;DR
> VRAM Suite introduces a declarative and predictive layer for VRAM management in PyTorch / ComfyUI.
The first MVP is functional, with open testing planned in the coming week.
https://redd.it/1oouthy
@rStableDiffusion
I've been developing an experimental runtime-level framework called VRAM Suite — a declarative meta-layer designed to predict and orchestrate GPU memory behavior during graph execution.
The project started as an internal debugging tool and gradually evolved into a minimal architecture for VRAM state modeling, fragmentation analysis, and predictive release scheduling.
---
Core Concept
Instead of profiling memory usage after the fact, VRAM Suite introduces a predictive orchestration layer that manages VRAM pressure before out-of-memory conditions occur.
It uses an abstract resource denoscriptor (.vramcard) and a runtime guard to coordinate allocation bursts across independent workflow nodes.
‐--
Architecture Overview
.vramcard
A JSON-based denoscriptor that defines the VRAM state at each workflow phase (reserved, allocated, released, predicted_peak).
VRAM Reader
Collects live telemetry from the CUDA allocator (total, reserved, active, fragmented).
Lightweight and independent of PyTorch internals.
VRAM Guard
Implements a phase-based memory orchestration model.
Tracks allocation patterns between nodes and predicts release windows using lag between alloc_peak and release_time.
Workflow Profiler (WIP)
Integrates with ComfyUI node graphs to visualize per-node VRAM utilization and allocation overlap.
---
Technical Notes
Runtime: PyTorch ≥ 2.10 (CUDA 13.0)
Environment: Ubuntu 24.04 (WSL2)
Error margin of VRAM prediction: ~3%
No modification of CUDACachingAllocator
Designed for ComfyUI custom node interface
---
Motivation
Current ComfyUI pipelines fail under complex chaining (VAE → LoRA → Refiner) due to unpredictable fragmentation.
Allocator caching helps persistence, but not orchestration.
VRAM Suite models the temporal structure of allocations, providing a deterministic headroom window for each node execution.
---
Roadmap
Public repository and documentation release: within a week
Initial tests will include:
sample .vramcard schema
early Guard telemetry logs
ComfyUI integration preview
---
TL;DR
> VRAM Suite introduces a declarative and predictive layer for VRAM management in PyTorch / ComfyUI.
The first MVP is functional, with open testing planned in the coming week.
https://redd.it/1oouthy
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
I'm making a turn back to older models for a reason.
I guess this sub mostly knows me for my startup "Mann-E" which was and is focused on image generation. I personally enjoy making and modifying models and not joking, I love doing this stuff.
Honestly the whole beast of a startup I own now has been started as my hobby of modifying and fine-tuning models in my spare time. But nowadays, models get so big that there is no difference between Qwen Image or Nano Banana, for utilizing both - as long as you don't have a big enough GPU - you may need a cloud based solution or an API which is not really "open source" anymore.
So I just took a U-turn to SDXL, but I just want to make it a "personal project" of mine now. Not a startup, but a personal project with some new concepts and ideas.
Firstly, I am thinking of using Gemma (maybe 1b or even 270m) as the text encoder of the model. I know there was a gemma based model so it makes it easier to utilize it (maybe even 12B or 27B for bigger GPU's and the purpose of multilinguality).
Second, I am thinking of that we always had image editing abilities in this game of open models right? Why not having it again? I mean that it might not be nano banana, but it obviously will be a cool local product for med/low vram people who want to experiment with these models.
P.S: Also considering FLUX models, but I think quantized versions of FLUX won't have the good results of SDXL, specially the directory of artists most of SD based models (1.5 and XL) could recognize.
https://redd.it/1oozl0j
@rStableDiffusion
I guess this sub mostly knows me for my startup "Mann-E" which was and is focused on image generation. I personally enjoy making and modifying models and not joking, I love doing this stuff.
Honestly the whole beast of a startup I own now has been started as my hobby of modifying and fine-tuning models in my spare time. But nowadays, models get so big that there is no difference between Qwen Image or Nano Banana, for utilizing both - as long as you don't have a big enough GPU - you may need a cloud based solution or an API which is not really "open source" anymore.
So I just took a U-turn to SDXL, but I just want to make it a "personal project" of mine now. Not a startup, but a personal project with some new concepts and ideas.
Firstly, I am thinking of using Gemma (maybe 1b or even 270m) as the text encoder of the model. I know there was a gemma based model so it makes it easier to utilize it (maybe even 12B or 27B for bigger GPU's and the purpose of multilinguality).
Second, I am thinking of that we always had image editing abilities in this game of open models right? Why not having it again? I mean that it might not be nano banana, but it obviously will be a cool local product for med/low vram people who want to experiment with these models.
P.S: Also considering FLUX models, but I think quantized versions of FLUX won't have the good results of SDXL, specially the directory of artists most of SD based models (1.5 and XL) could recognize.
https://redd.it/1oozl0j
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Considering a beefy upgrade. How much would WAN and VACE benefit from 96 GB VRAM?
Considering buying the RTX Pro 6000 with 96 GB VRAM to increase resolution and frame range in WAN. I also train models, but will mostly use it for high-end video diffusion and VFX projects. I have heard that WAN struggles with quality above 720p, but in my experience, 1-second test clips rendered in 1080p look fine. I have had good results at 1408 by 768 for about 121 frames, but hit OEM errors when going any higher on my current RTX 4090 24 GB.
I would love to hear any real-world experiences regarding maximum resolution and frame ranges with 96 GB VRAM before upgrading.
https://redd.it/1op2gqd
@rStableDiffusion
Considering buying the RTX Pro 6000 with 96 GB VRAM to increase resolution and frame range in WAN. I also train models, but will mostly use it for high-end video diffusion and VFX projects. I have heard that WAN struggles with quality above 720p, but in my experience, 1-second test clips rendered in 1080p look fine. I have had good results at 1408 by 768 for about 121 frames, but hit OEM errors when going any higher on my current RTX 4090 24 GB.
I would love to hear any real-world experiences regarding maximum resolution and frame ranges with 96 GB VRAM before upgrading.
https://redd.it/1op2gqd
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Masking and Scheduling LoRA
https://blog.comfy.org/p/masking-and-scheduling-lora-and-model-weights
https://redd.it/1op5sdw
@rStableDiffusion
https://blog.comfy.org/p/masking-and-scheduling-lora-and-model-weights
https://redd.it/1op5sdw
@rStableDiffusion
blog.comfy.org
Masking and Scheduling LoRA and Model Weights
As of Monday, December 2nd, ComfyUI now supports masking and scheduling LoRA and model weights natively as part of its conditioning system.
I still find flux Kontext much better for image restauration once you get the intuition on prompting and preparing the images. Qwen edit ruins and changes way too much.
https://redd.it/1op7wv0
@rStableDiffusion
https://redd.it/1op7wv0
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: I still find flux Kontext much better for image restauration once you get the intuition…
Explore this post and more from the StableDiffusion community