r/StableDiffusion – Telegram
Open source Model to create posters/educational pictures

I have been trying to create a text to image tool for K-12 students for educational purpose. Outputs along with aesthetic pictures needs to be posters, flash cards etc with text in it.

Problem is stable diffusion models and even flux struggles with text heavily. Flux is somewhat ok sometimes but not reliable enough. I have tried layout parsing over background generated by stable diffusion too, this gives me okayish results if i hard code layouts properly so can't be automated with llm being attached for layouts.

What are my options in terms of open source models or anyone has done any work in this domain before which i can take reference from?


https://redd.it/1oo4w5g
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
New extension for ComfyUI, Model Linker. A tool that automatically detects and fixes missing model references in workflows using fuzzy matching, eliminating the need to manually relink models through multiple dropdowns

https://redd.it/1oo823a
@rStableDiffusion
What’s the best AI tool for actually making cinematic videos?






I’ve been experimenting with a few AI video creation tools lately, trying to figure out which ones actually deliver something that feels cinematic instead of just stitched-together clips. I’ve mostly been using Veo 3, Runway, and imini AI, all of them have solid strengths, but each one seems to excel at different things.

Veo does a great job with character motion and realism, but it’s not always consistent with complex scenes. Runway is fast and user-friendly, especially for social-style edits, though it still feels a bit limited when it comes to storytelling. imini AI, on the other hand, feels super smooth for generating short clips and scenes directly from prompts, especially when I want something that looks good right away without heavy editing.

What I’m chasing is a workflow where I can type something like: “A 20-second video of a sunset over Tokyo with ambient music and light motion blur,” and get something watchable without having to stitch together five different tools.

what’s everyone else using right now? Have you found a single platform that can actually handle visuals, motion, and sound together, or are you mixing multiple ones to get the right result? Would love to hear what’s working best for you.

https://redd.it/1oo4ir2
@rStableDiffusion
Spent 48 hours building a cinematic AI portrait workflow — here’s my best result so far.
https://redd.it/1oocz8o
@rStableDiffusion
How to avoid slow motion in Wan 2.2?

New to Wan kicking the tires right now. The quality is great but everything is super slow motion. I've tried changing prompts, length duration and fps and the characters are always moving in molasses. Does anyone have any thoughts about how to correct this? Thanks.

https://redd.it/1oojkjq
@rStableDiffusion
Can Windows itself hog less VRAM if I only control it remotely?

for some reason Windows is hogging up 1.2gb of my VRAM even when I have no apps open and not generating anything, leaving less for my gens.

I'm thinking about using this computer strictly as a remote computer (for my Wan2.2 gens), no monitors connected, strictly controlling it remotely through my laptop. would Windows still hog the VRAM in this situation?

I know that IF I had integrated graphics I could just let Windows use that instead, but sadly my garbage computer has no iGPU. I know I could buy a seperate GPU for windows, but that feels so wasteful if it's just being connected through remotely anyway

Threadripper 3960x, TRX40 extreme motherboard, win11 pro, 5090, 256gb RAM.

https://redd.it/1oov5q3
@rStableDiffusion
[WIP] VRAM Suite — Declarative VRAM Management Layer for PyTorch / ComfyUI

I've been developing an experimental runtime-level framework called VRAM Suite — a declarative meta-layer designed to predict and orchestrate GPU memory behavior during graph execution.

The project started as an internal debugging tool and gradually evolved into a minimal architecture for VRAM state modeling, fragmentation analysis, and predictive release scheduling.


---

Core Concept

Instead of profiling memory usage after the fact, VRAM Suite introduces a predictive orchestration layer that manages VRAM pressure before out-of-memory conditions occur.

It uses an abstract resource denoscriptor (.vramcard) and a runtime guard to coordinate allocation bursts across independent workflow nodes.

‐--

Architecture Overview

.vramcard
A JSON-based denoscriptor that defines the VRAM state at each workflow phase (reserved, allocated, released, predicted_peak).

VRAM Reader
Collects live telemetry from the CUDA allocator (total, reserved, active, fragmented).
Lightweight and independent of PyTorch internals.

VRAM Guard
Implements a phase-based memory orchestration model.
Tracks allocation patterns between nodes and predicts release windows using lag between alloc_peak and release_time.

Workflow Profiler (WIP)
Integrates with ComfyUI node graphs to visualize per-node VRAM utilization and allocation overlap.


---

Technical Notes

Runtime: PyTorch ≥ 2.10 (CUDA 13.0)

Environment: Ubuntu 24.04 (WSL2)

Error margin of VRAM prediction: ~3%

No modification of CUDACachingAllocator

Designed for ComfyUI custom node interface



---

Motivation

Current ComfyUI pipelines fail under complex chaining (VAE → LoRA → Refiner) due to unpredictable fragmentation.
Allocator caching helps persistence, but not orchestration.
VRAM Suite models the temporal structure of allocations, providing a deterministic headroom window for each node execution.


---

Roadmap

Public repository and documentation release: within a week

Initial tests will include:

sample .vramcard schema

early Guard telemetry logs

ComfyUI integration preview




---

TL;DR

> VRAM Suite introduces a declarative and predictive layer for VRAM management in PyTorch / ComfyUI.
The first MVP is functional, with open testing planned in the coming week.

https://redd.it/1oouthy
@rStableDiffusion
I'm making a turn back to older models for a reason.

I guess this sub mostly knows me for my startup "Mann-E" which was and is focused on image generation. I personally enjoy making and modifying models and not joking, I love doing this stuff.

Honestly the whole beast of a startup I own now has been started as my hobby of modifying and fine-tuning models in my spare time. But nowadays, models get so big that there is no difference between Qwen Image or Nano Banana, for utilizing both - as long as you don't have a big enough GPU - you may need a cloud based solution or an API which is not really "open source" anymore.

So I just took a U-turn to SDXL, but I just want to make it a "personal project" of mine now. Not a startup, but a personal project with some new concepts and ideas.

Firstly, I am thinking of using Gemma (maybe 1b or even 270m) as the text encoder of the model. I know there was a gemma based model so it makes it easier to utilize it (maybe even 12B or 27B for bigger GPU's and the purpose of multilinguality).

Second, I am thinking of that we always had image editing abilities in this game of open models right? Why not having it again? I mean that it might not be nano banana, but it obviously will be a cool local product for med/low vram people who want to experiment with these models.

P.S: Also considering FLUX models, but I think quantized versions of FLUX won't have the good results of SDXL, specially the directory of artists most of SD based models (1.5 and XL) could recognize.

https://redd.it/1oozl0j
@rStableDiffusion
Considering a beefy upgrade. How much would WAN and VACE benefit from 96 GB VRAM?

Considering buying the RTX Pro 6000 with 96 GB VRAM to increase resolution and frame range in WAN. I also train models, but will mostly use it for high-end video diffusion and VFX projects. I have heard that WAN struggles with quality above 720p, but in my experience, 1-second test clips rendered in 1080p look fine. I have had good results at 1408 by 768 for about 121 frames, but hit OEM errors when going any higher on my current RTX 4090 24 GB.

I would love to hear any real-world experiences regarding maximum resolution and frame ranges with 96 GB VRAM before upgrading.

https://redd.it/1op2gqd
@rStableDiffusion