LTX-2 GGUF T2V/I2V 12GB Workflow V1.1 updated with new kijai node for the new video vae! That's what I get for going to sleep!!!!
https://civitai.com/models/2304098?modelVersionId=2593987
https://redd.it/1qbsoge
@rStableDiffusion
https://civitai.com/models/2304098?modelVersionId=2593987
https://redd.it/1qbsoge
@rStableDiffusion
Civitai
LTX-2 19B GGUF T2V/I2V 12GB ComfyUI Workflows - V1.1 | LTXV Workflows | Civitai
1/12/26 YOU MAY NEED TO UPDATE THE COMFYUI-GGUF NODE PACK TO USE LTX-2!!! These are two workflows that I've been using for my setup. I have 12GB VR...
Who is Sarah Peterson and why she spams Civitai with bad loras?
For a while now this person absolutely spams the civitai lora section with bad (usually adult) loras. I mean, for z-image almost half of the most recent loras are by Sarah Peterson (they all bad). It makes me wonder what is going on here.
https://redd.it/1qbzt3v
@rStableDiffusion
For a while now this person absolutely spams the civitai lora section with bad (usually adult) loras. I mean, for z-image almost half of the most recent loras are by Sarah Peterson (they all bad). It makes me wonder what is going on here.
https://redd.it/1qbzt3v
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
Soprano TTS training code released: Create your own 2000x realtime on-device text-to-speech model with Soprano-Factory!
https://redd.it/1qc5n9r
@rStableDiffusion
https://redd.it/1qc5n9r
@rStableDiffusion
GLM image launched today just now.
GLM image launched today just now.
https://redd.it/1qc8tne
@rStableDiffusion
GLM image launched today just now.
https://redd.it/1qc8tne
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Media is too big
VIEW IN TELEGRAM
LTX-2 Audio Synced to added MP3 i2v - 6 examples 3 realistic 3 animated - Non Distilled - 20s clips stitched together (Music: Dido's "Thank You")
https://redd.it/1qcc81m
@rStableDiffusion
https://redd.it/1qcc81m
@rStableDiffusion
GLM-Image explained: why autoregressive + diffusion actually matters
Seeing some confusion about what makes GLM-Image different so let me break it down.
How diffusion models work (Flux, SD, etc):
You start with pure noise. The model looks at ALL pixels simultaneously and goes "this should be a little less noisy." Repeat 20-50 times until you have an image.
The entire image evolves together in parallel. There's no concept of "first this, then that."
How autoregressive works:
Generate one piece at a time. Each new piece looks at everything before it to decide what comes next.
This is how LLMs write text:
"The cat sat on the "
→ probably "mat"
"The cat sat on the mat and "
→ probably "purred"
Each word is chosen based on all previous words.
GLM-Image does BOTH:
1. Autoregressive stage: A 9B LLM (literally initialized from GLM-4) generates ~256-4096 semantic tokens. These tokens encode MEANING and LAYOUT, not pixels.
2. Diffusion stage: A 7B diffusion model takes those semantic tokens and renders actual pixels.
Think of it like: the LLM writes a detailed blueprint, then diffusion builds the house.
Why this matters
Prompt: "A coffee shop chalkboard menu: Espresso $3.50, Latte $4.25, Cappuccino $4.75"
Diffusion approach:
- Text encoder compresses your prompt into embeddings
- Model tries to match those embeddings while denoising
- No sequential reasoning happens
- Result: "Esperrso $3.85, Latle $4.5?2" - garbled nonsense
Autoregressive approach:
- LLM actually PARSES the prompt: "ok, three items, three prices, menu format"
- Generates tokens sequentially: menu layout → first item "Espresso" → price "$3.50" → second item...
- Each token sees full context of what came before
- Result: readable text in correct positions
This is why GLM-Image hits 91% text accuracy while Flux sits around 50%.
Another example - knowledge-dense images:
Prompt: "An infographic showing the water cycle with labeled stages: evaporation, condensation, precipitation, collection"
Diffusion models struggle here because they're not actually REASONING about what an infographic should contain. They're pattern matching against training data.
Autoregressive models can leverage actual language understanding. The same architecture that knows "precipitation comes after condensation" can encode that into the image tokens.
The tradeoff:
Autoregressive is slower (sequential generation vs parallel) and the model is bigger (16B total). For pure aesthetic/vibes generation where text doesn't matter, Flux is still probably better.
But for anything where the image needs to convey actual information accurately - text, diagrams, charts, signage, documents - this architecture has a real advantage.
Will report back in a few hours with some test images.
https://redd.it/1qcegzd
@rStableDiffusion
Seeing some confusion about what makes GLM-Image different so let me break it down.
How diffusion models work (Flux, SD, etc):
You start with pure noise. The model looks at ALL pixels simultaneously and goes "this should be a little less noisy." Repeat 20-50 times until you have an image.
The entire image evolves together in parallel. There's no concept of "first this, then that."
How autoregressive works:
Generate one piece at a time. Each new piece looks at everything before it to decide what comes next.
This is how LLMs write text:
"The cat sat on the "
→ probably "mat"
"The cat sat on the mat and "
→ probably "purred"
Each word is chosen based on all previous words.
GLM-Image does BOTH:
1. Autoregressive stage: A 9B LLM (literally initialized from GLM-4) generates ~256-4096 semantic tokens. These tokens encode MEANING and LAYOUT, not pixels.
2. Diffusion stage: A 7B diffusion model takes those semantic tokens and renders actual pixels.
Think of it like: the LLM writes a detailed blueprint, then diffusion builds the house.
Why this matters
Prompt: "A coffee shop chalkboard menu: Espresso $3.50, Latte $4.25, Cappuccino $4.75"
Diffusion approach:
- Text encoder compresses your prompt into embeddings
- Model tries to match those embeddings while denoising
- No sequential reasoning happens
- Result: "Esperrso $3.85, Latle $4.5?2" - garbled nonsense
Autoregressive approach:
- LLM actually PARSES the prompt: "ok, three items, three prices, menu format"
- Generates tokens sequentially: menu layout → first item "Espresso" → price "$3.50" → second item...
- Each token sees full context of what came before
- Result: readable text in correct positions
This is why GLM-Image hits 91% text accuracy while Flux sits around 50%.
Another example - knowledge-dense images:
Prompt: "An infographic showing the water cycle with labeled stages: evaporation, condensation, precipitation, collection"
Diffusion models struggle here because they're not actually REASONING about what an infographic should contain. They're pattern matching against training data.
Autoregressive models can leverage actual language understanding. The same architecture that knows "precipitation comes after condensation" can encode that into the image tokens.
The tradeoff:
Autoregressive is slower (sequential generation vs parallel) and the model is bigger (16B total). For pure aesthetic/vibes generation where text doesn't matter, Flux is still probably better.
But for anything where the image needs to convey actual information accurately - text, diagrams, charts, signage, documents - this architecture has a real advantage.
Will report back in a few hours with some test images.
https://redd.it/1qcegzd
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Media is too big
VIEW IN TELEGRAM
Starting to play with LTX-2 ic-lora with pose control. Made a Pwnisher style video
https://redd.it/1qciya2
@rStableDiffusion
https://redd.it/1qciya2
@rStableDiffusion
Local Comparison: GLM-Image vs Flux.2 Dev vs Z-Image Turbo, no cherry picking
https://redd.it/1qcn46q
@rStableDiffusion
https://redd.it/1qcn46q
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Local Comparison: GLM-Image vs Flux.2 Dev vs Z-Image Turbo, no cherry picking
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
The Dragon (VHS Style): Z-Image Turbo - Wan 2.2 FLFTV - Qwen Image Edit 2511 - RTX 2060 Super 8GB VRAM
https://redd.it/1qcosvm
@rStableDiffusion
https://redd.it/1qcosvm
@rStableDiffusion