TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
https://huggingface.co/inclusionAI/TwinFlow-Z-Image-Turbo
https://redd.it/1q3lrk6
@rStableDiffusion
https://huggingface.co/inclusionAI/TwinFlow-Z-Image-Turbo
https://redd.it/1q3lrk6
@rStableDiffusion
huggingface.co
inclusionAI/TwinFlow-Z-Image-Turbo · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Wan2.2 : better results with lower resolution?
Usually I do a test by generating at a low resolutions like 480x480 , if I like the results I generate at a higher resolution.
But in some cases I find the low resolution generations to be better in prompt adherence and looking more natural, higher resolutions like 720x720 some time look weird.
Anyone else notice the same?
https://redd.it/1q3lq5n
@rStableDiffusion
Usually I do a test by generating at a low resolutions like 480x480 , if I like the results I generate at a higher resolution.
But in some cases I find the low resolution generations to be better in prompt adherence and looking more natural, higher resolutions like 720x720 some time look weird.
Anyone else notice the same?
https://redd.it/1q3lq5n
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Release: Invoke AI 6.10 - now supports Z-Image Turbo
The new Invoke AI v6.10.0 RC1 now supports Z-Image Turbo... https://github.com/invoke-ai/InvokeAI/releases
https://redd.it/1q3ruuo
@rStableDiffusion
The new Invoke AI v6.10.0 RC1 now supports Z-Image Turbo... https://github.com/invoke-ai/InvokeAI/releases
https://redd.it/1q3ruuo
@rStableDiffusion
GitHub
Releases · invoke-ai/InvokeAI
Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The ...
This media is not supported in your browser
VIEW IN TELEGRAM
Time-lapse of a character creation process using Qwen Edit 2511
https://redd.it/1q3sb0z
@rStableDiffusion
https://redd.it/1q3sb0z
@rStableDiffusion
The Z-Image Turbo Lora-Training Townhall
Okay guys, I think we all know that bringing up training on Reddit is always a total fustercluck. It's an art more than it is a science. To that end I'm proposing something slightly different...
Put your steps, dataset image count and anything else you think is relevant in a quick, clear comment. If you agree with someone else's comment, upvote them.
I'll run training for as many as I can of the most upvoted with an example data set and we can do a science on it.
https://redd.it/1q3tcae
@rStableDiffusion
Okay guys, I think we all know that bringing up training on Reddit is always a total fustercluck. It's an art more than it is a science. To that end I'm proposing something slightly different...
Put your steps, dataset image count and anything else you think is relevant in a quick, clear comment. If you agree with someone else's comment, upvote them.
I'll run training for as many as I can of the most upvoted with an example data set and we can do a science on it.
https://redd.it/1q3tcae
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Turned myself into a GTA-style character. Kinda feels illegal
https://redd.it/1q3vjp7
@rStableDiffusion
https://redd.it/1q3vjp7
@rStableDiffusion
WAN2.2 SVI v2.0 Pro Simplicity - infinite prompt, separate prompt lengths
https://redd.it/1q3wjyo
@rStableDiffusion
https://redd.it/1q3wjyo
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: WAN2.2 SVI v2.0 Pro Simplicity - infinite prompt, separate prompt lengths
Explore this post and more from the StableDiffusion community
SVI: One simple change fixed my slow motion and lack of prompt adherence...
https://redd.it/1q45liy
@rStableDiffusion
https://redd.it/1q45liy
@rStableDiffusion
LTXV2 Pull Request In Comfy, Coming Soon? (weights not released yet)
https://github.com/comfyanonymous/ComfyUI/pull/11632
Looking at the PR it seems to support audio and use Gemma3 12B as text encoder.
The previous LTX models had speed but nowhere near the quality of Wan 2.2 14B.
LTX 0.9.7 actually followed prompts quite well, and had a good way of handling infinite length generation in comfy, you just put in prompts delimited by a '|' character, the dev team behind LTX clearly cares as the workflows are nicely organised, they release distilled + non distilled versions same day etc.
There seems to be something about Wan 2.2 that makes it avoid body horror/keep coherence when doing more complex things, smaller/faster models like Wan 5B, Hunyuan 1.5 and even the old Wan 1.3B CAN produce really good results, but 90% of the time you'll get weird body horror or artifacts somewhere in the video, whereas with Wan 2.2 it feels more like 20%.
On top of that some of the models break down a lot quicker with lower resolution, so you're forced into higher res, partially losing the speed benefits, or they have a high quality but stupidly slow VAE (HY 1.5 and Wan 5B are like this).
I hope LTX can achieve that while being faster, or improve on Wan (more consistent/less dice roll prompt following similar to Qwen image/z image, which might be likely due to gemma as text encoder) while being the same speed.
https://redd.it/1q49ulp
@rStableDiffusion
https://github.com/comfyanonymous/ComfyUI/pull/11632
Looking at the PR it seems to support audio and use Gemma3 12B as text encoder.
The previous LTX models had speed but nowhere near the quality of Wan 2.2 14B.
LTX 0.9.7 actually followed prompts quite well, and had a good way of handling infinite length generation in comfy, you just put in prompts delimited by a '|' character, the dev team behind LTX clearly cares as the workflows are nicely organised, they release distilled + non distilled versions same day etc.
There seems to be something about Wan 2.2 that makes it avoid body horror/keep coherence when doing more complex things, smaller/faster models like Wan 5B, Hunyuan 1.5 and even the old Wan 1.3B CAN produce really good results, but 90% of the time you'll get weird body horror or artifacts somewhere in the video, whereas with Wan 2.2 it feels more like 20%.
On top of that some of the models break down a lot quicker with lower resolution, so you're forced into higher res, partially losing the speed benefits, or they have a high quality but stupidly slow VAE (HY 1.5 and Wan 5B are like this).
I hope LTX can achieve that while being faster, or improve on Wan (more consistent/less dice roll prompt following similar to Qwen image/z image, which might be likely due to gemma as text encoder) while being the same speed.
https://redd.it/1q49ulp
@rStableDiffusion
GLM-Image AR Model Support by zRzRzRzRzRzRzR · Pull Request #43100 · huggingface/transformers
https://github.com/huggingface/transformers/pull/43100/files
https://redd.it/1q42gv8
@rStableDiffusion
https://github.com/huggingface/transformers/pull/43100/files
https://redd.it/1q42gv8
@rStableDiffusion
GitHub
GLM-Image AR Model Support by zRzRzRzRzRzRzR · Pull Request #43100 · huggingface/transformers
This PR is to adapt the implementation of the AR model for GLM-Image.
This media is not supported in your browser
VIEW IN TELEGRAM
I open-sourced a tool that turns any photo into a playable Game Boy ROM using AI
https://redd.it/1q4pgaa
@rStableDiffusion
https://redd.it/1q4pgaa
@rStableDiffusion
I’m the Co-founder & CEO of Lightricks. We just open-sourced LTX-2, a production-ready audio-video AI model. AMA.
Hi everyone. **I’m Zeev Farbman, Co-founder & CEO of Lightricks.**
I’ve spent the last few years working closely with our team on [LTX-2](https://ltx.io/model), a production-ready audio–video foundation model. This week, we did a full open-source release of LTX-2, including weights, code, a trainer, benchmarks, LoRAs, and documentation.
Open releases of multimodal models are rare, and when they do happen, they’re often hard to run or hard to reproduce. We built LTX-2 to be something you can actually use: it runs locally on consumer GPUs and powers real products at Lightricks.
**I’m here to answer questions about:**
* Why we decided to open-source LTX-2
* What it took ship an open, production-ready AI model
* Tradeoffs around quality, efficiency, and control
* Where we think open multimodal models are going next
* Roadmap and plans
Ask me anything!
I’ll answer as many questions as I can, with some help from the LTX-2 team.
*Verification:*
[Lightricks CEO Zeev Farbman](https://preview.redd.it/3oo06hz2x4cg1.jpg?width=2400&format=pjpg&auto=webp&s=4c3764327c90a1af88b7e056084ed2ac8f87c60b)
https://redd.it/1q7dzq2
@rStableDiffusion
Hi everyone. **I’m Zeev Farbman, Co-founder & CEO of Lightricks.**
I’ve spent the last few years working closely with our team on [LTX-2](https://ltx.io/model), a production-ready audio–video foundation model. This week, we did a full open-source release of LTX-2, including weights, code, a trainer, benchmarks, LoRAs, and documentation.
Open releases of multimodal models are rare, and when they do happen, they’re often hard to run or hard to reproduce. We built LTX-2 to be something you can actually use: it runs locally on consumer GPUs and powers real products at Lightricks.
**I’m here to answer questions about:**
* Why we decided to open-source LTX-2
* What it took ship an open, production-ready AI model
* Tradeoffs around quality, efficiency, and control
* Where we think open multimodal models are going next
* Roadmap and plans
Ask me anything!
I’ll answer as many questions as I can, with some help from the LTX-2 team.
*Verification:*
[Lightricks CEO Zeev Farbman](https://preview.redd.it/3oo06hz2x4cg1.jpg?width=2400&format=pjpg&auto=webp&s=4c3764327c90a1af88b7e056084ed2ac8f87c60b)
https://redd.it/1q7dzq2
@rStableDiffusion
ltx.io
Multimodal Model For Generative Creation | LTX Model
LTX Model is an AI multimodal generation model built for real workflows, enabling reliable, flexible, production-grade AI creation at scale.
LTX-2 For Low V-RAM: Audio-Video Model Using Comfy UI (720p & 1080p Videos)
https://youtu.be/XOVF0wIAMQQ
https://redd.it/1q6wzqh
@rStableDiffusion
https://youtu.be/XOVF0wIAMQQ
https://redd.it/1q6wzqh
@rStableDiffusion
YouTube
LTX-2 For Low V-RAM: Audio-Video Model Using Comfy UI (720p & 1080p Videos)
🌟 Running LTX-2 on Consumer GPUs! 🌟
LTX-2 is a DiT-based audio-video foundation model that generates synchronized video and audio inside a single model — no separate pipelines, no post-sync hacks.
In this video, I show how LTX-2 can be run locally even…
LTX-2 is a DiT-based audio-video foundation model that generates synchronized video and audio inside a single model — no separate pipelines, no post-sync hacks.
In this video, I show how LTX-2 can be run locally even…
This media is not supported in your browser
VIEW IN TELEGRAM
LTX-2 team literally challenging Alibaba Wan team, this was shared on their official X account :)
https://redd.it/1q7kygr
@rStableDiffusion
https://redd.it/1q7kygr
@rStableDiffusion