r/StableDiffusion – Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
I open-sourced a tool that turns any photo into a playable Game Boy ROM using AI

https://redd.it/1q4pgaa
@rStableDiffusion
I’m the Co-founder & CEO of Lightricks. We just open-sourced LTX-2, a production-ready audio-video AI model. AMA.

Hi everyone. **I’m Zeev Farbman, Co-founder & CEO of Lightricks.**

I’ve spent the last few years working closely with our team on [LTX-2](https://ltx.io/model), a production-ready audio–video foundation model. This week, we did a full open-source release of LTX-2, including weights, code, a trainer, benchmarks, LoRAs, and documentation.

Open releases of multimodal models are rare, and when they do happen, they’re often hard to run or hard to reproduce. We built LTX-2 to be something you can actually use: it runs locally on consumer GPUs and powers real products at Lightricks.

**I’m here to answer questions about:**

* Why we decided to open-source LTX-2
* What it took ship an open, production-ready AI model
* Tradeoffs around quality, efficiency, and control
* Where we think open multimodal models are going next
* Roadmap and plans

Ask me anything!
I’ll answer as many questions as I can, with some help from the LTX-2 team.

*Verification:*

[Lightricks CEO Zeev Farbman](https://preview.redd.it/3oo06hz2x4cg1.jpg?width=2400&format=pjpg&auto=webp&s=4c3764327c90a1af88b7e056084ed2ac8f87c60b)



https://redd.it/1q7dzq2
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
LTX-2 team literally challenging Alibaba Wan team, this was shared on their official X account :)

https://redd.it/1q7kygr
@rStableDiffusion
someone posted today about sage attention 3, I tested it and here is my results

Hardware: RTX 5090 + 64GB DDR4 RAM.

Test: same input image, same prompt, 121 frames, 16 fps, 720x1280

1. Lightx2v high/low models (not loras) + sage attention node set to auto: 160 seconds
2. Lightx2v high/low models (not loras) + sage attention node set to sage3: 85 seconds
3. Lightx2v high/low models (not loras) + no sage attention: 223 seconds
4. Full WAN 2.2 fp16 models, no loras + sage 3: 17 minutes
5. Full WAN 2.2 fp16, no loras, no sage attention: 24.5 minutes

Quality best to worst: 5 > 1&2 > 3 > 4

I'm lazy to upload all generations but uploading whats important:

4. using Wan 2.2 fp16 + sage3: https://files.catbox.moe/a3eosn.mp4, Quality Speaks for itself

2. lightx2v + sage 3 https://files.catbox.moe/nd9dtz.mp4

3. lightx2v no sage attention https://files.catbox.moe/ivhy68.mp4

hope this helps.

Edit: if anyone wants to test this this is how I installed sage3 and got it running in Comfyui portable:

******Note 1: do this at your own risk, I personally have multiple running copies of Comfyui portable in case anything went wrong.

*****Note 2: assuming you have triton installed which should be installed if you use SA2.2.

1. Download the wheel that matches your cuda, pytorch, and python versions from here, https://github.com/mengqin/SageAttention/releases/tag/20251229
2. Place the wheel in your .\\python_embeded\\ folder
3. Run this in command "ComfyUI\\python_embeded\\python.exe -m pip install full_wheel_name.whl"

https://redd.it/1q7yzsp
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
Thx to Kijai LTX-2 GGUFs are now up. Even Q6 is better quality than FP8 imo.

https://redd.it/1q8590s
@rStableDiffusion
Tips on Running LTX2 on Low ( 8GB or little less or more) VRAM

There seems to be a lot of confusion here on how to run LTX2 on 8GB VRAM or low VRAM setups. I have been running it in a completely stable setup on 8GB VRAM 4060 (Mobile) Laptop, 64 GB RAM. Generating 10 sec videos at 768 X 768 within 3 mins. In fact I got most of my info, from someone who was running the same stuff on 6GB VRAM and 32GB RAM. When done correctly, this this throws out videos faster than Flux used to make single images. In my experience, these things are critical, ignoring any of them results in failures.

Use the Workflow provided by ComfyUI within their latest updates (LTX2 Image to Video). None of the versions provided by 3rd party references worked for me. Use the same models in it (the distilled LTX2) and the below variation of Gemma:
Use the fp8 version of Gemma (the one provided in workflow is too heavy), expand the workflow and change the clip to this version after downloading it separately.
Increase Pagefile to 128 GB, as the model, clip, etc, etc take up more than 90 to 105 GB of RAM + Virtual Memory to load up. RAM alone, no matter how much, is usually never enough. This is the biggest failure point, if not done.
Use the flags: Low VRAM (for 8GB or Less) or Reserve VRAM (for 8GB+) in the executable file.
start with 480 X 480 and gradually work up to see what limit your hardware allows.
Finally, this:

In ComfyUI\\comfy\\ldm\\lightricks\\embeddings_connector.py

replace:

hidden_states = torch.cat((hidden_states, learnable_registers[hidden_states.shape[1\]:\].unsqueeze(0).repeat(hidden_states.shape[0\], 1, 1)), dim=1)

with

hidden_states = torch.cat((hidden_states, learnable_registers[hidden_states.shape[1\]:\].unsqueeze(0).repeat(hidden_states.shape[0\], 1, 1).to(hidden_states.device)), dim=1)

.... Did this all after a day of banging my head around and giving up, then found this info from multiple places ... with above all, did not have a single issue.

https://redd.it/1q87hdn
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
20 seconds LTX2 video on a 3090 in only 2 minutes at 720p. Wan2GP, not comfy this time

https://redd.it/1q8e2g8
@rStableDiffusion