Tips on Running LTX2 on Low ( 8GB or little less or more) VRAM
There seems to be a lot of confusion here on how to run LTX2 on 8GB VRAM or low VRAM setups. I have been running it in a completely stable setup on 8GB VRAM 4060 (Mobile) Laptop, 64 GB RAM. Generating 10 sec videos at 768 X 768 within 3 mins. In fact I got most of my info, from someone who was running the same stuff on 6GB VRAM and 32GB RAM. When done correctly, this this throws out videos faster than Flux used to make single images. In my experience, these things are critical, ignoring any of them results in failures.
Use the Workflow provided by ComfyUI within their latest updates (LTX2 Image to Video). None of the versions provided by 3rd party references worked for me. Use the same models in it (the distilled LTX2) and the below variation of Gemma:
Use the fp8 version of Gemma (the one provided in workflow is too heavy), expand the workflow and change the clip to this version after downloading it separately.
Increase Pagefile to 128 GB, as the model, clip, etc, etc take up more than 90 to 105 GB of RAM + Virtual Memory to load up. RAM alone, no matter how much, is usually never enough. This is the biggest failure point, if not done.
Use the flags: Low VRAM (for 8GB or Less) or Reserve VRAM (for 8GB+) in the executable file.
start with 480 X 480 and gradually work up to see what limit your hardware allows.
Finally, this:
In ComfyUI\\comfy\\ldm\\lightricks\\embeddings_connector.py
replace:
hidden_states = torch.cat((hidden_states, learnable_registers[hidden_states.shape[1\]:\].unsqueeze(0).repeat(hidden_states.shape[0\], 1, 1)), dim=1)
with
hidden_states = torch.cat((hidden_states, learnable_registers[hidden_states.shape[1\]:\].unsqueeze(0).repeat(hidden_states.shape[0\], 1, 1).to(hidden_states.device)), dim=1)
.... Did this all after a day of banging my head around and giving up, then found this info from multiple places ... with above all, did not have a single issue.
https://redd.it/1q87hdn
@rStableDiffusion
There seems to be a lot of confusion here on how to run LTX2 on 8GB VRAM or low VRAM setups. I have been running it in a completely stable setup on 8GB VRAM 4060 (Mobile) Laptop, 64 GB RAM. Generating 10 sec videos at 768 X 768 within 3 mins. In fact I got most of my info, from someone who was running the same stuff on 6GB VRAM and 32GB RAM. When done correctly, this this throws out videos faster than Flux used to make single images. In my experience, these things are critical, ignoring any of them results in failures.
Use the Workflow provided by ComfyUI within their latest updates (LTX2 Image to Video). None of the versions provided by 3rd party references worked for me. Use the same models in it (the distilled LTX2) and the below variation of Gemma:
Use the fp8 version of Gemma (the one provided in workflow is too heavy), expand the workflow and change the clip to this version after downloading it separately.
Increase Pagefile to 128 GB, as the model, clip, etc, etc take up more than 90 to 105 GB of RAM + Virtual Memory to load up. RAM alone, no matter how much, is usually never enough. This is the biggest failure point, if not done.
Use the flags: Low VRAM (for 8GB or Less) or Reserve VRAM (for 8GB+) in the executable file.
start with 480 X 480 and gradually work up to see what limit your hardware allows.
Finally, this:
In ComfyUI\\comfy\\ldm\\lightricks\\embeddings_connector.py
replace:
hidden_states = torch.cat((hidden_states, learnable_registers[hidden_states.shape[1\]:\].unsqueeze(0).repeat(hidden_states.shape[0\], 1, 1)), dim=1)
with
hidden_states = torch.cat((hidden_states, learnable_registers[hidden_states.shape[1\]:\].unsqueeze(0).repeat(hidden_states.shape[0\], 1, 1).to(hidden_states.device)), dim=1)
.... Did this all after a day of banging my head around and giving up, then found this info from multiple places ... with above all, did not have a single issue.
https://redd.it/1q87hdn
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
20 seconds LTX2 video on a 3090 in only 2 minutes at 720p. Wan2GP, not comfy this time
https://redd.it/1q8e2g8
@rStableDiffusion
https://redd.it/1q8e2g8
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
Stop using T2V & Best Practices IMO (LTX Video / ComfyUI Guide)
https://redd.it/1q8dxon
@rStableDiffusion
https://redd.it/1q8dxon
@rStableDiffusion
How Many Male Genital Pics Does Z-Turbo Need for a Lora to work? Sheesh.
Trying to make a lora that can make people with male genitalia. Gathered about 150 photos to train in AI Toolkit and so far the results are pure nightmare fuel...is this going to take like 1,000+ pictures to train? Any tips from those who have had success in this realm?
https://redd.it/1q8olqf
@rStableDiffusion
Trying to make a lora that can make people with male genitalia. Gathered about 150 photos to train in AI Toolkit and so far the results are pure nightmare fuel...is this going to take like 1,000+ pictures to train? Any tips from those who have had success in this realm?
https://redd.it/1q8olqf
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
Another single 60-seconds test in LTX-2 with a more dynamic scene
https://redd.it/1q8plrd
@rStableDiffusion
https://redd.it/1q8plrd
@rStableDiffusion
All sorts of LTX-2 workflows. Getting Messy. Can we have like Workflow Link + Denoscription of what it achives in the comments here at a single place?
All people with workflows may be can comment/link workflow with denoscription/example?
https://redd.it/1q8o0d0
@rStableDiffusion
All people with workflows may be can comment/link workflow with denoscription/example?
https://redd.it/1q8o0d0
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
SDXL → Z-Image → SeedVR2, while the world burns with LTX-2 videos, here are a few images.
https://redd.it/1q8w47s
@rStableDiffusion
https://redd.it/1q8w47s
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: SDXL → Z-Image → SeedVR2, while the world burns with LTX-2 videos, here are a few…
Explore this post and more from the StableDiffusion community
Open Source Needs Competition, Not Brain-Dead “WAN Is Better” Comments
Sometimes I wonder whether all these comments around like “WAN vs anything else, WAN is better” aren’t just a handful of organized Chinese users trying to tear down any other competitive model 😆
or (heres the sad truth) if they’re simply a bunch of idiots ready to spit on everything, even on what’s handed to them for free right under their noses, and who
haven’t understood the importance of competition that drives progress in this open-source sector, which is ESSENTIAL, and we’re all hanging by a thread begging for production-ready tools that can compete with big corporations.
WAN and LTX are two different things:
one was trained to create video and audio together.
I don’t know if you even have the faintest idea of how complex that is.
Just ENCOURAGE OPENSOURCE COMPETITION, help if you can, give polite comments and testing, then add your new toy to your arsenal! wtf. God you piss me off so much with those nasty fingers always ready to type bullshit against everything.
https://redd.it/1q8wt2b
@rStableDiffusion
Sometimes I wonder whether all these comments around like “WAN vs anything else, WAN is better” aren’t just a handful of organized Chinese users trying to tear down any other competitive model 😆
or (heres the sad truth) if they’re simply a bunch of idiots ready to spit on everything, even on what’s handed to them for free right under their noses, and who
haven’t understood the importance of competition that drives progress in this open-source sector, which is ESSENTIAL, and we’re all hanging by a thread begging for production-ready tools that can compete with big corporations.
WAN and LTX are two different things:
one was trained to create video and audio together.
I don’t know if you even have the faintest idea of how complex that is.
Just ENCOURAGE OPENSOURCE COMPETITION, help if you can, give polite comments and testing, then add your new toy to your arsenal! wtf. God you piss me off so much with those nasty fingers always ready to type bullshit against everything.
https://redd.it/1q8wt2b
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Control the FAL Multiple-Angles-LoRA with Camera Angle Selector in a 3D view for Qwen-image-edit-2511
https://redd.it/1q90gq8
@rStableDiffusion
https://redd.it/1q90gq8
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Control the FAL Multiple-Angles-LoRA with Camera Angle Selector in a 3D view for…
Explore this post and more from the StableDiffusion community
WOW!! I accidentally discovered that the native LTX-2 ITV workflow can use very short videos to make longer videos containing the exact kind of thing this model isn't supposed to do (example inside w/prompt and explanation itt)
BEFORE MAKING THIS THREAD, I was Googling around to see if anyone else had found this out. I thought for sure someone had stumbled on this. And they probably have. I probably just didn't see it or whatever, but I DID do my due diligence and search before making this thread.
At any rate, yesterday, while doing an ITV generation in LTX-2, I meant to copy/paste an image from a folder but accidentally copy/pasted a GIF I'd generated with WAN 2.2. To my surprise, despite GIF files being hidden when you click to load via the file browser, you can just straight-up copy and paste the GIF you made into the LTX-2 template workflow and use that as the ITV input, and it will actually go frame by frame and add sound to the GIF.
But THAT is not the reason this is useful by itself. Because if you do that, it won't change the actual video. It'll just add sound.
However, let's say you use a 2 or 3-second GIF. Something just to establish a basic motion. Let's say a certain "position" that the model doesn't understand. It can add time to that following along with what came before.
Thus, a 2-second clip of a 1girl moving up and down (I'll be vague about why) can easily become a 10-second with dialogue and the correct motion because it has the first two seconds or less (or more) as reference.
Ideally, the shorter the GIF (33 frames works well) the better. The least amount you need to have the motion and details you want captured. Then of course there is some luck, but I have consistently gotten decent results in the 1 hour I've played around with this. But I have NOT put effort into making the video quality itself better. That I would imagine can be easily done via the ways people usually do it. I threw this example together to prove it CAN work.
The video output likely suffers from poor quality only because I am using much lower res than recommended.
Exact steps I used:
Wan 2.2 with a LORA for ... something that rhymes with "cowbirl monisiton"
I created a gif using 33 frames, 16fps.
Copy/pasted GIF using control C and control V into the LTX-2 ITV workflow. Enter prompt, generate.
Used the following prompt: A woman is moving and bouncing up very fast while moaning and expressing great pleasure. She continues to make the same motion over and over before speaking. The woman screams, "[WORDS THAT I CANNOT SAY ON THIS SUB MOST LIKELY. BUT YOU'LL BE ABLE TO SEE IT IN THE COMMENTS\]"
I have an example I'll link in the comments on Streamable. Mods, if this is unacceptable, please feel free to delete, and I will not take it personally.
Current Goal: Figuring out how to make a workflow that will generate a 2-second GIF and feed it automatically into the image input in LTX-2 video.
EDIT: if nothing else, this method also appears to guarantee non-static outputs. I don't believe it is capable of doing the "static" non-moving image thing when using this method, as it has motion to begin with and therefore cannot switch to static.
https://redd.it/1q94nlk
@rStableDiffusion
BEFORE MAKING THIS THREAD, I was Googling around to see if anyone else had found this out. I thought for sure someone had stumbled on this. And they probably have. I probably just didn't see it or whatever, but I DID do my due diligence and search before making this thread.
At any rate, yesterday, while doing an ITV generation in LTX-2, I meant to copy/paste an image from a folder but accidentally copy/pasted a GIF I'd generated with WAN 2.2. To my surprise, despite GIF files being hidden when you click to load via the file browser, you can just straight-up copy and paste the GIF you made into the LTX-2 template workflow and use that as the ITV input, and it will actually go frame by frame and add sound to the GIF.
But THAT is not the reason this is useful by itself. Because if you do that, it won't change the actual video. It'll just add sound.
However, let's say you use a 2 or 3-second GIF. Something just to establish a basic motion. Let's say a certain "position" that the model doesn't understand. It can add time to that following along with what came before.
Thus, a 2-second clip of a 1girl moving up and down (I'll be vague about why) can easily become a 10-second with dialogue and the correct motion because it has the first two seconds or less (or more) as reference.
Ideally, the shorter the GIF (33 frames works well) the better. The least amount you need to have the motion and details you want captured. Then of course there is some luck, but I have consistently gotten decent results in the 1 hour I've played around with this. But I have NOT put effort into making the video quality itself better. That I would imagine can be easily done via the ways people usually do it. I threw this example together to prove it CAN work.
The video output likely suffers from poor quality only because I am using much lower res than recommended.
Exact steps I used:
Wan 2.2 with a LORA for ... something that rhymes with "cowbirl monisiton"
I created a gif using 33 frames, 16fps.
Copy/pasted GIF using control C and control V into the LTX-2 ITV workflow. Enter prompt, generate.
Used the following prompt: A woman is moving and bouncing up very fast while moaning and expressing great pleasure. She continues to make the same motion over and over before speaking. The woman screams, "[WORDS THAT I CANNOT SAY ON THIS SUB MOST LIKELY. BUT YOU'LL BE ABLE TO SEE IT IN THE COMMENTS\]"
I have an example I'll link in the comments on Streamable. Mods, if this is unacceptable, please feel free to delete, and I will not take it personally.
Current Goal: Figuring out how to make a workflow that will generate a 2-second GIF and feed it automatically into the image input in LTX-2 video.
EDIT: if nothing else, this method also appears to guarantee non-static outputs. I don't believe it is capable of doing the "static" non-moving image thing when using this method, as it has motion to begin with and therefore cannot switch to static.
https://redd.it/1q94nlk
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community