Qwen3-TTS, a series of powerful speech generation capabilities
https://redd.it/1qjuebr
@rStableDiffusion
https://redd.it/1qjuebr
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
PersonaPlex: Voice and role control for full duplex conversational speech models by Nvidia
https://redd.it/1qjtpf1
@rStableDiffusion
https://redd.it/1qjtpf1
@rStableDiffusion
LTX2 issues probably won't be fixed by loras/workflows
When Wan2.2 released the speedup loras were a mess, there was mass confusion on getting enough motion out of characters, and the video length issues resulted in a flood of hacky continuation workflows
But the core model always worked well: it had excellent prompt adherence, and it understood the movement and structure of humans well
LTX2 at its peak exceeds Wan, and some of the outputs are brilliant in terms of fluid movement and quality
But the model is unstable, which results in a high fail rate. It is an absolute shot in the dark as to whether the prompts will land as expected, and the structure of humans is fragile and often nonsensical
I'll admit LTX2 has made it difficult to go back to Wan because when it's better, it's much better. But it's core base simply needs more work, so I'm mostly holding out for LTX3
https://redd.it/1qjyoqz
@rStableDiffusion
When Wan2.2 released the speedup loras were a mess, there was mass confusion on getting enough motion out of characters, and the video length issues resulted in a flood of hacky continuation workflows
But the core model always worked well: it had excellent prompt adherence, and it understood the movement and structure of humans well
LTX2 at its peak exceeds Wan, and some of the outputs are brilliant in terms of fluid movement and quality
But the model is unstable, which results in a high fail rate. It is an absolute shot in the dark as to whether the prompts will land as expected, and the structure of humans is fragile and often nonsensical
I'll admit LTX2 has made it difficult to go back to Wan because when it's better, it's much better. But it's core base simply needs more work, so I'm mostly holding out for LTX3
https://redd.it/1qjyoqz
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
AI girls flodding social media, including Reddit
Hi everyone,
I guess anyone who has worked with diffusion models for a while can spot that average 1girl AI look from a mile away.
I'm just curious by now how do you guys deal with it? Do you report it or just ignore it?
Personally, I report it if the subreddit explicitly bans AI. But Instagram is so flooded with bots and accounts fishing for engagement that I feel like it's pointless to try and report every single one.
https://redd.it/1qk0vac
@rStableDiffusion
Hi everyone,
I guess anyone who has worked with diffusion models for a while can spot that average 1girl AI look from a mile away.
I'm just curious by now how do you guys deal with it? Do you report it or just ignore it?
Personally, I report it if the subreddit explicitly bans AI. But Instagram is so flooded with bots and accounts fishing for engagement that I feel like it's pointless to try and report every single one.
https://redd.it/1qk0vac
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Media is too big
VIEW IN TELEGRAM
According the Ace Step team, ACE STEP 1.5 music model releases soon! This is what I got when I asked it for a mix of dubstep, arpeggios, gritty bassline, female singer, melodic. Nice mashup IMO.
https://redd.it/1qk2odv
@rStableDiffusion
https://redd.it/1qk2odv
@rStableDiffusion
How to render 80+ second long videos with LTX 2 using one simple node and no extensions.
I've have amazing results with this node:
Reddit: Enabling 800-900+ frame videos (at 1920x1088) on a single 24GB GPU Text-To-Video in ComfyUI
Github: ComfyUI\_LTX-2\_VRAM\_Memory\_Management
From the github repo:
"Generate extremely long videos with LTX-2 on consumer GPUs
This custom node dramatically reduces VRAM usage for LTX-2 video generation in ComfyUI, enabling 800-900+ frames (at 1920x1088) on a single 24GB GPU. LTX-2's FeedForward layers create massive intermediate tensors that normally limit video length. This node chunks those operations to reduce peak memory by up to 8x, without any quality loss."
This really helps prevent OOMs, especially if you have less VRAM.
You can add this node to any existing LTX-2 workflow, no need to reinvent the wheel.
I just finished a 960x544 2000 frame / 80 sec. render in 17 minutes on a 4090 24bg VRAM 64 GB RAM system. In the past, there was no way I'd come close to these results. Lip-sync and image quality hold through out the video.
This project is a work in progress and the author is actively seeking feedback.
Go get chunked!
https://redd.it/1qke313
@rStableDiffusion
I've have amazing results with this node:
Reddit: Enabling 800-900+ frame videos (at 1920x1088) on a single 24GB GPU Text-To-Video in ComfyUI
Github: ComfyUI\_LTX-2\_VRAM\_Memory\_Management
From the github repo:
"Generate extremely long videos with LTX-2 on consumer GPUs
This custom node dramatically reduces VRAM usage for LTX-2 video generation in ComfyUI, enabling 800-900+ frames (at 1920x1088) on a single 24GB GPU. LTX-2's FeedForward layers create massive intermediate tensors that normally limit video length. This node chunks those operations to reduce peak memory by up to 8x, without any quality loss."
This really helps prevent OOMs, especially if you have less VRAM.
You can add this node to any existing LTX-2 workflow, no need to reinvent the wheel.
I just finished a 960x544 2000 frame / 80 sec. render in 17 minutes on a 4090 24bg VRAM 64 GB RAM system. In the past, there was no way I'd come close to these results. Lip-sync and image quality hold through out the video.
This project is a work in progress and the author is actively seeking feedback.
Go get chunked!
https://redd.it/1qke313
@rStableDiffusion
Reddit
Inevitable-Start-653's comment on "Enabling 800-900+ frame videos (at 1920x1088) on a single 24GB GPU Text-To-Video in ComfyUI"
Explore this conversation and more from the StableDiffusion community
Flux Dev 1 Vs Z image. Unfortunately, Zimage is very poor at generating landscapes (The same applies to WAN - But it's very good with interior scenes)
https://redd.it/1qke70f
@rStableDiffusion
https://redd.it/1qke70f
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Flux Dev 1 Vs Z image. Unfortunately, Zimage is very poor at generating landscapes…
Explore this post and more from the StableDiffusion community
Tensorstack Diffuse v0.4.0 beta just dropped about an hour ago and I like it!
https://redd.it/1qkjxsv
@rStableDiffusion
https://redd.it/1qkjxsv
@rStableDiffusion
Flux.2 / Klein Inpaint Segment Edit (edit segments one-by-one!)
https://redd.it/1qkkv6q
@rStableDiffusion
https://redd.it/1qkkv6q
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Flux.2 / Klein Inpaint Segment Edit (edit segments one-by-one!)
Explore this post and more from the StableDiffusion community
Follow up: AI 3D generation has improved a lot, better topology & texturing results
https://redd.it/1qkl12j
@rStableDiffusion
https://redd.it/1qkl12j
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Follow up: AI 3D generation has improved a lot, better topology & texturing results
Explore this post and more from the StableDiffusion community