Images created with a corrupted version of Z‑Image using a Vibecoded tool called Universal Model Corruptor.
https://redd.it/1pgsf61
@rStableDiffusion
https://redd.it/1pgsf61
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Images created with a corrupted version of Z‑Image using a Vibecoded tool called…
Explore this post and more from the StableDiffusion community
What is your current favorites realistic images checkpoint?
Newish to Stable Diffusion. What checkpoints do you current use to generate the most realistic AI images? Thank you.
https://redd.it/1pgsw96
@rStableDiffusion
Newish to Stable Diffusion. What checkpoints do you current use to generate the most realistic AI images? Thank you.
https://redd.it/1pgsw96
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Last week in Image & Video Generation
I curate a weekly newsletter on multimodal AI. Here are the image and video generation highlights from this week:
ViBT - 20B Vision Bridge Transformer
Direct trajectory modeling for conditional image and video generation.
4x faster than comparable models through unified data-to-data translation.
[Website](https://yuanshi9815.github.io/ViBT_homepage/) | [Paper](https://huggingface.co/papers/2511.23199) | [GitHub](https://github.com/Yuanshi9815/ViBT) | [Demo](https://huggingface.co/spaces/Yuanshi/ViBT) | [Model](https://huggingface.co/Yuanshi/ViBT)
https://reddit.com/link/1ph9i7o/video/m29ko6p6my5g1/player
Stable Video Infinite 2.0
Extended video generation with maintained temporal consistency.
Open-source release with full weights and ComfyUI support through KJ version.
Hugging Face | GitHub | KJ ComfyUI
Live Avatar (Alibaba) - Streaming Avatar Generation
Real-time audio-driven avatar generation with infinite length.
Streaming architecture removes time constraints from generation.
[Website](https://liveavatar.github.io/) | [Paper](https://arxiv.org/abs/2512.04677) | [GitHub](https://github.com/Alibaba-Quark/LiveAvatar) | [Hugging Face](https://huggingface.co/Quark-Vision/Live-Avatar)
https://reddit.com/link/1ph9i7o/video/gfg5k5ccmy5g1/player
Reward Forcing (Alibaba) - Real-Time Streaming Video
Interactive video generation with real-time modification capabilities.
1.3B parameter model enabling streaming video workflows.
Website | Paper | Hugging Face | GitHub
LongCat Image - 6B Image Generation
Efficient 6B parameter model for image generation.
Balances quality with computational efficiency.
[Hugging Face](https://huggingface.co/meituan-longcat/LongCat-Image) | [GitHub](https://github.com/meituan-longcat/LongCat-Image)
YingVideo-MV - Portrait Animation
Animates static portraits into singing performances with audio synchronization.
Handles facial expressions and lip-sync from audio input.
Website | Paper | GitHub
https://reddit.com/link/1ph9i7o/video/ybf3hkmemy5g1/player
BlockVid - Minute-Long Video Generation
Block diffusion approach for high-quality, consistent extended videos.
Handles minute-long generation with maintained coherence.
[Paper](https://huggingface.co/papers/2511.22973)
https://reddit.com/link/1ph9i7o/video/3mdbw4jfmy5g1/player
NeuralRemaster - Structure-Aligned Generation
Phase-preserving diffusion for structure-aligned image generation.
Maintains structural consistency through generation process.
Paper
https://reddit.com/link/1ph9i7o/video/7ccqwyegmy5g1/player
Infinity-RoPE Framework
Training-free approach for unlimited length video generation.
Extends video sequences without additional model training.
[Website](https://infinity-rope.github.io/) | [Paper](https://arxiv.org/abs/2511.20649)
< cant add more videos to this post but more videos and demos in my [free newsletter](https://open.substack.com/pub/thelivingedge/p/last-week-in-multimodal-ai-36-factual?r=12l7fk&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true) \>
Community Highlight: Video Models on 4GB VRAM
yanokusnir runs SOTA video models on 4GB VRAM and 16GB RAM.
Impressive demonstration optimization techniques on consumer hardware.
Reddit Thread \- I cant add
I curate a weekly newsletter on multimodal AI. Here are the image and video generation highlights from this week:
ViBT - 20B Vision Bridge Transformer
Direct trajectory modeling for conditional image and video generation.
4x faster than comparable models through unified data-to-data translation.
[Website](https://yuanshi9815.github.io/ViBT_homepage/) | [Paper](https://huggingface.co/papers/2511.23199) | [GitHub](https://github.com/Yuanshi9815/ViBT) | [Demo](https://huggingface.co/spaces/Yuanshi/ViBT) | [Model](https://huggingface.co/Yuanshi/ViBT)
https://reddit.com/link/1ph9i7o/video/m29ko6p6my5g1/player
Stable Video Infinite 2.0
Extended video generation with maintained temporal consistency.
Open-source release with full weights and ComfyUI support through KJ version.
Hugging Face | GitHub | KJ ComfyUI
Live Avatar (Alibaba) - Streaming Avatar Generation
Real-time audio-driven avatar generation with infinite length.
Streaming architecture removes time constraints from generation.
[Website](https://liveavatar.github.io/) | [Paper](https://arxiv.org/abs/2512.04677) | [GitHub](https://github.com/Alibaba-Quark/LiveAvatar) | [Hugging Face](https://huggingface.co/Quark-Vision/Live-Avatar)
https://reddit.com/link/1ph9i7o/video/gfg5k5ccmy5g1/player
Reward Forcing (Alibaba) - Real-Time Streaming Video
Interactive video generation with real-time modification capabilities.
1.3B parameter model enabling streaming video workflows.
Website | Paper | Hugging Face | GitHub
LongCat Image - 6B Image Generation
Efficient 6B parameter model for image generation.
Balances quality with computational efficiency.
[Hugging Face](https://huggingface.co/meituan-longcat/LongCat-Image) | [GitHub](https://github.com/meituan-longcat/LongCat-Image)
YingVideo-MV - Portrait Animation
Animates static portraits into singing performances with audio synchronization.
Handles facial expressions and lip-sync from audio input.
Website | Paper | GitHub
https://reddit.com/link/1ph9i7o/video/ybf3hkmemy5g1/player
BlockVid - Minute-Long Video Generation
Block diffusion approach for high-quality, consistent extended videos.
Handles minute-long generation with maintained coherence.
[Paper](https://huggingface.co/papers/2511.22973)
https://reddit.com/link/1ph9i7o/video/3mdbw4jfmy5g1/player
NeuralRemaster - Structure-Aligned Generation
Phase-preserving diffusion for structure-aligned image generation.
Maintains structural consistency through generation process.
Paper
https://reddit.com/link/1ph9i7o/video/7ccqwyegmy5g1/player
Infinity-RoPE Framework
Training-free approach for unlimited length video generation.
Extends video sequences without additional model training.
[Website](https://infinity-rope.github.io/) | [Paper](https://arxiv.org/abs/2511.20649)
< cant add more videos to this post but more videos and demos in my [free newsletter](https://open.substack.com/pub/thelivingedge/p/last-week-in-multimodal-ai-36-factual?r=12l7fk&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true) \>
Community Highlight: Video Models on 4GB VRAM
yanokusnir runs SOTA video models on 4GB VRAM and 16GB RAM.
Impressive demonstration optimization techniques on consumer hardware.
Reddit Thread \- I cant add
Vision Bridge Transformer
Vision Bridge Transformer at Scale
Visual Bridge Transformer scales Brownian Bridge Models to billions of params for efficient image/video translation with a transformer and stabilized velocity loss.