What is your current favorites realistic images checkpoint?
Newish to Stable Diffusion. What checkpoints do you current use to generate the most realistic AI images? Thank you.
https://redd.it/1pgsw96
@rStableDiffusion
Newish to Stable Diffusion. What checkpoints do you current use to generate the most realistic AI images? Thank you.
https://redd.it/1pgsw96
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Last week in Image & Video Generation
I curate a weekly newsletter on multimodal AI. Here are the image and video generation highlights from this week:
ViBT - 20B Vision Bridge Transformer
Direct trajectory modeling for conditional image and video generation.
4x faster than comparable models through unified data-to-data translation.
[Website](https://yuanshi9815.github.io/ViBT_homepage/) | [Paper](https://huggingface.co/papers/2511.23199) | [GitHub](https://github.com/Yuanshi9815/ViBT) | [Demo](https://huggingface.co/spaces/Yuanshi/ViBT) | [Model](https://huggingface.co/Yuanshi/ViBT)
https://reddit.com/link/1ph9i7o/video/m29ko6p6my5g1/player
Stable Video Infinite 2.0
Extended video generation with maintained temporal consistency.
Open-source release with full weights and ComfyUI support through KJ version.
Hugging Face | GitHub | KJ ComfyUI
Live Avatar (Alibaba) - Streaming Avatar Generation
Real-time audio-driven avatar generation with infinite length.
Streaming architecture removes time constraints from generation.
[Website](https://liveavatar.github.io/) | [Paper](https://arxiv.org/abs/2512.04677) | [GitHub](https://github.com/Alibaba-Quark/LiveAvatar) | [Hugging Face](https://huggingface.co/Quark-Vision/Live-Avatar)
https://reddit.com/link/1ph9i7o/video/gfg5k5ccmy5g1/player
Reward Forcing (Alibaba) - Real-Time Streaming Video
Interactive video generation with real-time modification capabilities.
1.3B parameter model enabling streaming video workflows.
Website | Paper | Hugging Face | GitHub
LongCat Image - 6B Image Generation
Efficient 6B parameter model for image generation.
Balances quality with computational efficiency.
[Hugging Face](https://huggingface.co/meituan-longcat/LongCat-Image) | [GitHub](https://github.com/meituan-longcat/LongCat-Image)
YingVideo-MV - Portrait Animation
Animates static portraits into singing performances with audio synchronization.
Handles facial expressions and lip-sync from audio input.
Website | Paper | GitHub
https://reddit.com/link/1ph9i7o/video/ybf3hkmemy5g1/player
BlockVid - Minute-Long Video Generation
Block diffusion approach for high-quality, consistent extended videos.
Handles minute-long generation with maintained coherence.
[Paper](https://huggingface.co/papers/2511.22973)
https://reddit.com/link/1ph9i7o/video/3mdbw4jfmy5g1/player
NeuralRemaster - Structure-Aligned Generation
Phase-preserving diffusion for structure-aligned image generation.
Maintains structural consistency through generation process.
Paper
https://reddit.com/link/1ph9i7o/video/7ccqwyegmy5g1/player
Infinity-RoPE Framework
Training-free approach for unlimited length video generation.
Extends video sequences without additional model training.
[Website](https://infinity-rope.github.io/) | [Paper](https://arxiv.org/abs/2511.20649)
< cant add more videos to this post but more videos and demos in my [free newsletter](https://open.substack.com/pub/thelivingedge/p/last-week-in-multimodal-ai-36-factual?r=12l7fk&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true) \>
Community Highlight: Video Models on 4GB VRAM
yanokusnir runs SOTA video models on 4GB VRAM and 16GB RAM.
Impressive demonstration optimization techniques on consumer hardware.
Reddit Thread \- I cant add
I curate a weekly newsletter on multimodal AI. Here are the image and video generation highlights from this week:
ViBT - 20B Vision Bridge Transformer
Direct trajectory modeling for conditional image and video generation.
4x faster than comparable models through unified data-to-data translation.
[Website](https://yuanshi9815.github.io/ViBT_homepage/) | [Paper](https://huggingface.co/papers/2511.23199) | [GitHub](https://github.com/Yuanshi9815/ViBT) | [Demo](https://huggingface.co/spaces/Yuanshi/ViBT) | [Model](https://huggingface.co/Yuanshi/ViBT)
https://reddit.com/link/1ph9i7o/video/m29ko6p6my5g1/player
Stable Video Infinite 2.0
Extended video generation with maintained temporal consistency.
Open-source release with full weights and ComfyUI support through KJ version.
Hugging Face | GitHub | KJ ComfyUI
Live Avatar (Alibaba) - Streaming Avatar Generation
Real-time audio-driven avatar generation with infinite length.
Streaming architecture removes time constraints from generation.
[Website](https://liveavatar.github.io/) | [Paper](https://arxiv.org/abs/2512.04677) | [GitHub](https://github.com/Alibaba-Quark/LiveAvatar) | [Hugging Face](https://huggingface.co/Quark-Vision/Live-Avatar)
https://reddit.com/link/1ph9i7o/video/gfg5k5ccmy5g1/player
Reward Forcing (Alibaba) - Real-Time Streaming Video
Interactive video generation with real-time modification capabilities.
1.3B parameter model enabling streaming video workflows.
Website | Paper | Hugging Face | GitHub
LongCat Image - 6B Image Generation
Efficient 6B parameter model for image generation.
Balances quality with computational efficiency.
[Hugging Face](https://huggingface.co/meituan-longcat/LongCat-Image) | [GitHub](https://github.com/meituan-longcat/LongCat-Image)
YingVideo-MV - Portrait Animation
Animates static portraits into singing performances with audio synchronization.
Handles facial expressions and lip-sync from audio input.
Website | Paper | GitHub
https://reddit.com/link/1ph9i7o/video/ybf3hkmemy5g1/player
BlockVid - Minute-Long Video Generation
Block diffusion approach for high-quality, consistent extended videos.
Handles minute-long generation with maintained coherence.
[Paper](https://huggingface.co/papers/2511.22973)
https://reddit.com/link/1ph9i7o/video/3mdbw4jfmy5g1/player
NeuralRemaster - Structure-Aligned Generation
Phase-preserving diffusion for structure-aligned image generation.
Maintains structural consistency through generation process.
Paper
https://reddit.com/link/1ph9i7o/video/7ccqwyegmy5g1/player
Infinity-RoPE Framework
Training-free approach for unlimited length video generation.
Extends video sequences without additional model training.
[Website](https://infinity-rope.github.io/) | [Paper](https://arxiv.org/abs/2511.20649)
< cant add more videos to this post but more videos and demos in my [free newsletter](https://open.substack.com/pub/thelivingedge/p/last-week-in-multimodal-ai-36-factual?r=12l7fk&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true) \>
Community Highlight: Video Models on 4GB VRAM
yanokusnir runs SOTA video models on 4GB VRAM and 16GB RAM.
Impressive demonstration optimization techniques on consumer hardware.
Reddit Thread \- I cant add
Vision Bridge Transformer
Vision Bridge Transformer at Scale
Visual Bridge Transformer scales Brownian Bridge Models to billions of params for efficient image/video translation with a transformer and stabilized velocity loss.
more videos to this post but this video is available in this thread
Community Highlight: SOTA Image Model Comparison
[BoostPixels](https://www.reddit.com/user/BoostPixels/) compares Z-Image-Turbo, Gemini 3 Pro, and Qwen Image Edit 2509 on uncanny valley performance.
Reddit Thread
https://preview.redd.it/bcukexdkny5g1.png?width=1080&format=png&auto=webp&s=cfec4e72b99f305fd1e8e3901f4f59a6bac512ce
Community Highlight: NanoBanana Pro LoRA Dataset Generator
[Lovis Odin](https://x.com/OdinLovis) releases tool for creating training datasets for Flux 2, Z-Image, Qwen Image Edit, and other image-to-image models.
Simplifies dataset creation for fine-tuning workflows.
[Post](https://x.com/OdinLovis/status/1996683967861608839?s=20) | [Website](http://lovis.io/NanoBananaLoraDatasetGenerator) | [GitHub](https://github.com/lovisdotio/NanoBananaLoraDatasetGenerator)
\ I couldnt add any more videos to this post but more videos, demos and resources are available in my free newsletter
https://redd.it/1ph9i7o
@rStableDiffusion
Community Highlight: SOTA Image Model Comparison
[BoostPixels](https://www.reddit.com/user/BoostPixels/) compares Z-Image-Turbo, Gemini 3 Pro, and Qwen Image Edit 2509 on uncanny valley performance.
Reddit Thread
https://preview.redd.it/bcukexdkny5g1.png?width=1080&format=png&auto=webp&s=cfec4e72b99f305fd1e8e3901f4f59a6bac512ce
Community Highlight: NanoBanana Pro LoRA Dataset Generator
[Lovis Odin](https://x.com/OdinLovis) releases tool for creating training datasets for Flux 2, Z-Image, Qwen Image Edit, and other image-to-image models.
Simplifies dataset creation for fine-tuning workflows.
[Post](https://x.com/OdinLovis/status/1996683967861608839?s=20) | [Website](http://lovis.io/NanoBananaLoraDatasetGenerator) | [GitHub](https://github.com/lovisdotio/NanoBananaLoraDatasetGenerator)
\ I couldnt add any more videos to this post but more videos, demos and resources are available in my free newsletter
https://redd.it/1ph9i7o
@rStableDiffusion
Reddit
Check out BoostPixels’s Reddit profile
Explore BoostPixels’s posts and comments on Reddit
Aquif-Image-14B Was An Stolen Model: Real One Is Magic-Wan-Image V2.0
https://redd.it/1phd5gx
@rStableDiffusion
https://redd.it/1phd5gx
@rStableDiffusion