NEW BOT Телеграм, страница

r/StableDiffusion

Last week in Image & Video Generation

I curate a weekly newsletter on multimodal AI. Here are the image and video generation highlights from this week:

ViBT - 20B Vision Bridge Transformer

Direct trajectory modeling for conditional image and video generation.
4x faster than comparable models through unified data-to-data translation.
[Website](https://yuanshi9815.github.io/ViBT_homepage/) | [Paper](https://huggingface.co/papers/2511.23199) | [GitHub](https://github.com/Yuanshi9815/ViBT) | [Demo](https://huggingface.co/spaces/Yuanshi/ViBT) | [Model](https://huggingface.co/Yuanshi/ViBT)

https://reddit.com/link/1ph9i7o/video/m29ko6p6my5g1/player

Stable Video Infinite 2.0

Extended video generation with maintained temporal consistency.
Open-source release with full weights and ComfyUI support through KJ version.
Hugging Face | GitHub | KJ ComfyUI

Live Avatar (Alibaba) - Streaming Avatar Generation

Real-time audio-driven avatar generation with infinite length.
Streaming architecture removes time constraints from generation.
[Website](https://liveavatar.github.io/) | [Paper](https://arxiv.org/abs/2512.04677) | [GitHub](https://github.com/Alibaba-Quark/LiveAvatar) | [Hugging Face](https://huggingface.co/Quark-Vision/Live-Avatar)

https://reddit.com/link/1ph9i7o/video/gfg5k5ccmy5g1/player

Reward Forcing (Alibaba) - Real-Time Streaming Video

Interactive video generation with real-time modification capabilities.
1.3B parameter model enabling streaming video workflows.
Website | Paper | Hugging Face | GitHub

LongCat Image - 6B Image Generation

Efficient 6B parameter model for image generation.
Balances quality with computational efficiency.
[Hugging Face](https://huggingface.co/meituan-longcat/LongCat-Image) | [GitHub](https://github.com/meituan-longcat/LongCat-Image)

YingVideo-MV - Portrait Animation

Animates static portraits into singing performances with audio synchronization.
Handles facial expressions and lip-sync from audio input.
Website | Paper | GitHub

https://reddit.com/link/1ph9i7o/video/ybf3hkmemy5g1/player

BlockVid - Minute-Long Video Generation

Block diffusion approach for high-quality, consistent extended videos.
Handles minute-long generation with maintained coherence.
[Paper](https://huggingface.co/papers/2511.22973)

https://reddit.com/link/1ph9i7o/video/3mdbw4jfmy5g1/player

NeuralRemaster - Structure-Aligned Generation

Phase-preserving diffusion for structure-aligned image generation.
Maintains structural consistency through generation process.
Paper

https://reddit.com/link/1ph9i7o/video/7ccqwyegmy5g1/player

Infinity-RoPE Framework

Training-free approach for unlimited length video generation.
Extends video sequences without additional model training.
[Website](https://infinity-rope.github.io/) | [Paper](https://arxiv.org/abs/2511.20649)

< cant add more videos to this post but more videos and demos in my [free newsletter](https://open.substack.com/pub/thelivingedge/p/last-week-in-multimodal-ai-36-factual?r=12l7fk&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true) \>

Community Highlight: Video Models on 4GB VRAM

yanokusnir runs SOTA video models on 4GB VRAM and 16GB RAM.
Impressive demonstration optimization techniques on consumer hardware.
Reddit Thread \- I cant add

Vision Bridge Transformer

Vision Bridge Transformer at Scale

Visual Bridge Transformer scales Brownian Bridge Models to billions of params for efficient image/video translation with a transformer and stabilized velocity loss.

4 views12:40

r/StableDiffusion

more videos to this post but this video is available in this thread

Community Highlight: SOTA Image Model Comparison

[BoostPixels](https://www.reddit.com/user/BoostPixels/) compares Z-Image-Turbo, Gemini 3 Pro, and Qwen Image Edit 2509 on uncanny valley performance.
Reddit Thread

https://preview.redd.it/bcukexdkny5g1.png?width=1080&format=png&auto=webp&s=cfec4e72b99f305fd1e8e3901f4f59a6bac512ce

Community Highlight: NanoBanana Pro LoRA Dataset Generator

[Lovis Odin](https://x.com/OdinLovis) releases tool for creating training datasets for Flux 2, Z-Image, Qwen Image Edit, and other image-to-image models.
Simplifies dataset creation for fine-tuning workflows.
[Post](https://x.com/OdinLovis/status/1996683967861608839?s=20) | [Website](http://lovis.io/NanoBananaLoraDatasetGenerator) | [GitHub](https://github.com/lovisdotio/NanoBananaLoraDatasetGenerator)

\ I couldnt add any more videos to this post but more videos, demos and resources are available in my free newsletter

https://redd.it/1ph9i7o
@rStableDiffusion

Check out BoostPixels’s Reddit profile

Explore BoostPixels’s posts and comments on Reddit

5 views12:40

r/StableDiffusion

Aquif-Image-14B Was An Stolen Model: Real One Is Magic-Wan-Image V2.0
https://redd.it/1phd5gx
@rStableDiffusion

5 views14:40

r/StableDiffusion

Flux 2 is good at meaningless prompts
https://redd.it/1phgjir
@rStableDiffusion

6 views16:40

r/StableDiffusion

DC Vivid Dark Fantasy Painting & DC Dark Fantasy Style 1 [Z-Image Turbo Loras]

https://redd.it/1phgj87
@rStableDiffusion

From the StableDiffusion community on Reddit: DC Vivid Dark Fantasy Painting & DC Dark Fantasy Style 1 [Z-Image Turbo Loras]