r/StableDiffusion – Telegram
Last week in Image & Video Generation

I curate a weekly newsletter on multimodal AI. Here are the image & video generation highlights from this week:

**One Attention Layer is Enough(Apple)**

* Apple proves single attention layer transforms vision features into SOTA generators.
* Dramatically simplifies diffusion architecture without sacrificing quality.
* [Paper](https://arxiv.org/abs/2512.07829)

https://preview.redd.it/ggv1v459qb7g1.jpg?width=2294&format=pjpg&auto=webp&s=7c830bb9a64cfeddf7442910e7eef6c6dff935e1

**DMVAE - Reference-Matching VAE**

* Matches latent distributions to any reference for controlled generation.
* Achieves state-of-the-art synthesis with fewer training epochs.
* [Paper](https://huggingface.co/papers/2512.07778) | [Model](https://huggingface.co/sen-ye/dmvae/tree/main)

https://preview.redd.it/ve5tk92aqb7g1.jpg?width=692&format=pjpg&auto=webp&s=6e1edf72b4f45677759b78d7d9e73cd59aef20d2

**Qwen-Image-i2L - Image to Custom LoRA**

* First open-source tool converting single images into custom LoRAs.
* Enables personalized generation from minimal input.
* [ModelScope](https://modelscope.cn/models/DiffSynth-Studio/Qwen-Image-i2L/summary) | [Code](https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/qwen_image/model_inference_low_vram/Qwen-Image-i2L.py)

https://preview.redd.it/or5kkkhgqb7g1.jpg?width=1640&format=pjpg&auto=webp&s=dc88bd866947cf89a3a564832dfbae4253e5638b

**RealGen - Photorealistic Generation**

* Uses detector-guided rewards to improve text-to-image photorealism.
* Optimizes for perceptual realism beyond standard training.
* [Website](https://yejy53.github.io/RealGen/) | [Paper](https://huggingface.co/papers/2512.00473) | [GitHub](https://github.com/yejy53/RealGen?tab=readme-ov-file) | [Models](https://huggingface.co/lokiz666/Realgen-detection-models)

https://preview.redd.it/wpnnvh6iqb7g1.jpg?width=1200&format=pjpg&auto=webp&s=ae33b572b90d969db7655bb4dc948117149867a4

**Qwen 360 Diffusion - 360° Text-to-Image**

* State-of-the-art text-to-360° image generation.
* Best-in-class immersive content creation.
* [Hugging Face](https://huggingface.co/ProGamerGov/qwen-360-diffusion) | [Viewe](https://progamergov.github.io/html-360-viewer/)r

**Shots - Cinematic Multi-Angle Generation**

* Generates 9 cinematic camera angles from one image with consistency.
* Perfect visual coherence across different viewpoints.
* [Post](https://x.com/higgsfield_ai/status/1998895357707825503?s=20)

https://reddit.com/link/1pn1xym/video/2floylaoqb7g1/player

**Nano Banana Pro Solution(ComfyUI)**

* Efficient workflow generating 9 distinct 1K images from 1 prompt.
* \~3 cents per image with improved speed.
* [Post](https://x.com/hellorob/status/1999537115168636963?s=42)

https://reddit.com/link/1pn1xym/video/g8hk35mpqb7g1/player

Checkout the [full newsletter](https://open.substack.com/pub/thelivingedge/p/last-week-in-multimodal-ai-37-less?utm_campaign=post-expanded-share&utm_medium=web) for more demos, papers, and resources(couldnt add all the images/videos due to Reddit limit).

https://redd.it/1pn1xym
@rStableDiffusion
My LoRa "PONGO" is avaiable on CivitAi - Link in the first comment
https://redd.it/1pmzw3x
@rStableDiffusion
qwen image edit 2511!!!! Alibaba is cooking.
https://redd.it/1pn4it4
@rStableDiffusion