r/StableDiffusion – Telegram
RTX 5080 + SageAttention 3 — 2K Video in 5.7 Minutes (WSL2, CUDA 13.0)

**Repository:** [github.com/k1n0F/sageattention3-blackwell-wsl2](https://github.com/k1n0F/sageattention3-blackwell-wsl2)

I’ve completed the full **SageAttention 3 Blackwell build** under **WSL2 + Ubuntu 22.04**, using **CUDA 13.0 / PyTorch 2.10.0-dev**.
The build runs stably inside **ComfyUI + WAN Video Wrapper** and fully detects the **FP4 quantization API**, compiled for Blackwell (SM\_120).

**Results:**

* 125 frames @ 1984×1120
* Runtime: 341 seconds (\~5.7 minutes)
* VRAM usage: 9.95 GB (max), 10.65 GB (reserved)
* FP4 API detected: `scale_and_quant_fp4`, `blockscaled_fp4_attn`, `fp4quant_cuda`
* Device: RTX 5080 (Blackwell SM\_120)
* Platform: WSL2 Ubuntu 22.04 + CUDA 13.0

# Summary

* Built **PyTorch 2.10.0-dev + CUDA 13.0** from source
* Compiled SageAttention3 with `TORCH_CUDA_ARCH_LIST="12.0+PTX"`
* Fixed all major issues: `-lcuda`, `allocator mismatch`, `checkPoolLiveAllocations`, `CUDA_HOME`, `Python.h`, missing module imports
* Verified presence of FP4 quantization and attention kernels (not yet used in inference)
* Achieved stable runtime under ComfyUI with full CUDA graph support

# Proof of Successful Build

attention mode override: sageattn3
tensor out (1, 8, 128, 64) torch.bfloat16 cuda:0
Max allocated memory: 9.953 GB
Comfy-VFI done — 125 frames generated
Prompt executed in 341.08 seconds


# Conclusion

This marks the **fully documented and stable SageAttention3 build for Blackwell (SM\_120)**,
compiled and executed entirely inside **WSL2**, **without official support**.
The FP4 infrastructure is fully present and verified, ready for future activation and testing.

https://redd.it/1ojosl5
@rStableDiffusion
What's the most technically advanced local model out there?

Just curious, which one of the models, architectures, etc that can be run on a PC is the most advanced from a technical point of view? Not asking for better images or more optimizations, but for a model that, say, uses something more powerful than clip encoders to associate prompts with images, or that incorporates multimodality, or any other trick that holds more promise than just perfecting the training dataset for a checkpoint.

https://redd.it/1ojgek3
@rStableDiffusion
Has anyone tried a new model FIBO?

https://huggingface.co/briaai/FIBO

https://huggingface.co/spaces/briaai/FIBO

The following is the official introduction forwarded

# What's FIBO?

Most text-to-image models excel at imagination—but not control. FIBO is built for professional workflows, not casual use. Trained on structured JSON captions up to 1,000+ words, FIBO enables precise, reproducible control over lighting, composition, color, and camera settings. The structured captions foster native disentanglement, allowing targeted, iterative refinement without prompt drift. With only 8B parameters, FIBO delivers high image quality, strong prompt adherence, and professional-grade control—trained exclusively on licensed data.

https://redd.it/1ojsdji
@rStableDiffusion
UDIO just got nuked by UMG.

I know this is not an open source tool, but there are some serious implications for the whole AI generative community. Basically:

UDIO settled with UMG and ninja rolled out a new TOS that PROHIBITS you from:

1. Downloading generated songs.
2. Owning a copy of any generated song on ANY of your devices.

The TOS is working retroactively. You can no longer download songs generated under old TOS, which allowed free personal and commercial use.

What is worth noting, udio was not only a purely generative tool, many musicans uploaded their own music, to modify and enchance it, given the ability to separate stems. People lost months of work overnight.

https://redd.it/1ojvjh3
@rStableDiffusion