Сonsistency characters V0.4 | Generate characters only by image and prompt, without character's Lora! | IL\NoobAI Edit
https://redd.it/1okrsld
@rStableDiffusion
https://redd.it/1okrsld
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Сonsistency characters V0.4 | Generate characters only by image and prompt, without…
Explore this post and more from the StableDiffusion community
Update — FP4 Infrastructure Verified (Oct 31 2025)
Quick follow-up to my previous post about running SageAttention 3 on an RTX 5080 (Blackwell) under WSL2 + CUDA 13.0 + PyTorch 2.10 nightly.
After digging into the internal API, I confirmed that the hidden FP4 quantization hooks (scaleandquantfp4, enableblockscaledfp4attn, etc.) are fully implemented at the Python level — even though the low-level CUDA kernels are not yet active.
I built an experimental FP4 quantization layer and integrated it directly into nodesmodelloading.py.
The system initializes correctly, executes under Blackwell, and logs tensor output + VRAM profile with FP4 hooks active.
However, true FP4 compute isn’t yet functional, as the CUDA backend still defaults to FP8/FP16 paths.
---
Proof of Execution
attention mode override: sageattn3
FP4 quantization applied to transformer
FP4 API fallback to BF16/FP8 pipeline
Max allocated memory: 9.95 GB
Prompt executed in 341.08 seconds
---
Next Steps
Wait for full NV-FP4 exposure in future CUDA / PyTorch releases
Continue testing with non-quantized WAN 2.2 models
Publish an FP4-ready fork once reproducibility is verified
Full build logs and technical details are on GitHub:
Repository: github.com/k1n0F/sageattention3-blackwell-wsl2
https://redd.it/1oktwaz
@rStableDiffusion
Quick follow-up to my previous post about running SageAttention 3 on an RTX 5080 (Blackwell) under WSL2 + CUDA 13.0 + PyTorch 2.10 nightly.
After digging into the internal API, I confirmed that the hidden FP4 quantization hooks (scaleandquantfp4, enableblockscaledfp4attn, etc.) are fully implemented at the Python level — even though the low-level CUDA kernels are not yet active.
I built an experimental FP4 quantization layer and integrated it directly into nodesmodelloading.py.
The system initializes correctly, executes under Blackwell, and logs tensor output + VRAM profile with FP4 hooks active.
However, true FP4 compute isn’t yet functional, as the CUDA backend still defaults to FP8/FP16 paths.
---
Proof of Execution
attention mode override: sageattn3
FP4 quantization applied to transformer
FP4 API fallback to BF16/FP8 pipeline
Max allocated memory: 9.95 GB
Prompt executed in 341.08 seconds
---
Next Steps
Wait for full NV-FP4 exposure in future CUDA / PyTorch releases
Continue testing with non-quantized WAN 2.2 models
Publish an FP4-ready fork once reproducibility is verified
Full build logs and technical details are on GitHub:
Repository: github.com/k1n0F/sageattention3-blackwell-wsl2
https://redd.it/1oktwaz
@rStableDiffusion
GitHub
GitHub - k1n0F/sageattention3-blackwell-wsl2
Contribute to k1n0F/sageattention3-blackwell-wsl2 development by creating an account on GitHub.