NEW BOT Телеграм, страница

r/StableDiffusion

I still find flux Kontext much better for image restauration once you get the intuition on prompting and preparing the images. Qwen edit ruins and changes way too much.

https://redd.it/1op7wv0
@rStableDiffusion

From the StableDiffusion community on Reddit: I still find flux Kontext much better for image restauration once you get the intuition…

Explore this post and more from the StableDiffusion community

9 views17:40

r/StableDiffusion

Messing with WAN 2.2 text-to-image

https://redd.it/1opd9y4
@rStableDiffusion

From the StableDiffusion community on Reddit: Messing with WAN 2.2 text-to-image

Explore this post and more from the StableDiffusion community

9 views20:40

r/StableDiffusion

7 views20:40

r/StableDiffusion

Qwen trained model wild examples both Realistic and Fantastic, Full step by step tutorial published, train with as low as 6 GB GPUs, Qwen can do amazing ultra complex prompts + emotions very well - Images generated with SwarmUI with our ultra easy to use presets - 1-Click to use

https://redd.it/1opivzh
@rStableDiffusion

From the sdforall community on Reddit: Qwen trained model wild examples both Realistic and Fantastic, Full step by step tutorial…

Explore this post and more from the sdforall community

7 views23:40

r/StableDiffusion

7 views23:40

r/StableDiffusion

[Release] New ComfyUI Node – Maya1_TTS 🎙️

Hey everyone! Just dropped a new ComfyUI node I've been working on – **ComfyUI-Maya1\_TTS** 🎙️

[https://github.com/Saganaki22/-ComfyUI-Maya1\_TTS](https://github.com/Saganaki22/-ComfyUI-Maya1_TTS)

This one runs the Maya1 TTS 3B model, an expressive voice TTS directly in ComfyUI. It's 1 all-in-one (AIO) node.

https://preview.redd.it/on7otvl7fizf1.png?width=1426&format=png&auto=webp&s=288d3e2ee0081fd789c7ae3c13f305f238e8a6e8

**What it does:**

* Natural language voice design (just describe the voice you want in plain text)
* 17+ emotion tags you can drop right into your text: `<laugh>`, `<gasp>`, `<whisper>`, `<cry>`, etc.
* Real-time generation with decent speed (I'm getting \~45 it/s on a 5090 with bfloat16 + SDPA)
* Built-in VRAM management and quantization support (4-bit/8-bit if you're tight on VRAM)
* Works with all ComfyUI audio nodes

**Quick setup note:**

* Flash Attention and Sage Attention are *optional* – use them if you like to experiment
* If you've got less than 10GB VRAM, I'd recommend installing `bitsandbytes` for 4-bit/8-bit support. Otherwise float16/bfloat16 works great and is actually faster.

Also, you can pair this with my [**dotWaveform node**](https://github.com/Saganaki22/ComfyUI-dotWaveform) if you want to visualize the speech output.

[Realistic male voice in the 30s age with american accent. Normal pitch, warm timbre, conversational pacing.](https://reddit.com/link/1oph2fi/video/w0ayr8gqiizf1/player)

[Realistic female voice in the 30s age with british accent. Normal pitch, warm timbre, conversational pacing.](https://reddit.com/link/1oph2fi/video/kal929sriizf1/player)

The README has a bunch of character voice examples if you need inspiration. Model downloads from HuggingFace, everything's detailed in the repo.

If you find it useful, toss the project a ⭐ on GitHub – helps a ton! 🙌

https://redd.it/1oph2fi
@rStableDiffusion

9 views00:40

About

Blog

Apps

Platform