I still find flux Kontext much better for image restauration once you get the intuition on prompting and preparing the images. Qwen edit ruins and changes way too much.
https://redd.it/1op7wv0
@rStableDiffusion
https://redd.it/1op7wv0
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: I still find flux Kontext much better for image restauration once you get the intuition…
Explore this post and more from the StableDiffusion community
Qwen trained model wild examples both Realistic and Fantastic, Full step by step tutorial published, train with as low as 6 GB GPUs, Qwen can do amazing ultra complex prompts + emotions very well - Images generated with SwarmUI with our ultra easy to use presets - 1-Click to use
https://redd.it/1opivzh
@rStableDiffusion
https://redd.it/1opivzh
@rStableDiffusion
Reddit
From the sdforall community on Reddit: Qwen trained model wild examples both Realistic and Fantastic, Full step by step tutorial…
Explore this post and more from the sdforall community
[Release] New ComfyUI Node – Maya1_TTS 🎙️
Hey everyone! Just dropped a new ComfyUI node I've been working on – **ComfyUI-Maya1\_TTS** 🎙️
[https://github.com/Saganaki22/-ComfyUI-Maya1\_TTS](https://github.com/Saganaki22/-ComfyUI-Maya1_TTS)
This one runs the Maya1 TTS 3B model, an expressive voice TTS directly in ComfyUI. It's 1 all-in-one (AIO) node.
https://preview.redd.it/on7otvl7fizf1.png?width=1426&format=png&auto=webp&s=288d3e2ee0081fd789c7ae3c13f305f238e8a6e8
**What it does:**
* Natural language voice design (just describe the voice you want in plain text)
* 17+ emotion tags you can drop right into your text: `<laugh>`, `<gasp>`, `<whisper>`, `<cry>`, etc.
* Real-time generation with decent speed (I'm getting \~45 it/s on a 5090 with bfloat16 + SDPA)
* Built-in VRAM management and quantization support (4-bit/8-bit if you're tight on VRAM)
* Works with all ComfyUI audio nodes
**Quick setup note:**
* Flash Attention and Sage Attention are *optional* – use them if you like to experiment
* If you've got less than 10GB VRAM, I'd recommend installing `bitsandbytes` for 4-bit/8-bit support. Otherwise float16/bfloat16 works great and is actually faster.
Also, you can pair this with my [**dotWaveform node**](https://github.com/Saganaki22/ComfyUI-dotWaveform) if you want to visualize the speech output.
[Realistic male voice in the 30s age with american accent. Normal pitch, warm timbre, conversational pacing.](https://reddit.com/link/1oph2fi/video/w0ayr8gqiizf1/player)
[Realistic female voice in the 30s age with british accent. Normal pitch, warm timbre, conversational pacing.](https://reddit.com/link/1oph2fi/video/kal929sriizf1/player)
The README has a bunch of character voice examples if you need inspiration. Model downloads from HuggingFace, everything's detailed in the repo.
If you find it useful, toss the project a ⭐ on GitHub – helps a ton! 🙌
https://redd.it/1oph2fi
@rStableDiffusion
Hey everyone! Just dropped a new ComfyUI node I've been working on – **ComfyUI-Maya1\_TTS** 🎙️
[https://github.com/Saganaki22/-ComfyUI-Maya1\_TTS](https://github.com/Saganaki22/-ComfyUI-Maya1_TTS)
This one runs the Maya1 TTS 3B model, an expressive voice TTS directly in ComfyUI. It's 1 all-in-one (AIO) node.
https://preview.redd.it/on7otvl7fizf1.png?width=1426&format=png&auto=webp&s=288d3e2ee0081fd789c7ae3c13f305f238e8a6e8
**What it does:**
* Natural language voice design (just describe the voice you want in plain text)
* 17+ emotion tags you can drop right into your text: `<laugh>`, `<gasp>`, `<whisper>`, `<cry>`, etc.
* Real-time generation with decent speed (I'm getting \~45 it/s on a 5090 with bfloat16 + SDPA)
* Built-in VRAM management and quantization support (4-bit/8-bit if you're tight on VRAM)
* Works with all ComfyUI audio nodes
**Quick setup note:**
* Flash Attention and Sage Attention are *optional* – use them if you like to experiment
* If you've got less than 10GB VRAM, I'd recommend installing `bitsandbytes` for 4-bit/8-bit support. Otherwise float16/bfloat16 works great and is actually faster.
Also, you can pair this with my [**dotWaveform node**](https://github.com/Saganaki22/ComfyUI-dotWaveform) if you want to visualize the speech output.
[Realistic male voice in the 30s age with american accent. Normal pitch, warm timbre, conversational pacing.](https://reddit.com/link/1oph2fi/video/w0ayr8gqiizf1/player)
[Realistic female voice in the 30s age with british accent. Normal pitch, warm timbre, conversational pacing.](https://reddit.com/link/1oph2fi/video/kal929sriizf1/player)
The README has a bunch of character voice examples if you need inspiration. Model downloads from HuggingFace, everything's detailed in the repo.
If you find it useful, toss the project a ⭐ on GitHub – helps a ton! 🙌
https://redd.it/1oph2fi
@rStableDiffusion