NEW BOT Телеграм, страница

6 views17:40

RTX 3090 24 GB VS RTX 5080 16GB

Hey, guys, I currently own an average computer with 32GB RAM and an RTX 3060, and I am looking to either buy a new PC or replace my old card with an RTX 3090 24GB. The new computer that I have in mind has an RTX 5080 16GB, and 64GB RAM.

I am just tired of struggling to use image models beyond XL (Flux, Qwen, Chroma), being unable to generate videos with Wan 2.2, and needing several hours to locally train a simple Lora for 1.5; training XL is out of the question. So what do you guys recommend to me?

How important is CPU RAM when using AI models? It is worth discarding the 3090 24GB for a new computer with twice my current RAM, but with a 5080 16GB?

https://redd.it/1oso8md
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

7 views18:40

r/StableDiffusion

Face Swaping Using Qwen Edit 2509+ Combined qwen Face To Person LORA, Consistent Edit LORA with Qwen Nunchaku Lora Loader
https://youtu.be/ZpXvrrJjYNg

https://redd.it/1osrz4n
@rStableDiffusion

YouTube

ComfyUI Tutorial: How To Do Face Swap Using Qwen Edit 2509 #comfyui #comfyuitutorial #qwenimage

On this tutorial I will show you how to do face swaping using qwen image Edit nunchaku version and 2 lora models, qwen face to person and qwen consistent edit, the models can generate people using crop image of our target face and based on that I tried to…

7 views19:40

r/StableDiffusion

FlatJustice Noob V-Pred model. I didn't know V-pred models are so good.

https://redd.it/1ossrqq
@rStableDiffusion

From the StableDiffusion community on Reddit: FlatJustice Noob V-Pred model. I didn't know V-pred models are so good.

Explore this post and more from the StableDiffusion community

11 views20:40

r/StableDiffusion

11 views20:40

r/StableDiffusion

Haven’t used SD in a while, is illustrious/pony still the go to or has there been better checkpoints lately?

Haven’t used sd for about several months since illustrious came out and I do and don’t like illustrious. Was curious on what everyone is using now?

Also would like to know if what video models everyone is using for local stuff?

https://redd.it/1osv278
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

10 views00:40

r/StableDiffusion

Pilates Princess Wan 2.2 LoRa

https://redd.it/1osxc88
@rStableDiffusion

From the StableDiffusion community on Reddit: Pilates Princess Wan 2.2 LoRa

Explore this post and more from the StableDiffusion community

10 views04:40

r/StableDiffusion

9 views04:40

r/StableDiffusion

Good Ai video generators that have "mid frame"?

So I've been using pixverse to create videos because it has a start, mid, and endframe option but I'm kind of struggling to get a certain aspect down.

For simplicity sake, say I'm trying to make a video of a character punching another character.

Start frame: Both characters in stances against eachother

Mid frame: Still of one character's fist colliding with the other character

End frame: Aftermath still of the punch with character knocked back

From what I can tell, it seems like whatever happens before and whatever happens after the midframe was generated separately and spliced together without using eachother for context, there is no constant momentum carried over the mid frame. As a result, there is a short period where the fist slows down until is barely moving as it touches the other character and after the midframe, the fist doesn't move.

Anyone figured out a way to preserve momentum before and after a frame you want to use?

https://redd.it/1ot3da3
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

10 views06:40

r/StableDiffusion

UniLumos: Fast and Unified Image and Video Relighting

https://github.com/alibaba-damo-academy/Lumos-Custom?tab=readme-ov-file

So many new releases set off my 'wtf are you talking about?' klaxon, so I've tried to paraphrase their jargon. Apologies if I'm misinterpreted it.

What does it do ?

UniLumos, a relighting framework for both images and videos that takes foreground objects and reinserts them into other backgrounds and relights them as appropriate to the new background. In effect making an intelligent green screen cutout that also grades the film .

iS iT fOr cOmFy ? aNd wHeN ?

No and ask on Github you lazy scamps

Is it any good ?

Like all AI , it's a tool for specific uses and some will work and some won't, if you try extreme examples, prepare to eat a box of 'Disappointment Donuts'. The examples (on Github) are for showing the relighting, not context.

Original

Processed

https://redd.it/1ota9tc
@rStableDiffusion

GitHub

GitHub - alibaba-damo-academy/Lumos-Custom: Lumos-Custom Project: research for customized video generation in the Lumos Project.

Lumos-Custom Project: research for customized video generation in the Lumos Project. - alibaba-damo-academy/Lumos-Custom

8 views10:40

r/StableDiffusion

A little overwhelmed with all the choices
https://redd.it/1otaj4v
@rStableDiffusion

9 views11:40

r/StableDiffusion

Is there a way to edit photos inside ComfyUI? like a photoshop node or something
https://redd.it/1otdzku
@rStableDiffusion

8 views14:40

r/StableDiffusion

Ovi 1.1 is now 10 seconds

https://reddit.com/link/1otllcy/video/gyspbbg91h0g1/player

The Ovi 1.1 now is 10 seconds! In addition,

1. We have simplified the audio denoscription tags from

Audio Denoscription: <AUDCAP>Audio denoscription here<ENDAUDCAP>

to

Audio Denoscription: Audio: Audio denoscription here

This makes prompt editing much easier.

2. We will also release a new 5-second base model checkpoint that was retrained using higher quality, 960x960p resolution videos, instead of the original Ovi 1.0 that was trained using 720x720p videos. The new 5-second base model also follows the simplified prompt above.

3. The 10-second video was trained using full bidirectional dense attention instead of causal or AR approach to ensure quality of generation.

We will release both 10-second & new 5-second weights very soon on our github repo - https://github.com/character-ai/Ovi

https://redd.it/1otllcy
@rStableDiffusion

GitHub

GitHub - character-ai/Ovi

Contribute to character-ai/Ovi development by creating an account on GitHub.

9 views18:40

r/StableDiffusion

The simplest workflow for Qwen-Image-Edit-2509 that simply works

I tried Qwen-Image-Edit-2509 and got the expected result. My workflow was actually simpler than standard, as I removed any of the image resize nodes. In fact, you shouldn’t use any resize node, since the TextEncodeQwenImageEditPlus function automatically resizes all connected input images ( nodes_qwen.py lines 89–96):

if vae is not None:
total = int(1024 1024)
scale_by = math.sqrt(total / (samples.shape[3] samples.shape2))
width = round(samples.shape3 scale_by / 8.0) 8
height = round(samples.shape2 scale_by / 8.0) 8
s = comfy.utils.commonupscale(samples, width, height, "area", "disabled")
reflatents.append(vae.encode(s.movedim(1, -1):, :, :, :3))

This screenshot example shows where I directly connected the input images to the node. It addresses most of the comments, potential misunderstandings, and complications mentioned at the other post.

Image editing \(changing clothes\) using Qwen-Image-Edit-2509 model

https://redd.it/1otityx
@rStableDiffusion

From the StableDiffusion community on Reddit: Trying to use Qwen image for inpainting, but it doesn't seem to work at all.

Explore this post and more from the StableDiffusion community

9 views19:40

r/StableDiffusion

[Release] New ComfyUI node – Step Audio EditX TTS

🎙️ ComfyUI-Step\_Audio\_EditX\_TTS: Zero-Shot Voice Cloning + Advanced Audio Editing

**TL;DR:** Clone any voice from 3-30 seconds of audio, then edit emotion, style, speed, and add effects—all while preserving voice identity. State-of-the-art quality, now in ComfyUI.

Currently recommend 10 -18 gb VRAM

[GitHub](https://github.com/Saganaki22/ComfyUI-Step_Audio_EditX_TTS) | [HF Model](https://huggingface.co/stepfun-ai/Step-Audio-EditX) | [Demo](https://stepaudiollm.github.io/step-audio-editx/) | [HF Spaces](https://huggingface.co/spaces/stepfun-ai/Step-Audio-EditX)

\---

This one brings Step Audio EditX to ComfyUI – state-of-the-art zero-shot voice cloning and audio editing. Unlike typical TTS nodes, this gives you two specialized nodes for different workflows:

[Clone on the left, Edit on the right](https://preview.redd.it/p33fzzhrzh0g1.png?width=1331&format=png&auto=webp&s=c5db8c5950bacd3b1ae91050bb26de52bb29b30c)

# What it does:

**🎤 Clone Node** – Zero-shot voice cloning from just 3-30 seconds of reference audio

* Feed it any voice sample + text trannoscript
* Generate unlimited new speech in that exact voice
* Smart longform chunking for texts over 2000 words (auto-splits and stitches seamlessly)
* Perfect for character voices, narration, voiceovers

**🎭 Edit Node** – Advanced audio editing while preserving voice identity

* **Emotions**: happy, sad, angry, excited, calm, fearful, surprised, disgusted
* **Styles**: whisper, gentle, serious, casual, formal, friendly
* **Speed control**: faster/slower with multiple levels
* **Paralinguistic effects**: `[Laughter]`, `[Breathing]`, `[Sigh]`, `[Gasp]`, `[Cough]`
* **Denoising**: clean up background noise or remove silence
* Multi-iteration editing for stronger effects (1=subtle, 5=extreme)

[voice clone + denoise & edit style exaggerated 1 iteration \/ float32](https://reddit.com/link/1otsbfb/video/m1c8m1nd5i0g1/player)

[voice clone + edit emotion admiration 1 iteration \/ float32](https://reddit.com/link/1otsbfb/video/dczqvi6vai0g1/player)

# Performance notes:

* Getting solid results on RTX 4090 with bfloat16 (\~11-14GB VRAM for clone, \~14-18GB for edit)
* Current quantization support (int8/int4) available but with quality trade-offs
* **Note: We're waiting on the Step AI research team to release official optimized quantized models for better lower-VRAM performance – will implement them as soon as they drop!**
* Multiple attention mechanisms (SDPA, Eager, Flash Attention, Sage Attention)
* Optional VRAM management – keeps model loaded for speed or unloads to free memory

# Quick setup:

* Install via ComfyUI Manager (search "Step Audio EditX TTS") or manually clone the repo
* Download **both** Step-Audio-EditX and Step-Audio-Tokenizer from HuggingFace
* Place them in `ComfyUI/models/Step-Audio-EditX/`
* Full folder structure and troubleshooting in the README

# Workflow ideas:

* Clone any voice → edit emotion/style for character variations
* Clean up noisy recordings with denoise mode
* Speed up/slow down existing audio without pitch shift
* Add natural-sounding paralinguistic effects to generated speech

[Advanced workflow with Whisper \/ trannoscription, clone + edit](https://preview.redd.it/wkc39r900i0g1.png?width=1379&format=png&auto=webp&s=557b8a0893fcbbb58dd957c299d8a3f8d6bed8e9)

The README has full parameter guides, VRAM recommendations, example settings, and troubleshooting tips. Works with all ComfyUI audio nodes.

If you find it useful, drop a ⭐ on GitHub

https://redd.it/1otsbfb
@rStableDiffusion

GitHub

GitHub - Saganaki22/ComfyUI-Step_Audio_EditX_TTS: ComfyUI nodes for Step Audio EditX - State-of-the-art zero-shot voice cloning…

ComfyUI nodes for Step Audio EditX - State-of-the-art zero-shot voice cloning and audio editing with emotion, style, speed control, and more. - Saganaki22/ComfyUI-Step_Audio_EditX_TTS

8 views22:40

r/StableDiffusion

Qwen-Image-Edit-2509 Photo-to-Anime comfyui workflow is out
https://redd.it/1ottbhz
@rStableDiffusion

7 views01:40

About

Blog

Apps

Platform