NEW BOT Телеграм, страница

QWEN Image Layers - Inherent Editability via Layer Decomposition

https://redd.it/1pq0s71
@rStableDiffusion

From the StableDiffusion community on Reddit: QWEN Image Layers - Inherent Editability via Layer Decomposition

Explore this post and more from the StableDiffusion community

8 views20:40

Photo Tinder

Hi, I got sick of trawling through images manually and using destructive processes to figure out which images to keep, which to throw away and which were best - so I vibe coded Photo Tinder with Claude (tested on OSX and Linux with no issues - windows available but untested).

Basically you have two modes

\- triage - which outputs rejected into one folder and accepted into the other -

\- ranking - which uses the glick algorithm to compare two photos and you pick the winner - the score gets updated and you repeat until your results are certain.

You have a browser which allows you to look at the rejected and accepted folders and filter them by ranking, recency etc...

Hope this is useful. Preparing datasets is hard - this tool makes it that much more easy.

https://github.com/relaxis/photo-tinder-desktop

https://redd.it/1ppwx68
@rStableDiffusion

GitHub

GitHub - relaxis/photo-tinder-desktop: Photo Tinder - Desktop app for image triage and ranking (Tauri)

Photo Tinder - Desktop app for image triage and ranking (Tauri) - relaxis/photo-tinder-desktop

8 views21:40

r/StableDiffusion

TwinFlow - Qwen Image with 2 steps.

https://redd.it/1pq3byz
@rStableDiffusion

From the StableDiffusion community on Reddit: TwinFlow - Qwen Image with 2 steps.

Explore this post and more from the StableDiffusion community

7 views22:40

r/StableDiffusion

6 views22:40

r/StableDiffusion

KLing released a video model few days ago MemFlow . Long 60s video generation ( Realtime 18 fps on a H100 GPU / ) lots of examples on project page
https://redd.it/1pq2uxb
@rStableDiffusion

7 views23:40

r/StableDiffusion

New incredibly fast realistic TTS: MiraTTS

Current TTS models are great but unfortunately, they either lack emotion/realism or speed. So I heavily optimized the finetuned LLM based TTS model: MiraTTS. It's extremely fast and great quality by using lmdeploy and FlashSR respectively.

The main benefits of this repo and model are

1. Extremely fast: Can reach speeds up to 100x realtime through lmdeploy and batching!
2. High quality: Generates 48khz clear audio(most other models generate 16khz-24khz audio which is lower quality) using FlashSR
3. Very low latency: Latency as low as 150ms from initial tests.
4. Very low vram usage: can be low as 6gb vram so great for local users.

I am planning on multilingual versions, native 48khz bicodec, and possibly multi-speaker models.

Github link: https://github.com/ysharma3501/MiraTTS

Model and non-cherrypicked examples link: https://huggingface.co/YatharthS/MiraTTS

Blog explaining llm tts models: https://huggingface.co/blog/YatharthS/llm-tts-models

I would very much appreciate stars or likes, thank you.

https://redd.it/1pq5t35
@rStableDiffusion

GitHub

GitHub - ysharma3501/MiraTTS: A high quality and fast TTS repository

A high quality and fast TTS repository. Contribute to ysharma3501/MiraTTS development by creating an account on GitHub.

7 views00:40

r/StableDiffusion

Z-Image-Turbo - Smartphone Snapshot Photo Reality - LoRa - Release

https://redd.it/1pqgjxy
@rStableDiffusion

From the StableDiffusion community on Reddit: Z-Image-Turbo - Smartphone Snapshot Photo Reality - LoRa - Release

Explore this post and more from the StableDiffusion community

6 views09:40

r/StableDiffusion

5 views09:40

About

Blog

Apps

Platform