QWEN Image Layers - Inherent Editability via Layer Decomposition
https://redd.it/1pq0s71
@rStableDiffusion
https://redd.it/1pq0s71
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: QWEN Image Layers - Inherent Editability via Layer Decomposition
Explore this post and more from the StableDiffusion community
Photo Tinder
Hi, I got sick of trawling through images manually and using destructive processes to figure out which images to keep, which to throw away and which were best - so I vibe coded Photo Tinder with Claude (tested on OSX and Linux with no issues - windows available but untested).
Basically you have two modes
\- triage - which outputs rejected into one folder and accepted into the other -
\- ranking - which uses the glick algorithm to compare two photos and you pick the winner - the score gets updated and you repeat until your results are certain.
You have a browser which allows you to look at the rejected and accepted folders and filter them by ranking, recency etc...
Hope this is useful. Preparing datasets is hard - this tool makes it that much more easy.
https://github.com/relaxis/photo-tinder-desktop
https://redd.it/1ppwx68
@rStableDiffusion
Hi, I got sick of trawling through images manually and using destructive processes to figure out which images to keep, which to throw away and which were best - so I vibe coded Photo Tinder with Claude (tested on OSX and Linux with no issues - windows available but untested).
Basically you have two modes
\- triage - which outputs rejected into one folder and accepted into the other -
\- ranking - which uses the glick algorithm to compare two photos and you pick the winner - the score gets updated and you repeat until your results are certain.
You have a browser which allows you to look at the rejected and accepted folders and filter them by ranking, recency etc...
Hope this is useful. Preparing datasets is hard - this tool makes it that much more easy.
https://github.com/relaxis/photo-tinder-desktop
https://redd.it/1ppwx68
@rStableDiffusion
GitHub
GitHub - relaxis/photo-tinder-desktop: Photo Tinder - Desktop app for image triage and ranking (Tauri)
Photo Tinder - Desktop app for image triage and ranking (Tauri) - relaxis/photo-tinder-desktop
KLing released a video model few days ago MemFlow . Long 60s video generation ( Realtime 18 fps on a H100 GPU / ) lots of examples on project page
https://redd.it/1pq2uxb
@rStableDiffusion
https://redd.it/1pq2uxb
@rStableDiffusion
New incredibly fast realistic TTS: MiraTTS
Current TTS models are great but unfortunately, they either lack emotion/realism or speed. So I heavily optimized the finetuned LLM based TTS model: MiraTTS. It's extremely fast and great quality by using lmdeploy and FlashSR respectively.
The main benefits of this repo and model are
1. Extremely fast: Can reach speeds up to 100x realtime through lmdeploy and batching!
2. High quality: Generates 48khz clear audio(most other models generate 16khz-24khz audio which is lower quality) using FlashSR
3. Very low latency: Latency as low as 150ms from initial tests.
4. Very low vram usage: can be low as 6gb vram so great for local users.
I am planning on multilingual versions, native 48khz bicodec, and possibly multi-speaker models.
Github link: https://github.com/ysharma3501/MiraTTS
Model and non-cherrypicked examples link: https://huggingface.co/YatharthS/MiraTTS
Blog explaining llm tts models: https://huggingface.co/blog/YatharthS/llm-tts-models
I would very much appreciate stars or likes, thank you.
https://redd.it/1pq5t35
@rStableDiffusion
Current TTS models are great but unfortunately, they either lack emotion/realism or speed. So I heavily optimized the finetuned LLM based TTS model: MiraTTS. It's extremely fast and great quality by using lmdeploy and FlashSR respectively.
The main benefits of this repo and model are
1. Extremely fast: Can reach speeds up to 100x realtime through lmdeploy and batching!
2. High quality: Generates 48khz clear audio(most other models generate 16khz-24khz audio which is lower quality) using FlashSR
3. Very low latency: Latency as low as 150ms from initial tests.
4. Very low vram usage: can be low as 6gb vram so great for local users.
I am planning on multilingual versions, native 48khz bicodec, and possibly multi-speaker models.
Github link: https://github.com/ysharma3501/MiraTTS
Model and non-cherrypicked examples link: https://huggingface.co/YatharthS/MiraTTS
Blog explaining llm tts models: https://huggingface.co/blog/YatharthS/llm-tts-models
I would very much appreciate stars or likes, thank you.
https://redd.it/1pq5t35
@rStableDiffusion
GitHub
GitHub - ysharma3501/MiraTTS: A high quality and fast TTS repository
A high quality and fast TTS repository. Contribute to ysharma3501/MiraTTS development by creating an account on GitHub.
Z-Image-Turbo - Smartphone Snapshot Photo Reality - LoRa - Release
https://redd.it/1pqgjxy
@rStableDiffusion
https://redd.it/1pqgjxy
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Z-Image-Turbo - Smartphone Snapshot Photo Reality - LoRa - Release
Explore this post and more from the StableDiffusion community