My LORAs update (WAN/Z Image)
Hello everyone!
I'm not posting often nowadays but Z Image Turbo dropped and I wanted to hop on the bandwagon sooner than usual :)
I have already trained 5 Z Image Turbo loras (https://huggingface.co/malcolmrey/zimage/tree/main) and I intend to train a lot more (technically I could train 350 per week, but lets be realistic - I'll set up a goal of 100 per week and we will just be pleasantly surprised when I print out more :P)
In the past weeks I have been mainly printing WAN loras, you can check all my models at: https://huggingface.co/spaces/malcolmrey/browser
Those WAN loras are trained on 2.1 but they are compatible with WAN 2.2 and they are good for images and videos (regular t2v, i2v as well as vace and animate).
I have my workflows here: https://huggingface.co/datasets/malcolmrey/workflows/tree/main
The image samples are here: https://huggingface.co/datasets/malcolmrey/samples
Tutorials are here: https://huggingface.co/datasets/malcolmrey/various/tree/main
My CivitAI profile is here: https://civitai.com/user/malcolmrey (some remaining models and lots of articles)
My "database" contains of more than 1200 datasets, my current plan is to create at least 1000 loras for Z Image (this may change if some even greater model drops before I do so), all will be uploaded to huggingface and my MEGA page, some (those that I can) will be uploaded to CivitAI
You can request particular models on my coffee page.
Last time I did a trailer for my LoRAas and some people were salty, so this time I will just leave some stats:
On my huggingface you can find:
* 1338 SD 1.5 LoRAs
* 1330 SD 1.5 LoCoN (LyCORIS)
* 1338 SD 1.5 Embeddings (Textual Inversion)
* 381 Flux.1 Dev LoRAs
* 1127 WAN 2.1 LoRAs
* 17 SDXL LoRAs
* 5 Z Image LoRAs (and there will be more as I will focus on them :P)
Cheers and have a great start of the week!
https://redd.it/1payhth
@rStableDiffusion
Hello everyone!
I'm not posting often nowadays but Z Image Turbo dropped and I wanted to hop on the bandwagon sooner than usual :)
I have already trained 5 Z Image Turbo loras (https://huggingface.co/malcolmrey/zimage/tree/main) and I intend to train a lot more (technically I could train 350 per week, but lets be realistic - I'll set up a goal of 100 per week and we will just be pleasantly surprised when I print out more :P)
In the past weeks I have been mainly printing WAN loras, you can check all my models at: https://huggingface.co/spaces/malcolmrey/browser
Those WAN loras are trained on 2.1 but they are compatible with WAN 2.2 and they are good for images and videos (regular t2v, i2v as well as vace and animate).
I have my workflows here: https://huggingface.co/datasets/malcolmrey/workflows/tree/main
The image samples are here: https://huggingface.co/datasets/malcolmrey/samples
Tutorials are here: https://huggingface.co/datasets/malcolmrey/various/tree/main
My CivitAI profile is here: https://civitai.com/user/malcolmrey (some remaining models and lots of articles)
My "database" contains of more than 1200 datasets, my current plan is to create at least 1000 loras for Z Image (this may change if some even greater model drops before I do so), all will be uploaded to huggingface and my MEGA page, some (those that I can) will be uploaded to CivitAI
You can request particular models on my coffee page.
Last time I did a trailer for my LoRAas and some people were salty, so this time I will just leave some stats:
On my huggingface you can find:
* 1338 SD 1.5 LoRAs
* 1330 SD 1.5 LoCoN (LyCORIS)
* 1338 SD 1.5 Embeddings (Textual Inversion)
* 381 Flux.1 Dev LoRAs
* 1127 WAN 2.1 LoRAs
* 17 SDXL LoRAs
* 5 Z Image LoRAs (and there will be more as I will focus on them :P)
Cheers and have a great start of the week!
https://redd.it/1payhth
@rStableDiffusion
huggingface.co
malcolmrey/zimage at main
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Best ai image generator tool?
Hi all!! I'm a marketer who's been using AI tools for fashion and product content for a while now. The platform I've been on (Higgsfield) has gotten really unreliable lately, and I keep feeling like im being scammed with all kinds of fine print and tricks. I'm thinking of switching to something new.
Also nano banana pro stopped working completely and is "unlimited" but each generation takes a few hours...
I'm looking at Freepik, Hypermark, and Krea. I've seen some mentions here, but I'm not sure which ones are actually worth it... do all of them reset credits monthly or do other shady things?
Has anyone here tried these? Are any of them legitimately good? Thanks!
https://redd.it/1pb3ky1
@rStableDiffusion
Hi all!! I'm a marketer who's been using AI tools for fashion and product content for a while now. The platform I've been on (Higgsfield) has gotten really unreliable lately, and I keep feeling like im being scammed with all kinds of fine print and tricks. I'm thinking of switching to something new.
Also nano banana pro stopped working completely and is "unlimited" but each generation takes a few hours...
I'm looking at Freepik, Hypermark, and Krea. I've seen some mentions here, but I'm not sure which ones are actually worth it... do all of them reset credits monthly or do other shady things?
Has anyone here tried these? Are any of them legitimately good? Thanks!
https://redd.it/1pb3ky1
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
I prompted "the great leader" with Z-Image turbo and got this with every seed
https://redd.it/1pb3os7
@rStableDiffusion
https://redd.it/1pb3os7
@rStableDiffusion
A THIRD Alibaba AI Image model has dropped with demo!
Again new model! And it seems promising as a 7b parameter model it is.
(https://huggingface.co/spaces/AIDC-AI/Ovis-Image-7B)
https://huggingface.co/AIDC-AI/Ovis-Image-7B
about this model a little here:
Ovis-Image-7B achieves text-rendering performance rivaling 20B-scale models while maintaining a compact 7B footprint.
It demonstrates exceptional fidelity on text-heavy, layout-critical prompts, producing clean, accurate, and semantically aligned typography.
The model handles diverse fonts, sizes, and aspect ratios without degrading visual coherence.
Its efficient architecture enables deployment on a single high-end GPU, supporting responsive, low-latency use.
Overall, Ovis-Image-7B delivers near–frontier text-to-image capability within a highly accessible computational budget.
here is the space to use it right now!
https://huggingface.co/spaces/AIDC-AI/Ovis-Image-7B
and finally about the company who created this one:
AIDC-AI is the AI team at Alibaba International Digital Commerce Group. Here, we will open-source our research in the fields of language models, vision models, and multimodal models.
2026 will gonna be wild but still waiting for Z base and edit model though.
Please who has more tech knowledge share their reviews of this model.
https://redd.it/1pb9aps
@rStableDiffusion
Again new model! And it seems promising as a 7b parameter model it is.
(https://huggingface.co/spaces/AIDC-AI/Ovis-Image-7B)
https://huggingface.co/AIDC-AI/Ovis-Image-7B
about this model a little here:
Ovis-Image-7B achieves text-rendering performance rivaling 20B-scale models while maintaining a compact 7B footprint.
It demonstrates exceptional fidelity on text-heavy, layout-critical prompts, producing clean, accurate, and semantically aligned typography.
The model handles diverse fonts, sizes, and aspect ratios without degrading visual coherence.
Its efficient architecture enables deployment on a single high-end GPU, supporting responsive, low-latency use.
Overall, Ovis-Image-7B delivers near–frontier text-to-image capability within a highly accessible computational budget.
here is the space to use it right now!
https://huggingface.co/spaces/AIDC-AI/Ovis-Image-7B
and finally about the company who created this one:
AIDC-AI is the AI team at Alibaba International Digital Commerce Group. Here, we will open-source our research in the fields of language models, vision models, and multimodal models.
2026 will gonna be wild but still waiting for Z base and edit model though.
Please who has more tech knowledge share their reviews of this model.
https://redd.it/1pb9aps
@rStableDiffusion
huggingface.co
Ovis Image 7B - a Hugging Face Space by AIDC-AI
This application creates images based on text denoscriptions you provide. You can input a detailed text prompt, and the app will generate a corresponding high-quality image. You can also adjust setti...
I crashed Seedream V4’s API and the error log accidentally revealed their entire backend architecture (DiT model, PyTorch, Ray, A100/H100, custom pipeline)
I was testing Seedream V4 through their API and accidentally pushed a generation that completely crashed their backend due to GPU memory exhaustion.
Surprisingly, the API returned a *full internal error log*, and it basically reveals a lot about how Seedream works under the hood.
Here’s what the crash exposed:
# 1. They’re running a Diffusion Transformer (DiT) model
The log references a **“DiTPipeline”** and a generation stage called **“ditvae”**.
That naming doesn’t exist in any public repo, but the structure matches:
* Text encoder
* DiT core
* VAE decoder
This is extremely close to **Stable Diffusion 3’s architecture**, and also somewhat similar to **Flux**, although the naming (“ditvae”) feels more SD3-style.
# 2. It’s all built on top of PyTorch
The traceback includes clear PyTorch memory management data:
* 36 GB allocated by PyTorch
* 6 GB reserved/unallocated
* CUDA OOM during a 2 GB request
This is a pure PyTorch inferencing setup.
# 3. They orchestrate everything with Ray
The crash shows:
get_ray_engine().process(context)
ray_engine.py
queue_consumer.py
vefuser/core/role_manager
This means Seedream is distributing tasks across **Ray workers**, typical for large-scale GPU clusters.
# 4. They’re using A100/H100 GPUs (≈ 45–48 GB VRAM)
The log reveals the exact VRAM stats:
* **Total: 44.53 GB**
* Only \~1 GB was free
* The process was using **43.54 GB**
* Then it tried to allocate 2 GB more → *boom, crash*
A single inference using >40 GB of VRAM implies a **very large DiT model (10B+ parameters)**.
This is not SDXL territory – it’s SD3-class or larger.
# 5. “vefuser” appears to be their internal task fuser
The path `/opt/tiger/vefuser/...` suggests:
* “tiger” = internal platform codename
* “vefuser” = custom module for fusing and distributing workloads to GPU nodes
This is typical in high-load inference systems (think internal Meta/Google-like modules).
# 6. They use Euler as sampler
The log throws:
EulerError
Which means the sampler is Euler — very classical for Stable Diffusion-style pipelines.
# 7. My conclusion
Seedream V4 appears to be running:
**A proprietary or forked Diffusion Transformer architecture very close to SD3, with maybe some Flux-like components, deployed through Ray on A100/H100 infrastructure, with a custom inference pipeline (“ditvae”, “DiTPipeline”, “vefuser”).**
I haven’t seen anyone talk about this publicly, so maybe I'm the first one who got a crash log detailed enough to reverse-engineer the backend.
If anyone else has logs or insights, I’d love to compare.
Logs:
500 - "{\"error\":{\"code\":\"InternalServiceError\",\"message\":\"Request {{{redacted}}} failed: process task failure: stage: ditvae, location: 10.4.35.228:5000, error: task process error: Worker failed to complete request: request_id='{{{redacted}}}', error='DiTPipeline process failed: EulerError, error_code: 100202, message: do predict failed. err=CUDA out of memory. Tried to allocate 2.00 GiB. GPU 0 has a total capacity of 44.53 GiB of which 1003.94 MiB is free. Process 1733111 has 43.54 GiB memory in use. Of the allocated memory 36.01 GiB is allocated by PyTorch, and 6.12 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)', traceback: Traceback (most recent call last):\\n File \\\"/opt/tiger/vefuser/vefuser/core/role_manager/queue_consumer.py\\\", line 186, in process_task\\n result_context = get_ray_engine().process(context)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/tiger/vefuser/vefuser/core/engine/ray_engine.py\\\", line 247, in process\\n raise RayEngineProcessError(f\\\"Worker failed to complete request: {request_id=},
I was testing Seedream V4 through their API and accidentally pushed a generation that completely crashed their backend due to GPU memory exhaustion.
Surprisingly, the API returned a *full internal error log*, and it basically reveals a lot about how Seedream works under the hood.
Here’s what the crash exposed:
# 1. They’re running a Diffusion Transformer (DiT) model
The log references a **“DiTPipeline”** and a generation stage called **“ditvae”**.
That naming doesn’t exist in any public repo, but the structure matches:
* Text encoder
* DiT core
* VAE decoder
This is extremely close to **Stable Diffusion 3’s architecture**, and also somewhat similar to **Flux**, although the naming (“ditvae”) feels more SD3-style.
# 2. It’s all built on top of PyTorch
The traceback includes clear PyTorch memory management data:
* 36 GB allocated by PyTorch
* 6 GB reserved/unallocated
* CUDA OOM during a 2 GB request
This is a pure PyTorch inferencing setup.
# 3. They orchestrate everything with Ray
The crash shows:
get_ray_engine().process(context)
ray_engine.py
queue_consumer.py
vefuser/core/role_manager
This means Seedream is distributing tasks across **Ray workers**, typical for large-scale GPU clusters.
# 4. They’re using A100/H100 GPUs (≈ 45–48 GB VRAM)
The log reveals the exact VRAM stats:
* **Total: 44.53 GB**
* Only \~1 GB was free
* The process was using **43.54 GB**
* Then it tried to allocate 2 GB more → *boom, crash*
A single inference using >40 GB of VRAM implies a **very large DiT model (10B+ parameters)**.
This is not SDXL territory – it’s SD3-class or larger.
# 5. “vefuser” appears to be their internal task fuser
The path `/opt/tiger/vefuser/...` suggests:
* “tiger” = internal platform codename
* “vefuser” = custom module for fusing and distributing workloads to GPU nodes
This is typical in high-load inference systems (think internal Meta/Google-like modules).
# 6. They use Euler as sampler
The log throws:
EulerError
Which means the sampler is Euler — very classical for Stable Diffusion-style pipelines.
# 7. My conclusion
Seedream V4 appears to be running:
**A proprietary or forked Diffusion Transformer architecture very close to SD3, with maybe some Flux-like components, deployed through Ray on A100/H100 infrastructure, with a custom inference pipeline (“ditvae”, “DiTPipeline”, “vefuser”).**
I haven’t seen anyone talk about this publicly, so maybe I'm the first one who got a crash log detailed enough to reverse-engineer the backend.
If anyone else has logs or insights, I’d love to compare.
Logs:
500 - "{\"error\":{\"code\":\"InternalServiceError\",\"message\":\"Request {{{redacted}}} failed: process task failure: stage: ditvae, location: 10.4.35.228:5000, error: task process error: Worker failed to complete request: request_id='{{{redacted}}}', error='DiTPipeline process failed: EulerError, error_code: 100202, message: do predict failed. err=CUDA out of memory. Tried to allocate 2.00 GiB. GPU 0 has a total capacity of 44.53 GiB of which 1003.94 MiB is free. Process 1733111 has 43.54 GiB memory in use. Of the allocated memory 36.01 GiB is allocated by PyTorch, and 6.12 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)', traceback: Traceback (most recent call last):\\n File \\\"/opt/tiger/vefuser/vefuser/core/role_manager/queue_consumer.py\\\", line 186, in process_task\\n result_context = get_ray_engine().process(context)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/tiger/vefuser/vefuser/core/engine/ray_engine.py\\\", line 247, in process\\n raise RayEngineProcessError(f\\\"Worker failed to complete request: {request_id=},
docs.pytorch.org
CUDA semantics — PyTorch 2.9 documentation
A guide to torch.cuda, a PyTorch module to run CUDA operations
{error=}\\\")\\nvefuser.core.common.exceptions.RayEngineProcessError: Worker failed to complete request: request_id='{{{redacted}}}', error='DiTPipeline process failed: EulerError, error_code: 100202, message: do predict failed. err=CUDA out of memory. Tried to allocate 2.00 GiB. GPU 0 has a total capacity of 44.53 GiB of which 1003.94 MiB is free. Process 1733111 has 43.54 GiB memory in use. Of the allocated memory 36.01 GiB is allocated by PyTorch, and 6.12 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)'\\n Request id: {{{redacted}}}\",\"param\":\"\",\"type\":\"\"}}"
https://redd.it/1pbdo6j
@rStableDiffusion
https://redd.it/1pbdo6j
@rStableDiffusion
docs.pytorch.org
CUDA semantics — PyTorch 2.9 documentation
A guide to torch.cuda, a PyTorch module to run CUDA operations