better\)](https://preview.redd.it/jvg50yhy3gxf1.png?width=600&format=png&auto=webp&s=0fb3bf71073a362921b1ec1ef3eace36950f9412)
For first-time renders, the gap between the systems is also influenced by the disk speed. For the particular systems I have, the disks are not particularly fast and I'm certain there would be other enthusiasts who can load models a lot faster.
**Render Duration (Subsequent Runs)**
After the model is cached into memory, the subsequent passes would be significantly faster. Note that for DGX Spark we should set \`--highvram\` to maximize the use of the coherent memory and to increase the likelihood of retaining the model in memory. Its observed for some models, omitting this flag for the DGX Spark may result in significantly poorer performance for subsequent runs (especially for Qwen Image Edit).
The following chart illustrates the time taken in seconds complete a batch size of 1. Multiple passes were conducted until a steady state is reached.
[Render duration in seconds \(lower is better\)](https://preview.redd.it/llc7b0h84gxf1.png?width=600&format=png&auto=webp&s=65fbe1ae55cc7917d87b02fb9b4c41bbe25c69c1)
We can also infer the relative GPU compute performance between the two systems based on the iteration speed
[Iterations per second \(higher is better\)](https://preview.redd.it/7vn0vz4g4gxf1.png?width=600&format=png&auto=webp&s=904264194ced1f87cb4c152797c595d4e92bbbf0)
Overall we can infer that:
* The DGX Spark render duration is around 3.06 times slower, and the gap widens when using larger model
* The RTX 5090 compute performance is around 3.18 times faster
While the DGX Spark is not as fast as the Blackwell desktop GPU, its performance puts it close in performance to a RTX3090 for diffusion tasks, but having access to a much larger amount of memory.
**Notes**
* This is not a sponsored review, I paid for it with my own money.
* I do not have a second DGX Spark to try nccl with, because the shop I bought the DGX Spark no longer have any left in stock. Otherwise I would probably be toying with Hunyuan Image 3.0.
* I do not have access to a Strix Halo machine so don't ask me to compare it with that.
* I do have a M4 Max Macbook but I gave up waiting after 10 minutes for some of the larger models.
https://redd.it/1ogjjlj
@rStableDiffusion
For first-time renders, the gap between the systems is also influenced by the disk speed. For the particular systems I have, the disks are not particularly fast and I'm certain there would be other enthusiasts who can load models a lot faster.
**Render Duration (Subsequent Runs)**
After the model is cached into memory, the subsequent passes would be significantly faster. Note that for DGX Spark we should set \`--highvram\` to maximize the use of the coherent memory and to increase the likelihood of retaining the model in memory. Its observed for some models, omitting this flag for the DGX Spark may result in significantly poorer performance for subsequent runs (especially for Qwen Image Edit).
The following chart illustrates the time taken in seconds complete a batch size of 1. Multiple passes were conducted until a steady state is reached.
[Render duration in seconds \(lower is better\)](https://preview.redd.it/llc7b0h84gxf1.png?width=600&format=png&auto=webp&s=65fbe1ae55cc7917d87b02fb9b4c41bbe25c69c1)
We can also infer the relative GPU compute performance between the two systems based on the iteration speed
[Iterations per second \(higher is better\)](https://preview.redd.it/7vn0vz4g4gxf1.png?width=600&format=png&auto=webp&s=904264194ced1f87cb4c152797c595d4e92bbbf0)
Overall we can infer that:
* The DGX Spark render duration is around 3.06 times slower, and the gap widens when using larger model
* The RTX 5090 compute performance is around 3.18 times faster
While the DGX Spark is not as fast as the Blackwell desktop GPU, its performance puts it close in performance to a RTX3090 for diffusion tasks, but having access to a much larger amount of memory.
**Notes**
* This is not a sponsored review, I paid for it with my own money.
* I do not have a second DGX Spark to try nccl with, because the shop I bought the DGX Spark no longer have any left in stock. Otherwise I would probably be toying with Hunyuan Image 3.0.
* I do not have access to a Strix Halo machine so don't ask me to compare it with that.
* I do have a M4 Max Macbook but I gave up waiting after 10 minutes for some of the larger models.
https://redd.it/1ogjjlj
@rStableDiffusion
Wan2.1 Mocha Video Character One-Click Replacement
https://reddit.com/link/1ogkacm/video/5banxduzggxf1/player
Workflow download:
https://civitai.com/models/2075972?modelVersionId=2348984
Project address:https://orange-3dv-team.github.io/MoCha/
Controllable video character replacement with a user-provided one remains a challenging problem due to the lack of qualified paired-video data. Prior works have predominantly adopted a reconstruction-based paradigm reliant on per-frame masks and explicit structural guidance (e.g., pose, depth). This reliance, however, renders them fragile in complex scenarios involving occlusions, rare poses, character-object interactions, or complex illumination, often resulting in visual artifacts and temporal discontinuities. In this paper, we propose MoCha, a novel framework that bypasses these limitations, which requires only a single first-frame mask and re-renders the character by unifying different conditions into a single token stream. Further, MoCha adopts a condition-aware RoPE to support multi-reference images and variable-length video generation. To overcome the data bottleneck, we construct a comprehensive data synthesis pipeline to collect qualified paired-training videos. Extensive experiments show that our method substantially outperforms existing state-of-the-art approaches.
https://redd.it/1ogkacm
@rStableDiffusion
https://reddit.com/link/1ogkacm/video/5banxduzggxf1/player
Workflow download:
https://civitai.com/models/2075972?modelVersionId=2348984
Project address:https://orange-3dv-team.github.io/MoCha/
Controllable video character replacement with a user-provided one remains a challenging problem due to the lack of qualified paired-video data. Prior works have predominantly adopted a reconstruction-based paradigm reliant on per-frame masks and explicit structural guidance (e.g., pose, depth). This reliance, however, renders them fragile in complex scenarios involving occlusions, rare poses, character-object interactions, or complex illumination, often resulting in visual artifacts and temporal discontinuities. In this paper, we propose MoCha, a novel framework that bypasses these limitations, which requires only a single first-frame mask and re-renders the character by unifying different conditions into a single token stream. Further, MoCha adopts a condition-aware RoPE to support multi-reference images and variable-length video generation. To overcome the data bottleneck, we construct a comprehensive data synthesis pipeline to collect qualified paired-training videos. Extensive experiments show that our method substantially outperforms existing state-of-the-art approaches.
https://redd.it/1ogkacm
@rStableDiffusion
Civitai
Wan2.1 Mocha Video Character One-Click Replacement - v1.0 | Stable Diffusion Workflows | Civitai
You can click on the link below and try it out directly. If the effect is good, you can deploy it locally https://www.runninghub.ai/post/1981739348...
Media is too big
VIEW IN TELEGRAM
Introducing The Arca Gidan Prize, an art competition focused on open models. It's an excuse to push yourself + models, but 4 winners get to fly to Hollywood to show their piece - sponsored by Comfy/Banodoco
https://redd.it/1ogrpmt
@rStableDiffusion
https://redd.it/1ogrpmt
@rStableDiffusion
Wan 2.2 - Why the '' slow '' motion ?
Hi,
Every video I'm generating using Wan 2.2 has somehow '' slow '' motion, this is an easy tell that the video is generated.
Is there a way to have faster movements that look more natural ?
https://redd.it/1ogt5ug
@rStableDiffusion
Hi,
Every video I'm generating using Wan 2.2 has somehow '' slow '' motion, this is an easy tell that the video is generated.
Is there a way to have faster movements that look more natural ?
https://redd.it/1ogt5ug
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Holy crap. Form me Chroma Radiance is like 10 times better than qwen.
https://redd.it/1ogwi51
@rStableDiffusion
https://redd.it/1ogwi51
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Holy crap. Form me Chroma Radiance is like 10 times better than qwen.
Explore this post and more from the StableDiffusion community