I tested 11 AI image detectors on 1000+ images including SD 3.5. Here are the results.
Just finished largest test yet: **10 AI image detectors** tested on 1000+ images, 10000 checks in total.
# Key findings for Stable Diffusion users:
**The detectors that catch SD images best:**
|Detector|Overall Accuracy|False Positive Rate|
|:-|:-|:-|
|TruthScan|94.75%|0.80%|
|SightEngine|91.34%|1.20%|
|Was It AI|84.95%|7.97%|
|MyDetector|83.85%|5.50%|
**The detectors that struggle:**
|Detector|Overall Accuracy|Notes|
|:-|:-|:-|
|HF AI-image-detector|16.22%|Misses 75% of AI images|
|HF SDXL-detector|60.53%|Despite being trained for SDXL|
|Decopy|65.42%|Misses over 1/3 of AI content|
# The False Positive Problem
This is where it gets interesting for photographers and mixed-media artists:
* **Winston AI** flags **23.24%** of real photos as AI — nearly 1 in 4
* **AI or Not** flags **21.54%** — over 1 in 5
* **TruthScan** only flags **0.80%** — best in class
If you're using SD for art and worried about detection, know that:
1. The top detectors (TruthScan, SightEngine) will likely catch modern SD outputs
2. Some platforms use less accurate detectors — your mileage may vary
3. HuggingFace open-source detectors perform significantly worse than commercial ones
Test your own images: [https://aidetectarena.com/check](https://aidetectarena.com/check) — runs all available detectors simultaneously
https://redd.it/1qz9qu4
@rStableDiffusion
Just finished largest test yet: **10 AI image detectors** tested on 1000+ images, 10000 checks in total.
# Key findings for Stable Diffusion users:
**The detectors that catch SD images best:**
|Detector|Overall Accuracy|False Positive Rate|
|:-|:-|:-|
|TruthScan|94.75%|0.80%|
|SightEngine|91.34%|1.20%|
|Was It AI|84.95%|7.97%|
|MyDetector|83.85%|5.50%|
**The detectors that struggle:**
|Detector|Overall Accuracy|Notes|
|:-|:-|:-|
|HF AI-image-detector|16.22%|Misses 75% of AI images|
|HF SDXL-detector|60.53%|Despite being trained for SDXL|
|Decopy|65.42%|Misses over 1/3 of AI content|
# The False Positive Problem
This is where it gets interesting for photographers and mixed-media artists:
* **Winston AI** flags **23.24%** of real photos as AI — nearly 1 in 4
* **AI or Not** flags **21.54%** — over 1 in 5
* **TruthScan** only flags **0.80%** — best in class
If you're using SD for art and worried about detection, know that:
1. The top detectors (TruthScan, SightEngine) will likely catch modern SD outputs
2. Some platforms use less accurate detectors — your mileage may vary
3. HuggingFace open-source detectors perform significantly worse than commercial ones
Test your own images: [https://aidetectarena.com/check](https://aidetectarena.com/check) — runs all available detectors simultaneously
https://redd.it/1qz9qu4
@rStableDiffusion
AI Detector Arena
AI Image Checker - Multi-Detector Analysis | AI Detector Arena
Check if an image is AI-generated using 9 different detection services simultaneously. Get instant results from SightEngine, WasItAI, Hive, and more.
How to make Anime AI Gifs/Videos using Stable Diffusion/ComyUI?
Hello is there anyone here who knows how to make Anime AI gifs using either Web Forge UI/ ComfyUI In Stable Diffusion and would be willing to sit down and go step by step with me? Because literally every guide I have tried doesnt work and always gives a shit ton of errors. I would really appreciate it. I just do not know what to do anymore and I just know I need help.
https://redd.it/1qzfpz2
@rStableDiffusion
Hello is there anyone here who knows how to make Anime AI gifs using either Web Forge UI/ ComfyUI In Stable Diffusion and would be willing to sit down and go step by step with me? Because literally every guide I have tried doesnt work and always gives a shit ton of errors. I would really appreciate it. I just do not know what to do anymore and I just know I need help.
https://redd.it/1qzfpz2
@rStableDiffusion
Reddit
From the sdforall community on Reddit
Explore this post and more from the sdforall community
This media is not supported in your browser
VIEW IN TELEGRAM
ltx-2 I2V this one took me a few days to make properly, kept trying T2V and model kept adding phantom 3rd person on the bike, missing limbs, fused bodies with bike and it was hilarious, i2v fixed it, Heart Mula was used for the song klein9b for image.
https://redd.it/1qzffop
@rStableDiffusion
https://redd.it/1qzffop
@rStableDiffusion
Got tired of waiting for Qwen 2512 ControlNet support, so I made it myself! feedback needed.
After waiting forever for native support, I decided to just build it myself.
Good news for Qwen 2512 fans: The Qwen-Image-2512-Fun-Controlnet-Union model now works with the default ControlNet nodes in ComfyUI.
No extra nodes. No custom nodes. Just load it and go.
I've submitted a PR to the main ComfyUI repo: https://github.com/Comfy-Org/ComfyUI/pull/12359
Those who love Qwen 2512 can now have a lot more creative freedom. Enjoy!
https://redd.it/1qzht5h
@rStableDiffusion
After waiting forever for native support, I decided to just build it myself.
Good news for Qwen 2512 fans: The Qwen-Image-2512-Fun-Controlnet-Union model now works with the default ControlNet nodes in ComfyUI.
No extra nodes. No custom nodes. Just load it and go.
I've submitted a PR to the main ComfyUI repo: https://github.com/Comfy-Org/ComfyUI/pull/12359
Those who love Qwen 2512 can now have a lot more creative freedom. Enjoy!
https://redd.it/1qzht5h
@rStableDiffusion
GitHub
Add working Qwen 2512 ControlNet (Fun ControlNet) support by krigeta · Pull Request #12359 · Comfy-Org/ComfyUI
Summary
This PR adds full support for Qwen 2.5 Fun ControlNet format, enabling ControlNet functionality for Qwen image generation models.
Related Issues
Closes Please add support Qwen-Image-2512-F...
This PR adds full support for Qwen 2.5 Fun ControlNet format, enabling ControlNet functionality for Qwen image generation models.
Related Issues
Closes Please add support Qwen-Image-2512-F...
Is PatientX Comfyui Zluda removed? is it permanent? are there any alternatives?
https://redd.it/1qzd5ln
@rStableDiffusion
https://redd.it/1qzd5ln
@rStableDiffusion
Z-image base: simple workflow for high quality realism + info & tips
# What is this?
This is an almost copy-paste of a post I've made on Civitai (to explain the formatting).
Z-image base produces really, really realistic images. Aside from being creative & flexible the quality is also generally higher than the distils (as usual for non-distils), so it's worth using if you want really creative/flexible shots at the best possible quality. IMO it's the best model for realism out of the ones I've tried (Klein 9B base, Chroma, SDXL), especially because you can natively gen at high resolution.
This post is to share a simple starting workflow with good sampler/scheduler settings & resolutions pre-set for ease. There are also a bunch of tips for using Z-image base below and some general info you might find helpful.
The sampler settings are geared towards sharpness and clarity, but you can introduce grain and other defects through prompting.
You can grab the workflow from the Civitai link above or from here: pastebin
Here's a short album of example images, all of which were generated directly with this workflow with no further editing (SFW except for a couple of mild bikini shots): imgbb | g-drive
# Nodes & Models
Custom Nodes:
RES4LYF \- A very popular set of samplers & schedulers, and some very helpful nodes. These are needed to get the best z-image base outputs, IMO.
RGTHREE \- (Optional) A popular set of helper nodes. If you don't want this you can just delete the seed generator and lora stacker nodes, then use the default comfy lora nodes instead. RES4LYF comes with a seed generator node as well, I just like RGTHREE's more.
ComfyUI GGUF \- (Optional) Lets you load GGUF models, which for some reason ComfyUI still can't do natively. If you want to use a non-GGUF model you can just skip this, delete the UNET loader node and replace it with the normal 'load diffusion model' node.
Models:
Z-image base GGUFs \- BF16 recommended if you have 16GB+ VRAM. Q8 will just barely fit on 8GB VRAM if you know what you're doing. Q6_k will fit easily in 8GB. Avoid using FP8, the Q8 gguf is better.
Qwen 3 4B Text Encoder \- Grab the biggest one that fits in your VRAM. Some people say text encoder quality doesn't matter much & to use a lower sized one, but it absolutely does matter and can drastically affect quality. For the same reason, do not use an abliterated text encoder unless you've tested it and compared outputs to ensure the quality doesn't suffer.
Flux 1.0 VAE
# Info & Tips
## Sampler Settings
I've found that a two-stage sampler setup gives very good results for z-image base. The first stage does 95% of the work, and the second does a final little pass with a low noise scheduler to bring out fine details. It produces very clear, very realistic images and is particularly good at human skin.
CFG 4 works most of the time, but you can go up as high as CFG 7 to get different results.
Stage 1:
Sampler - res_2s
Scheduler - beta
Steps - 22
Denoise: 1.00
Stage 2:
Sampler - res_2s
Scheduler - normal
Steps - 3
Denoise: 0.15
## Resolutions
### High res generation
One of the best things about Z-image in general is that it can comfortably handle very high resolutions compared to other models. You can gen in high res and use an upscaler immediately without needing to do any other post-processing.
(info on upscalers + links to some good ones further below)
Note: high resolutions take a long time to gen. A 1280x1920 shot takes around \~95 seconds on
# What is this?
This is an almost copy-paste of a post I've made on Civitai (to explain the formatting).
Z-image base produces really, really realistic images. Aside from being creative & flexible the quality is also generally higher than the distils (as usual for non-distils), so it's worth using if you want really creative/flexible shots at the best possible quality. IMO it's the best model for realism out of the ones I've tried (Klein 9B base, Chroma, SDXL), especially because you can natively gen at high resolution.
This post is to share a simple starting workflow with good sampler/scheduler settings & resolutions pre-set for ease. There are also a bunch of tips for using Z-image base below and some general info you might find helpful.
The sampler settings are geared towards sharpness and clarity, but you can introduce grain and other defects through prompting.
You can grab the workflow from the Civitai link above or from here: pastebin
Here's a short album of example images, all of which were generated directly with this workflow with no further editing (SFW except for a couple of mild bikini shots): imgbb | g-drive
# Nodes & Models
Custom Nodes:
RES4LYF \- A very popular set of samplers & schedulers, and some very helpful nodes. These are needed to get the best z-image base outputs, IMO.
RGTHREE \- (Optional) A popular set of helper nodes. If you don't want this you can just delete the seed generator and lora stacker nodes, then use the default comfy lora nodes instead. RES4LYF comes with a seed generator node as well, I just like RGTHREE's more.
ComfyUI GGUF \- (Optional) Lets you load GGUF models, which for some reason ComfyUI still can't do natively. If you want to use a non-GGUF model you can just skip this, delete the UNET loader node and replace it with the normal 'load diffusion model' node.
Models:
Z-image base GGUFs \- BF16 recommended if you have 16GB+ VRAM. Q8 will just barely fit on 8GB VRAM if you know what you're doing. Q6_k will fit easily in 8GB. Avoid using FP8, the Q8 gguf is better.
Qwen 3 4B Text Encoder \- Grab the biggest one that fits in your VRAM. Some people say text encoder quality doesn't matter much & to use a lower sized one, but it absolutely does matter and can drastically affect quality. For the same reason, do not use an abliterated text encoder unless you've tested it and compared outputs to ensure the quality doesn't suffer.
Flux 1.0 VAE
# Info & Tips
## Sampler Settings
I've found that a two-stage sampler setup gives very good results for z-image base. The first stage does 95% of the work, and the second does a final little pass with a low noise scheduler to bring out fine details. It produces very clear, very realistic images and is particularly good at human skin.
CFG 4 works most of the time, but you can go up as high as CFG 7 to get different results.
Stage 1:
Sampler - res_2s
Scheduler - beta
Steps - 22
Denoise: 1.00
Stage 2:
Sampler - res_2s
Scheduler - normal
Steps - 3
Denoise: 0.15
## Resolutions
### High res generation
One of the best things about Z-image in general is that it can comfortably handle very high resolutions compared to other models. You can gen in high res and use an upscaler immediately without needing to do any other post-processing.
(info on upscalers + links to some good ones further below)
Note: high resolutions take a long time to gen. A 1280x1920 shot takes around \~95 seconds on
Civitai
Z-image base: simple workflow for high quality realism + tips & info - zimage_base_simple | ZImageBase Workflows | Civitai
What is this? This is a simple workflow for Z-image base that produces high quality, extremely realistic images at high native resolution. There ar...
an RTX 5090, and a 1680x1680 shot takes \~110 seconds.
### Different sizes & aspect ratios change the output
Different resolutions and aspect ratios can often drastically change the composition of images. If you're having trouble getting something ideal for a given prompt, try using a higher or lower resolution or changing the aspect ratio.
It will change the amount of detail in different areas of the image, make it more or less creative (depending on the topic), and will often change the lighting and other subtle features too.
I suggest generating in one big and one medium resolution whenever you're working on a concept, just to see if one of the sizes works better for it.
### Good resolutions
The workflow has a variety of pre-set resolutions that work very well. They're grouped by aspect ratio, and they're all divisible by 16. Z-image base (as with most image models) works best when dimensions are divisible by 16, and some models require it or else they mess up at the edges.
Here's a picture of the different resolutions if you don't want to download the workflow: imgbb | g-drive
You can go higher than 1920 to a side, but I haven't done it much so I'm not making any promises. Things do tend to get a bit weird when you go higher, but it is possible.
I do most of my generations at 1920 to a side, except for square images which I do at 1680x1680. I sometimes use a lower resolution if I like how it turns out more (e.g. the picture of the rat is 1680x1120).
## Realism Negative Prompt
The negative prompt matters a lot with z-image base. I use the following to get consistently good realism shots:
> 3D, ai generated, semi realistic, illustrated, drawing, comic, digital painting, 3D model, blender, video game screenshot, screenshot, render, high-fidelity, smooth textures, CGI, masterpiece, text, writing, subnoscript, watermark, logo, blurry, low quality, jpeg, artifacts, grainy
## Prompt Structure
You essentially just want to write clear, simple denoscriptions of the things you want to see. Your first sentence should be a basic intro to the subject of the shot, along with the style. From there you should describe the key features of the subject, then key features of other things in the scene, then the background. Then you can finish with compositional info, lighting & any other meta information about the shot.
Use new lines to separate key parts out to make it easier for you to read & build the prompt. The model doesn't care about new lines, they're just for you.
If something doesn't matter to you, don't include it. You don't need to specify the lighting if it doesn't matter, you don't need to precisely say how someone is posed, etc; just write what matters to you and slowly build the prompt out with more detail as needed.
You don't need to include parts that are implied by your negative prompt. If you're using the realism negative prompt I mentioned earlier, you don't usually need to specify that it's a photograph.
Your structure should look something like this (just an example, it's flexible):
> A <style> shot of a <subject + basic denoscription> doing <something>. The <subject> has <more detail>. The subject is <more info>. There is a <something else important> in <location>. The <something else> is <more detail>.
>
> The background is a <location>. The scene is <lit in some way>. The composition frames <something> and <something> from <an angle or photography term or whatever>.
Following that structure, here are a couple of the prompts for the images attached to this post. You can check the rest out by clicking on the images in Civitai, or just ask me for them in the comments.
The ballet woman
> A shot of a woman performing a ballet routine. She's wearing a ballet outfit and has a serious expression. She's in a dynamic pose.
>
> The scene is set in a concert hall. The composition is a close up that frames her head down to her knees. The scene is lit dramatically, with dark shadows and a
### Different sizes & aspect ratios change the output
Different resolutions and aspect ratios can often drastically change the composition of images. If you're having trouble getting something ideal for a given prompt, try using a higher or lower resolution or changing the aspect ratio.
It will change the amount of detail in different areas of the image, make it more or less creative (depending on the topic), and will often change the lighting and other subtle features too.
I suggest generating in one big and one medium resolution whenever you're working on a concept, just to see if one of the sizes works better for it.
### Good resolutions
The workflow has a variety of pre-set resolutions that work very well. They're grouped by aspect ratio, and they're all divisible by 16. Z-image base (as with most image models) works best when dimensions are divisible by 16, and some models require it or else they mess up at the edges.
Here's a picture of the different resolutions if you don't want to download the workflow: imgbb | g-drive
You can go higher than 1920 to a side, but I haven't done it much so I'm not making any promises. Things do tend to get a bit weird when you go higher, but it is possible.
I do most of my generations at 1920 to a side, except for square images which I do at 1680x1680. I sometimes use a lower resolution if I like how it turns out more (e.g. the picture of the rat is 1680x1120).
## Realism Negative Prompt
The negative prompt matters a lot with z-image base. I use the following to get consistently good realism shots:
> 3D, ai generated, semi realistic, illustrated, drawing, comic, digital painting, 3D model, blender, video game screenshot, screenshot, render, high-fidelity, smooth textures, CGI, masterpiece, text, writing, subnoscript, watermark, logo, blurry, low quality, jpeg, artifacts, grainy
## Prompt Structure
You essentially just want to write clear, simple denoscriptions of the things you want to see. Your first sentence should be a basic intro to the subject of the shot, along with the style. From there you should describe the key features of the subject, then key features of other things in the scene, then the background. Then you can finish with compositional info, lighting & any other meta information about the shot.
Use new lines to separate key parts out to make it easier for you to read & build the prompt. The model doesn't care about new lines, they're just for you.
If something doesn't matter to you, don't include it. You don't need to specify the lighting if it doesn't matter, you don't need to precisely say how someone is posed, etc; just write what matters to you and slowly build the prompt out with more detail as needed.
You don't need to include parts that are implied by your negative prompt. If you're using the realism negative prompt I mentioned earlier, you don't usually need to specify that it's a photograph.
Your structure should look something like this (just an example, it's flexible):
> A <style> shot of a <subject + basic denoscription> doing <something>. The <subject> has <more detail>. The subject is <more info>. There is a <something else important> in <location>. The <something else> is <more detail>.
>
> The background is a <location>. The scene is <lit in some way>. The composition frames <something> and <something> from <an angle or photography term or whatever>.
Following that structure, here are a couple of the prompts for the images attached to this post. You can check the rest out by clicking on the images in Civitai, or just ask me for them in the comments.
The ballet woman
> A shot of a woman performing a ballet routine. She's wearing a ballet outfit and has a serious expression. She's in a dynamic pose.
>
> The scene is set in a concert hall. The composition is a close up that frames her head down to her knees. The scene is lit dramatically, with dark shadows and a
ImgBB
zimage resolutions hosted at ImgBB
Image zimage resolutions in the Random non-content album
single shaft of light illuminating the woman from above.
The rat on the fence post
> A close up shot of a large, brown rat eating a berry. The rat is on a rickety wooden fence post. The background is an open farm field.
The woman in the water
> A surreal shot of a beautiful woman suspended half in water and half in air. She has a dynamic pose, her eyes are closed, and the shot is full body. The shot is split diagonally down the middle, with the lower-left being under water and the upper-right being in air. The air side is bright and cloudy, while the water side is dark and menacing.
The space capsule
> A woman is floating in a space capsule. She's wearing a white singlet and white panties. She's off-center, with the camera focused on a window with an external view of earth from space. The interior of the space capsule is dark.
## Upscaling
Z-image makes very sharp images, which means you can directly upscale them very easily. Conventional upscale models rely on sharp/clear images to add detail, so you can't reliably use them on a model that doesn't make sharp images.
My favourite upscaler for NAKED PEOPLE or human face close-ups is 4xFaceUp. It's ridiculously good at skin detail, but has a tendency to make everything else look a bit stringy (for lack of a better word). Use it when a human being showing lots of skin is the main focus of the shot.
Here's a 6720x6720 version of the sitting bikini girl that was upscaled directly using the 4xFaceUp upscaler: imgbb | g-drive
For general upscaling you can use something like 4xNomos2.
Alternatively, you can use SeedVR2, which also has the benefit of working on blurry images (not a problem with z-image anyway). It's not as good at human skin as 4xFaceUp, but it's better at everything else. It's also very reliable and pretty much always works. There's a simple workflow for it here: https://pastebin.com/9D7sjk3z
## ClownShark sampler - what is it?
It's a node from the RES4LYF pack. It works the same as a normal sampler, but with two differences:
1. "ETA". This setting basically adds extra noise during sampling using fancy math, and it generally helps get a little bit more detail out of generations. A value of 0.5 is usually good, but I've seen it be good up to 0.7 for certain models (like Klein 9B).
2. "bongmath". This setting turns on bongmath. It's some kind black magic that improves sampling results without any downsides. On some models it makes a big difference, others not so much. I find it does improve z-image outputs. Someone tries to explain what it is here: https://www.reddit.com/r/StableDiffusion/comments/1l5uh4d/someone\_needs\_to\_explain\_bongmath/
You don't need to use this sampler if you don't want to; you can use the res_2s/beta sampler/scheduler with a normal ksampler node as long as you have RES4LYF installed. But seeing as the clownshark sampler comes with RES4LYF anyway we may as well use it.
## Effect of CFG on outputs
Lower than 4 CFG is bad. Other than that, going higher has pretty big and unpredictable effects on the output for z-image base. You can usually range from 4 to 7 without destroying your image. It doesn't seem to affect prompt adherence much.
Going higher than 4 will change the lighting, composition and style of images somewhat unpredictably, so it can be helpful to do if you just want to see different variations on a concept. You'll find that some stuff just works better at 5, 6 or 7. Play around with it, but stick with 4 when you're just messing around.
Going higher than 4 also helps the model adhere to realism sometimes, which is handy if you're doing something realism-adjacent like trying to make a shot of a
The rat on the fence post
> A close up shot of a large, brown rat eating a berry. The rat is on a rickety wooden fence post. The background is an open farm field.
The woman in the water
> A surreal shot of a beautiful woman suspended half in water and half in air. She has a dynamic pose, her eyes are closed, and the shot is full body. The shot is split diagonally down the middle, with the lower-left being under water and the upper-right being in air. The air side is bright and cloudy, while the water side is dark and menacing.
The space capsule
> A woman is floating in a space capsule. She's wearing a white singlet and white panties. She's off-center, with the camera focused on a window with an external view of earth from space. The interior of the space capsule is dark.
## Upscaling
Z-image makes very sharp images, which means you can directly upscale them very easily. Conventional upscale models rely on sharp/clear images to add detail, so you can't reliably use them on a model that doesn't make sharp images.
My favourite upscaler for NAKED PEOPLE or human face close-ups is 4xFaceUp. It's ridiculously good at skin detail, but has a tendency to make everything else look a bit stringy (for lack of a better word). Use it when a human being showing lots of skin is the main focus of the shot.
Here's a 6720x6720 version of the sitting bikini girl that was upscaled directly using the 4xFaceUp upscaler: imgbb | g-drive
For general upscaling you can use something like 4xNomos2.
Alternatively, you can use SeedVR2, which also has the benefit of working on blurry images (not a problem with z-image anyway). It's not as good at human skin as 4xFaceUp, but it's better at everything else. It's also very reliable and pretty much always works. There's a simple workflow for it here: https://pastebin.com/9D7sjk3z
## ClownShark sampler - what is it?
It's a node from the RES4LYF pack. It works the same as a normal sampler, but with two differences:
1. "ETA". This setting basically adds extra noise during sampling using fancy math, and it generally helps get a little bit more detail out of generations. A value of 0.5 is usually good, but I've seen it be good up to 0.7 for certain models (like Klein 9B).
2. "bongmath". This setting turns on bongmath. It's some kind black magic that improves sampling results without any downsides. On some models it makes a big difference, others not so much. I find it does improve z-image outputs. Someone tries to explain what it is here: https://www.reddit.com/r/StableDiffusion/comments/1l5uh4d/someone\_needs\_to\_explain\_bongmath/
You don't need to use this sampler if you don't want to; you can use the res_2s/beta sampler/scheduler with a normal ksampler node as long as you have RES4LYF installed. But seeing as the clownshark sampler comes with RES4LYF anyway we may as well use it.
## Effect of CFG on outputs
Lower than 4 CFG is bad. Other than that, going higher has pretty big and unpredictable effects on the output for z-image base. You can usually range from 4 to 7 without destroying your image. It doesn't seem to affect prompt adherence much.
Going higher than 4 will change the lighting, composition and style of images somewhat unpredictably, so it can be helpful to do if you just want to see different variations on a concept. You'll find that some stuff just works better at 5, 6 or 7. Play around with it, but stick with 4 when you're just messing around.
Going higher than 4 also helps the model adhere to realism sometimes, which is handy if you're doing something realism-adjacent like trying to make a shot of a
ImgBB
beach girl 4xfaceup hosted at ImgBB
Image beach girl 4xfaceup in the Random non-content album