r/StableDiffusion – Telegram
single shaft of light illuminating the woman from above.

The rat on the fence post

> A close up shot of a large, brown rat eating a berry. The rat is on a rickety wooden fence post. The background is an open farm field.

The woman in the water

> A surreal shot of a beautiful woman suspended half in water and half in air. She has a dynamic pose, her eyes are closed, and the shot is full body. The shot is split diagonally down the middle, with the lower-left being under water and the upper-right being in air. The air side is bright and cloudy, while the water side is dark and menacing.

The space capsule

> A woman is floating in a space capsule. She's wearing a white singlet and white panties. She's off-center, with the camera focused on a window with an external view of earth from space. The interior of the space capsule is dark.

## Upscaling

Z-image makes very sharp images, which means you can directly upscale them very easily. Conventional upscale models rely on sharp/clear images to add detail, so you can't reliably use them on a model that doesn't make sharp images.

My favourite upscaler for NAKED PEOPLE or human face close-ups is 4xFaceUp. It's ridiculously good at skin detail, but has a tendency to make everything else look a bit stringy (for lack of a better word). Use it when a human being showing lots of skin is the main focus of the shot.

Here's a 6720x6720 version of the sitting bikini girl that was upscaled directly using the 4xFaceUp upscaler: imgbb | g-drive

For general upscaling you can use something like 4xNomos2.

Alternatively, you can use SeedVR2, which also has the benefit of working on blurry images (not a problem with z-image anyway). It's not as good at human skin as 4xFaceUp, but it's better at everything else. It's also very reliable and pretty much always works. There's a simple workflow for it here: https://pastebin.com/9D7sjk3z

## ClownShark sampler - what is it?

It's a node from the RES4LYF pack. It works the same as a normal sampler, but with two differences:

1. "ETA". This setting basically adds extra noise during sampling using fancy math, and it generally helps get a little bit more detail out of generations. A value of 0.5 is usually good, but I've seen it be good up to 0.7 for certain models (like Klein 9B).
2. "bongmath". This setting turns on bongmath. It's some kind black magic that improves sampling results without any downsides. On some models it makes a big difference, others not so much. I find it does improve z-image outputs. Someone tries to explain what it is here: https://www.reddit.com/r/StableDiffusion/comments/1l5uh4d/someone\_needs\_to\_explain\_bongmath/

You don't need to use this sampler if you don't want to; you can use the res_2s/beta sampler/scheduler with a normal ksampler node as long as you have RES4LYF installed. But seeing as the clownshark sampler comes with RES4LYF anyway we may as well use it.

## Effect of CFG on outputs

Lower than 4 CFG is bad. Other than that, going higher has pretty big and unpredictable effects on the output for z-image base. You can usually range from 4 to 7 without destroying your image. It doesn't seem to affect prompt adherence much.

Going higher than 4 will change the lighting, composition and style of images somewhat unpredictably, so it can be helpful to do if you just want to see different variations on a concept. You'll find that some stuff just works better at 5, 6 or 7. Play around with it, but stick with 4 when you're just messing around.

Going higher than 4 also helps the model adhere to realism sometimes, which is handy if you're doing something realism-adjacent like trying to make a shot of a
realistic elf or something.

## Base vs Distil vs Turbo

They're good for different things. I'm generally a fan of base models, so most workflows I post are / will be for base models. Generally they give the highest quality but are much slower and can be finicky to use at times.

What is distillation?

It's basically a method of narrowing the focus of a model so that it converges on what you want faster and more consistently. This allows a distil to generate images in fewer steps and more consistently for whatever subject/topic was chosen. They often also come pre-negatived (in a sense, don't @ me) so that you can use 1.0 CFG and no negative prompt. Distils can be full models or simple loras.

The downside of this is that the model becomes more narrow, making it less creative and less capable outside of the areas it was focused on during distillation. For many models it also reduces the quality of image outputs, sometimes massively. Models like Qwen and Flux have god-awful quality when distilled (especially human skin), but luckily Z-image distils pretty well and only loses a little bit of quality. Generally, the fewer steps the distil needs the lower the quality is. 4-step distils usually have very poor quality compared to base, while 8+ step distils are usually much more balanced.

Z-image turbo is just an official distil, and it's focused on general realism and human-centric shots. It's also designed to run in around 10 steps, allowing it to maintain pretty high quality.

So, if you're just doing human-centric shots and don't mind a small quality drop, Z-image turbo will work just fine for you. You'll want to use a different workflow though - let me know if you'd like me to upload mine.

Below are the typical pros and cons of base models and distils. These are pretty much always true, but not always a 'big deal' depending on the model. As I said above, Z-image distils pretty well so it's not too bad, but be careful which one you use - tons of distils are terrible at human skin and make people look plastic (z-image turbo is fine).

Base model pros:

Generally gives the highest quality outputs with the finest details, once you get the hang of it
Creative and flexible

Base model cons:

Very slow
Usually requires a lengthy negative prompt to get good results
Creativity has a downside; you'll often need to generate something several times to get a result you like
More prone to mistakes when compared to the focus areas of distils
e.g. z-image base is more likely to mess up hands/fingers or distant faces compared to z-image turbo

Distil pros:

Fast generations
Good at whatever it was focused on (e.g. people-centric photography for z-image turbo)
Doesn't need a negative prompt (usually)

Distil cons:

Bad at whatever it wasn't focused on, compared to base
Usually bad at facial expressions (not able to do 'extreme' ones like anger properly)
Generally less creative, less flexible (not always a downside)
Lower quality images, sometimes by a lot and sometimes only by a little - depends on the model, the specific distil, and the subject matter
Can't have a negative prompt (usually)
You can get access to negative prompts using NAG (not covered in this post)

https://redd.it/1qzncrz
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
What is up with the "plastic mouthes" that LTX-2 Generates when using i2v with your own Audio? Info in comments.

https://redd.it/1qzqgbm
@rStableDiffusion
What’s the new model: Hype or real?
https://redd.it/1qzv46s
@rStableDiffusion
Ace Step 1.5 could open up a booming market for huge, comprehensive music LoRAs

I'm still settling into my initial Ace Step 1.5 setup, but I'm getting some pretty high-quality sound out of the software as I gain familiarity with the parameters and prompting conventions. All that's missing to bring Ace Step much closer to Udio is a huge database of the music we already like.

Personally, I'm not talking about - nor do I have any interest in - "ethical" databases.

I would be delighted to pay for well-trained LoRAs. I don't know how big Ace Step LoRAs can be or how many songs they can hold, or if they can be combined, but I'm eager to find out more about that stuff. As of yet I'm not even sure how to load/implement LoRAs but I'll figure it out.

It seems that training music LoRAs might be a bit more involved than training AI image LoRAs, so I don't know if I should expect to see a CivitAI-style gallery of frequent releases such as the huge & still growing collections of SDXL models and LoRAs.

Anyway, I'm really looking forward to what the community produces. I haven't been this excited about Music AI since I discovered Udio almost two years ago.

https://redd.it/1qzutkp
@rStableDiffusion