My 4 stage upscale workflow to squeeze every drop from Z-Image Turbo
Workflow: [https://pastebin.com/b0FDBTGn](https://pastebin.com/b0FDBTGn)
ChatGPT Custom Instructions: [https://pastebin.com/qmeTgwt9](https://pastebin.com/qmeTgwt9)
I made [this](https://www.reddit.com/r/StableDiffusion/comments/1p80j9x/comment/nr1jak5/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button) comment on a separate thread a couple of days ago and I noticed that some of you guys were interested to learn more details
What I basically did is (and before I continue I must admit that this is not my idea. I am doing this since SD 1.5 and I don't remember where I borrowed the original idea from)
* Generate at a very low resolution, small enough to let the model draw an outline and then do a massive latent upscale with 0.7 denoise
* Adds a ton of details, sharper image and best quality (almost close to *I can jerk off to my own generated image* level)
I already shared that workflow with others in that same thread. I was reading through the comments and ideas that other's shared here and decided to double down on this approach
New and improved workflow:
* The one I am posting here is a 4 stage workflow. It starts by generating an image at 64x80 resolution
* Stage 1: Magic starts. We use a very low shift value here to give the model some breathing space and be creative - we don't want it to follow our prompt strictly here
* Stage 2: A high shift value so it follows our prompt and draws the composition. this is where it gets interesting. what you see here is what your final image will look like (from Stage 4) or maybe at least 90% resemblance. So, you can stop here if you don't like the composition. It barely takes a couple of seconds
* Stage 3: If you are satisfied with the composition, you can run stage 3. This is where we add details. We use a low shift value to give the model some breathing space. The composition will not change much because the denoise value is lower
* Stage 4: So you are happy with where the model is heading in terms of composition, lighting etc. run this stage and get the final image. Here we use shift value 7
What about CFG?
* Stage 1 to 3 uses CFG > 1. I also included a *ahmm very large* negative prompt in my workflow. It works for me and it does make a difference
Is it slow?
* Nope. The whole process (stage 1 to 4) still finishes in 1 minute or maximum 1 min 10 seconds (on my 4060ti) and you are greeted with a 1456x1840 image. You will not loose speed and you have the flexibility to bail out early if you don't like the composition
Seed variety?
* You get good seed variety with this workflow because you are forcing the model to generate something random but by following your prompt in stage 1. It will not generate the same 64x80 resolution image every time and combine this with low denoise values in each stage you get good variations
Important things to remember:
* Please do not use shift 7 for everything. You will kill the model's creativity and get the same boring image every single seed. Let it breath. Experiment with different values
* The 2nd pastebin link has the chatgpt instructions (Use GPT 4o, GPT 5 refuses to name the subjects - at least in my case) I use to get prompts.
* You can use it if you like. The important thing is (even if you use it or not), the first few keywords in your prompt should absolutely describe the scene briefly. Why? because we are generating at a very low resolution so we want the model to draw an outline first. If you describe it like *"oh there is a tree, its green, the climate is cool, bla bla bla, there is a man"*, the low res generation will give you a tree haha
If you have issues working with this workflow, just comment and I will assist. Feedback is welcome. Enjoy
https://redd.it/1paegb2
@rStableDiffusion
Workflow: [https://pastebin.com/b0FDBTGn](https://pastebin.com/b0FDBTGn)
ChatGPT Custom Instructions: [https://pastebin.com/qmeTgwt9](https://pastebin.com/qmeTgwt9)
I made [this](https://www.reddit.com/r/StableDiffusion/comments/1p80j9x/comment/nr1jak5/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button) comment on a separate thread a couple of days ago and I noticed that some of you guys were interested to learn more details
What I basically did is (and before I continue I must admit that this is not my idea. I am doing this since SD 1.5 and I don't remember where I borrowed the original idea from)
* Generate at a very low resolution, small enough to let the model draw an outline and then do a massive latent upscale with 0.7 denoise
* Adds a ton of details, sharper image and best quality (almost close to *I can jerk off to my own generated image* level)
I already shared that workflow with others in that same thread. I was reading through the comments and ideas that other's shared here and decided to double down on this approach
New and improved workflow:
* The one I am posting here is a 4 stage workflow. It starts by generating an image at 64x80 resolution
* Stage 1: Magic starts. We use a very low shift value here to give the model some breathing space and be creative - we don't want it to follow our prompt strictly here
* Stage 2: A high shift value so it follows our prompt and draws the composition. this is where it gets interesting. what you see here is what your final image will look like (from Stage 4) or maybe at least 90% resemblance. So, you can stop here if you don't like the composition. It barely takes a couple of seconds
* Stage 3: If you are satisfied with the composition, you can run stage 3. This is where we add details. We use a low shift value to give the model some breathing space. The composition will not change much because the denoise value is lower
* Stage 4: So you are happy with where the model is heading in terms of composition, lighting etc. run this stage and get the final image. Here we use shift value 7
What about CFG?
* Stage 1 to 3 uses CFG > 1. I also included a *ahmm very large* negative prompt in my workflow. It works for me and it does make a difference
Is it slow?
* Nope. The whole process (stage 1 to 4) still finishes in 1 minute or maximum 1 min 10 seconds (on my 4060ti) and you are greeted with a 1456x1840 image. You will not loose speed and you have the flexibility to bail out early if you don't like the composition
Seed variety?
* You get good seed variety with this workflow because you are forcing the model to generate something random but by following your prompt in stage 1. It will not generate the same 64x80 resolution image every time and combine this with low denoise values in each stage you get good variations
Important things to remember:
* Please do not use shift 7 for everything. You will kill the model's creativity and get the same boring image every single seed. Let it breath. Experiment with different values
* The 2nd pastebin link has the chatgpt instructions (Use GPT 4o, GPT 5 refuses to name the subjects - at least in my case) I use to get prompts.
* You can use it if you like. The important thing is (even if you use it or not), the first few keywords in your prompt should absolutely describe the scene briefly. Why? because we are generating at a very low resolution so we want the model to draw an outline first. If you describe it like *"oh there is a tree, its green, the climate is cool, bla bla bla, there is a man"*, the low res generation will give you a tree haha
If you have issues working with this workflow, just comment and I will assist. Feedback is welcome. Enjoy
https://redd.it/1paegb2
@rStableDiffusion
Pastebin
{ "id": "92112d97-bb64-4b44-86f2-ea5691ef8f6e", "revision": 0, "last_no - Pastebin.com
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.