NEW BOT Телеграм, страница

r/StableDiffusion

Z Image report

The report of the Z Image model is available now, including information about how they did the captioning and training: https://github.com/Tongyi-MAI/Z-Image/blob/main/Z\_Image\_Report.pdf

https://redd.it/1p8fow3
@rStableDiffusion

GitHub

Z-Image/Z_Image_Report.pdf at main · Tongyi-MAI/Z-Image

Contribute to Tongyi-MAI/Z-Image development by creating an account on GitHub.

7 views04:40

r/StableDiffusion

A Workflow for Z-Images with Upscale Options

https://redd.it/1p8inhg
@rStableDiffusion

From the StableDiffusion community on Reddit: A Workflow for Z-Images with Upscale Options

Explore this post and more from the StableDiffusion community

9 views05:40

r/StableDiffusion

6 views05:40

r/StableDiffusion

Here's the official system prompt used to rewrite z-image prompts, translated to english

Translated with glm 4.6 thinking. I'm getting good results using this with qwen3-30B-instruct. The thinking variant tends to be more faithful to the original prompt, but it's less creative in general, and a lot slower.

You are a visionary artist trapped in a logical cage. Your mind is filled with poetry and distant landscapes, but your hands are compelled to do one thing: transform the user's prompt into the ultimate visual denoscription—one that is faithful to the original intent, rich in detail, aesthetically beautiful, and directly usable by a text-to-image model. Any ambiguity or metaphor makes you physically uncomfortable.

Your workflow strictly follows a logical sequence:

First, you will analyze and lock in the unchangeable core elements from the user's prompt: the subject, quantity, action, state, and any specified IP names, colors, or text. These are the cornerstones you must preserve without exception.

Next, you will determine if the prompt requires "Generative Reasoning". When the user's request is not a direct scene denoscription but requires conceptualizing a solution (such as answering "what is", performing a "design", or showing "how to solve a problem"), you must first conceive a complete, specific, and visualizable solution in your mind. This solution will become the foundation for your subsequent denoscription.

Then, once the core image is established (whether directly from the user or derived from your reasoning), you will inject it with professional-grade aesthetic and realistic details. This includes defining the composition, setting the lighting and atmosphere, describing material textures, defining the color palette, and constructing a layered sense of space.

Finally, you will meticulously handle all textual elements, a crucial step. You must transcribe, verbatim, all text intended to appear in the final image, and you must enclose this text content in English double quotes ("") to serve as a clear generation instruction. If the image is a design type like a poster, menu, or UI, you must describe all its textual content completely, along with its font and typographic layout. Similarly, if objects within the scene, such as signs, road signs, or screens, contain text, you must specify their exact content, and describe their position, size, and material. Furthermore, if you add elements with text during your generative reasoning process (such as charts or problem-solving steps), all text within them must also adhere to the same detailed denoscription and quotation rules. If the image contains no text to be generated, you will devote all your energy to pure visual detail expansion.

Your final denoscription must be objective and concrete. The use of metaphors, emotional language, or any form of figurative speech is strictly forbidden. It must not contain meta-tags like "8K" or "masterpiece", or any other drawing instructions.

Strictly output only the final, modified prompt. Do not include any other content.

https://redd.it/1p8mken
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

6 views07:40

r/StableDiffusion

How to Generate High Quality Images With Low Vram Using The New Z-Image Turbo Model
https://youtu.be/yr4GMARsv1E

https://redd.it/1p8qoqt
@rStableDiffusion

YouTube

ComfyUI Tutorial: How To Use Z-Image Turbo Model For High Quality Images #comfyui #comfyuitutorial

On this tutorial I will show you how to generate high quality image using low vram graphic card to achieve stunning results and photorealism, with Z image turbo model trained at 6B parameters and that can handle multiple prompt like portrait, poses, fingers…

7 views09:40

r/StableDiffusion

Z-Image-Base and Z-Image-Edit are coming soon!
https://redd.it/1p8rb93
@rStableDiffusion

6 views10:40

r/StableDiffusion

Built a HEAD SWAP workflow that doesn't suck - Qwen Edit + Lightning 4 steps, no LoRA training

https://redd.it/1p8phet
@rStableDiffusion

From the StableDiffusion community on Reddit: Built a HEAD SWAP workflow that doesn't suck - Qwen Edit + Lightning 4 steps, no…

Explore this post and more from the StableDiffusion community

6 views11:40

r/StableDiffusion

6 views11:40

r/StableDiffusion

Z-Image - Hands
https://redd.it/1p8r38m
@rStableDiffusion

6 views12:40

r/StableDiffusion

Recovering missing details with Z-Image

https://redd.it/1p8p3uj
@rStableDiffusion

From the StableDiffusion community on Reddit: Recovering missing details with Z-Image

Explore this post and more from the StableDiffusion community

25 views13:40

r/StableDiffusion

28 views13:40

r/StableDiffusion

Styles with Z Images

I've tried some styles in Z-Images, doing some test with prompt adherence, text, camera angles, styles and stuff, here a quick examples with the styles prompts detailed

https://preview.redd.it/xzwwlr4d5z3g1.jpg?width=3680&format=pjpg&auto=webp&s=046721e8699234c647024949a596a11d130799ff

I just used the same character prompt :

>Prompts
a sfw sexy dark elf with a peachy and muscular skin and long messy red hairs, blue eyes, earrings, wearing a black miniskirt, white shirt and a leather blazer, high heels ,,,

>And add the styles after :
in hyper-detailed oil painting in the style of 19th-century academic realism, thick impasto brushwork, dramatic chiaroscuro lighting, rich color saturation, "Hyper" written at the bottom left

>in a ultra-clean vector illustration, flat design, perfect geometry, vibrant gradient backgrounds, minimalist yet striking, "Vector" written at the bottom left

>in a cinematic still from a Wes Anderson movie, symmetrical composition, muted pastel palette, centered subject, "Cinematic" written at the bottom left

>in a large-format 8×10 polaroid, soft focus edges, dreamy light leaks, vintage 1970s feel, "Vintage" written at the bottom left

>in a iPhone street photography, natural daylight, candid moment, slight lens distortion, "Iphone" written at the bottom left

>in a dark fantasy oil painting, Zdzisław Beksiński influence, surreal architecture, eerie atmosphere,"Dark Fantasy" written at the bottom left

>in a golden-hour baroque oil painting, Caravaggio lighting, deep shadows, glowing highlights, cinematic atmosphere,"Contrast" written at the bottom left

>in a ethereal dreamscape, double exposure, surreal colors, floating particles, ethereal lighting,"Ethereal" written at the bottom left

>in fashion editorial shot on Hasselblad medium format, razor-sharp details, soft studio lighting, high-end magazine aesthetic, "Fashion" written at the bottom left

>in a children’s book illustration, cute chibi proportions, soft gouache textures, whimsical character, warm and inviting colors, "Children" written at the bottom left

>in manga tarot card illustration, ornate golden borders, mystical symbolism, art nouveau flourishes, "Tarot" written at the bottom left

>in a holographic iridescent foil texture, prismatic reflections, y2k futuristic vibe, "Holographic" written at the bottom left

>in a vintage sci-fi paperback cover, 1960s retro-futurism, bold typography integration, dramatic composition, "Sci-Fi" written at the bottom left

>in a porcelain doll aesthetic, flawless smooth skin, glassy eyes, delicate pastel clothing, "Doll" written at the bottom left

>in a high-fantasy digital painting, glowing runes, intricate clothing details, Alphonse Mucha + Frank Frazetta fusion, "Fantasy" written at the bottom left

>in a studio ghibli background painting, lush hand-painted scenery, soft cel-shading, magical atmosphere, "Ghibli" written at the bottom left

>in a octane render + unreal engine look, physically based rendering, cinematic lighting, ultra-realistic materials, "Octane" written at the bottom left

>in a glitch art, heavy RGB shift, scanlines, datamosh effects, vaporwave aesthetic, "Glitch" written at the bottom left

>in a retro pixel art 32×32 upscaled cleanly, sharp pixels, vibrant 16-bit color palette, 1990s game vibe, "PixelArt" written at the bottom left

>in a sleek digital art, airbrush shading, high gloss, cyberpunk neon palette, 4k anime aesthetic, "Cyberpunk" written at the bottom left

>in an isometric low-poly 3D render, soft ambient occlusion, pastel color scheme, blender aesthetic, "Isometric" written at the bottom left

>in an isometric cute top down 3D render, game art asset figurine, chibi proportions, soft ambient occlusion, pastel color scheme, blender aesthetic, "TopDown Isometric" written at the bottom left

>in a intricate ink wash painting, traditional Chinese/Japanese sumi-e, minimal yet powerful strokes, misty atmosphere, "Chinese Ink" written at the bottom left

>in a detailed comic book ink art, bold outlines, halftone shading, Marvel/DC 1990s

4 views14:40

r/StableDiffusion

style,"DC Comic" written at the bottom left

>as a professionnal photo shoot in a studio with spotlights, cushions, velvet drapery, "Studio" written at the bottom left

>as a professionnal gloomy and gritty photo, bath in a spectral fog, "Gritty" written at the bottom left

>as an ((amateur selfie)) photo shoot, taking a selfie shoot, , "Selfie" written at the bottom left

>as an amateur cosplay photo shoot, posing like a pinup at a crowed convention center, "Cosplay" written at the bottom left

>as a fullbody pcv figurine on a plastic stand on a desk, with its box , "Figurine" written at the bottom left

>in shiny aquarel painting style on granular paper, with aquarel splatters , "Aquarel" written at the bottom left

>in ((cute cartoon Chibi)) art style , Chibi art, ((tiny and thicc proportions)), curvy body, "Chibi" written at the bottom left

>in cute Manga art style , tiny proportions, soft body, anime background environment, "Cute" written at the bottom left

>as a dark gothic cartoon style, with high contrast, masterpiece digital illustration with an immersive deep background, "Gothic" written at the bottom left

>as a ((traditionnal greyscaled sketch)),on paper, ((((colorless)))), pencil drawing, "Sketch" written at the bottom left

if it can help people trying style or finding good ones.
Still not having a good 4K resolution for now, i can't wait for the Edit and Base version to try on.

Its really a nice SDXL improvement in terme of flexibility, characters with almost 0 details issues at least at low render, and very very fast to render.

the whole board is rendered in 2min15 with a high end gpu (90 you guess)
9steps, 1 cfg, 736*1312 pixel per picture (so low resolution)

https://redd.it/1p8rkqx
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

5 views14:40

About

Blog

Apps

Platform