Introduction to Stable Diffusion XL 0.9
Improving Latent Diffusion Models for High-Resolution Image Synthesis
By reading this article, you will learn to generate high-resolution images using the new Stable Diffusion XL 0.9 architecture.
Note that this tutorial will be based on the
diffusers
package instead of the original implementation.
For your information, SDXL is a new pre-released latent diffusion model created by StabilityAI. Compared to the previous models (SD1.5, SD2.1, etc.), SDXL 0.9 has the following characteristics:
- leverages a three times larger UNet backbone (more attention blocks)
- has a second text encoder and tokenizer
- trained on multiple aspect ratios
- has a refinement model to improve the visual fidelity (post-hoc image to image)
- latent image is 128 x 128 and final image resolution is 1024 x 1024
As illustrated in the image above, SDXL 0.9 comes with the following checkpoints:
Text-to-Image (1024x1024 resolution)
: stabilityai/stable-diffusion-xl-base-0.9Image-to-Image / Refiner
…