Introduction to Stable Diffusion XL 0.9

Ng Wai Foong
6 min readJul 12, 2023

Improving Latent Diffusion Models for High-Resolution Image Synthesis

Image by the author

By reading this article, you will learn to generate high-resolution images using the new Stable Diffusion XL 0.9 architecture.

Note that this tutorial will be based on the diffusers package instead of the original implementation.

For your information, SDXL is a new pre-released latent diffusion model created by StabilityAI. Compared to the previous models (SD1.5, SD2.1, etc.), SDXL 0.9 has the following characteristics:

  • leverages a three times larger UNet backbone (more attention blocks)
  • has a second text encoder and tokenizer
  • trained on multiple aspect ratios
  • has a refinement model to improve the visual fidelity (post-hoc image to image)
  • latent image is 128 x 128 and final image resolution is 1024 x 1024
Image taken form HuggingFace

As illustrated in the image above, SDXL 0.9 comes with the following checkpoints:

--

--

Ng Wai Foong
Ng Wai Foong

Written by Ng Wai Foong

Senior AI Engineer@Yoozoo | Content Writer #NLP #datascience #programming #machinelearning | Linkedin: https://www.linkedin.com/in/wai-foong-ng-694619185/

Responses (1)