Member-only story
SDXL 0.9 with Ensemble of Expert Denoisers
Enhanced Control of the Inference Process in SDXL 0.9
The topic for today is about using both the base and refiner models of SDLXL as an ensemble of expert of denoisers. This concept was first proposed in the eDiff-I paper and was brought forward to the diffusers
package by the community contributors.
Theoretically, the base model will serve as the expert for the high-noise diffusion stage and the refiner model will serve as the expert for the low-noise diffusion stage. In layman’s term, the base model is used to generate text-aligned content while the refiner model is used for high visual fidelity generation.
Compared to the standard method, this concept requires less overall denoising steps, which means that the image generation process should be significantly faster. Also, it can be used to create different variations of the same image.
The drawback is that will still be heavily denoised and there is no way to inspect the output of the base model.
Note that this tutorial is based on the
diffusers
package and only works with version0.19.0.dev0
and above. SDXL 0.9 requires at least a 12GB GPU for full inference with both the base and refiner models. For more information on SDXL 0.9, kindly read the Introduction to Stable Diffusion XL 0.9 article.