How to Fine-tune SDXL using LoRA

Ng Wai Foong
7 min readAug 7, 2023

Personalized text-to-image generation with custom datasets

Image by the author

Previously, I have covered an article on How to Fine-tune SDXL 0.9 using Dreambooth LoRA. For your information, Dreambooth is a specialized method which requires only a few images to create personalized subject or style. It works really well for single subject or style image generation.

Note that some of the frameworks do support Dreambooth training with image-captions pairs datasets. Kindly refer to the corresponding repositories for more information.

This tutorial covers vanilla text-to-image fine-tuning using LoRA. The training is based on image-caption pairs datasets using SDXL 1.0 as the base model. This method should be preferred for training models with multiple subjects and styles.

This tutorial is based on the diffusers package, which does not support image-caption datasets for Dreambooth training. Training has been tested on version 0.19.3. Note that the output LoRA can only be used via the the diffusers package and not compatible with the original implementation (most open-source webui in the market use the original implementation).

Based on local experiments, the VRAM consumptions are as follows:

  • GeForce RTX 3060 GPU (12GB) —consumes about 12.3 GB for training. Training takes about 7 hours…

--

--

Ng Wai Foong

Senior AI Engineer@Yoozoo | Content Writer #NLP #datascience #programming #machinelearning | Linkedin: https://www.linkedin.com/in/wai-foong-ng-694619185/