How to Fine-tune Stable Diffusion using LoRA

Ng Wai Foong
8 min readFeb 21

Personalized generated images with custom datasets

Image by the author

Previously, I have covered the following articles on fine-tuning the Stable Diffusion model to generate personalized images:

By default, doing a full fledged fine-tuning requires about 24 to 30GB VRAM. However, with the introduction of Low-Rank Adaption of Large Language Models (LoRA), it is now possible to do fine-tuning with consumer GPUs.

Based on a local experiment, a single process training with batch size of 2 can be done on a single 12GB GPU (10GB without xformers, 6GB with xformers).

LoRA offers the following benefits:

  • less likely to have catastrophic forgetting as the previous pre-trained weights are kept frozen
  • LoRA weights have fewer parameters than the original model and can be easily portable
  • allow control to which extent the model is adapted toward new training images (supports interpolation)

This tutorial is strictly based on the diffusers package. Training and inference will be done using the StableDiffusionPipeline class directly. Model conversion is required for checkpoints that are trained using other repositories or web UI.

Let’s proceed to the next section for the setup and installation.


Before that, it is highly recommended to create a new virtual environment.

Python packages

Activate the virtual environment and run the following command to install the dependencies:

pip install accelerate torchvision transformers datasets ftfy tensorboard

Next, install thediffusers package as follows:

pip install diffusers

For the latest development version of diffusers, kindly install it using the following command:

pip install git+
Ng Wai Foong

Senior AI Engineer@Yoozoo | Content Writer #NLP #datascience #programming #machinelearning | Linkedin:

Recommended from Medium


See more recommendations