Member-only story
How to Fine-tune Stable Diffusion using LoRA
Personalized generated images with custom datasets
Previously, I have covered the following articles on fine-tuning the Stable Diffusion model to generate personalized images:
- How to Fine-tune Stable Diffusion using Textual Inversion
- How to Fine-tune Stable Diffusion using Dreambooth
- The Beginner’s Guide to Unconditional Image Generation Using Diffusers
By default, doing a full fledged fine-tuning requires about 24 to 30GB VRAM. However, with the introduction of Low-Rank Adaption of Large Language Models (LoRA), it is now possible to do fine-tuning with consumer GPUs.
Based on a local experiment, a single process training with batch size of 2 can be done on a single 12GB GPU (10GB without
xformers
, 6GB withxformers
).
LoRA offers the following benefits:
- less likely to have catastrophic forgetting as the previous pre-trained weights are kept frozen
- LoRA weights have fewer parameters than the original model and can be easily portable
- allow control to which extent the model is adapted toward new training images (supports interpolation)
This tutorial is strictly based on the diffusers
package. Training and inference will be done using the StableDiffusionPipeline
class directly. Model conversion is required for checkpoints that are trained using other repositories or web UI.
Let’s proceed to the next section for the setup and installation.
Setup
Before that, it is highly recommended to create a new virtual environment.
Python packages
Activate the virtual environment and run the following command to install the dependencies:
pip install accelerate torchvision transformers datasets ftfy tensorboard
Next, install thediffusers
package as follows:
pip install diffusers
For the latest development version of diffusers
, kindly install it using the following command:
pip install git+https://github.com/huggingface/diffusers