Personalized generated images with custom datasets
Previously, I have covered the following articles on fine-tuning the Stable Diffusion model to generate personalized images:
- How to Fine-tune Stable Diffusion using Textual Inversion
- How to Fine-tune Stable Diffusion using Dreambooth
- The Beginner’s Guide to Unconditional Image Generation Using Diffusers
By default, doing a full fledged fine-tuning requires about 24 to 30GB VRAM. However, with the introduction of Low-Rank Adaption of Large Language Models (LoRA), it is now possible to do fine-tuning with consumer GPUs.
Based on a local experiment, a single process training with batch size of 2 can be done on a single 12GB GPU (10GB without
xformers, 6GB with
LoRA offers the following benefits:
- less likely to have catastrophic forgetting as the previous pre-trained weights are kept frozen
- LoRA weights have fewer parameters than the original model and can be easily portable
- allow control to which extent the model is adapted toward new training images (supports interpolation)
This tutorial is strictly based on the
diffusers package. Training and inference will be done using the
StableDiffusionPipeline class directly. Model conversion is required for checkpoints that are trained using other repositories or web UI.
Let’s proceed to the next section for the setup and installation.
Before that, it is highly recommended to create a new virtual environment.
Activate the virtual environment and run the following command to install the dependencies:
pip install accelerate torchvision transformers datasets ftfy tensorboard
Next, install the
diffusers package as follows:
pip install diffusers
For the latest development version of
diffusers, kindly install it using the following command:
pip install git+https://github.com/huggingface/diffusers