Member-only story
How to Fine-tune Stable Diffusion using Dreambooth
Personalized generated images with custom styles or objects

Previously, I have covered an article on fine-tuning Stable Diffusion using textual inversion. This tutorial focuses on how to fine-tune Stable Diffusion using another method called Dreambooth. Unlike textual inversion method which train just the embedding without modification to the base model, Dreambooth fine-tune the whole text-to-image model such that it learns to bind a unique identifier with a specific concept (object or style). As a result, the generated images is more personalized to the object or style compared to textual inversion.
This tutorial is based on a forked version of Dreambooth implementation by HuggingFace. The original implementation requires about 16GB to 24GB in order to fine-tune the model. The maintainer ShivamShrirao optimized the code to reduce VRAM usage to under 16GB. Depending on your needs and settings, you can fine-tune the model with 10GB to 16GB GPU. I have personally tested the training to be feasible on Tesla T4 GPU.
Please note that all the existing implementation is not by the original author of Dreambooth. As a result, there might be slight difference in terms of reproducibility.
Let’s proceed to the next section to setup all the necessary modules.
Setup
It is recommended to create a new virtual environment before you continue with the installation.
Python packages
In your working directory, create a new file called requirements.txt
with the following code:
accelerate==0.12.0
torchvision
transformers>=4.21.0
ftfy
tensorboard
modelcards
Activate your virtual environment and run the following command one by one to install all the necessary modules:
pip install git+https://github.com/ShivamShrirao/diffusers.git
pip install -r requirements.txt
NOTE: You need to install
diffusers
using the url above instead of installing it directly frompypi
.
bitsandbytes package
There is an optional package called bitsandbytes
, which can reduce the VRAM usage further…