Introduction to Token Merging for Stable Diffusion

Ng Wai Foong
4 min readApr 6

Speeds up image generation by merging redundant tokens

Photo by Shubham Dhage on Unsplash

The topic for today is on Token Merging (ToMe) to speed up image generating by merging redundant tokens. The token merging concept was first introduced by the team behind Facebook research. Token Merging (ToMe)

… allows you to take an existing Vision Transformer architecture and efficiently merge tokens inside of the network for 2–3x faster evaluation.

Later on, one of the team member, Daniel Bolya applied the same concept to the underlying transformer blocks in Stable Diffusion. The implementation does not require any training and work out-of-the-box for any Stable Diffusion model. It provides a slight improvement to inference speed and memory consumption with minimal quality loss.

Token Merging (ToMe) can be applied to the original Stable Diffusion checkpoint (ckpt) or diffusers-based model. This tutorial will be based on the diffusers package.

Let’s proceed to the next section for the setup and installation.


It is recommended to create a new virtual environment before the installation.

Run the following command to install diffusers and other dependencies:

pip install diffusers accelerate transformers

Token Merging (ToMe) is now available as PyPI package and can be installed locally via pip install.

pip install tomesd



Create a new Python file called and append the following import statements:

import tomesd
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler

Then, initialize the pipeline using mixed precision:

pipe = StableDiffusionPipeline.from_pretrained(

Continue by changing the scheduler as follows:

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
Ng Wai Foong

Senior AI Engineer@Yoozoo | Content Writer #NLP #datascience #programming #machinelearning | Linkedin: