Introduction to Token Merging for Stable Diffusion

Ng Wai Foong
4 min readApr 6, 2023

Speeds up image generation by merging redundant tokens

Photo by Shubham Dhage on Unsplash

The topic for today is on Token Merging (ToMe) to speed up image generating by merging redundant tokens. The token merging concept was first introduced by the team behind Facebook research. Token Merging (ToMe)

… allows you to take an existing Vision Transformer architecture and efficiently merge tokens inside of the network for 2–3x faster evaluation.

Later on, one of the team member, Daniel Bolya applied the same concept to the underlying transformer blocks in Stable Diffusion. The implementation does not require any training and work out-of-the-box for any Stable Diffusion model. It provides a slight improvement to inference speed and memory consumption with minimal quality loss.

Token Merging (ToMe) can be applied to the original Stable Diffusion checkpoint (ckpt) or diffusers-based model. This tutorial will be based on the diffusers package.

Let’s proceed to the next section for the setup and installation.

Setup

It is recommended to create a new virtual environment before the installation.

Run the following command to install diffusers and other dependencies:

pip install diffusers accelerate…

--

--

Ng Wai Foong

Senior AI Engineer@Yoozoo | Content Writer #NLP #datascience #programming #machinelearning | Linkedin: https://www.linkedin.com/in/wai-foong-ng-694619185/