Introduction to VideoFusion

Ng Wai Foong
5 min readMar 30

Decomposed Diffusion Models for High-Quality Video Generation

Image by the author

By reading this article, you will learn to perform text-to-video generation using TextToVideoSDPipeline, a new pipeline based on the VideoFusion paper. It is available in the development version of the diffusers package (0.15.0.dev0).

VideoFusion is a new research initiative by the Damo Vilab team, which decomposed diffusion models for high-quality video generation. Based on the official repository, the text-to-video generation diffusion model

… consists of three sub-networks: text feature extraction model, text feature-to-video latent space diffusion model, and video latent space to video visual space model. The overall model parameters are about 1.7 billion. Currently, it only supports English input. The diffusion model adopts a UNet3D structure, and implements video generation through the iterative denoising process from the pure Gaussian noise video.

The model is licensed under CC BY-NC-ND 4.0, and is meant for research purposes only.

Let’s proceed to the next section for setup and installation.


First and foremost, it is recommended to create a new virtual environment. Activate it and run the following command to install all the base dependencies:

pip install transformers accelerate


Next, run the following command to install the latest development version of diffusers:

 pip install git+

At the time of this writing, the stable version of diffusers is 0.14.0, which does not support text-to-video generation. Make sure to install the latest development version until the release of version 0.15.0.


Note that opencv-python is required for frames to video conversion. Although opencv-python comes with 4 different packages:

  • opencv-python — main package
  • opencv-contrib-python — full package (comes with contrib/extra modules)
  • opencv-python-headless — main package without GUI
  • opencv-contrib-python-headless —…
Ng Wai Foong

Senior AI Engineer@Yoozoo | Content Writer #NLP #datascience #programming #machinelearning | Linkedin: