Beginner’s Guide to Neural Speaker Diarization with pyannote

Ng Wai Foong
3 min readAug 8, 2023

An open-source toolkit written in Python for speaker diarization

Photo by Ilyass SEDDOUG on Unsplash

By reading this article, you will learn to split an audio input into different segments or chunks according to the identity of each speaker. This process is also known as speaker diarization.

This tutorial is based on the pyannote-audio Python package for speaker diarization. It comes with the following capabilities

  • speech activity detection
  • speaker change detection
  • overlapped speech detection
  • speaker embedding

Let’s proceed to the next section for the setup and installation process.

Setup

It is highly recommended to create a new virtual environment before you continue with the installation.

Pytorch

Run the following command to install Pytorch:

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

Other Python packages

Run the following command to install all the required dependencies:

pip install git+https://github.com/pyannote/pyannote-audio

Create a new account on the HuggingFace platform and accept the conditions for the following models:

Then, access the following URL”

hf.co/settings/tokens

Take note of the access token as we will need it later on in the inference script.

ffmpeg

Head over to the official page and download the installer:

https://ffmpeg.org/download.html

# windows
https://github.com/BtbN/FFmpeg-Builds/releases

For example, windows users can download and install from the following file.

For windows users, add the following path to the environment variables:

C:\Program Files\ffmpeg\bin

Usage

--

--

Ng Wai Foong

Senior AI Engineer@Yoozoo | Content Writer #NLP #datascience #programming #machinelearning | Linkedin: https://www.linkedin.com/in/wai-foong-ng-694619185/