Split and Transcribe Audio Files with OpenAI Whisper

Ng Wai Foong
7 min readAug 1, 2023

General purpose method to streamline audio preprocessing

Photo by Techivation on Unsplash

By reading this article, you will learn to split an audio file into multiple chunks with the transcription text. This comes in handy for those that wish to cut and trim any audio recording automatically.

The steps are as follows:

  1. Transcribe the audio clip using OpenAI Whisper large model (30 seconds sliding window).
  2. Convert the transcription text from Traditional Chinese to Simplified Chinese using OpenCC. (Optional)
  3. Split audio into different segments using ffmpeg. Segment is based on the timestamp obtained from transcription.
  4. Trim each segment to remove silence from beginning and end using librosa package.
  5. Save the audio files and transcription text.

However, do note that this method is not perfect but ultimately serves as a good start. This method works extremely well if there are sufficient long pauses in between sentences and there is only a single speaker at a time for each utterance.

This tutorial will cover the tips and tricks for processing Chinese audio files. However, it should work for all the supported languages.

--

--

Ng Wai Foong
Ng Wai Foong

Written by Ng Wai Foong

Senior AI Engineer@Yoozoo | Content Writer #NLP #datascience #programming #machinelearning | Linkedin: https://www.linkedin.com/in/wai-foong-ng-694619185/