Split and Transcribe Audio Files with OpenAI Whisper
General purpose method to streamline audio preprocessing
By reading this article, you will learn to split an audio file into multiple chunks with the transcription text. This comes in handy for those that wish to cut and trim any audio recording automatically.
The steps are as follows:
- Transcribe the audio clip using OpenAI Whisper large model (30 seconds sliding window).
- Convert the transcription text from Traditional Chinese to Simplified Chinese using OpenCC. (Optional)
- Split audio into different segments using ffmpeg. Segment is based on the timestamp obtained from transcription.
- Trim each segment to remove silence from beginning and end using
librosa
package. - Save the audio files and transcription text.
However, do note that this method is not perfect but ultimately serves as a good start. This method works extremely well if there are sufficient long pauses in between sentences and there is only a single speaker at a time for each utterance.
This tutorial will cover the tips and tricks for processing Chinese audio files. However, it should work for all the supported languages.