Member-only story

Speech Translation with OpenAI Whisper

Ng Wai Foong
5 min readJun 27, 2023

An experimental hack that works out-of-the-box

Photo by Hannah Wright on Unsplash

Whisper is a general-purpose speech recognition model built by OpenAI. It was officially released to the public in the late 2022 and is now one of the state-of-the-art model for speech recognition.

The model is trained on a large dataset of diverse audio and is capable of performing the following tasks:

  • multilingual speech recognition
  • speech translation
  • language identification

The official repository primarily focused on the speech recognition capabilities. However, Whisper can do speech translation quite well for languages that share similar traits. For example, translating English to Spanish.

Recently, I came across an experiment conducted by the community which used transformers-based Whisper model to transcribe speech to any language. Inspired by the experiment, this tutorial covers the same technique used for speech translation by utilizing the original implementation of Whisper instead of transformer-based Whisper.

At the time of this writing, Whisper comes with five different multilingual models.

This tutorial is based on the large model with requires about 10GB of VRAM.

--

--

Ng Wai Foong
Ng Wai Foong

Written by Ng Wai Foong

Senior AI Engineer@Yoozoo | Content Writer #NLP #datascience #programming #machinelearning | Linkedin: https://www.linkedin.com/in/wai-foong-ng-694619185/

Responses (1)