October 6, 2022
Whisper’s OpenAI: The AI whisperer model
By Sofía Sánchez González
The world of artificial intelligence has gone crazy lately. Every week we have a new model available, but the best comes when it’s an open source model. This is the case with Whisper, from OpenAI, which has overshadowed Stable Diffusion these past weeks. Whisper’s OpenAI: The AI whisperer model
How does Whisper work?
Whisper is an Automatic Speech Recognition (ASR) system. This means that we introduce an audio track to the model and since it is a multitasking model, you will be able to generate two things with it:
- Audio transcription
- Audio translation (from other languages to English)
The real-life uses of Whisper are fantastic, especially if you’re a content creator. If you are a YouTuber and you wish to put subtitles on your videos, Whisper can do it for you with a slightly better quality (according to some) than what YT itself offers. On the other hand, if you want to transcribe a podcast, just upload the audio and Whisper will quickly transform it into text.
But remember, Whisper’s main language is English.
Whisper’s strengths
Whisper is a vast improvement over the ARS systems we’ve seen to date. These are some of its advantages:
- It is the most robust and accurate system of automatic speech recognition (ARS) that exists.
- Whisper it’s very permissive when it comes to accents. You don’t have to speak British English for Whisper to understand what you’re saying, an important point for non-native speakers. Other approaches that have been used by other orgnatizations so far for these models suggest that the datasets are smaller and the audio and text in training are not paired. This means that there is not so much diversity of accents or quantity.
- OpenAI’m model is capable of limiting background noise. Many existing models are unable to distinguish between silence and voice and if we remain silent they will transcribe the silence.
- It is configured to perfectly understand, transcribe and translate technical language. Whisper understands industry or niche expressions, that can be difficult for other speech recognition systems, well.
- It makes approximately 50% fewer errors than other models.
OpenAI has far surpassed other models thanks to a huge and exclusive dataset with which they carried out the training.
We will give you more details.
How has Whisper been trained?
All of you will wonder, why has Open AI reached places where other researchers have not been able to approach? Well, open research can go up to a certain point. The data to train the models is the most important thing, it is the gasoline that fuels the artificial intelligence.
And not all organizations have the same amount of data to train their models. In the case of Whisper, 680,000 hours have been dedicated to the system, during which it has been exposed to (unspecified) web content.
All of those hours have been broken down into 30 second audio tracks, as that is the length of audio that Whisper supports. If you want to transcribe a 50-minute podcast, the model will turn it into a Mel spectrogram and will break it into 30-second tracks.
This is then fed to the decoder, which performs the language identification tasks that we mentioned above: both transcription and translation. Also, Whisper has not been fine-tuned on any specific dataset so that it does not acquire any bias.
Where can I try it?
The launch of Whisper has been truly open to everyone, that is, there has been no waiting list or exclusive access to try it. You can try it now at the following link.
You can upload audio to Whisper:
- From the microphone
- In MP3 or MP4 format
As always, our NLP engineer Manuel Romero has adapted Whisper to the Hugging Face ecosystem. Here you can try it.
About Narrativa
Narrativa is an internationally recognized content services company that uses its proprietary artificial intelligence and machine learning platforms to build and deploy digital content solutions for enterprises. Its technology suite, consisting of data extraction, data analysis, natural language processing (NLP) and natural language generation (NLG) tools, all seamlessly work together to power a lineup of smart content creation, automated business intelligence reporting and process optimization products for a variety of industries.
Contact us to learn more about our solutions!
Share