OpenAI’s Whisper can Reach Human-Level Robustness in ASR

By S G Rickman On Sep 30, 2022

OpenAI’s Whisper will enable speech recognition apps to reach new levels of efficiency

Speech recognition or voice recognition technology has come a long since the concept first emerged. But users continue to have only one persisting problem with voice recognition, which is accuracy. Over the past couple of years, researchers have been working on building AI algorithms that can accurately process voice input and consistently focus on the research and development of speech development. Recently, OpenAI’s Whisper is making headlines for being an avant-garde open-source ML model which can perform automatic speech recognition on a wide selection of global languages. With the help of a unique transformed trained on 680,000 hours of weekly-supervised, multi-lingual audio data, OpenAI’s Whisper can conduct human-level robustness and accuracy in ASR, without the need for fine-tuning or any intermediaries. The model is basically open-source and has various weight sizes available to the public.

Over the years, countless big tech companies have been trying to reach an efficient level of accuracy in ASR systems, which sit at the core of this recognition software apps, besides, the services from tech giants like Google, Amazon, and Meta have abundantly helped the growth and development of the speech recognition domain. OpenAI mentioned in the GitHub repository for Whisper that the ASR has shown successful results in over 10 languages and demonstrates additional capabilities in tasks like voice activity detection, speaker classification, or speaker diarization, which weren’t actively addressed previously.

Is Whisper Really Not Limitless?

No, Whisper does have its limitations, particularly in the area of text prediction. The system is basically trained on a large amount of noisy data, which mostly contains words in its transcriptions that were not actually spoken, mainly because it tries to predict the next word through audio and try to transcribe the audio itself. Furthermore, this open-source ML model does not really perform well across languages, which are suffering from a higher error rate when it comes to speakers of languages that are not well-represented in the training data.

Bias has been one of the major reasons that hinder the streamlining of machine learning models. Studies conducted by some of the best tech companies in the world like Google, IBM, and Amazon have reduced the proximity of the errors. Despite this, OpenAI’s Whisper has transcription capabilities being used to improve existing accessibility tools.

Bottom Line

Whisper does not really reflect the full potential of OpenAI, nor do its plans. The efforts to aid the growing popularity of Dall-E 2 and GPT-3, but the company is definitely pursuing several research projects on AI research.

The post OpenAI’s Whisper can Reach Human-Level Robustness in ASR appeared first on Analytics Insight.

OpenAI’s Whisper will enable speech recognition apps to reach new levels of efficiency

Is Whisper Really Not Limitless?

Bottom Line

The post OpenAI’s Whisper can Reach Human-Level Robustness in ASR appeared first on Analytics Insight.

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.