Machine Learning Times
Machine Learning Times
Announcing Eric Siegel’s New Book: The AI Playbook
  Dear Reader, I’m excited to announce the forthcoming,...
Predictive Analytics for the Call Center
 So, you just received your shiny new smart watch....
MLW Preview Video: Gulrez Khan, Data Science Lead at PayPal
 In anticipation of his upcoming keynote presentation at Predictive...
MLW Preview Video: Devanshi Vyas, Co-Founder at Censius
 In anticipation of her upcoming presentation at Deep Learning...

4 months ago
Introducing Speech-to-Text, Text-to-Speech, and More for 1,100+ Languages

Originally published in Meta AI, May 22, 2023.

Equipping machines with the ability to recognize and produce speech can make information accessible to many more people, including those who rely entirely on voice to access information. However, producing good-quality machine learning models for these tasks requires large amounts of labeled data — in this case, many thousands of hours of audio, along with transcriptions. For most languages, this data simply does not exist. For example, existing speech recognition models only cover approximately 100 languages — a fraction of the 7,000+ known languages spoken on the planet. Even more concerning, nearly half of these languages are in danger of disappearing in our lifetime.

In the Massively Multilingual Speech (MMS) project, we overcome some of these challenges by combining wav2vec 2.0, our pioneering work in self-supervised learning, and a new dataset that provides labeled data for over 1,100 languages and unlabeled data for nearly 4,000 languages. Some of these, such as the Tatuyo language, have only a few hundred speakers, and for most of these languages, no prior speech technology exists. Our results show that the Massively Multilingual Speech models outperform existing models and cover 10 times as many languages. Meta is focused on multilinguality in general: For text, the NLLB project scaled multilingual translation to 200 languages, and the Massively Multilingual Speech project scales speech technology to many more languages.

To continue reading this article, click here.

10 thoughts on “Introducing Speech-to-Text, Text-to-Speech, and More for 1,100+ Languages

Leave a Reply