Introducing Speech-to-Text, Text-to-Speech, and More for 1,100+ Languages

AI, artificial intelligence, deep analytics, Deep Learning, Machine Learning, Predictive Analytics
2527 Views

1 year ago
Introducing Speech-to-Text, Text-to-Speech, and More for 1,100+ Languages

This blog post was made possible by the work of Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Ali Elkahky, Zhaoheng Ni, Sayani Kundu, Maryam Fazel-Zarandi, Apoorv Vyas, Alexei Baevski, Yossef Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, and Michael Auli

Originally published in Meta AI, May 22, 2023.

Equipping machines with the ability to recognize and produce speech can make information accessible to many more people, including those who rely entirely on voice to access information. However, producing good-quality machine learning models for these tasks requires large amounts of labeled data — in this case, many thousands of hours of audio, along with transcriptions. For most languages, this data simply does not exist. For example, existing speech recognition models only cover approximately 100 languages — a fraction of the 7,000+ known languages spoken on the planet. Even more concerning, nearly half of these languages are in danger of disappearing in our lifetime.

In the Massively Multilingual Speech (MMS) project, we overcome some of these challenges by combining wav2vec 2.0, our pioneering work in self-supervised learning, and a new dataset that provides labeled data for over 1,100 languages and unlabeled data for nearly 4,000 languages. Some of these, such as the Tatuyo language, have only a few hundred speakers, and for most of these languages, no prior speech technology exists. Our results show that the Massively Multilingual Speech models outperform existing models and cover 10 times as many languages. Meta is focused on multilinguality in general: For text, the NLLB project scaled multilingual translation to 200 languages, and the Massively Multilingual Speech project scales speech technology to many more languages.

To continue reading this article, click here.

12 thoughts on “Introducing Speech-to-Text, Text-to-Speech, and More for 1,100+ Languages”

Diane Birch on June 20, 2023 at 6:48 am said:
Log in to Reply

In cuphead online, every frame is a piece of beauty, and each level is a symphony of challenging and exciting gameplay. Be prepared to be carried away by the handmade magic of this game, where every level is a symphony.
Minho Lee on June 29, 2023 at 4:52 am said:
Log in to Reply

This is an excellent article. This is, in my opinion, one of the best posts ever written. Your work is excellent and inspiring us map. Thank you very much.
Aurora Smith on July 6, 2023 at 3:16 am said:
Log in to Reply

love the content of this blog and the positive intent you have. Thanks!
womens gothic jackets
Hay Day Apk on July 6, 2023 at 1:56 pm said:
Log in to Reply

CarX Street Mod APK is a safe and secure way to play the game. It has been scanned for viruses and malware, and it is not affiliated with any illegal activities.
3 vex on July 21, 2023 at 12:33 am said:
Log in to Reply

Great information, I will recommend it to my friends for them to check out. Thanks for sharing! If you have more time, please visit: stumble guys
Ayatt Alex on July 26, 2023 at 11:17 pm said:
Log in to Reply

It’s has been inspected for the presence of viruses and other malicious software, and it does not support or participate in any illegal activity slope game.
Muhammad Haseeb Sheikh on July 30, 2023 at 4:13 pm said:
Log in to Reply

you should be aware in the matter of downloading a moded file. or game. In generic case i used to play a game Carx street. And I Must say that I download
Carx Street Mod APK from a trusted sources that mentioned on the previous word.
Maha Sheikh on August 3, 2023 at 3:33 pm said:
Log in to Reply

Great information, I will recommend it to my friends for them to check out.
If you want more tech related information please visit InTechLogic
Kevin M Young on August 16, 2023 at 11:43 pm said:
Log in to Reply

Hill climb Racing Mod APK is a safe and secure racing game for playing the game. It has been scanned for viruses and malware, and it is not affiliated with any illegal activities.
Abdullah Blogger on August 23, 2023 at 7:57 am said:
Log in to Reply

The FM WhatsApp Download is a customized version of the fmWhatsApp that offers endless capabilities and functions that you may use to do anything in WhatsApp.
Willms Eva on October 23, 2023 at 9:47 pm said:
Log in to Reply

Your creativity knows no bounds. I’m constantly amazed by your ideas. Run 3
Sumair on May 18, 2024 at 4:53 am said:
Log in to Reply

Download the Vegas Sweeps Download App. It is an online platform that provides an enjoyable gaming experience with friends and family. Not only this you can also, compete against the players worldwide.

EXCLUSIVE HIGHLIGHTS

Related

1 year ago
Introducing Speech-to-Text, Text-to-Speech, and More for 1,100+ Languages

Originally published in Meta AI, May 22, 2023.

12 thoughts on “Introducing Speech-to-Text, Text-to-Speech, and More for 1,100+ Languages”

Leave a Reply Cancel reply

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact

EXCLUSIVE HIGHLIGHTS

Related

1 year agoIntroducing Speech-to-Text, Text-to-Speech, and More for 1,100+ Languages

Originally published in Meta AI, May 22, 2023.

Recommended

This new forecasting model is better than machine learning, researchers say

Widespread machine learning methods behind ‘link prediction’ are performing very poorly, study shows

AI’s $600B Question

Google scrambles to manually remove weird AI answers in search

12 thoughts on “Introducing Speech-to-Text, Text-to-Speech, and More for 1,100+ Languages”

Leave a Reply Cancel reply

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190 Produced by: Rising Media & Prediction Impact

1 year ago
Introducing Speech-to-Text, Text-to-Speech, and More for 1,100+ Languages

The Machine Learning Times © 2020 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact