Meta's AI breakthrough: Supporting thousands of languages with MMS model

In a groundbreaking effort to preserve the world's languages and enhance global communication, Meta has unveiled its latest achievement: Massively Multilingual Speech (MMS) models. This revolutionary advancement expands the capabilities of text-to-speech and speech-to-text technology, allowing support for over 1,100 languages, more than ten times the previous capacity. Furthermore, these models can now identify over 4,000 spoken languages, a significant leap forward in language recognition technology.


The urgent need to protect endangered languages and bridge communication gaps drove Meta's dedicated team to develop this cutting-edge technology. By providing individuals with access to information and enabling device usage in their preferred language, MMS models offer a solution to the challenges faced by linguistically diverse communities worldwide

.
Meta's MMS models have vast potential across various industries and use cases, including virtual and augmented reality technology, messaging services, and more. These powerful AI models can seamlessly adapt to any user's voice and comprehend spoken language in an inclusive manner.


Meta has decided to open-source the models and accompanying code. This move allows researchers and developers worldwide to build upon Meta's pioneering work, fostering collaboration in the pursuit of preserving linguistic diversity and bringing humanity closer together.


The development of MMS models posed unique challenges, as existing speech datasets were limited to approximately 100 languages. To overcome this hurdle, Meta ingeniously leveraged religious texts, such as the Bible, which have been translated into numerous languages and extensively studied for language translation research. These translations provided publicly available audio recordings, featuring individuals reading the texts in different languages.


For the MMS project, Meta curated a dataset containing readings of the New Testament in over 1,100 languages, with an average of 32 hours of audio data per language. By also incorporating unlabeled recordings of various Christian religious readings, the dataset expanded to encompass more than 4,000 languages. Notably, Meta's analysis demonstrates that the models perform equally well for male and female voices, despite the predominantly male speakers in the religious audio recordings. Furthermore, the models remain unbiased in their output, without favouring religious language based on the content of the audio recordings.


Meta said it remains committed to future advancements in language accessibility. The company aims to expand the coverage of MMS models to support even more languages, while also addressing the complexities associated with handling dialects—a challenge that has eluded existing speech technology.


With Meta's Massively Multilingual Speech models, language barriers will crumble, and voices from every corner of the globe will be heard.

Media
@adgully

News in the domain of Advertising, Marketing, Media and Business of Entertainment

More in Media