AI Babel Fish becomes reality

Publicly released:
International

While it may not be so easy to insert it into your ear, your smart phone could soon be the equivalent of the Babel Fish from the Hitchhiker's Guide to the Galaxy (a small fish capable of real time language translations), according to researchers from META. While most existing machine learning translation systems are text oriented, or involve multiple steps often translating speech into text and then converting it to speech in another language - the new model, called SEAMLESSM4T, does immediate speech-to-speech translation covering 101 languages. The AI model is capable of filtering out background noise and adjusting to  speech variations, with 23% more translation accuracy than other translators. The resource will be made available for public non-commercial use, say the authors.

Media release

From: Springer Nature

An AI model that can translate speech and text, including direct speech - to - speech translations, for up to 101 languages is described in
Nature. The model, named SEAMLESSM4T, fills gaps in language coverage and outperforms existing systems. The work may pave the way for rapid universal translations, with resources being made publicly available (for non-commercial use) to assist further research on inclusive speech translation technologies.

Readers of science fiction might be familiar with the Babel Fish from The Hitchhiker’s Guide to the Galaxy, a small fish that could be
inserted into an ear and simultaneously translate from one spoken language to another. Such a tool would be valuable in facilitating communication in an interconnected global landscape, but most existing machine learning translation systems are text oriented, or involve multiple steps — speech recognition, translation into text, and conversion of text to speech. In addition, language coverage for existing speech -
to - speech models falls behind that of text - to - text models and tends to be skewed towards translating from a source language into English, rather than from English to another language.

Addressing these limitations, the Seamless Communication Team from Meta have developed a single model that supports multiple modes of
translation between up to 101 languages. SEAMLESSM4T can facilitate speech - to - speech translation (recognizing 101 languages and translating to 36 languages), speech - to - text translation (101 to 96 languages), text - to - speech translation (96 to 36 languages), text - to - text translation (96 languages), and automatic speech recognition (96 languages). For speech - Instant speech-to-speech translation, SEAMLESSM4T translates text with up to 23% more accuracy than existing systems. The AI model can filter out background noise and adjust to speaker variation. Although further optimization is required, SEAMLESSM4T may represent a step towards improving communication across language barriers, the authors conclude.

Attachments

Note: Not all attachments are visible to the general public. Research URLs will go live after the embargo ends.

Research , Web page Will go live after the embargo lifts
Journal/
conference:
Nature
Research:Paper
Organisation/s: META Foundational AI Research
Funder: META
Media Contact/s
Contact details are only visible to registered journalists.