- Meta launches Voicebox: powerful generative AI for speech tasks.
- Voicebox: in-context text-to-speech, editing, noise reduction, cross-lingual style transfer.
- Applications: natural voices for metaverse, aid for visually impaired, content creation.
While Microsoft and Google often dominate the headlines when it comes to AI, there are several other companies also racing to develop AI products. One of these tech companies is Meta which is by far the biggest social media giant.
In a recent blog post, Meta announced the launch of its new generative AI tool called Voicebox. This tool is designed to perform various speech-generation tasks, even ones it hasn’t been specifically trained for, thanks to its ability to learn from context.
Voicebox offers a range of features. Firstly, it can perform in-context text-to-speech synthesis by using audio samples as short as two seconds to match the desired audio style for generating speech from text. It also enables speech editing and noise reduction, allowing users to seamlessly recreate or replace interrupted or misspoke words in recorded speech without the need for re-recording.
Introducing Voicebox, a new breakthrough generative speech system based on Flow Matching, a new method proposed by Meta AI. It can synthesize speech across six languages, perform noise removal, edit content, transfer audio style & more.— Meta AI (@MetaAI) June 16, 2023
More details on this work & examples ⬇️
Another capability of Voicebox is cross-lingual style transfer. By providing a speech sample and a passage of text, the tool can produce a reading of the text in English, French, German, Spanish, Polish, or Portuguese. Additionally, Voicebox utilizes diverse data to generate speech that is more representative of how people actually talk in these six languages, enhancing its ability to capture natural speech patterns.
Also, Read “Meta Set to Challenge Twitter’s Dominance“
Meta highlights the potential applications of Voicebox. In the future, multipurpose generative AI models like Voicebox could be used to give virtual assistants and non-player characters in the metaverse natural-sounding voices. They could also help visually impaired individuals by enabling AI to read written messages in the voices of their friends. Furthermore, content creators could benefit from Voicebox’s capabilities for easily creating and editing audio tracks for videos, among many other possibilities.
If you’re interested in seeing an example of Voicebox in action, you can visit Meta’s blog and watch the video they have posted there.