Meta's New Voicebox AI Serves Up Generative Text-To-Speech - SlashGear

Voicebox relies on a novel training method called Flow Matching, which is claimed to offer higher intelligibility at text-to-speech jobs, and returns a higher rate of audio similarity when compared to the original training material. Compared to rival models out there, Meta says Voicebox brings the text-to-speech error rate down from 10.9% to 5.2%. It allows style transfer from one language to another, making the audio output sound more authentic.

But the most impressive capability in Voicebox’s arsenal is the “zero-shot” learning approach, which means it doesn’t need to be trained on a vast training data cache to do its job. All it needs is a two-second audio clip, and it will then learn everything from it, from the distinct tone and pitch to personal pauses — before it starts generating fresh audio clips with a similar sound profile.

For comparison, Microsoft’s Vall-E AI model uses a three-second audio clip to train itself. Meta says its text-to-speech generation model is faster than Vall-E. Just like Microsoft, which paused the public release of Vall-E citing abuse risks, Meta is taking a similar approach with Voicebox.

“We recognize that this technology brings the potential for misuse and unintended harm,” Meta argues, adding that it wants to take a responsible approach to AI innovation. The company has also released a research paper in which it has documented building a classifier model that can differentiate between Voicebox-generated audio and an authentic clip of a real human speaking.

Stay connected with us on social media platform for instant update click here to join our T witter, & Facebook

We are now on Telegram. Click here to join our channel (@TechiUpdate) and stay updated with the latest Technology headlines.

For all the latest gaming News Click Here

For the latest news and updates, follow us on Google News.

Read original article here

Denial of responsibility! NewsAzi is an automatic aggregator around the global media. All the content are available free on Internet. We have just arranged it in one platform for educational purpose only. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials on our website, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Meta’s New Voicebox AI Serves Up Generative Text-To-Speech – SlashGear