Microsoft unveils VALL-E, the AI that copies human voices: what is all about

Trained with more than 60,000 hours of words, VALL-E can reproduce human voices and related tone with impressive accuracy.
VALL_E

Times of interesting news in the context of artificial intelligence. The latest innovation in this context is VALL-E, an AI system that reproduces human voices.

The artificial intelligence model, presented to the public by Microsoft, is capable not only of reproducing the words spoken by a person. But also of managing their tone according to their emotional state.

In this article, therefore, we will go over what is currently known about VALL-E. Also, how it works, and the possible risks associated with its use.

What is VALL-E, the AI that reproduces human voices, and how it works

The system that governs in operation of VALL-E is as simple as it is effective.

Through just 3 seconds of dialogue spoken by a person in fact, the artificial intelligence is able to reproduce that person’s voice. All it takes is textual input to be able to replicate text-to-speech speech, with the same tone for output fidelity that is nothing short of amazing.

To achieve this, the system has been trained with more than 60,000 hours of speech in English. For now, the only language for which VALL-E is available. With such a database at its disposal, the software is able to have a solid foundation through which to process newly received audio input.

How accurate are the results of this AI? According to reports from Cornell University, a New York university founded in 1865, the reproduction is nothing short of amazing.

In short, according to the institute, VALL-E is capable of far surpassing current zero-shot TSS systems. Which we can consider the maximum achievable result in terms of naturalness and similarity to the original voice.

Although VALL-E allows software used in this field so far to be considered outdated, it is still not a perfect mechanism. In fact, test in hand, some samples present minor problems.

In fact, among the many audio clips extrapolated so far, there are some imperfections through which it is possible to understand, at least for the most attentive ear, how the reproduced voice is not entirely natural.

That said, given the premise, it is easy to see how VALL-E, if developed properly, could become the ultimate artificial intelligence when it comes to reproducing human voices.

VALL-E benefits and risks

Advanced artificial intelligence systems such as VALL-E or the now famous ChatGPT, turn out to be surprising and interesting. But also potentially very dangerous.

If until now it was possible to edit footage by creating deep fake videos that were credible to say the least, through this system it will be possible to make even more convincing montages, perhaps even capable of fooling Intel’s FaceCatcher.

In this regard then, it will be possible to make people speak words they never said. With obvious repercussions on the fake news and/or propaganda side of various types.

While these technological developments appear fascinating and entertaining, we should consider the downside.

On the other hand, the unbridled application of AI, could create rather worrisome situations regarding public opinion and freedom of expression.

Read also: Is Artificial Intelligence a job killer and how can it coexist with human work?

Related articles...
Latest news
ISOLCORE

With CZ panel, ISOLCORE changes the rules of the insulation materials market

Cantiere Navale Noè di Augusta: Italian excellence between tradition, innovation and sustainability

Cosmo Impresa is the consulting player that guides your company into the future of Industry 4.0

be open

Commitment to increase sustainable operations from2025 on was made by BE OPEN think-tank atCOP29

Le Fonti Gran Gala torna il 5 dicembre 2024

cmi

C.m.i.: Growing Between Environmental Sustainability and Human Capital Enhancement

Newsletter

Sign up now to stay updated on all business topics.