Information

Text-To-Speech Conversion by Microsoft AI Is Incredibly Realistic

Text-To-Speech Conversion by Microsoft AI Is Incredibly Realistic

Microsoft and Chinese researchers may have discovered an effective way of converting text-to-speech. Up until now, this conversion had been evolving very imaginatively and cleverly; however, the issues of training time and resources to create natural-sounding output were causing delays.

RELATED: NEW OECD ARTIFICIAL INTELLIGENCE PRINCIPLES: GOVERNMENTS AGREE ON INTERNATIONAL STANDARDS FOR TRUSTWORTHY AI

What Microsoft and Chinese researchers have done is to create an AI text-to-speech Artificial Intelligence (AI) that utilizes 200 voice samples to create realistic-sounding speech to match transcriptions. This means approximately 20 minutes' worth.

How is it linked to the brain?

Similar to brain neurons, the system partly uses Transformers, or deep neural networks. Like our brain synapses, the Transformers weigh in and process all input and output information on the go. This helps them to run through even complicated and long sequences in a well-organized way - for example, a complex sentence.

Working with relatively little information, including a voice-removing encoder added to the mix, as it is in this case, AI can regardless manage quite nicely.

Even with slightly robotic sounds, the word-intelligibility of the recordings comes in at 99.84 percent. On top of that, this may bring the text to speech more accessible. In order to create realistic-sounding voices, it wouldn't take much more hard work.

Researchers are continually working to improve the system, and are hopeful that in the future, it will take even less work to generate lifelike discourse.


Watch the video: Demo: The magic of AI neural TTS and holograms at Microsoft Inspire 2019 (December 2021).