Text-to-speech engines have made significant progress over the last couple of years and have reached a point where they sound more or less like a real person. One of the finest examples of text-to-speech engines is Google’s own Cloud Text-to-Speech engine which the company currently uses to power the Google Assistant and Google Maps directions.
Soon, you might be able to see more of the Cloud Text-to-Speech engine as Google has now opened access to developers. The Cloud Text-to-Speech engine was developed by Google parent Alphabet’s DeepMind and features a total of 32 different voices that span across 12 languages and variants. Google will also allow developers to alter the pitch, speaking rate and volume gain of the MP3 or WAV files generated by the engine so that they can give it their own personal touch.
The service also includes six English voices which were all built using WaveNet, DeepMind’s model for creating raw audio text. What sets WaveNet apart is its ability to use raw audio to create a far more natural sounding speech using a machine-learning model. Google claims that the WaveNet voices scored 20 percent better than standard voices when the company tested it out with a number of people. The Cloud Text-to-Speech service is now available to all developers and if you’re interested in adding text-to-speech integration in your app, you can check out the pricing details here.