Google Releases Largest Overhaul for Cloud Speech-to-Text Engine


Late last month, Google released its Cloud Text-to-speech engine to developers worldwide which featured 32 different voices spanning across 12 languages and variants. Now, the company has released a major update for another product from its Cloud AI speech lineup- the Cloud Speech-to-text engine (formerly known as the Cloud Speech API).

The Cloud Speech-to-text engine, which was released back in 2016, has been available to developers for almost a year now. However, with the latest release, Google has added a number of new features and updates to the engine which is expected to make it much more useful for businesses, including phone-call and video transcription. However, nothing is stopping consumer apps developers from using these engines to make apps.

According to Google’s blog post, the new and updated Cloud Speech-to-Text engine now supports:

  1. A selection of pre-built models for improved transcription accuracy from phone calls and video
  2. Automatic punctuation, to improve readability of transcribed long-form audio
  3. A new mechanism (recognition metadata) to tag and group your transcription workloads, and provide feedback to the Google team
  4. A standard service level agreement (SLA) with a commitment to 99.9% availability

At least a few of these could have real world consumer applications – such as using the engine for transcribing voice recordings.

However, the new video and phone call transcription models have been specifically designed for business use cases, such as in call centers, where there is a need to keep track of all communication between company and customers.

The API can support up to 4 speakers for phone calls and over 4 speakers on video calls, while seamlessly accounting for background noise, static from the phone line, and other agents.

Google Speech to text

In order to train the model, Google used real data from customers who volunteered to provide the data in exchange for getting access to the improvements. Due to the use of real data, the new model now have 54% fewer errors than the previous model. In the blog post, Dan Aharon, Product Manager, Cloud AI at Google, wrote:

“Most major cloud providers use speech data from incoming requests to improve their products. Here at Google Cloud, we’ve avoided this practice, but customers routinely request that we use real data that’s representative of theirs, to improve our models. We want to meet this need, while being thoughtful about privacy and adhering to our data protection policies. That’s why today, we’re putting forth one of the industry’s first opt-in programs for data logging, and introducing the first model based on this data”.

comment Comments 0
Leave a Reply