Cloud Blog – Google Speech-to-Text now includes improved call and video transcription, automatic punctuation, and metadata recognition.

Google Speech-to-Text now includes improved call and video transcription, automatic punctuation, and metadata recognition.


Google has announced a significant update to its Cloud Speech-to-Text technology that will make the API more useful for businesses, including improved phone calls and video transcription.

Now Text-to Speech, a Google Cloud Platform (GCP) service, enables developers to use voice answerers in call centers, allows Internet of Things (IoT) devices to communicate with users, and turn text messages into voice format. It indicates that the tech giant is increasingly interested in providing businesses with solutions powered by Google’s artificial intelligence.

Cloud Speech-to-Text – formerly known as the Cloud Speech API – was first made public in 2016 and has been available for about three years. During this time, API usage has more than doubled every six months, according to Google.

Updates to Cloud Speech-to-Text include speech recognition models tailored to specific use cases, including phone call transcription and audio-to-video transcription. Customers can choose the model that best suits their business needs.

The update also includes one of the industry’s first login programs, “enhanced phone_call.” It uses customer data to improve the system and has 54% fewer errors than the main “phone_call” model.

Google has also published a video model that has been optimized to handle audio from video and/or multi-speaker audio. This model uses a machine learning algorithm similar to that used in YouTube subtitles and reduces errors by 64% compared to the regular model.

Cloud Speech-to-Text also includes automatic punctuation in language transcriptions, thanks to the new LSTM neural network. The model can automatically suggest commas, question marks, and dashes in text. It can be helpful for conference call transcriptions and voice recording.


Users can also add metadata to video transcription and provide feedback to the Google Cloud Platform team to improve the product. For example, you can describe the recorded audio or add tags like “shopping app voice commands” or “basketball sports TV shows,” Google Cloud aggregates this information from all Cloud Speech-to-Text users and uses technologies to improve the experience in the following projects.

According to Dan Aharoni, Google Cloud AI Product Manager:

“Access to quality speech transcription technology opens up a world of opportunity for companies that want to communicate and learn from their users. With this Cloud Speech-to-Text update, you get access to the latest developments from our team of machine learning experts through a simple REST API.”

The API cost is $0.006 for 15 seconds of audio for all but the video model, which costs $0.012 for 15 seconds.

For Cloudfresh customers, special prices for the service are available with payment by bank transfer in hryvnia, US dollars, or Euros. The company will also provide all the necessary accounting and legal documents. And the cherry on the cake will be assisting with the use, configuration, and further technical support of Google Cloud Platform services.