OpenAI’s Voice Engine Can Clone Human Voices From a 15-Second Sample

openai voice engine model for voice cloning
Image Courtesy: rafapress /
In Short
  • OpenAI developed the Voice Engine model in 2022. It can clone voices with a single 15-second audio sample in multiple languages.
  • The company has not released the model to the public now as there are serious risks associated with voice cloning.
  • OpenAI encourages society to adapt to a new world reality and understand the capabilities of AI.

OpenAI has state-of-the-art models for text and image generation, and most recently, it also introduced Sora, an incredible text-to-video model. Now, the company has announced a Voice Engine model that can generate speeches with a single 15-second audio sample. It’s essentially a text-to-audio model where you feed a 15-second audio to train the model and input your text to generate natural-sounding speech.

OpenAI says that even though the model is small, Voice Engine can generate realistic and emotive voices, very close to the original speaker. According to the company, the model was created in late 2022 and has been powering the ChatGPT Voice Chat feature.

OpenAI acknowledges the “serious risks” associated with the technology and the “potential for synthetic voice misuse“. So the company is not releasing the model to the public at this time, instead, it’s previewing the model to start a discourse around voice synthesis and how the society can adapt to these new capabilities.

As for the model, it can translate realistic audio in different languages with a nuanced accent. HeyGen, a popular AI video and audio generation platform, has been using OpenAI’s Voice Engine to create custom voices. In this space, ElevenLabs has built its own speech synthesis model that can clone voice and generate speeches in multilingual languages.

While the technology is quite powerful, it can be deceptive and may imperil users in various situations. OpenAI admits that voice-based authentication is used for accessing bank accounts and other sensitive information. The company hopes that such authentication systems are phased out. Apart from that, social media is filled with people cloning popular voices to upsell their products.

In India, particularly, AI voice cloning scams are on the rise. Cybercriminals are cloning kids’ voices to threaten parents and extort money. In such a scenario, OpenAI is not well-positioned to release the model widely. As we move towards the AI era, more caution and resilience are needed from society at large.

What do you think about OpenAI’s voice cloning engine? Should the company release the model to the public? Let us know your thoughts in the comments below.

comment Comments 0
Leave a Reply