- After months of delay, OpenAI is finally rolling out ChatGPT Advanced Voice to Plus and Team users. The rollout will be complete by the end of this week.
- The new Advanced Voice feature is powered by the GPT-4o multimodal model, bringing free-flowing conversation with interruption support.
- Support for camera input is not here. OpenAI will be adding support for more modalities in the future.
ChatGPT Advanced Voice was demoed months ago during the GPT-4o launch, however, OpenAI kept delaying the release due to some safety issues. Then a controversy erupted over the ‘Sky’ voice which sounded strikingly similar to Scarlett Johansson’s voice. Finally, five months later, OpenAI is now rolling out Advanced Voice to all ChatGPT Plus and Team users. OpenAI says the rollout will be completed this week.
In case you are unaware, Advanced Voice is a huge upgrade over the standard voice chat that is available to free ChatGPT users. Advanced Voice uses the multimodal capability of the GPT-4o model to deliver a natural free-flowing conversation with support for interruptions.
Advanced Voice in ChatGPT might sound similar to Google’s Gemini Live, but there is a key distinction. Gemini Live uses TTS/STT (text-to-speech) engines in between to extract responses from an LLM and reply back, but ChatGPT Advanced Voice supports audio input/output directly. Gemini Live also supports interruptions but doesn’t offer a truly multimodal experience.
Nevertheless, in my brief testing, ChatGPT Advanced Voice seems to have lost many multimodal features. During the demo, OpenAI showcased that it can sing for you, identify your mood/emotion by your speech, detect different sounds, do accents, and a lot more. However, currently, Advanced Voice says that it can’t identify the speech. Also, camera input is not supported yet.
It seems OpenAI has removed some of the features to avoid embarrassing conversations with ChatGPT. Nevertheless, are you excited to use ChatGPT Advanced Voice? Let us know in the comments below.