I Tried Sesame AI’s Voice Companion, and It Was Like Talking to a Real Person

OpenAI, the leading AI lab, launched the ChatGPT Advanced Voice Mode last year, but it failed to impress me. By the time it was released, OpenAI had watered down its capabilities, and the Voice Mode refused to produce human-like expressions. Google’s Gemini Live, on the other hand, relied on a TTS engine to generate spoken words, delivering a robotic experience, at best.

Enter Sesame, an AI startup founded by Oculus co-founder Brendan Iribe and Ankit Kumar, that has taken the AI industry by storm. Sesame’s “Maya” (female) and “Miles” (male) voice companions are so natural and engaging, that for the first time I feel AI has genuinely blurred the line between machine and human interaction.

Sesame avoids calling them voice assistants and instead refers to them as a “conversationalist” and a “voice companion,” which is an apt description. I’m not going to waste your time any longer and will take you straight to my interaction with Sesame’s Maya voice companion.

My Engaging Interaction with Sesame’s Maya

As you can hear, Maya starts with a natural tone and pauses to listen to what you are saying. There are micro-pauses in between and shifts in tonality, which is missing in existing voice assistants. It can laugh, change pace, emphasize, give expressive cues, and even detect your mood from your voice. In one interaction, I suddenly laughed to test the AI voice companion, and it told me, “Why are you giggling?

What I find interesting is that Sesame’s voice companion gives you some space to think and reflect. This makes conversations feel much more natural. To give you another example, when Sesame’s Maya speaks, there are subtle hesitations, which feel like it is thinking before responding, just like humans. It almost feels like the conversation is organic and the voice model is not simply reading programmed responses.

Note that while the voice interaction feels full-duplex — where both participants can talk and listen at the same time — Sesame says it’s not actually full-duplex as it processes the speech after you are done talking. Humans, on the other hand, can process the information while the other person is still speaking.

Nevertheless, in its current form, Sesame’s voice companion truly feels human-like. It has finally broken the uncanny valley in AI speech, something OpenAI demoed with ChatGPT Advanced Voice Mode early on. What I can say is that it’s designed to not just talk, but also engage the user with a nuanced tone, pitch, and contextual awareness, which adds depth to the conversation.

What is the Tech Behind Sesame’s Voice Companion?

First, Sesame is still working on its voice companions, and this is an early research demo. The team is backed by Andreessen Horowitz through the a16z VC firm. Now, coming to the underlying technology that makes everything tick, Sesame has developed a Conversational Speech Model (CSM), which is a Transformer-based multimodal model for speech generation.

talking to maya sesame ai voice companion

The company has trained three models with small decoders: Tiny (1B parameters), Small (3B), and Medium (8B). They are trained on close to 1 million hours of mostly English audio, so conversations are currently limited to the English language with some multilingual capability.

The goal of the company is to develop a full-duplex model with long-term memory and adaptive personality. Sesame is working on a lightweight eyeglass wearable that you can use to talk to the voice companion all day, which reminds me of the movie ‘Her’. It can also see the world around you, hinting at the addition of vision capability in the coming months.

So, if you are impressed with Sesame’s voice companion, click on the link below and interact with either Maya or Miles for free. It is recommended to use Google Chrome for the best experience.

Comments 0
Leave a Reply

Loading comments...