Gemini 1.5 Pro Now Listens to Audio and Is Available to All

In Short
  • Google has made the Gemini 1.5 Pro model generally available to all users, without any waiting list.
  • You can access the powerful model without any cost until it's in public preview. The model is accessible through Google AI Studio.
  • Gemini 1.5 Pro also supports audio files now, in addition to videos and images.

At the Google Cloud Next 2024 event in Las Vegas, Google announced that it’s going to make Gemini 1.5 Pro generally available to all users. The highly-anticipated model is finally in public preview with a 1 million context window, and you no longer have to sign up for the waitlist to access the Gemini 1.5 Pro model.

I tried to access the Gemini 1.5 Pro model from a new Google account and the model was readily available without any wait. And all this is available for free.

google ai studio with the gemini 1.5 pro model

That said, it does not mean you can start using the Gemini 1.5 Pro model on the Gemini portal. You will have to head to aistudio.google.com (visit) to access the model currently. After a few months of public preview, the model will be made available on the Gemini portal. You will likely need a Gemini Advanced subscription to use the model.

Keep in mind that the Gemini 1.5 Pro model is a mid-tier model built on the MoE architecture, however, it beats the largest Gemini 1.0 Ultra model easily. And in our comparison with the GPT-4 model, Gemini 1.5 Pro showed remarkable capabilities in several tests. When Gemini 1.5 Pro debuts on the Gemini portal, expect it to perform better than GPT-4 and Claude 3’s Opus model.

Apart from that, Gemini 1.5 Pro can now process audio files too. You can upload audio files of meetings or videos, and the model can listen to the uploaded files without the need to manually generate a transcript. It can be of immense help to people who want to find quick and structured information from audio meetings or discussions.

Gemini 1.5 Pro could already process videos and images, and now audio files are supported too which makes it a powerful multimodal model with a context length of 1 million tokens. We tested the audio processing capability of the Gemini 1.5 Pro model. Here is how it went.

How to Process Audio Files on Gemini 1.5 Pro

  • Head over to aistudio.google.com (visit) in a browser.
  • Next, make sure the “Gemini 1.5 Pro” model is selected in the drop-down menu.
  • After that, click on the “Audio” menu in the top row and upload your audio file. It supports these audio file formats: FLAC, MIDI, MP3, M4A, OPUS, OGG, OGA, WAV, and MID.
  • It will process the audio file and consume tokens.
  • Now, start asking your questions, and Gemini 1.5 Pro will find the information from the audio and respond accordingly.
  • The best part is that it generates the transcript in a structured format with labels of different speakers. And it doesn’t hallucinate at all.

So this is how you can upload and process audio files on Gemini 1.5 Pro. It’s really a powerful model from the Google DeepMind team and I am excited that it’s now available to the public at large without any cost. Go ahead and try it and let us know your thoughts in the comment section below.

VIA The Verge
Comments 0
Leave a Reply