Gemini Nano to Get Multimodal Capabilities; Coming to Pixel Later This Year

Google has announced that it is bringing multimodal capabilities to Google Gemini. With multimodal capabilities, Google Gemini will be able to understand contextual information not only from text input but also from audio, images, and spoken language.
Google Gemini Nano Multimodal

Google’s ongoing I/O 2024 event has become the hotbed of AI. Along with several updates like an AI video generator dubbed Veo to rival OpenAI’s Sora, Gemini Flash 1.5, Google has also announced that it is bringing multimodal capabilities to Gemini Nano, its on-device LLM model. It means that Gemini Nano will be able to input audio, images, and files in addition to textual inputs.

For those who are unaware, Gemini Nano is a lightweight and small LLM model that can perform on-device AI tasks. Google announced the Gemini Nano in December last year along with Gemini Ultra and Gemini Pro. As of now, Gemini Nano is only available on the Google Pixel 8 series and Samsung Galaxy S24. However, in its current state, Gemini Nano takes inputs only in the text format.

With multimodal capabilities, Gemini Nano will be able to get contextual information and also get inputs from sounds, images, and spoken language. As for the availability, Google says that it will roll out multimodal capabilities to Gemini Nano starting with Pixel later this year.

Comments 0
Leave a Reply