Gemini 1.5 Flash is an Underrated Gem You Need to Try Right Now: Here’s How

In Short
  • Gemini 1.5 Flash was overshadowed by other announcements at Google I/O 2024, but it packs serious capabilities.
  • It's the fastest AI model for inferencing and brings multimodality and a large context window of 1 million tokens.
  • Gemini 1.5 Flash is also very cheap to run. In terms of pricing, it outperforms all small and large models.

At the I/O 2024, Google announced several new AI models, upcoming projects, and a plethora of AI features coming across its products. However, what caught my attention was the Gemini 1.5 Flash model. It’s an impressively fast and efficient model that brings multimodal capability and a context window of up to 1 million tokens (2M via waitlist).

Despite the small size of Gemini 1.5 Flash — Google has not disclosed its parameter size — it achieves great scores across all modalities — text, vision, and audio. In the Gemini 1.5 technical report, Google disclosed that Gemini 1.5 Flash outperforms much larger models like 1.0 Ultra and 1.0 Pro in many aspects. Only in speech recognition and translation, it lags behind the larger models.

gemini 1.5 flash performance benchmark
Image Courtesy: Google

Unlike Gemini 1.5 Pro which is a sparse MoE model (Mixture of Experts), Gemini 1.5 Flash is a dense model, online distilled from the larger 1.5 Pro model for improved quality. In terms of speed as well, the Flash model outperforms all smaller models out there including Claude 3 Haiku, running on Google’s custom TPU.

gemini 1.5 flash speed performance
Image Courtesy: Google

And its pricing is unbelievably low. Gemini 1.5 Flash costs $0.35 for input and $0.53 for output to process 128K tokens. $0.70 and $1.05 for 1 million tokens. It’s much cheaper than Llama 3 70B, Mistral Medium, GPT-3.5 Turbo, and of course, larger models.

If you are a developer and need multimodal reasoning with a larger context window for cheap, you should definitely check out the Flash model. Here is how you can try Gemini 1.5 Flash for free.

How to Use Gemini 1.5 Flash For Free

  • Head to (visit) and sign in with your Google account. There is no waitlist to use the Flash model.
  • Next, choose the “Gemini 1.5 Flash” model in the drop-down menu.
use gemini 1.5 flash in google ai studio
  • Now, you can start chatting with the Flash model. You can also upload images, videos, audio clips, files, and folders.
chatting with gemini 1.5 flash

First Impression of Gemini 1.5 Flash

While Gemini 1.5 Flash is not a state-of-the-art model, its advantage is breakneck speed, efficiency. and low cost. It ranks below Gemini 1.5 Pro in terms of capabilities and other larger models from OpenAI and Anthropic. Nevertheless, I tried some of the reasoning prompts that I used to compare ChatGPT 4o and Gemini 1.5 Pro.

testing gemini 1.5 flash

It could only generate one correct response out of five questions. It might not be very smart at commonsense reasoning, but for other applications that require multimodal capability and a large context window, it might fit your use case. Also, Gemini models are very good at creative tasks which can be of value to developers and users.

Simply put, there is no AI model out there that is fast, efficient, offers multimodality, and has a large context window with near-perfect recall. On top of that, it’s insanely cheap to run. So what is your opinion on Google’s latest Flash model? Let us know in the comments below.

comment Comments 0
Leave a Reply