
- Google's latest Gemini 2.5 Flash AI model is faster, cheaper, and smarter. It's currently in preview.
- Google says it's the first fully hybrid reasoning model. Developers can turn on or off thinking and set thinking budgets as well.
- Gemini 2.5 Flash is rolling out on the Gemini app, AI Studio, and Vertex AI.
Google has released a cost-efficient Gemini 2.5 Flash AI model. It’s currently in preview and available on the Gemini app for free. Developers can start using the API on AI Studio, and Vertex AI. Google says Gemini 2.5 Flash is the company’s “first fully hybrid reasoning model.”
It means that it’s both a traditional LLM and a reasoning/thinking AI model. Google says developers can turn thinking on or off, and allocate thinking budgets as well. The AI model is optimized for cost, quality, and latency, offering a balance between fast performance and quality output at a cheaper cost. While Gemini 2.5 Pro offers the best performance, Gemini 2.5 Flash is designed with affordability in mind.
Coming to benchmarks, Gemini 2.5 Flash ranks slightly below OpenAI’s latest o4-mini reasoning model. On Humanity’s Last Exam, Gemini 2.5 Flash scores 12.1% whereas o4-mini gets 14.3%. On GPQA Diamond, Gemini 2.5 Flash achieves 78.3% while o4-mini does slightly better, scoring 81.4%. Similarly, on AIME 2025, Gemini 2.5 Flash gets 78% and o4-mini achieves 92.7%.
The real value of Gemini 2.5 Flash lies in its affordable pricing. Its input/output cost is $0.15/$0.60 for non-reasoning, and $3.50 for reasoning API calls. By the way, Gemini 2.5 Flash is also compatible with new features like Canvas in the Gemini app.