
- Google has announced a Gemini 2.5 Pro Deep Think mode that uses new research techniques to consider multiple hypotheses before responding.
- The new Deep Think mode improves the model's performance significantly across major benchmarks.
- In addition, the new Gemini 2.5 Flash AI model is cheaper and more intelligent, taking the second spot on the LMArena benchmark, just after Gemini 2.5 Pro.
On Tuesday, at the Google I/O 2025 event, the search giant announced Gemini 2.5 Pro Deep Think and a much improved version of the Gemini 2.5 Flash AI model. First, the Gemini 2.5 Pro Deep Think mode is an enhanced reasoning mode that “uses new research techniques enabling the model to consider multiple hypotheses before responding.”
In the USAMO 2025 test, the Deep Think mode gets 49.4%, more than the standard Gemini 2.5 Pro model, which achieves 34.5%. Again, in LiveCodeBench and MMMU, the Gemini 2.5 Pro Deep Think mode outperforms its own model and OpenAI’s o3 model. It’s currently being tested with trusted partners and will be available broadly to users in the future.

Apart from that, the new Gemini 2.5 Flash model is much more intelligent. It’s a smaller and cheaper model than the flagship Gemini 2.5 Pro model, however, its performance is impressive. On the LMArena leaderboard, the new Gemini 2.5 Flash model ranks just below Gemini 2.5 Pro with an ELO score of 1424. Gemini 2.5 Pro scores 1446.

The new Gemini 2.5 Flash model will be generally available in early June. However, you can try the preview version right away in the Gemini app, Google AI Studio, and Vertex AI. As for developers, the new Gemini 2.5 Flash model brings improved capabilities, transparency via thought summaries, and better cost-efficiency. On top of that, you can set a thinking budget for the model.
Moreover, the model supports native audio output and can switch between different voices. The audio capability is going live on the Gemini API, starting today. Google says the Gemini 2.5 Flash model is 22% more efficient than before, and it reduces the token consumption significantly.