Google Drops Its First “Reasoning” Model to Take On OpenAI o1

testing gemini 2.0 flash thinking model
In Short
  • Google's first reasoning model is finally here. The "Gemini 2.0 Flash Thinking" model can solve complex reasoning, math, and coding problems.
  • It supports multimodal inputs such as images, videos, and audio files.
  • It uses more compute resources and time to re-evaluate its response before generating the final answer.

After OpenAI introduced its o1 reasoning model that takes some time to “think” before responding, Google has now finally released its own version of the thinking model. The new AI model is “Gemini 2.0 Flash Thinking” aka gemini-2.0-flash-thinking-exp-1219. It’s an experimental preview model, and already available on AI Studio for testing and feedback.

The Gemini 2.0 Flash Thinking model follows the new paradigm of test-time compute that OpenAI introduced in September. Basically, it allows the model to use more compute resources and time to re-evaluate its response before generating the final answer.

In early research, it’s seen that when AI models are given more time to “think” during inference, they perform far better than models trained on large parameters.

Google has released its first thinking model with the smaller Gemini 2.0 Flash model, but it’s expected that inference scaling will come to the larger Gemini 2.0 Pro model (Gemini-Exp-1206) as well.

Google says Gemini 2.0 Flash Thinking can solve complex reasoning questions and difficult math and coding problems. And unlike OpenAI o1, it shows the raw thinking process of the model which is great for transparency.

Not to mention, the new Thinking model can process multimodal inputs such as images, videos, and audio files. Finally, its knowledge cutoff date is August 2024.

I briefly tested the Gemini 2.0 Flash Thinking model on AI Studio. It failed the popular Strawberry question on the first try, but on the next run, it got the answer right and said there are three r’s in the word “Strawberry”. Next, I asked it to find Indian states that don’t have ‘a’ in their names. Again, it got the answer wrong.

I think we should wait for the larger Gemini 2.0 Pro Thinking model which should deliver strong performance, and demonstrate the power of inference scaling. Meanwhile, on the LMSYS benchmark, Gemini’s thinking model has topped the chart across all categories.

#Tags
Comments 0
Leave a Reply

Loading comments...