
- Anthropic has announced a new Claude 3.7 Sonnet unified model that can generate quick responses and reflect deeply on complex queries.
- Claude 3.7 Sonnet with "Normal" Thinking mode is available to all users. However, "Extended" Thinking is available to Pro subscribers only.
- In coding tasks, Claude 3.7 Sonnet delivers state-of-the-art performance, surpassing OpenAI o3-mini-high, o1, DeepSeek R1, and Grok 3.
Anthropic has finally released its reasoning model and surprisingly, it’s not a separate model. The new Claude 3.7 Sonnet model is the first “hybrid reasoning model” that is both an LLM and a reasoning model — unified into one. OpenAI recently said that GPT-5 is going to be a unified model, but before that, Anthropic has introduced Claude 3.7 Sonnet which is capable of both quick responses and deeper reasoning.
Anthropic says in its blog, “We’ve developed Claude 3.7 Sonnet with a different philosophy from other reasoning models on the market. Just as humans use a single brain for both quick responses and deep reflection, we believe reasoning should be an integrated capability of frontier models rather than a separate model entirely.“
The new Claude 3.7 Sonnet has two Thinking modes: Normal and Extended. “Normal” is the default Thinking mode and is available to free users as well. The “Extended” Thinking mode is only available to Pro subscribers. In this mode, it self-reflects before giving the final answer.
Anthropic has also decided to show the thought process in raw form, unlike OpenAI and xAI. You will see dramatic performance improvements in math, physics, coding, instruction-following, and more.

Now, coming to benchmarks, Claude 3.7 Sonnet has achieved the best score of 62.3% on the SWE-bench verified — a benchmark that evaluates the ability to solve real-world software issues. In this test, OpenAI’s o3-mini-high scores 49.3%, o1 gets 48.9%, and DeepSeek R1 achieves 49.2%.

Due to improvements in reasoning, Claude 3.7 Sonnet has also become much better at agentic use cases. In TAU-bench (retail), 3.7 Sonnet scores 81.2%, higher than OpenAI o1’s 73.5%. With Extended Thinking mode, the new Sonnet achieves 78.2% in GPQA Diamond, and 96.2% in MATH 500. Basically, in nearly all benchmarks, Claude 3.7 Sonnet matches or delivers better performance than OpenAI o3-mini, o1, Grok 3, and DeepSeek R1.
Apart from that, Anthropic also announced a new command-line tool called “Claude Code”. It’s an agentic coding tool that you can use from the Terminal. You can use it to search and read code, edit files, run tests, commit and push code to GitHub. Claude Code is currently in preview and you can apply here to get early access.