Anthropic Releases Claude 3.7 Sonnet Unified Reasoning Model Before OpenAI

claude ai app ranks on top on US app store

Image Credit: Anthropic

In Short

Anthropic has announced a new Claude 3.7 Sonnet unified model that can generate quick responses and reflect deeply on complex queries.
Claude 3.7 Sonnet with "Normal" Thinking mode is available to all users. However, "Extended" Thinking is available to Pro subscribers only.
In coding tasks, Claude 3.7 Sonnet delivers state-of-the-art performance, surpassing OpenAI o3-mini-high, o1, DeepSeek R1, and Grok 3.

Click Here to Add Beebom as a Trusted Source

Anthropic has finally released its reasoning model and surprisingly, it’s not a separate model. The new Claude 3.7 Sonnet model is the first “hybrid reasoning model” that is both an LLM and a reasoning model — unified into one. OpenAI recently said that GPT-5 is going to be a unified model, but before that, Anthropic has introduced Claude 3.7 Sonnet which is capable of both quick responses and deeper reasoning.

Anthropic says in its blog, “We’ve developed Claude 3.7 Sonnet with a different philosophy from other reasoning models on the market. Just as humans use a single brain for both quick responses and deep reflection, we believe reasoning should be an integrated capability of frontier models rather than a separate model entirely.“

The new Claude 3.7 Sonnet has two Thinking modes: Normal and Extended. “Normal” is the default Thinking mode and is available to free users as well. The “Extended” Thinking mode is only available to Pro subscribers. In this mode, it self-reflects before giving the final answer.

Also Read: Grok 3 Hands-On: xAI Emerges as a Formidable Challenger to OpenAI

Anthropic has also decided to show the thought process in raw form, unlike OpenAI and xAI. You will see dramatic performance improvements in math, physics, coding, instruction-following, and more.

claude 3.7 sonnet on swe bench verified benchmark — Image Credit: Anthropic

Now, coming to benchmarks, Claude 3.7 Sonnet has achieved the best score of 62.3% on the SWE-bench verified — a benchmark that evaluates the ability to solve real-world software issues. In this test, OpenAI’s o3-mini-high scores 49.3%, o1 gets 48.9%, and DeepSeek R1 achieves 49.2%.

claude 3.7 sonnet benchmarks — Image Credit: Anthropic

Due to improvements in reasoning, Claude 3.7 Sonnet has also become much better at agentic use cases. In TAU-bench (retail), 3.7 Sonnet scores 81.2%, higher than OpenAI o1’s 73.5%. With Extended Thinking mode, the new Sonnet achieves 78.2% in GPQA Diamond, and 96.2% in MATH 500. Basically, in nearly all benchmarks, Claude 3.7 Sonnet matches or delivers better performance than OpenAI o3-mini, o1, Grok 3, and DeepSeek R1.

Apart from that, Anthropic also announced a new command-line tool called “Claude Code”. It’s an agentic coding tool that you can use from the Terminal. You can use it to search and read code, edit files, run tests, commit and push code to GitHub. Claude Code is currently in preview and you can apply here to get early access.

What is Claude Code Interpreter and How to Use It

Arjun Sha Oct 29, 2024

How to Run DeepSeek R1 Locally on Windows, macOS, Android & iPhone

Arjun Sha Feb 11, 2025

ChatGPT o1 vs DeepSeek R1: Battle of Frontier AI Models

Arjun Sha Jan 28, 2025

Gemini 2.0 Flash Thinking vs ChatGPT o1: OpenAI Thinks Deeper

Arjun Sha Dec 20, 2024