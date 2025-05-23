Home > News > Anthropic’s Claude Opus 4 and Sonnet 4 Set a New Benchmark in AI Coding

Anthropic’s Claude Opus 4 and Sonnet 4 Set a New Benchmark in AI Coding

Arjun Sha
anthropic launches claude opus 4 and claude sonnet 4 ai models
Image Credit: Anthropic
In Short
  • Anthropic has dropped two new AI models under the Claude 4 series -- Claude Opus 4 and Claude Sonnet 4.
  • Anthropic says that Claude Opus 4 is the "world's best coding model," outperforming OpenAI Codex-1 and Gemini 2.5 Pro on SWE-bench.
  • Claude 4 models are rolling out to all paid plans, and free users can access the Claude Sonnet 4 model without extended thinking mode.

On Thursday, Anthropic launched two new AI models under the Claude 4 series — Claude Opus 4 and Claude Sonnet 4. Anthropic says Claude Opus 4 is the “world’s best coding model” and it offers sustained performance on long-horizon, agentic workflows. And Claude Sonnet 4 brings superior coding and reasoning performance than Claude Sonnet 3.7.

First, let’s talk about the Claude Opus 4 AI model. On the SWE-bench verified benchmark which measures performance on real software engineering tasks, Claude Opus 4 achieves 72.5%, slightly higher than OpenAI’s best coding model, Codex-1 which got 72.1%. However, with parallel test-time compute, which appears similar to the Deep Think mode in Gemini 2.5 Pro, Opus 4 achieved a groundbreaking 79.4%.

What is interesting is that the Claude Sonnet 4 model achieves 72.7% on SWE-bench, and with parallel test-time compute, gets 80.2% accuracy — delivering better coding performance than the larger Opus 4 model.

claude opus 4 and claude sonnet 4 performance on swe-bench
Image Credit: Anthropic

Anthropic says the Claude Sonnet 4 model “balances performance and efficiency for internal and external use cases, with enhanced steerability for greater control over implementations. While not matching Opus 4 in most domains, it delivers an optimal mix of capability and practicality.

Claude Opus 4 excels in complex, long-running tasks and agentic workflows, while Claude Sonnet 4 combines strong coding performance and efficiency. Both models are hybrid reasoning models, meaning they can offer near-instant responses and extended thinking for deeper reasoning.

Anthropic also notes that when given access to local files, Claude Opus 4 maintains key information in a memory file. For example, while playing Pokémon, Claude Opus 4 created a navigation guide file to improve its gameplay.

Finally, in terms of safety, the company, for the first time, has activated AI Safety Level 3 (ASL-3) for the Claude Opus 4 model, in line with Anthropic’s Responsible Scaling Policy (RSP). Anthropic has implemented Constitutional Classifiers and other defenses to prevent jailbreaking techniques.

Claude 4 models are rolling out to all paid users under Pro, Max, Team, and Enterprise plans. And thankfully, Claude Sonnet 4 is available to free users as well, but without extended thinking.

