- Anthropic has launched a new model called Claude 3.5 Sonnet. It's 2x faster than the largest Claude 3 Opus model and offers better intelligence.
- Claude 3.5 Sonnet beats OpenAI's GPT-4o and Google's Gemini 1.5 Pro model in various benchmarks.
- The model is also very good at visual tasks and can read illegible handwriting perfectly.
Just after releasing the Claude 3 models three months back, Anthropic has now introduced a much improved Claude 3.5 Sonnet model. It’s not the largest model from Anthropic’s lab, yet it beats ChatGPT 4o and Gemini 1.5 Pro, at least in several benchmarks. Claude 3.5 Sonnet is a mid-tier model and it brings 2x faster speed than the largest Claude 3 Opus model.
Anthropic has kept the API price the same for the Sonnet 3.5 model with a context window of 200K tokens. For general users, it’s available for free on claude.ai (visit) and supports both image and document uploads. Keep in mind that there is a rate limit for free users.
Coming to benchmarks, Claude 3.5 Sonnet beats GPT-4o in nearly all benchmarks except MMLU and MATH, but the difference is very marginal. In HumanEval that tests coding abilities, Claude 3.5 Sonnet scores 92% whereas GPT-4o scores 90.2%. In GPQA Diamond which evaluates graduate-level reasoning, the new Sonnet model achieves a score of 59.4% whereas GPT-4o stands at 53.6%.
With 0-shot prompting in the MMLU test, Claude 3.5 Sonnet gets 88.3% and OpenAI’s GPT-4o model gets 88.7%. From the table, you can infer that Anthropic has developed a highly capable model that outranks both GPT-4o and Gemini 1.5 Pro models.
Next, Claude 3.5 Sonnet is also a powerful vision model and again does better than GPT-4o in various visual reasoning tests. It’s very good at understanding and transcribing texts from illegible images. It’s also excellent at interpreting charts, graphs, and illustrations.
Moreover, Anthropic has announced a new Artifacts tool for Claude which works like OpenAI’s Code Interpreter tool. The Artifacts tool generates the code and creates AI-generated content in a separate interface. It’s not just limited to Python as it can work with other programming languages as well. For example, I created an SVG image of the Taj Mahal with the Artifacts tool on Claude Chat.
Anthropic says Claude 3.5 Haiku and Claude 3.5 Opus are coming later this year. Overall, I am very impressed with Claude 3.5 Sonnet’s speed and intelligence. It seems I can finally replace ChatGPT 4o with Anthropic’s new model for my everyday tasks.