Elon Musk's Grok 4 AI Models Set New Benchmark Records

Image Credit: xAI via X

In Short

Elon Musk's xAI company has launched two new AI models called Grok 4 and Grok 4 Heavy.
New Grok 4 models record groundbreaking benchmark results, outranking OpenAI o3, Gemini 2.5 Pro, and Claude Opus 4.
Grok 4 achieved 15.9% on the novel ARC-AGI-2 benchmark, becoming the state-of-the-art reasoning AI model.

Elon Musk’s AI firm, xAI, released its frontier Grok 4 AI models with record-breaking benchmark numbers. There are two new AI models — Grok 4 and Grok 4 Heavy — and both are reasoning AI models. Along with the new models, xAI announced a new subscription plan called SuperGrok Heavy, which costs $300 per month and offers access to the Grok 4 Heavy model.

Talking about benchmarks, Grok 4 outperforms all leading AI models from OpenAI, Google, and Anthropic. In GPQA, Grok 4 scored 87.5% and Grok 4 Heavy achieved 88.9%. In the AIME 2025 test, Grok 4 Heavy got a full 100% accuracy.

grok 4 benchmarks — Image Credit: xAI via X

And in the challenging Humanity’s Last Exam benchmark, Grok 4 Heavy achieved 44.4% and Grok 4 got 38.6%, with tool support. In this test, Gemini 2.5 Pro scored 26.9% and OpenAI’s o3 scored 24.9% with tools. It shows that Grok 4 is currently the state-of-the-art reasoning AI model.

Most notably, in the newly launched ARC-AGI-2 benchmark, Grok 4 achieved a record-breaking 15.9%, which is the highest score to date. It scored double that of Claude Opus 4 and OpenAI o3. This makes Grok 4 the frontier AI model, among all the AI models released by any AI lab so far. And in the older ARC-AGI-1 benchmark, Grok 4 achieved 66.7%, again higher than the publicly available OpenAI o3-pro and o4-mini.

xAI says Grok 4 Heavy is the largest AI model by the company, and it can work with multiple agents to solve a problem in parallel. Musk also said that an AI coding model will be released in August, a multi-modal agent is planned for September, and we may finally see a video generation model in October.

Overall, xAI has again proved that it’s one of the prominent AI labs training foundational AI models and stands to challenge all major AI players around the world.

X is All Set to Allow AI Chatbots to Write Community Notes

Anshuman Jain Jul 2, 2025

xAI Responds to Grok’s “White Genocide” Posts, Blames Unauthorized Modification

Anshuman Jain May 16, 2025

Elon Musk’s Grok AI Can See the World and Talk in Real-Time

Arjun Sha Apr 23, 2025

xAI’s Grok Adds a Memory Feature That Can Remember Your Conversations

Anshuman Jain Apr 17, 2025

#Tags

#AI #featured

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Comments 0

Elon Musk’s Grok 4 AI Models Set New Benchmark Records