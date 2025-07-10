Elon Musk’s AI firm, xAI, released its frontier Grok 4 AI models with record-breaking benchmark numbers. There are two new AI models — Grok 4 and Grok 4 Heavy — and both are reasoning AI models. Along with the new models, xAI announced a new subscription plan called SuperGrok Heavy, which costs $300 per month and offers access to the Grok 4 Heavy model.

Talking about benchmarks, Grok 4 outperforms all leading AI models from OpenAI, Google, and Anthropic. In GPQA, Grok 4 scored 87.5% and Grok 4 Heavy achieved 88.9%. In the AIME 2025 test, Grok 4 Heavy got a full 100% accuracy.

Image Credit: xAI via X

And in the challenging Humanity’s Last Exam benchmark, Grok 4 Heavy achieved 44.4% and Grok 4 got 38.6%, with tool support. In this test, Gemini 2.5 Pro scored 26.9% and OpenAI’s o3 scored 24.9% with tools. It shows that Grok 4 is currently the state-of-the-art reasoning AI model.

Most notably, in the newly launched ARC-AGI-2 benchmark, Grok 4 achieved a record-breaking 15.9%, which is the highest score to date. It scored double that of Claude Opus 4 and OpenAI o3. This makes Grok 4 the frontier AI model, among all the AI models released by any AI lab so far. And in the older ARC-AGI-1 benchmark, Grok 4 achieved 66.7%, again higher than the publicly available OpenAI o3-pro and o4-mini.

xAI says Grok 4 Heavy is the largest AI model by the company, and it can work with multiple agents to solve a problem in parallel. Musk also said that an AI coding model will be released in August, a multi-modal agent is planned for September, and we may finally see a video generation model in October.

Overall, xAI has again proved that it’s one of the prominent AI labs training foundational AI models and stands to challenge all major AI players around the world.