
- Elon Musk-led xAI finally unveiled its powerful Grok 3 AI model. The company also announced Grok 3 Reasoning models.
- Grok 3 outperforms GPT-4o, Claude 3.5 Sonnet, and Gemini 2.0 Pro on AIME 2024, GPQA Science, LiveCodeBench, and Chatbot Arena.
- And the Grok 3 Reasoning model delivers even stronger performance, outranking OpenAI's o3-mini and DeepSeek R1 models.
Elon Musk-led xAI finally released its frontier Grok 3 AI model after a few months of delay. Musk claims Grok 3 is the “smartest AI on Earth” and that it outperforms ChatGPT on several benchmarks. After looking at the benchmarks, it surely seems Grok 3 is the most powerful AI model out there.
Starting with the training, Grok 3 has been trained on a massive cluster of 200K GPUs, which uses almost 10x more compute than Grok 2. As for benchmarks, the Grok 3 traditional language model beats GPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Pro, and DeepSeek V3. In AIME 2024, Grok 3 scores 52%; in GPQA Science, Grok 3 achieves 75%; and in LiveCodeBench, Grok 3 gets 57%.

In fact, the smaller Grok 3 mini model matches or outranks other state-of-the-art models. xAI was also testing the Grok 3 model on LMSYS Chatbot Arena under the name of “chocolate”, and it has become the first AI model to cross the 1,400 Elo score mark. Grok 3 is now the number one chatbot on Chatbot Arena in all categories, be it creative writing, coding, math, hard prompts, or instruction following.

Now, coming to the Grok 3 reasoning model, well, again it decimates the competition. Grok 3 Reasoning model consistently outmatches OpenAI’s o3-mini-high and the full o1, DeepSeek R1, and Gemini 2.0 Flash Thinking. Even on the latest AIME 2025 question set, the Grok 3 Reasoning model does much better than competing reasoning models. What I find interesting is that the Grok 3 mini Reasoning model is also very capable for its size.
Next, Elon Musk announced a new DeepSearch agent that goes to the web and finds sources to compile information accurately. The agent uses the Grok 3 Reasoning model. It’s similar to OpenAI’s Deep Research agent but takes much less time to browse the web, do the thinking, and come up with an answer.
After that, the “Think” button uses the Grok 3 mini Reasoning model. And the “Big Brain” button uses more compute and thinking time to solve complex problems. It uses the bigger Grok 3 Reasoning model. Elon Musk says Grok 3 will be available to X’s Premium+ subscribers, starting today. And if you want to use the newly-launched features, you can subscribe to SuperGrok which costs $30 a month.