Elon Musk's Grok-2 Beta Launched; Outperforms ChatGPT, Claude, and Gemini

Image Courtesy: xAI

In Short

xAI has released an early preview of Grok-2 and it scores a whopping 87.5% on the MMLU benchmark.
On the LMSYS leaderboard, Grok-2 beats ChatGPT, Gemini, and Claude. xAI has also launched a smaller Grok-2 mini model.
Users can start using the new model on x.com, but it will require an X Premium subscription.

Elon Musk’s AI venture, xAI has released an early preview of the Grok 2 model, and it has surprisingly outperformed Claude, Gemini, and even ChatGPT as well. The earlier Grok-1.5 model was not received well, but Grok-2 has delivered great performance on the LMSYS leaderboard. xAI has released two new models: Grok-2 and a smaller Grok-2 mini model.

xAI says Grok-2 has been significantly improved in key areas including reasoning, instruction following, and providing accurate and factual information. In traditional AI benchmarks, Grok-2 has scored a whopping 87.5% in MMLU and 88.4% in HumanEval. This is particularly interesting because the MMLU score has been derived using 0-shot CoT.

Elon Musk’s Grok-2 Beta Launched; Outperforms ChatGPT, Claude, and Gemini — Image Courtesy: xAI

Grok-2 was tested on LMSYS under the name “sus-column-r”. With around 12,000 votes, it stands at the third position, just below ChatGPT-4o-latest, Gemini-1.5-Pro-Experimental, and GPT-40-2024-05-13. However, it performs better than GPT-4o-mini, Claude 3.5 Sonnet, Gemini 1.5 Pro, and Llama 3.1 405B.

How to Use Gemini Live on Any Android Phone Right Now

Arjun Sha Aug 22, 2024

Project Strawberry Explained: Is ChatGPT Getting a Huge Upgrade?

Arjun Sha Aug 14, 2024

In coding and math-related tasks, Grok-2 takes the 2nd spot, and in hard prompts, it takes the 4th position. xAI says that the Grok-2 multimodal model will be released soon. The company has not revealed the parameter size for both models. You can start using the new Grok-2 model on x.com and developers can get started with the API as well.

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Comments 0

Elon Musk’s Grok-2 Beta Launched; Outperforms ChatGPT, Claude, and Gemini

Arjun Sha