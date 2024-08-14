Elon Musk’s AI venture, xAI has released an early preview of the Grok 2 model, and it has surprisingly outperformed Claude, Gemini, and even ChatGPT as well. The earlier Grok-1.5 model was not received well, but Grok-2 has delivered great performance on the LMSYS leaderboard. xAI has released two new models: Grok-2 and a smaller Grok-2 mini model.

xAI says Grok-2 has been significantly improved in key areas including reasoning, instruction following, and providing accurate and factual information. In traditional AI benchmarks, Grok-2 has scored a whopping 87.5% in MMLU and 88.4% in HumanEval. This is particularly interesting because the MMLU score has been derived using 0-shot CoT.

Image Courtesy: xAI

Grok-2 was tested on LMSYS under the name “sus-column-r”. With around 12,000 votes, it stands at the third position, just below ChatGPT-4o-latest, Gemini-1.5-Pro-Experimental, and GPT-40-2024-05-13. However, it performs better than GPT-4o-mini, Claude 3.5 Sonnet, Gemini 1.5 Pro, and Llama 3.1 405B. Woah, another exciting update from Chatbot Arena❤️‍🔥



The results for @xAI’s sus-column-r (Grok 2 early version) are now public**!



With over 12,000 community votes, sus-column-r has secured the #3 spot on the overall leaderboard, even matching GPT-4o! It excels in Coding (#2),… https://t.co/gqSWSwYN0z pic.twitter.com/j9UYDBYNt4— lmsys.org (@lmsysorg) August 14, 2024

In coding and math-related tasks, Grok-2 takes the 2nd spot, and in hard prompts, it takes the 4th position. xAI says that the Grok-2 multimodal model will be released soon. The company has not revealed the parameter size for both models. You can start using the new Grok-2 model on x.com and developers can get started with the API as well.