- The new Grok-1.5 model comes with improved reasoning and problem-solving capabilities.
- The model also has a large context length of 128K tokens, 16x larger than Grok-1.
- It will be available to existing Grok users and early testers on the X platform in the coming days.
After open-sourcing Grok-1 two weeks ago, Elon Musk’s xAI has now announced an upgraded Grok-1.5 model. The new AI startup says Grok-1.5 comes with improved reasoning capabilities and a context length of 128,000 tokens. The model is not available right away, instead, it will be available to early testers and existing Grok users on the X (formerly Twitter) platform in the coming days.
To showcase Grok-1.5’s problem-solving capability, xAI has benchmarked the model on popular tests. In the MMLU test, Grok-1.5 scored 81.3% (5-shot), higher than Mistral Large and Claude 3 Sonnet. In the MATH test, it scored 50.6% (4-shot), again beating Claude 3 Sonnet. In the next GSM8K test, it scored a whopping 90%, but with 8-shot prompting. Finally, on the HumanEval test, the Grok-1.5 model scored 74.1% with 0-shot.
xAI has also increased the context length from 8K tokens to 128K tokens on the Grok-1.5 model. To evaluate its retrieval capability, the company ran the NIAH test (Needle in a Haystack), and it achieved perfect results.
As this is an incremental model, xAI has not disclosed the parameter size. However, to give you an overview, Grok-1 is trained on 314 billion parameters, one of the largest open-source models out there. It’s also based on the Mixture-of-Experts (MoE) architecture. xAI also released the model weights and the architecture under the Apache 2.0 license which is great.
Recently, Anthropic launched its family of Claude 3 models which have shown great promise and in many cases, the largest Opus model has already outranked OpenAI’s GPT-4 model. OpenAI is said to be working on an intermediate GPT-4.5 Turbo model and GPT-5 is also on the cards and may launch in the summer of 2024. Google’s Gemini 1.5 Pro model has also demonstrated incredible multimodal capabilities over a long context window.
Among the powerful proprietary models, xAI’s Grok-1.5 sits somewhere in the middle, if we go by its benchmark numbers. We have to wait and see how well it does on reasoning tests. Anyway, what do you think about the Grok-1.5 model? Let us know in the comments below.