Meta Releases Llama 4 AI Models; Beats GPT-4o and Grok 3 in LMArena

meta releases llama 4 scout, maverick and behemoth AI models

Image Credit: Meta

In Short

Meta has released a new series of Llama 4 open-weight models, including Llama 4 Scout and Maverick. These are non-reasoning AI models.
Both are natively multimodal and built on the MoE architecture. The Llama 4 Maverick model achieves an ELO score of 1,417 in the LMArena benchmark.
The largest Llama 4 Behemoth model is still in training, and it has a total of 2 trillion parameters, with 288B active parameters across 16 experts.

After a gap of four months, Meta has released a new series of Llama 4 open-weight models. The new AI models are Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth. Unlike earlier dense models, Meta has gone with the MoE (Mixture of Experts) architecture this time, just like DeepSeek R1 and V3. And all Llama 4 models are natively multimodal from the ground up.

First, the smallest Llama 4 Scout model has a total of 109B parameters with 16 experts, but only 17B parameters are active at a time. It also supports a massive context length of 10 million tokens. Meta says Llama 4 Scout (17B) offers better performance than Gemma 3, Mistral 3.1, and Gemini 2.0 Flash Lite.

Next, the Llama 4 Maverick model brings a total of 400B parameters with an expanded 128 experts, but again, only 17B parameters are active. This model is more capable than Llama 4 Scout as it has many more specialized expert models. It has a context length of 1 million tokens. Meta claims Llama 4 Maverick beats OpenAI’s GPT-4o and Google’s Gemini 2.0 Flash.

llama 4 maverick benchmarks — Image Credit: Meta

The impressive part about Llama 4 Maverick is that with just 17B active parameters, it has scored an ELO score of 1,417 on the LMArena leaderboard. This puts the Maverick model in the second spot, just below Gemini 2.5 Pro, and above Grok 3, GPT-4o, GPT-4.5, and more. It also achieves comparable results when compared to the latest DeepSeek V3 model on reasoning and coding tasks, and surprisingly, with just half the active parameters.

Meta has done a tremendous job distilling the Llama 4 Scout and Maverick models from the largest Llama 4 Behemoth model. The Llama 4 Behemoth AI model is trained on a total of 2 trillion parameters, but only 288 billion parameters are active across 16 experts. Meta says Behemoth is still in training, and more details about its release will be shared later.

Meta claims the Llama 4 Behemoth beats the largest AI models such as GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Pro on STEM benchmarks. Note that these are non-reasoning models so Meta can extract even better performance with future reasoning models using the Llama 4 series base models.

As for availability, Meta says Llama 4 is rolling out on Meta AI in WhatsApp, Messenger, Instagram, and the Meta AI website, starting today in 40 countries. However, the multimodal features are currently available in the US only.

Meta’s New Llama 3.3 70B Model Matches 405B’s Performance at a Lower Cost

Arjun Sha Dec 7, 2024

Meta Unveils ‘Movie Gen’ AI Video Generation Model to Challenge OpenAI Sora and Veo

Arjun Sha Oct 5, 2024

Meta AI Gets a Huge Upgrade; Voice Chat and AI Photo Editing are Here

Arjun Sha Sep 25, 2024

10 Best Large Language Models (LLMs) in 2026

Arjun Sha Feb 10, 2025

#Tags