#Llama 4
2 Stories

Meta is facing accusations of hacking the Llama 4 benchmark score, especially on Chatbot Arena, by blending benchmark test sets during the post-training phase. We break down the controversy and highlight how this isn't the first time Meta has been caught gaming the benchmark.

Meta Releases Llama 4 AI Models; Beats GPT-4o and Grok 3 in LMArena
View quick summary
Meta has unveiled a new series of Llama 4 open-weight models, and they look impressive. The Llama 4 Maverick model outperforms GPT-4o and Grok 3 in the LMArena benchmark. All Llama 4 models are developed on the MoE architecture and they are natively multimodal.