Nvidia Releases Nemotron 70B Model; Claims to Beat GPT-4o and Claude 3.5 Sonnet

Image Courtesy: Nvidia

In Short

Nvidia has unveiled the Nemotron 70B model and it's trained on Llama 3.1 70B using RLHF.
The model claims to beat GPT-4o and Claude 3.5 Sonnet based on LMSYS' Arena Hard benchmark, MT-Bench, and AlpacaEval.
Nvidia says Nemotron 70B can correctly answer the 'strawberry' question without using additional reasoning tokens or CoT prompting.

In June, Nvidia released the Nemotron-4 340B model which lets developers generate rich synthetic data. And now, Nvidia has unveiled Llama 3.1 Nemotron 70B model which is trained on Meta’s Llama 3.1 70B model using RLHF. The model is relatively smaller, but Nvidia claims it beats GPT-4o and Claude 3.5 Sonnet.

nvidia nemotron 70b benchmarks — Image Courtesy: HuggingFace / Nvidia

In LMSYS’s Arena Hard benchmark, the Llama 3.1 Nemotron 70B model scores 85.0 whereas GPT-4o gets 79.3 and Claude 3.5 Sonnet achieves 79.2 points. On AlpacaEval and MT-Bench too, Nvidia’s latest model does better than proprietary models despite its smaller size. Nvidia has not released traditional ML benchmarks for this model.

nvidia nemotron 70b solving strawberry question

Apart from that, Nvidia says that Llama 3.1 Nemotron 70B can correctly answer the strawberry question (how many r’s in strawberry?) that has stumped so many LLMs. It doesn’t use additional reasoning tokens like OpenAI o1 models or take advantage of specialized prompting to get the answer right. In my brief testing, the model got it wrong on the first try. However, when I asked the same question again, it correctly answered 3 R’s.

I Tried out My First Hindi LLM ‘Nanda,’ and Here’s How It Went

Sagnik Das Gupta Oct 16, 2024

Meta Releases Llama 3.2 Models with Vision Capability For the First Time

Arjun Sha Sep 25, 2024

You can test the Llama 3.1 Nemotron 70B model on HuggingFace (visit) for free. And developers can try the hosted inference for free at build.nvidia.com (visit).

#Tags

#NVIDIA

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Comments 0