OpenAI Unveils o3 Model and Becomes First to Crack the ARC-AGI Benchmark in 5 Years

openai announces o3 model and cracks arc-agi benchmark

Image Credit: Beebom

In Short

OpenAI announced a new state-of-the-art o3 reasoning model that finally cracked the hallowed ARC-AGI benchmark.
On the ARC-AGI Semi-Private Evaluation Set, OpenAI o3 scored a whopping 87.5%, beating the 85% mark, generally achieved by humans.
OpenAI also announced o3-mini which is optimized for coding and offers faster performance at a lower cost.

On the final day of the “12 days of OpenAI” announcements, OpenAI revealed the biggest update. OpenAI announced the o3 and o3-mini reasoning models, and most notably, OpenAI made history as o3 became the first AI model to crack the hallowed ARC-AGI benchmark, breaking a five-year unbeaten streak.

openai o3 arc-agi benchmark — Image Credit: OpenAI via YouTube

On the ARC-AGI Semi-Private Evaluation Set, OpenAI’s o3 model scored a whopping 87.5% when using high-compute resources and given more time to think. The ARC Prize threshold was set at 85%, close to what humans generally achieve. Just so you know, the OpenAI o1 model could only score 32%.

ARC-AGI is designed to test AI models for generalized intelligence, focusing on the ability to solve novel problems, rather than relying on memorized patterns. So with the o3 model, OpenAI has indeed achieved a historic breakthrough in generalized intelligence. It may bring OpenAI closer to achieving AGI (Artificial General Intelligence) — an AI system that can match or exceed human intelligence.

Image Credit: OpenAI via YouTube
Image Credit: OpenAI via YouTube

Besides ARC-AGI, OpenAI o3 scored 71.7 in SWE-bench Verified, 2,727 in Codeforces, 96.7 in AIME 2024, and 87.7 in GPQA Diamond. All these tests are highly challenging and the scores are significantly higher than what o1 achieved. Finally, in the EpochAI Frontier Math benchmark which requires expert mathematicians hours to solve a problem, OpenAI o3 got 25.2 accuracy. The earlier best score was just 2.0.

Image Credit: OpenAI via YouTube
Image Credit: OpenAI via YouTube

Coming to the o3-mini model, OpenAI says it’s a distilled model from o3, and optimized for coding, fast performance, and cost-efficiency. o3-mini has three compute settings: low, medium, and high. At medium setting, the o3-mini outperforms the larger o1 model and costs less. Its latency is also lower than the o1 model.

In case you are wondering why is it called o3, and not o2, well, to avoid legal issues with O2, the UK-based mobile network operator, OpenAI decided to skip o2 altogether.

Gemini 2.0 Flash Thinking vs ChatGPT o1: OpenAI Thinks Deeper

Arjun Sha Dec 20, 2024

Sam Altman Downplays AGI Risks; Now Warns About Superintelligence

Arjun Sha Dec 5, 2024

OpenAI Planning to Launch an AI Agent Called ‘Operator’ in January

Arjun Sha Nov 14, 2024

Finally, about availability, OpenAI says it’s performing safety testing on o3 and o3-mini models. The company is also opening up the o3-mini model for public safety testing. OpenAI plans to release the o3-mini model by the end of January 2025. And after that, the o3 model will be released, after rigorous testing and approval by regulators.

#Tags

#AI

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Comments 1