ARC-AGI-3: AI’S TOUGHEST TEST YET

ARC-AGI-3 is a fresh benchmark from François Chollet's ARC Prize Foundation, designed like fun puzzles or games. Humans glance at colorful grids and figure out hidden patterns to solve them — scoring 100% on the first try. But AI must learn the rules from scratch, without hints, testing raw smarts like discovering goals and planning moves.

This test matters because it cuts through hype about AGI—true human-like intelligence. Old benchmarks let AI cheat by memorizing data, but ARC-AGI forces real thinking. With a $1M prize, it pushes labs to build smarter systems, not just bigger ones trained on endless info.

Top models bombed: Google's Gemini Pro led at a measly 0.37%, GPT-5.4 High at 0.26%, Opus 4.6 at 0.25%, and Grok-4.20 at 0%. Even after millions spent training on past versions, scores reset below 1% on this harder V3—showing AI still can't match human intuition.

It means we're far from AGI, but progress could speed up, like how ARC-2 jumped from 3% to 50% in a year. Track the journey via live leaderboards on arcprize.org, prize climbs, and new agent tricks. Real AGI? No one knows the timeline, but benchmarks like this keep claims honest.

BENCHMARKS DON'T LIE—AGI AIN'T HERE YET!

Scroll to Top