From 0% to 36% on Day 1 of ARC-AGI-3

lairv 60 points 29 comments March 27, 2026

Discussion Highlights (4 comments)

lairv

Note that this uses a harness so it doesn't qualify for the official ARC-AGI-3 leaderboard According to the authors the harness isn't ARC-AGI specific though https://x.com/agenticasdk/status/2037335806264971461

esafak

Anybody used this Agentica of theirs?

modeless

On the public set of 25 problems. These are intended for development and testing, not evaluation. There are 110 private problems for actual evaluation purposes, and the ARC-AGI-3 paper says "the public set is materially easier than the private set".

gslin

https://en.wikipedia.org/wiki/Goodhart's_law > Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

From 0% to 36% on Day 1 of ARC-AGI-3

Discussion Highlights (4 comments)

Related Discussions