From 0% to 36% on Day 1 of ARC-AGI-3
lairv
60 points
29 comments
March 27, 2026
Related Discussions
Found 5 related stories in 54.7ms across 3,471 title embeddings via pgvector HNSW
- ARC-AGI-3 lairv · 328 pts · March 25, 2026 · 59% similar
- Measuring progress toward AGI: A cognitive framework surprisetalk · 114 pts · March 18, 2026 · 52% similar
- The first 40 months of the AI era jpmitchell · 156 pts · March 28, 2026 · 51% similar
- The changing goalposts of AGI and timelines skandium · 356 pts · March 08, 2026 · 48% similar
- AI coding is gambling speckx · 321 pts · March 18, 2026 · 48% similar
Discussion Highlights (4 comments)
lairv
Note that this uses a harness so it doesn't qualify for the official ARC-AGI-3 leaderboard According to the authors the harness isn't ARC-AGI specific though https://x.com/agenticasdk/status/2037335806264971461
esafak
Anybody used this Agentica of theirs?
modeless
On the public set of 25 problems. These are intended for development and testing, not evaluation. There are 110 private problems for actual evaluation purposes, and the ARC-AGI-3 paper says "the public set is materially easier than the private set".
gslin
https://en.wikipedia.org/wiki/Goodhart's_law > Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.