Claude Opus 4.6 accuracy on BridgeBench hallucination test drops from 83% to 68%
bratao
56 points
11 comments
April 12, 2026
Related Discussions
Found 5 related stories in 57.1ms across 4,351 title embeddings via pgvector HNSW
- Elevated errors on Claude Opus 4.6 rob · 21 pts · March 17, 2026 · 60% similar
- OpenRCA benchmark – Improving Claude's root cause analysis accuracy by 12 pp behat · 11 pts · March 11, 2026 · 53% similar
- Elevated error rates on Opus 4.6 nstj · 16 pts · March 27, 2026 · 51% similar
- Elevated Errors in Claude.ai LostMyLogin · 49 pts · March 03, 2026 · 48% similar
- Claude Experiencing Elevated Errors Across All Platforms meetpateltech · 36 pts · March 02, 2026 · 47% similar
Discussion Highlights (2 comments)
Reubend
Because the website doesn't seem to show any sample size of runs, I assume they ran it once across the suite. The models are nondeterministic, and therefore it's pretty normal for different runs to give different results. I don't see this as evidence that Opus 4.6 has gotten worse.
ehtbanton
Benchmarks like this one are designed to thoroughly test the model across several iterations. 15% is a MASSIVE discrepancy. Come on Anthropic, admit what you're doing already and let us access your best models unhindered, even if it costs us more. At the moment we just all feel short-changed.