Claude Opus 4.6 accuracy on BridgeBench hallucination test drops from 83% to 68%
bratao
56 points
11 comments
April 12, 2026
Related Discussions
Found 5 related stories in 97.5ms across 8,637 title embeddings via pgvector HNSW
- Elevated errors on Claude Opus 4.6 rob · 21 pts · March 17, 2026 · 60% similar
- Claude Opus 4.7 meetpateltech · 1621 pts · April 16, 2026 · 54% similar
- OpenRCA benchmark – Improving Claude's root cause analysis accuracy by 12 pp behat · 11 pts · March 11, 2026 · 53% similar
- Elevated error rates on Opus 4.6 nstj · 16 pts · March 27, 2026 · 51% similar
- Elevated error rates on Opus 4.7 rob · 55 pts · May 15, 2026 · 51% similar
Discussion Highlights (2 comments)
Reubend
Because the website doesn't seem to show any sample size of runs, I assume they ran it once across the suite. The models are nondeterministic, and therefore it's pretty normal for different runs to give different results. I don't see this as evidence that Opus 4.6 has gotten worse.
ehtbanton
Benchmarks like this one are designed to thoroughly test the model across several iterations. 15% is a MASSIVE discrepancy. Come on Anthropic, admit what you're doing already and let us access your best models unhindered, even if it costs us more. At the moment we just all feel short-changed.