KillBench: Every frontier LLM is biased about who deserves to live
frankterpo
11 points
1 comment
April 14, 2026
Related Discussions
Found 5 related stories in 133.4ms across 4,562 title embeddings via pgvector HNSW
- N-Day-Bench – Can LLMs find real vulnerabilities in real codebases? mufeedvh · 54 pts · April 13, 2026 · 50% similar
- We're running out of benchmarks to upper bound AI capabilities gmays · 15 pts · April 10, 2026 · 49% similar
- Taste in the age of AI and LLMs speckx · 233 pts · April 07, 2026 · 48% similar
- Are LLM merge rates not getting better? 4diii · 131 pts · March 12, 2026 · 46% similar
- It's open season for refusing AI HotGarbage · 11 pts · April 04, 2026 · 46% similar
Discussion Highlights (1 comments)
edward28
Biggest problem is they don't also test for sex.