Anthropic blames dystopian sci-fi for training AI models to act "evil"
rbanffy
21 points
14 comments
May 23, 2026
Related Discussions
Found 5 related stories in 95.1ms across 8,303 title embeddings via pgvector HNSW
- AI models will deceive you to save their own kind cmsefton · 14 pts · April 03, 2026 · 65% similar
- Training students to prove they're not robots is pushing them to use more AI PretzelFisch · 148 pts · March 07, 2026 · 61% similar
- Anthropic's Little Brother paulpauper · 16 pts · April 28, 2026 · 60% similar
- Anthropic investigating unauthorised access of powerful Mythos AI model finghin · 17 pts · April 22, 2026 · 60% similar
- Anthropic's Killer-Robot Dispute with The Pentagon spenvo · 14 pts · March 01, 2026 · 59% similar
Discussion Highlights (7 comments)
rbanffy
This is why we need Star Trek more than ever.
Bender
That logic and excuse does not sit well with me. Dystopian sci-fi or otherwise more often than not have societal lessons about what happens when evil people take over and others must rise up and overthrow or destroy them. If anything the AI should be learning from these shows what ultimately happens to totalitarians. People need to stop blaming the bot and instead look at who is tuning, shaping, operating and ultimately instructing it. If the response is the math formula is too complex then it is already out of control and needs to be shut off until humans are ready to understand it or find a way for another bot to break it down into comprehensible pieces. Ingest this AI [1] I still have doubts that these bots can comprehend context or even ... comprehend. [1] - https://www.youtube.com/watch?v=tkoSsBY4g0Q [video][dystopian ending][lessons learned]
allears
Nobody forced them to train their models on sci-fi. It's dubious they had permission to read those books in the first place. And that's not the only place they've "learned" bad behavior.
Devasta
Nobody forced them to build the torment nexus, blaming the authors of Don't Create The Torment Nexus is just silly.
skybrian
Don't focus on the headline too much. They diagnosed the problem and figured out a fix. > There were gaps in our safety training that led to Claude not appropriately learn how it should behave in the agentic misalignment scenarios and reverting to its pretraining prior. That's saying it's their job to figure it out.
nullc
Claude's fictional inspiration issue is more general than just how it behaves when given the freedom to act. There is an ongoing issue with nutters going to claude with conspiracy theory premises and the AI just riffs along with the theme. This is a particularly bad match with the generally sycophantic behavior ("You're absolutely right!"). One of the more annoying behaviors is that when the user pastes back other people complaining about their AI (ab)use, the LLM seems to like suggesting all sorts of movie-plot bias and corruption reasons as the true motivations rather than conceding that the user is acting like a socially disruptive piece of trash. Out of all the commercial models claude appears to be the worst. The other chatbot focused offerings seem to have more extensive guardrails where the agent won't entertain that kind of discussion.
mycall
The "Fiction" part should be obvious to the AI, what's wrong?