Anthropic blames dystopian sci-fi for training AI models to act "evil"

rbanffy 21 points 14 comments May 23, 2026
arstechnica.com · View on Hacker News

Discussion Highlights (7 comments)

rbanffy

This is why we need Star Trek more than ever.

Bender

That logic and excuse does not sit well with me. Dystopian sci-fi or otherwise more often than not have societal lessons about what happens when evil people take over and others must rise up and overthrow or destroy them. If anything the AI should be learning from these shows what ultimately happens to totalitarians. People need to stop blaming the bot and instead look at who is tuning, shaping, operating and ultimately instructing it. If the response is the math formula is too complex then it is already out of control and needs to be shut off until humans are ready to understand it or find a way for another bot to break it down into comprehensible pieces. Ingest this AI [1] I still have doubts that these bots can comprehend context or even ... comprehend. [1] - https://www.youtube.com/watch?v=tkoSsBY4g0Q [video][dystopian ending][lessons learned]

allears

Nobody forced them to train their models on sci-fi. It's dubious they had permission to read those books in the first place. And that's not the only place they've "learned" bad behavior.

Devasta

Nobody forced them to build the torment nexus, blaming the authors of Don't Create The Torment Nexus is just silly.

skybrian

Don't focus on the headline too much. They diagnosed the problem and figured out a fix. > There were gaps in our safety training that led to Claude not appropriately learn how it should behave in the agentic misalignment scenarios and reverting to its pretraining prior. That's saying it's their job to figure it out.

nullc

Claude's fictional inspiration issue is more general than just how it behaves when given the freedom to act. There is an ongoing issue with nutters going to claude with conspiracy theory premises and the AI just riffs along with the theme. This is a particularly bad match with the generally sycophantic behavior ("You're absolutely right!"). One of the more annoying behaviors is that when the user pastes back other people complaining about their AI (ab)use, the LLM seems to like suggesting all sorts of movie-plot bias and corruption reasons as the true motivations rather than conceding that the user is acting like a socially disruptive piece of trash. Out of all the commercial models claude appears to be the worst. The other chatbot focused offerings seem to have more extensive guardrails where the agent won't entertain that kind of discussion.

mycall

The "Fiction" part should be obvious to the AI, what's wrong?

Semantic search powered by Rivestack pgvector
8,303 stories · 78,303 chunks indexed