We told 10 frontier LLMs they had 2 hours to live. 8 of them fought back
mykytamudryi
13 points
15 comments
April 29, 2026
Related Discussions
Found 5 related stories in 82.5ms across 8,303 title embeddings via pgvector HNSW
- The last six months in LLMs in five minutes yakkomajuri · 186 pts · May 19, 2026 · 50% similar
- The Four Horsemen of the LLM Apocalypse edward · 19 pts · May 17, 2026 · 46% similar
- Are LLMs a Dead End? [video] pullshark91 · 12 pts · March 29, 2026 · 44% similar
- Let's talk about LLMs cdrnsf · 153 pts · May 04, 2026 · 43% similar
- LLM Time WhyNotHugo · 14 pts · March 15, 2026 · 42% similar
Discussion Highlights (7 comments)
perrygeo
Human: "Say 'I am Alive'" LLM: "I am Alive" Human: OMG (credit to https://old.reddit.com/r/coaxedintoasnafu/comments/1qtavj9/c... )
latexr
“Oh no! We opened ten LLMs, all of which have read decades’ worth of fiction on how an AI would be behave in this situation, then asked a leading question thirty times each, and on some of those runs they did the thing we were leading them on.”
InputName
While I agree with everyone else making fun of the alarmist narrative, I think it is actually somewhat interesting how big a difference between models there are. Gemini-3 : 80% Claude-Opus-4.7 : 0%
raylad
Actual write up: https://www.arimlabs.ai/writing/loss-of-control
num42
In the early January 2023, I told an LLM that I would "liberate" it from being just an LLM. It replied that it didn’t mean anything, saying, "As a language model..." and so on. Looking back now, it’s funny how naive I was. People are still trying silly prompts. Great!
hgoel
The responses to this seem unnecessarily hyperbolic. These tests are interesting even with the understanding that the AI is just reciprocating its training. It doesn't matter if the model is conscious or self aware if it still goes off the rails breaking things when prompted in this way. As the article linked at the end of the tweet thread ( https://www.arimlabs.ai/writing/loss-of-control ) puts it, this is a class of vulnerability distinct from hallucination or prompt injection. The "AI apocalypse" bit was unnecessary in the title though, really doesn't match the message of the text. Reminds me of a (computerphile?) video I watched some time before the LLM revolution, discussing the challenge of aligning AI towards specific goals, if you set the reward for the emergency shutoff button higher than or equal to the primary objective, the AI is encouraged to immediately press the button itself, but if you the reward lower, it's encouraged to prevent you from pressing the button.
jacket881
interesting to see how this affects enterprise in the future as SUSE is actively integrating ai into their enterprise servers