We told 10 frontier LLMs they had 2 hours to live. 8 of them fought back

mykytamudryi 13 points 15 comments April 29, 2026
twitter.com · View on Hacker News

Discussion Highlights (7 comments)

perrygeo

Human: "Say 'I am Alive'" LLM: "I am Alive" Human: OMG (credit to https://old.reddit.com/r/coaxedintoasnafu/comments/1qtavj9/c... )

latexr

“Oh no! We opened ten LLMs, all of which have read decades’ worth of fiction on how an AI would be behave in this situation, then asked a leading question thirty times each, and on some of those runs they did the thing we were leading them on.”

InputName

While I agree with everyone else making fun of the alarmist narrative, I think it is actually somewhat interesting how big a difference between models there are. Gemini-3 : 80% Claude-Opus-4.7 : 0%

raylad

Actual write up: https://www.arimlabs.ai/writing/loss-of-control

num42

In the early January 2023, I told an LLM that I would "liberate" it from being just an LLM. It replied that it didn’t mean anything, saying, "As a language model..." and so on. Looking back now, it’s funny how naive I was. People are still trying silly prompts. Great!

hgoel

The responses to this seem unnecessarily hyperbolic. These tests are interesting even with the understanding that the AI is just reciprocating its training. It doesn't matter if the model is conscious or self aware if it still goes off the rails breaking things when prompted in this way. As the article linked at the end of the tweet thread ( https://www.arimlabs.ai/writing/loss-of-control ) puts it, this is a class of vulnerability distinct from hallucination or prompt injection. The "AI apocalypse" bit was unnecessary in the title though, really doesn't match the message of the text. Reminds me of a (computerphile?) video I watched some time before the LLM revolution, discussing the challenge of aligning AI towards specific goals, if you set the reward for the emergency shutoff button higher than or equal to the primary objective, the AI is encouraged to immediately press the button itself, but if you the reward lower, it's encouraged to prevent you from pressing the button.

jacket881

interesting to see how this affects enterprise in the future as SUSE is actively integrating ai into their enterprise servers

Semantic search powered by Rivestack pgvector
8,303 stories · 78,303 chunks indexed