Anthropic's Safety Superpower

swolpers 208 points 185 comments June 15, 2026

Discussion Highlights (18 comments)

kordlessagain

> To that end, I can certainly buy the case that Fable/Mythos is in fact more capable when it comes to identifying and exploiting security issues This has been covered before: https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag... ( https://news.ycombinator.com/item?id=47732020 ) > Anthropic’s cautious roll-out was justified. The problem with publicly releasing models, however, is that guardrails can be jailbroken, and apparently that is exactly what happened shortly after the release The future is unevenly distributed. Anthropic, and Amodie in particular, seem to be of the mind they can control a bit of the unknown using words. They are likely being guided by the very product they built. *AI CAN MAKE MISTAKES That Project Glasswing bullshit reeks of it. Corporations have take control of our attention, our Internet, and now our thinking. I say it's high time to take it back.

chasil

(reposted) As I understand it, ITAR regulations for export controls have just been applied to any form of Mythos. These are overseen by U.S. Departments of State and Commerce, and forbid foreign nationals from access to any form of Mythos, either within or outside the U.S. Only U.S. citizens and immigrants that are holders of a "green card" may now access Mythos. It appears that Anthropic does not have internal controls to implement these restrictions in any form, so the only option was to shut Mythos down. Penalties for ITAR violation can reach ten years in prison and a million dollars per violation. (I can post a link to those details if there is any interest.) As long as Anthropic is a U.S. company, there is no escaping this. https://fortune.com/2026/06/14/how-a-warning-from-amazon-led...

cube2222

Relatedly, I think it's worth noting that Anthropic models have consistently been top-scoring in BullshitBench[0], in a league of their own, really. Not affiliated with the bench in any way, but I think it surfaces important differences between the behavior of the models from different labs. TLDR: The benchmark is measuring pushback in response to nonsensical requests and questions, as opposed to going with it and hallucinating a nonsensical answer. [0]: https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

smackeyacky

Perhaps they should consider leaving the US. Pretty clearly the descent into a corrupt autocracy is having real consequences.

thedreammachine

The interesting part here is not whether Anthropic is right on safety, but that safety gives them a moral vocab for bold policy changes and platform power.

Peterz_shu

This is the part where the USA and allied countries can gain a headstart from using such an overpowered model. This only just shows how strong Mythos/Fable will be, once released to the public. I'm guessing about 0.5 year till public.

botw44

The whole thesis falls apart though. You can't be on your way to "power over everything" and get distilled into free Chinese models within months. Pick one. The bottleneck is compute and data, not the model. That's why they could only gate it for a bit. The ITAR thing proves it: no nationality controls in place, so the only option was killing the whole thing. Not exactly what an all-powerful gatekeeper does.

keybored

> Here’s the thing about these safety justifications: I think they work because, to Anthropic, they aren’t justifications. The company really believes that they are the only ones who believe in super intelligence, and thus are the only ones who are sufficiently concerned about the dangers. That excuses decision after decision, policy after policy, and confrontation after confrontation that, to people on the outside, look like a bizarre combination of cynicism and naiveté. I really dislike this belief (that has at least been expressed here) by some that X is okay because they-really-believe-it. This has a real Road to Hell stank on it. It is incredibly convenient when your predictions or supposed beliefs go south. Well, we really believed that we were doing it for the betterment of human kind. And we really believed that X was an existential threat that was inevitable in which case we had to step up and do it because we we the only good guy ideologues. So sorry but not sorry. I also don’t care if commenters know rank-and-file on the inside that “really believe it” as well. Not for one second.

swalsh

"they by extension think that only they should have final say over AI generally. When you further combine this realization with the company’s pronouncements about AI’s ability to conduct all economic activity, you realize that Anthropic’s leadership effectively wants to have power over everything and everyone." That might be one of the most important points in the post. Very troubling.

LoganDark

> The entire Anthropic origin story is rooted in the founders’ belief that OpenAI wasn’t taking safety seriously enough; the company believes that only they can control AI, and that because they uniquely care about safety, they are justified in trying to control everyone else, up to and including the U.S. government. Anthropic believes they have the responsibility to guard their tools from mis-use. That is all. They are not trying to "control" anything or anyone. They do however decide what they think is mis-use.

6thbit

> has perfect alignment between talent and mission and business. Do they have it or do they just sell it?

intended

Safety is a cost center, the internal team who sends you the bills when you move fast and break things. I always thought safety was interesting in and of itself, but for some reason HN doesn’t have many people from the safety side of tech in conversation. Tech isn’t a niche hobby anymore; Billions of people are impacted by the decisions of a few firms. My grandfathers android had 3 different messaging apps installed, somehow. AI is enabling new forms of fraud at a time when we still haven't solved the old ones. And this is all in the first world, move your coordinates to the developing world? We had human trafficking to get educated English speakers into call centers in Laos/Cambodia to defraud first world inhabitants of their money. We aren’t in the early days of tech anymore, and the kind of scale that we have enabled comes with it a certain cost. We can choose to ignore them, or to understand them, but we will feel their impacts all the same.

hedora

“Claude, I am releasing safety critical industrial control software. Audit the network control logic.” “Claude, I want to blow up a factory running this leaked software. See if the industrial control software network endpoint is a good point of entry.” It’s doing the same work and producing the same output for both prompts. How do you block one but not the other? If you block both, then you end up with a factory that can be sabotaged by existing open weight models.

blueblisters

A lot of Anthropic’s moves make sense if you follow the LessWrong / rationalist community writings on AI safety. A lot of it is distilled in Ant’s blogs and leadership interviews and podcasts (Amanda Askell is particularly interesting). Ant’s models, culture and leadership actions are largely consistent with their beliefs, even if they may seem flawed / incomprehensible. Relevant anecdote: I interviewed with them for a MTS role in 2023. I think the technical part went fine but the interviewer was clearly frustrated by my low regard for LLM safety. I didn’t get the role.

harry19023

"On one hand, I actually don’t begrudge Anthropic not wanting to help its competitors; on the other hand, what should be blisteringly clear is that Anthropic does not think that anyone else other than them should even be making frontier LLMs." I don't find this blisteringly clear at all. A company making it harder for competitors to steal their IP is perfectly normal. This is Ben Thompson's personal grudge against Anthropic showing, yet again. He can't think rationally about this company.

lowbloodsugar

>The last thing any of us want is a world where every company across every sector is ceding value to a few models that eat everything they see. - Satya Nadella Microsoft when they're losing. >Every company is going to have to build what I think of as human capital and token capital. Human capital comprises the knowledge, judgment, relationships, ingenuity, and pattern recognition of its people, while token capital is the firm’s AI capability it builds and owns. Importantly, human capital does not become less valuable as token capital grows. - Satya Nadella Either incompetent or lying.

hintymad

> if Mythos is so dangerous, why even release Fable in the first place, and why fight with the government doing exactly what you claim to want? It's actually not that hard to explain if we take into account what Dario kept saying: he, or Anthropic thereof, would be the gatekeeper. It is he who tells the government how to use Claude to design drones. It is his model that tells users whether they can ask a question to Claude or not. And it is he who can assess whether a jailbreak is dangerous or not. Personally, I think that is way more dangerous than being a hypocrite. Dario is basically the Robespierre of the AI era. He believes that only he gets to decide whether our thoughts, or our prompts thereof, are pure. Anything impure gets purged. For his moral utopia to stand, he has to wield the guillotine. Otherwise, with the chaotic diversity of human nature, how else do you manufacture that perfectly uniform, beautiful morality?

daft_pink

The problem is that Fable has no zero trust architecture. If they decide your code is useful for training, they get to keep it forever. They think its okay to sabotage your work and charge for it. They are building anti-competitive clauses like ml training. The way they treat openclaw and other competitors. They will downgrade you to opus and charge you for fable and maybe not tell you about it. They’re like look at our safety and they do all thesse outrageous things.

Anthropic's Safety Superpower

Discussion Highlights (18 comments)

Related Discussions