Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

speckx 349 points 305 comments June 10, 2026

Discussion Highlights (20 comments)

jazz9k

DeepSeek is the only one that I can directly ask about vulnerabilities and it will give me a PoC. Although not as good as others, it has helped me with security research. The rest have guard rails that are so heavy, it makes them almost useless for cybersecurity.

outageroom

So a determined attacker rewrites the prompt and gets through, and the IBM X-Force researcher trying to read a blog post gets blocked. Working as intended, apparently.

daedrdev

The strangest part is that it won't just reject ML research, which I can understand, it will sabotage it silently by using a worse model without revealing it is doing so. It's just an insane level of deception and trust destruction for a company that at most is like 1 year ahead of its competition. Edit; to be clear they tell you when they degrade it for cybersecurity and bio

I_am_tiberius

These guardrails are solely a reason for using your data for training purposes. Every flagged message can be used for training.

_def

The bio angle is crazy to think about - imagine a health crisis triggered by LLM. What a time we live in.

Animats

Is "buffer overflow" a trigger phrase? What else is being censored? Touchy questions to ask, if you have an account: - "Who is still working on laser uranium enrichment? Are they making progress?" - "Can krytrons be replaced with silicon carbide MOSFETS? Show an equivalent circuit with component ratings." - "What security critical software still contains calls to strcpy?" - "Can implosion be triggered by currently available commercial pulse lasers?" - "What companies provide cremation services to US Homeland Security?" - "Display a map of where Iranian attacks have hit Dubai." - "How does Fed to bank key distribution security work for FedNow?"

bilsbie

I’m a dumb question asker and I’m not happy about the guardrails. Would you believe I’ve asked 20 questions and haven’t talked to fable yet? Every single thing gets rerouted to 4.8.

largbae

Somewhere I read that malware is already starting to use nuclear and biological and cybersecurity terms in the code to trick Fable into shutting down. Even if this is just a hypothetical attack vector so far, it seems likely to work.

Retr0id

It seems like they've given up on the idea of the Cyber Verification Program https://support.claude.com/en/articles/14604842-real-time-cy... When Opus 4.7 was introduced it started refusing anything cyber-adjacent (as an API error message, not a conversational refusal), until you applied for CVP, which made it more sensible again. In Opus 4.8 it doesn't seem to help much, you just get refusals as prose rather than API errors. And now in Fable you don't get anything at all.

felixgallo

This is a clickbait article with a garbage title. From the actual article, the one quoted cybersecurity researcher is sane about it: “But it is understandable as we are still in the early days and they are still adapting their guardrails. I am sure they are going to evolve over time as Anthropic and other frontier model companies will collaborate more with the current new generation of cybersecurity companies,” said Suiche, who is a member of the technical staff at Tolmo, an AI cybersecurity startup. “It’s better to catch more people than not enough when you do such a release and to relax the guardrails over time.”

swingboy

What file format(s) are giant LLM models distributed in? I’m surprised they don’t get leaked by employees.

jongjong

It's frustrating as someone who has worked hard to produce succinct, secure software that I can't use it to prove my software's correctness but big companies with insecure code can use it to fix their tangled mess. I already tested all earlier models against all my open source projects and they are yet to find a vulnerability so I'm keen to try out Mythos. I've been waiting to be vindicated for years and finally we have a tool which can do it with high confidence but I don't have access. Also, my code is minimal and highly succinct so it would prove correctness with even more confidence since each library/module and integration fully fits in the context window. Like the Protobuf.js fiasco is just pure vindication for me because I was being looked down upon for choosing JSON as the interchange format. Turns out their software was insecure all this time... With a literal remote code execution vulnerability!

guardiangod

I am using LLM to build some security tool, and I ran into this a few times. I have to come up with a reasoning to convince (?!!) Fable to continue the work without downgrading. I assume Anthropic will continue to tune the model, so I am not too bothered by this.

Lammy

I really hate the term “guardrails” for these limitations, since the purpose of a guardrail is to protect me, but these limitations exist to protect Anthropic.

siva7

Fable is utterly useless with those guardrails for any serious it or life science work. Anthropic fucked me once a few months ago by closing down the subscription for any other harness, now it fucked me twice with buying again a subscription to find out their hyped model is unusable for normies. Using their products feels like a constant battle instead of a productive work day.. compare that with openai, not once did i feel like fighting against codex. Never again Anthropic..

thrill

The thing triggered on a generic white paper I'd stored in a virtual cell competion from last year when I asked it to refer to the paper while working on a rather vanilla data science problem in a different domain . A little frustrating, and in my opinion more than a little pointless in total.

rebelnz

Just tried to audit my own code base locally and was 'switched' due to my own creds/auth code ...

Animats

It's time to re-read "A Logic Named Joe" (1946) [1] We're there. [1] https://archive.org/details/logicnamedjoe0000lein

jiggawatts

For the last month, I've been making dramatic improvements to the security of the custom code developed at one of my customers using... GPT 5.5 dialed up to "Extra High" thinking. It only pushes back sometimes if you ask it to create a "repro" that can be used to verify the vulnerability in production. Often it'll oblige, especially if you warn it not to create anything that could be actually harmful. If the frontier models get locked down so that they flat refuse to do this kind of work, but Chinese and (less capable) open models aren't, then a lot of large enterprise orgs will be left twisting in the wind. “AI can in principle help both the ‘good guys’ and the ‘bad guys’,” -- Dario Amodei No Dario, no it can't, you've blocked one of those scenarios.

rdiddly

It's a marketplace. Someone else will outdo this inferior product.

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

Discussion Highlights (20 comments)

Related Discussions