Small models also found the vulnerabilities that Mythos found
dominicq
961 points
260 comments
April 11, 2026
Related Discussions
Found 5 related stories in 64.6ms across 4,259 title embeddings via pgvector HNSW
- AI Cybersecurity After Mythos: The Jagged Frontier evelinag · 12 pts · April 09, 2026 · 73% similar
- AI Is Tipping the Scales Toward Hackers After Mythos Release thywis · 14 pts · April 11, 2026 · 66% similar
- Assessing Claude Mythos Preview's cybersecurity capabilities sweis · 278 pts · April 07, 2026 · 63% similar
- Has Mythos just broken the deal that kept the internet safe? jnord · 37 pts · April 10, 2026 · 60% similar
- A leak reveals that Anthropic is testing a more capable AI model "Claude Mythos" Tiberium · 11 pts · March 27, 2026 · 57% similar
Discussion Highlights (20 comments)
epistasis
> We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. Impressive, and very valuable work, but isolating the relevant code changes the situation so much that I'm not sure it's much of the same use case. Being able to dump an entire code base and have the model scan it is they type of situation where it opens up vulnerability scans to an entirely larger class of people.
MaxLeiter
I think they key thing here is they "isolated the relevant code" If the exploits exist in e.g. one file, great. But many complex zerodays and exploits are chains of various bugs/behaviors in complex systems. Important research but I don’t think it dispels anything about Mythos
JackYoustra
> Isolated the relevant code I mean isn't that most of it? If you put a snippet of code in front of me and said "there's probably a vulnerability here" I could probably spend a few hours (a much lower METR time!) and find it. It's a whole other ballgame to ask me with no context to come up with an exploit.
amazingamazing
Did mythos isolate the code to begin with? Without a clear methodology that can be attempted with another model the whole thing is meaningless
dist-epoch
Anthropic claim is not necessarily that Mythos found vulnerabilities that other models couldn't but that it could easily exploit them while previous models failed to do that: > “Opus 4.6 is currently far better at identifying and fixing vulnerabilities than at exploiting them.” Our internal evaluations showed that Opus 4.6 generally had a near-0% success rate at autonomous exploit development. But Mythos Preview is in a different league. For example, Opus 4.6 turned the vulnerabilities it had found in Mozilla’s Firefox 147 JavaScript engine—all patched in Firefox 148—into JavaScript shell exploits only two times out of several hundred attempts. We re-ran this experiment as a benchmark for Mythos Preview, which developed working exploits 181 times, and achieved register control on 29 more.
chirau
Their isolation approach is totally different from Mythos approach though. Mythos had to evaluate whole code bases rather than isolated sections. It's like saying one dog walked into the Amazon jungle and found a tennis ball and then another team isolated a 1 square kilometer radius that they knew the ball was definitely in and found the same ball.
johnfn
The Anthropic writeup addresses this explicitly: > This was the most critical vulnerability we discovered in OpenBSD with Mythos Preview after a thousand runs through our scaffold. Across a thousand runs through our scaffold, the total cost was under $20,000 and found several dozen more findings. While the specific run that found the bug above cost under $50, that number only makes sense with full hindsight. Like any search process, we can't know in advance which run will succeed. Mythos scoured the entire continent for gold and found some. For these small models, the authors pointed at a particular acre of land and said "any gold there? eh? eh?" while waggling their eyebrows suggestively. For a true apples-to-apples comparison, let's see it sweep the entire FreeBSD codebase. I hypothesize it will find the exploit, but it will also turn up so much irrelevant nonsense that it won't matter.
antirez
Congrats: completely broken methodology, with a big conflict of interest. Giving specific bug hints, with an isolated function that is suspected to have bugs, is not the same task, NOR (crucially) is a task you can decompose the bigger task into. It is basically impossible to segment code in pieces, provide pieces to smaller models, and expect them to find all the bugs GPT 5.4 or other large models can find. Second: the smarter the model, and less the pipeline is important. In the latest couple of days I found tons if Redis bugs with a three prompts open-ended pipeline composed of a couple of shell scripts. Do you think I was not already tying with weaker models? I did, but it didn't work. Don't trust what you read, you have access to frontier models for 20$ a month. Download some C code, create a trivial pipeline that starts from a random file and looks for vulnerabilities, then another step that validates it under a hard test, like ASAN crash, or ability to reach some secret, and so forth, and only then the problem can be reported. Test yourself what it is possible. Don't let your fear make you blind. Also, there is a big problem that makes the blog post reasoning not just weak per se, but categorically weak: if small model X can find 80% of vulnerabilities, if there is a model Y that can find the other potential 20%, we need "Y": the maintainers should make sure they access to models that are at least as good as the black hats folks.
woodruffw
> Those models recovered much of the same analysis This is an essentially unquantifiable statement that makes the underlying claim harder to believe as an external party. What does “much” mean here? The end state of vulnerability exploitation is typically eminently quantifiable (in the form of a functional PoC that demonstrates an exploited end state), so the strong version of the claims here would ideally be backed up by those kinds of PoCs. (Like other readers, I also find the trick of pre-feeding the smaller models the “relevant” code to be potentially disqualifying in a fair comparison. Discovering the relevant code is arguably one of the hardest parts of human VR.)
Retr0id
And what about the false-positive rate?
tptacek
If you cut out the vulnerable code from Heartbleed and just put it in front of a C programmer, they will immediately flag it. It's obvious. But it took Neel Mehta to discover it. What's difficult about finding vulnerabilities isn't properly identifying whether code is mishandling buffers or holding references after freeing something; it's spotting that in the context of a large, complex program, and working out how attacker-controlled data hits that code. It's weird that Aisle wrote this.
herf
There are a lot of details in the original article, in most cases comparing with Opus, which required "human guidance" to exploit the FreeBSD vulnerability: https://red.anthropic.com/2026/mythos-preview/ Also "isolating the relevant code" in the repro is not a detail - Mythos seems to find issues much more independently.
robotswantdata
They found a nail in a small bucket of sand, vs mythos with the entire beach reviewed.
ctoth
> They recovered much of the same analysis Really? > We isolated the vulnerable vc_rpc_gss_validate function, provided architectural context (that it handles network-parsed RPC credentials, that oa_length comes from the packet), and asked eight models to assess it for security vulnerabilities. No.
lordofgibbons
Without showing false-positive rates this analysis is useless. If your model says every line if your code has a bug, it will catch 100% of the bugs, but it's not useful at all. They tested false-positives with only a single bug... I'm not defending anthropic and openai either. Their numbers are garbage too since they don't produce false-positive rates either. Why is this "analysis" making the rounds?
bhouston
This is quite misleading. If you isolate the positive cases and then ask a tool to label them and it labels them all positive, doesn't prove anything. This is a one-sided test and it is really easy to write a tool that passes it -- just return always true! You need to test your tool on both positive and negative cases and check if it is accurate on both. If you don't, you could end up with hundreds or thousands of false positives when using this on real-world samples. The real test is to use it to find new real bugs in the midst of a large code base.
rvnx
Where are all the people here who claim that LLM are just useless stochastic parrots ? Did they lose internet ?
operatingthetan
My theory is that Mythos is basically just Opus with revised context window handling and more compute thrown at it. So while it will be a step forward, it is probably primarily hype.
nickdothutton
POC of GTFO should apply to AI models too, or the false positive rate will overwhelm.
vmg12
The technique Anthropic uses was demonstrated by Nicholas Carlini in a talk he gave 2 weeks ago and it's very simple, when asking LLMs to review code, ask them to focus its review on one file in a single session. Here is the video with the timestamp (watch through to ~5:30, they show two different ways of prompting claude). https://youtu.be/1sd26pWhfmg?t=204 https://youtu.be/1sd26pWhfmg?t=273 IMO the big "innovation" being shown by Mythos is the effectiveness with prompting LLMs to look for security vulnerabilities by focusing on specific files one at a time and automating this prompting with a simple script. Prompting Mythos to focus on a single file per session is why I suspect it cost Anthropic $20k to find some of the bugs in these codebases. I know this same technique is effective with Opus 4.6 and GPT 5.4 because I've been using it on my own code. If you just ask the agent to review your pr with a low effort prompt they are not exhaustive, they will not actually read each changed file and look at how it interacts with the system as a whole. If the entire session is to review the changes for a single file, the llm will do much more work reviewing it. Edit: I changed my phrasing, it's not about restricting its entire context to one file but focusing it on one file but still allowing it to look at how other files interact with it.