Mythos Finds a Curl Vulnerability
TangerineDream
638 points
261 comments
May 11, 2026
Related Discussions
Found 5 related stories in 75.1ms across 8,303 title embeddings via pgvector HNSW
- Mythos Discovered a CVE in Its Training Data – and That's Still Worrying chris_j · 14 pts · May 11, 2026 · 66% similar
- Small models also found the vulnerabilities that Mythos found dominicq · 961 pts · April 11, 2026 · 65% similar
- Has Mythos just broken the deal that kept the internet safe? jnord · 37 pts · April 10, 2026 · 59% similar
- FreeBSD CVE-2026-4747 Log Suggests Mythos Is a Marketing Trick jgalt212 · 13 pts · April 21, 2026 · 58% similar
- Mozilla says 271 vulnerabilities found by Mythos and "almost no false positives" epistasis · 121 pts · May 07, 2026 · 58% similar
Discussion Highlights (20 comments)
ahofmann
Putting on my tinfoil-hat: Sooo, the guy who runs the test and delivers the report could just have removed the more interesting bugs and delivered those to any three letter agency?
rzmmm
Quote: "My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing." It's a good reminder for us all that the competition in this space is rough and lots of more or less subtle marketing is involved.
bilekas
> The single confirmed vulnerability is going to end up a severity low CVE planned to get published in sync with our pending next curl release 8.21.0 in late June My mind still cannot understand the quality and refinement that's gone into cURL. It really is the perfect example of something done so right, that people barely think twice about.
yjftsjthsd-h
> The source code consists of 660,000 words, which is 12% more words than the entire English edition of the novel War and Piece. Typo, or is there a spoof I should go read?
yjftsjthsd-h
> Not particularly “dangerous” I'm not sure that follows. As noted, curl was already analyzed to death with every tool available; most software isn't at that level.
AntiUSAbah
There is always marketing involved and people should be able to put marketing into perspective. Also curl in this regard is a open source project, relativly small but critical, well known and used everywhere. Besides image libraries, tools like curl or sudo, su, passwd, etc. would also be my first try. Mythos is still not known at all what it can do. What does it mean from cost and benchmark pov to have a 10 Trillion parameter model? Nonetheless, the fact that LLMs got significant better in finding this, better than humans, started to happen half a year ago? so at one point we need to address the elefant in the room and state that today you need to do security scanning additional with LLMs. You need to take this serious. In worst case, use Anthropics marketing to state that its a must now and something changed.
mohsen1
I don't know about Mythos but in recent weeks I've noticed Opus is constantly failing to fix things in tsz[0] vs GPT 5.5 can easily churn out fixes that are solid and pass tests. I've stopped paying for Claude for now and all my money is going to OpenAI at the moment. Either Opus is massively nerfed or GPT 5.5 is really head and shoulder higher in terms of very difficult tasks. The last percent of conformance tests in tsz are really really difficult and I've seen Opus bailing again and again. So annoying to waste time and tokens to finally get "this is too involved" or "this requires a multi-week sprint to fix". [0] https://tsz.dev
perching_aix
It's a shame he seems to reject the idea of actually diving in and using these tools interactively: > It’s not that I would have a lot of time to explore lots of different prompts and doing deep dive adventures anyway. His expertise I think would elevate the results quite a bit. Although if he never uses LLMs, which it reads like he doesn't, I guess it might backfire just as well. Prompting style (still?) does matter after all, certainly in my experience anyways.
absynth
I routinely used to compile C programs on other compilers to find defects that one or another didn't find. Compiling on Windows vs Linux. You could summarize / minimize it down to compiling it with warning as errors etc but you'd be missing the point. The point wasn't actual cross-platform portability even though that was a nice side effect. It was to flush out all the weird edge cases. Edges like security flaws. Buffer overflows are usually platform specific. There are plenty of other ways to find these issues but simply recompiling for a different platform surfaces all sorts of issues.
apexalpha
> An amazingly successful marketing stunt for sure. This. Well done by Antropic. It even reached the CISO of my small semi-government org in the Netherlands, who slightly panicked at the announced 'tsunami' of vulnerabilities that was coming with Mythos. Got us some more money and priority with the board, though. Never waste a good marketing scare.
utopiah
Won my bet "voted 10 [vulnerabilities] but in retrospect as you are familiar with Claude and such tooling if you already used any of recent model to done some kind of security review then I'd drop to 1 or even 0." https://mastodon.pirateparty.be/@utopiah/116537456780283420
jongjong
I'm looking forward to trying Mythos run against my 5000-line, instant-finality, quantum-resistant blockchain project and decentralized exchange (an additional 5000 lines). I already ran all the models up to Opus 4.6 and they couldn't find anything.
nevi-me
> These tools and the analyses they have done have triggered somewhere between two and three hundred bugfixes merged in curl through-out the recent 8-10 months or so. If you've just gone through a lengthy analysis of your code with other AI tools, surely it's reasonable not to expect to see hundreds more from a new tool? It should be possible, unless more bugs are introduced, to eventually get to a state where there are no more bugs in your code. Process aside, it sounds like Daniel expected to find dozens/hundreds more bugs.
jedisct1
Swival found many more vulnerabilities without Mythos https://github.com/swival/security-audits
NitpickLawyer
What's going on in this thread? It's weird how prevalent the negativity towards mythos is, and I'm not sure if it's people throwing the baby out with the bathwater or something more tinfoil-adjacent coordinated campaign. I also noticed this on a thread a few days ago, before the mozilla post. There were dozens of comments saying basically "mythos is vaporware". I get the idea that they're using it for marketing. Of course they are. But to reduce it at "just marketing" feels either ill informed or outright wrong. Unless you have reasons to not believe the dozens of credentialed, well respected people in the field that have already shared their opinions after working with mythos. Plenty of them on all the social media sites. And then there's the team at mozilla. They wrote a blog about this, and they've worked with anthropic before, using opus 4.6 and found and fixed 22 vulnerabilities. Then they worked with mythos and found and fixed 271 vulnerabilities. Unless you're going to accuse them of being shills, these are unquestionable numbers. The model is quantitatively better at this thing. And it matches what everyone is saying. I think there are better things to accuse anthropic of, than that they are simply lying for marketing purposes. Of course they'll use this as a marketing campaign, but there's plenty of evidence out there that there is something there, that the model is simply better than previous generations at this. Don't fall for the cheap reductionist stuff, just because you don't like them, or feel that this is marketing fluff. It doesn't feel like a gimmick, even if it gets used to push their agenda. Something, something, propaganda often uses true statements as well.
andromaton
If priced like other Anthropic models, Mythos will make vulnerability discovery a lot more accessible. The author compares it to AISLE, ZeroPath, and OpenAI’s Codex Security. AISLE and ZeroPath are much more expensive. OpenAI’s Codex Security is gated. Most people don't care about the first two and don't complain about the latter's policy because they are all specialized models and/or harnesses. Mythos will be available to all.
Semkas
I'm disinclined to be overly generous to Antrophic, but I have to say that regardless of whether the talk of Mythos being uniquely dangerous was mostly cynical: It would be great if this starts a trend of giving security-critical software a few months head start with any new significantly improved model.
jrflo
I know that the Mythos hype is part marketing by anthropic, but isn't it possible that with a highly scrutinized codebase, there just aren't any notable security exploits in it's current state? The fact that it found nothing isn't necessarily an incrimination against it, especially when other tools had identified hundreds of exploits previously. Seems like it's been completely picked over (for now).
srcreigh
I can't help but think that curl is, by nature, a relatively simple and well-contained tool. Compare to an operating system or web browser or database or billion dollar company codebase. It makes some sense that Mythos/ChatGPT 5.5 might be that much better with complexities that curl just doesn't have because it's a basic tool. Like yeah curl is obviously extremely fully featured as an "anything client" but it's orders of magnitude less complex than other software we rely on.
romaniv
"I signed the contract for getting access, but then nothing happened. Weeks went past and I was told there was a hiccup somewhere and access was delayed. Eventually, I was instead offered that someone else, who has access to the model, could run a scan and analysis on curl for me using Mythos and send me a report. To me, the distinction isn’t that important." Really? We're talking about (essentially) a product demo from a trillion dollar industry fueled by debt. Clearly, blog posts like this have an immense influence on the perception of usefulness of the particular model and AI in general. With so much staked on this for the company, wouldn't you want to be sure that you're using the actual product without anyone messing with the results in any way?