Project Glasswing: An Initial Update

louiereederson 389 points 229 comments May 22, 2026

Discussion Highlights (20 comments)

OsrsNeedsf2P

The vulnerabilities found continues to impress, and make legacy media, Twitter and Youtube go nuts. But we still have no data to prove this wasn't doable with the same initiative backed by Opus 4.7, and there is no GA for Mythos access.

amusingimpala75

[edit: TFA addresses this, though I still find crazy 90% accuracy overall vs 20% accuracy for curl] Is this suspected vulns or actual vulns? If I recall correctly, it produced 5 for curl but only 1 was legit

InsideOutSanta

I wonder if it coincidentally becomes safe to release when compute capacity bought from SpaceX will provide enough headroom to let a lot more people run it.

0xAstro

I had a fun day today where I had deepseek-v4-flash subagents work out patch for dirty frag for systems with AF_ALG disabled and nscd turned on, to gain root access. The original published exploit wasn't working but the patched one worked like a charm. I am still a believer that a 100 subagents with good-enough intelligence can get same results as mythos, I am ready for this opinion to be shattered when I eventually try mythos and I believe others here must have tried mythos out too.

orangebread

BOOO RELEASE THE MODEL ALREADY GAWD

mdeeks

You can get a taste of this today yourself with Codex Security. I turned it on just as an experiment and in less than a week it has now become essential to all of us. I was shocked how accurate it is, how many security issues it found in existing code, how it continually finds them as we commit, and how NO ONE is immune from making these mistakes. I'd say it is about 90% accurate for us. Often even the "Low" findings lead us to dig and realize it is actually exploitable. Everyone makes these mistakes, from the most junior to the most senior. They are just a class of bugs after all. I expect tools like this to be a regular part of the development lifecycle from here on. We code with AI, we review with AI, we search for vulns with AI. Even if it isn't perfect, it is easily worth the cost IMHO. Highly recommend you get something enabled for your own repos ASAP

vincefutr23

Mythos couldn’t find the “tens thousand” typo in this post?

ares623

> good lord what is happening in there?! > that's just thousands of vulnerabilities being discovered by our trillion parameter model > thousands of vulnerabilities and trillions of parameters?! At current energy prices, in this economic climate, isolated entirely within your datacenter? > yes > may we see it? > no

mlazos

I believe them to some degree but this trend of posting stuff when it can’t be verified actually needs to end. I’m so tired of this bs marketing.

rsync

I asked in a different thread: Do we have a sense that projects like OpenBSD/OpenSSH, FreeBSD, ISC[1] and Apache were included in the "blessed" initial participants in Project Glasswing ? Or is it big name tech companies, banks and fashionable languages and package managers ? [1] Bind, DHCP

chopete3

>> Next, we will work with critical partners—including US and allied governments—to expand Project Glasswing to additional partners. That means, they intend to make a load of money before a general release. It is a good strategy.

nikcub

There has been a lot of cynicism around mythos, that it's just the usual public models without guardrails, etc. etc. but this: > 1,752 of those high- or critical-rated vulnerabilities have now been carefully assessed by one of six independent security research firms, or in a small number of cases by ourselves. Of these, 90.6% (1,587) have proved to be valid true positives, and 62.4% (1,094) were confirmed as either high- or critical-severity. for anybody who has applied opus, codex or oss models for vuln scanning - the true positive rate and discovery volume are a clear step change[0]. The ~50 partners in Glasswing have largely all previously run harnesses with other models and many of them have come out and said - essentially - "ye, wow" Question now is what a second and third phases of access looks like - deciding which class of systems to secure. Routers, firewalls, SaaS, ERP systems, factory controllers, SCADA systems, zero-trust VPN gateways, telecoms gear and networks, medical devices - there's just so much to do This is why I believe mythos will remain private for the foreseeable future. There's such a large surface that needs to be secured and so much to triage, fix, deploy. That may suit Anthropic as private models can't be distilled. There's also a runaway effect of model improvement from the discovery, triage and fix data. This is likely already the most potent corpus of curated offensive data ever assembled and will only get better. I don't see how Chinese companies are given access soon, or ever. We're likely going to see a world soon of CISA mandated audits, and where to buy a mythos-proof VPN gateway or home router - you'll have to buy American[1]. [0] vs ~30% or so in regular audit tools [1] or allied

giancarlostoro

> Since then, we and our approximately 50 partners have used Claude Mythos Preview to find more than ten thousand high- or critical-severity vulnerabilities across the most systemically important software in the world. Progress on software security used to be limited by how quickly we could find new vulnerabilities. Now it’s limited by how quickly we can verify, disclose, and patch the large numbers of vulnerabilities found by AI. I guess they forgot to scan Visual Studio Code plugins and their endless npm dependencies.

antirez

I have the feeling posts like that should be 1/4 the size, at max. At this point I don't care if it is AI-slop or human-slop: they are surprisingly alike. Information must be more dense, each sentence must carry some truth.

jimmar

People predict that in 50 years, no human will be driving a car, and people will be shocked that we let humans drive cars manually. Coding may be the same. So many vulnerabilities in code written by very competent programmers. Manually building large, complex systems without major bugs or security vulnerabilities seems to be a nearly impossible challenge.

mixologic

Right now the only codebase I care about them fixing vulnerabilities in are the 3800 repositories that got stolen from GitHub. "Vulnerabilities in the software that makes the internet" is honestly lower priority than "The platform that the software that makes the internet uses to make releases" If buyers of those internal repos find ways to break into GitHub such that they can cut software releases, or poison github actions from a distance, then we're all in a very ugly mess. Don't forget that in those 3800 repos is likely also npmjs.org itself.

bevekspldnw

How much of this is RL’ing a good coding model on every CVE ever?

kalashvasaniya

this is INSANEEE

ayeeeeeeeeee

It would be informative to publish not only vulnerability numbers, but also vulnerability type statistics (as available here for example: https://cvedb.github.io/years.html ), such that programmers can understand which types of exploits popular systems and languages commonly allow, and thereby encourage fundamental changes to fix or transition away from them.

mikmoila

Code contains deviations from assumed behaviour, and some behaviours might manifest themselves as failures. Some failures might be exploitable by attackers.

Project Glasswing: An Initial Update

Discussion Highlights (20 comments)

Related Discussions