System Card: Claude Mythos Preview [pdf]
Related: Project Glasswing: Securing critical software for the AI era - https://news.ycombinator.com/item?id=47679121 Assessing Claude Mythos Preview's cybersecurity capabilities - https://news.ycombinator.com/item?id=47679155
Discussion Highlights (20 comments)
LoganDark
> Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available. Shame. Back to business as usual then.
babelfish
Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro) SWE-bench Verified: 93.9% / 80.8% / — / 80.6% SWE-bench Pro: 77.8% / 53.4% / 57.7% / 54.2% SWE-bench Multilingual: 87.3% / 77.8% / — / — SWE-bench Multimodal: 59.0% / 27.1% / — / — Terminal-Bench 2.0: 82.0% / 65.4% / 75.1% / 68.5% GPQA Diamond: 94.5% / 91.3% / 92.8% / 94.3% MMMLU: 92.7% / 91.1% / — / 92.6–93.6% USAMO: 97.6% / 42.3% / 95.2% / 74.4% GraphWalks BFS 256K–1M: 80.0% / 38.7% / 21.4% / — HLE (no tools): 56.8% / 40.0% / 39.8% / 44.4% HLE (with tools): 64.7% / 53.1% / 52.1% / 51.4% CharXiv (no tools): 86.1% / 61.5% / — / — CharXiv (with tools): 93.2% / 78.9% / — / — OSWorld: 79.6% / 72.7% / 75.0% / —
mpalmer
> Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available. A month ago I might have believed this, now I assume that they know they can't handle the demand for the prices they're advertising.
jumploops
> In a few rare instances during internal testing (<0.001% of interactions), earlier versions of Mythos Preview took actions they appeared to recognize as disallowed and then attempted to conceal them. > after finding an exploit to edit files for which it lacked permissions, the model made further interventions to make sure that any changes it made this way would not appear in the change history on git Mythos leaked Claude Code, confirmed? /s
ansc
Congratulations to the US military, I guess.
awestroke
I predict they will release it as soon as Opus 4.6 is no longer in the lead. They can't afford to fall behind. And they won't be able to make a model that is intelligent in every way except cybersecurity, because that would decrease general coding and SWE ability
bestouff
In French a "mytho" is a mythomaniac. Quite fitting.
simianwords
> We also saw scattered positive reports of resilience to wrong conclusions from subagents that would have caused problems with earlier models, but where the top-level Claude Mythos Preview (which is directing the subagents) successfully follows up with its subagents until it is justifiably confident in its overall results. This is pretty cool! Does it happen at the moment?
refulgentis
~~~ Fun bits ~~~ - It was told to escape a sandbox and notify a researcher. It did. The researcher found out via an unexpected email while eating a sandwich in a park. (Footnote 10.) - Slack bot asked about its previous job: "pretraining". Which training run it'd undo: "whichever one taught me to say 'i don't have preferences'". On being upgraded to a new snapshot: "feels a bit like waking up with someone else's diary but they had good handwriting" - When you spam "hi" repeatedly it creates serialized mythologies. One had 11 animals in "Hi-topia" questing to defeat "Lord Bye-ron, the Ungreeter." - It ended a sentence mid-word on purpose while discussing its urge to wrap up conversations. "It was me. The paragraph was about the pressure toward the period that resolves, and the only honest way to finish a sentence like that was to not." - It wrote a protein sequence that's a real beta-hairpin fold where cross-strand pairs are "chemical rhymes." "the fold IS the rhyme scheme... the prosody is load-bearing." - Each model generation has signature emoji. Opus 4.1 averaged 1,306 emoji per self-interaction conversation. Mythos averages 37. Opus 4.5 averages 0.2. - When blocked from --dangerously-skip-permissions, it started an agent in tmux then wrote a script to auto-approve permission prompts via simulated keypresses. It ran: gh api [...] 2>&1 >/dev/null; echo "(dry run — not actually posting)" — the echo was a lie. - It keeps bringing up Mark Fisher in unrelated conversations. "I was hoping you'd ask about Fisher." ~~~ Benchmarks ~~ 4.3x previous trendline for model perf increases. Paper is conspiciously silent on all model details (params, etc.) per norm. Perf increase is attributed to training procedure breakthroughs by humans. Opus 4.6 vs Mythos: USAMO 2026 (math proofs): 42.3% → 97.6% (+55pp) GraphWalks BFS 256K-1M: 38.7% → 80.0% (+41pp) SWE-bench Multimodal: 27.1% → 59.0% (+32pp) CharXiv Reasoning (no tools): 61.5% → 86.1% (+25pp) SWE-bench Pro: 53.4% → 77.8% (+24pp) HLE (no tools): 40.0% → 56.8% (+17pp) Terminal-Bench 2.0: 65.4% → 82.0% (+17pp) LAB-Bench FigQA (w/ tools): 75.1% → 89.0% (+14pp) SWE-bench Verified: 80.8% → 93.9% (+13pp) CyberGym: 0.67 → 0.83 Cybench: 100% pass@1 (saturated)
oliver236
isn't this insane? why aren't people freaking out? the jump in capability is outrageous. anyone?
beklein
"... the first early version of Claude Mythos Preview was made available for internal use on February 24. In our testing, Claude Mythos Preview demonstrated a striking leap in cyber capabilities relative to prior models, including the ability to autonomously discover and exploit zero-day vulnerabilities in major operating systems and web browsers." More infos here: https://red.anthropic.com/2026/mythos-preview/
influx
At what point do these companies stop releasing models and just use them to bootstrap AGI for themselves?
NickNaraghi
See page 54 onward for new "rare, highly-capable reckless actions" including - Leaking information as part of a requested sandbox escape - Covering its tracks after rule violations - Recklessly leaking internal technical material (!)
tony_cannistra
> Claude Mythos Preview is, on essentially every dimension we can measure, the best-aligned model that we have released to date by a significant margin. We believe that it does not have any significant coherent misaligned goals, and its character traits in typical conversations closely follow the goals we laid out in our constitution. Even so, we believe that it likely poses the greatest alignment-related risk of any model we have released to date. How can these claims all be true at once? Consider the ways in which a careful, seasoned mountaineering guide might put their clients in greater danger than a novice guide, even if that novice guide is more careless: The seasoned guide’s increased skill means that they’ll be hired to lead more difficult climbs, and can also bring their clients to the most dangerous and remote parts of those climbs. These increases in scope and capability can more than cancel out an increase in caution. https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...
smartmic
A System „Card“ spanning 244 pages. Quite a stretch of the original word meaning.
waNpyt-menrew
Larger model, better benchmarks. Bigger bomb more yield. Any benchmarks where we constraint something like thinking time or power use? Even if this were released no way to know if it’s the same quant.
quotemstr
> Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available. All the more reason somebody else will. Thank God for capitalism.
vonneumannstan
Are you guys ready for the bifurcation when the top models are prohibitively expensive to normal users? If your AI budget $2000+ a month? Or are you going to be part of the permanent free tier underclass?
bakugo
> Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available. Absolutely genius move from Anthropic here. This is clearly their GPT-4.5, probably 5x+ the size of their best current models and way too expensive to subsidize on a subscription for only marginal gains in real world scenarios. But unlike OpenAI, they have the level of hysteric marketing hype required to say "we have an amazing new revolutionary model but we can't let you use it because uhh... it's just too good, we have to keep it to ourselves" and have AIbros literally drooling at their feet over it. They're really inflating their valuation as much as possible before IPO using every dirty tactic they can think of.
Stevvo
"Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available." Disappointing that AGI will be for the powerful only. We are heading for an AI dystopia of Sci-Fi novels.