Epoch confirms GPT5.4 Pro solved a frontier math open problem

in-silico 218 points 129 comments March 24, 2026

Discussion Highlights (17 comments)

6thbit

> Subsequent to this solve, we finished developing our general scaffold for testing models on FrontierMath: Open Problems. In this scaffold, several other models were able to solve the problem as well: Opus 4.6 (max), Gemini 3.1 Pro, and GPT-5.4 (xhigh). Interesting. Whats that “scaffold”? A sort of unit test framework for proofs?

karmasimida

No denial at this point, AI could produce something novel, and they will be doing more of this moving forward.

osti

Seems like the high compute parallel thinking models weren't even needed, both the normal 5.4 and gemini 3.1 pro solved it. Somehow Gemini 3 deepthink couldn't solve it.

renewiltord

Fantastic news! That means with the right support tooling existing models are already capable of solving novel mathematics. There’s probably a lot of good mathematics out there we are going to make progress on.

pinkmuffinere

As someone with only passing exposure to serious math, this section was by far the most interesting to me: > The author assessed the problem as follows. > [number of mathematicians familiar, number trying, how long an expert would take, how notable, etc] How reliably can we know these things a-priori? Are these mostly guesses? I don't mean to diminish the value of guesses; I'm curious how reliable these kinds of guesses are.

an0malous

I feel like there’s a fork in our future approaching where we’ll either blossom into a paradise for all or live under the thumb of like 5 immortal VCs

johnfn

I like to imagine that the number of consumed tokens before a solution is found is a proxy for how difficult a problem is, and it looks like Opus 4.6 consumed around 250k tokens. That means that a tricky React refactor I did earlier today at work was about half as hard as an open problem in mathematics! :)

alberth

For those, like me, who find the prompt itself of interest … > A full transcript of the original conversation with GPT-5.4 Pro can be found here [0] and GPT-5.4 Pro’s write-up from the end of that transcript can be found here [1]. [0] https://epoch.ai/files/open-problems/gpt-5-4-pro-hypergraph-... [1] https://epoch.ai/files/open-problems/hypergraph-ramsey-gpt-5...

tombert

I was trying to get Claude and Codex to try and write a proof in Isabelle for the Collatz conjecture, but annoyingly it didn't solve it, and I don't feel like I'm any closer than I was when I started. AI is useless! In all seriousness, this is pretty cool. I suspect that there's a lot of theoretical math that haven't been solved simply because of the "size" of the proof. An AI feedback loop into something like Isabelle or Lean does seem like it could end up opening up a lot of proofs.

daveguy

New goalpost, and I promise I'm not being facetious at all, genuinely curious: Can an AI pose an frontier math problem that is of any interest to mathematicians? I would guess 1) AI can solve frontier math problems and 2) can pose interesting/relevant math problems together would be an "oh shit" moment. Because that would be true PhD level research.

vlinx

This is a remarkable result if confirmed independently. The gap between solving competition problems and open research problems has always been significant - bridging that gap suggests something qualitatively different in the model capabilities.

measurablefunc

I guess this means AI researchers should be out of jobs very soon.

Validark

I have long said I am an AI doubter until AI could print out the answers to hard problems or ones requiring tons of innovation. Assuming this is verified to be correct (not by AI) then I just became a believer. I would like to see a few more AI inventions to know for sure, but wow, it really is a new and exciting world. I really hope we use this intelligence resource to make the world better.

data_maan

A model to whose internals we don't have access solved a problem we didn't knew was in their datasets. Great, I'm impressed

gnarlouse

We only get one shot.

virgildotcodes

I don't know why I am still perpetually shocked that the default assumption is that humans are somehow unique. It's this pervasive belief that underlies so much discussion around what it means to be intelligent. The null hypothesis goes out the window. People constantly make comments like "well it's just trying a bunch of stuff until something works" and it seems that they do not pause for a moment to consider whether or not that also applies to humans. If they do, they apply it in only the most restrictive way imaginable, some 2 dimensional caricature of reality, rather than considering all the ways that humans try and fail in all things throughout their lifetimes in the process of learning and discovery. There's still this seeming belief in magic and human exceptionalism, deeply held, even in communities that otherwise tend to revolve around the sciences and the empirical.

qnleigh

Their 'Open Problems page' linked below gives some interesting context. They list 15 open problems in total, categorized as 'moderately interesting,' 'solid result,' 'major advance,' or 'breakthrough.' The solved problem is listed as 'moderately interesting,' which is presumably the easiest category. But it's notable that the problem was selected and posted here before it was solved. I wonder how long until the other 3 problems in this category are solved. https://epoch.ai/frontiermath/open-problems

Epoch confirms GPT5.4 Pro solved a frontier math open problem

Discussion Highlights (17 comments)

Related Discussions