GPT-5.4

mudkipdev 739 points 624 comments March 05, 2026
openai.com · View on Hacker News

https://openai.com/index/gpt-5-4-thinking-system-card/ https://x.com/OpenAI/status/2029620619743219811

Discussion Highlights (20 comments)

ignorantguy

it shows a 404 as of now.

mattas

"GPT‑5.4 interprets screenshots of a browser interface and interacts with UI elements through coordinate-based clicking to send emails and schedule a calendar event." They show an example of 5.4 clicking around in Gmail to send an email. I still think this is the wrong interface to be interacting with the internet. Why not use Gmail APIs? No need to do any screenshot interpretation or coordinate-based clicking.

denysvitali

Article: https://openai.com/index/introducing-gpt-5-4/ gpt-5.4 Input: $2.50 /M tokens Cached: $0.25 /M tokens Output: $15 /M tokens --- gpt-5.4-pro Input: $30 /M tokens Output: $180 /M tokens Wtf

minimaxir

The marquee feature is obviously the 1M context window, compared to the ~200k other models support with maybe an extra cost for generations beyond >200k tokens. Per the pricing page, there is no additional cost for tokens beyond 200k: https://openai.com/api/pricing/ Also per pricing, GPT-5.4 ($2.50/M input, $15/M output) is much cheaper than Opus 4.6 ($5/M input, $25/M output) and Opus has a penalty for its beta >200k context window. I am skeptical whether the 1M context window will provide material gains as current Codex/Opus show weaknesses as its context window is mostly full, but we'll see. Per updated docs ( https://developers.openai.com/api/docs/guides/latest-model ), it supercedes GPT-5.3-Codex, which is an interesting move.

Chance-Device

I’m sure the military and security services will enjoy it.

twtw99

If you don't want to click in, easy comparison with other 2 frontier models - https://x.com/OpenAI/status/2029620619743219811?s=20

jryio

1 million tokens is great until you notice the long context scores fall off a cliff past 256K and the rest is basically vibes and auto compacting.

iamronaldo

Notably 75% on os world surpassing humans at 72%... (How well models use operating systems)

minimaxir

More discussion here on the blog post announcement which has been confusingly penalized by Hacker News's algorithm: https://news.ycombinator.com/item?id=47265005

ZeroCool2u

Bit concerning that we see in some cases significantly worse results when enabling thinking. Especially for Math, but also in the browser agent benchmark. Not sure if this is more concerning for the test time compute paradigm or the underlying model itself. Maybe I'm misunderstanding something though? I'm assuming 5.4 and 5.4 Thinking are the same underlying model and that's not just marketing.

egonschiele

The actual card is here https://deploymentsafety.openai.com/gpt-5-4-thinking/introdu... the link currently goes to the announcement.

nickysielicki

can anyone compare the $200/mo codex usage limits with the $200/mo claude usage limits? It’s extremely difficult to get a feel for whether switching between the two is going to result in hitting limits more or less often, and it’s difficult to find discussion online about this. In practice, if I buy $200/mo codex, can I basically run 3 codex instances simultaneously in tmux, like I can with claude code pro max, all day every day, without hitting limits?

strongpigeon

It's interesting that they charge more for the > 200k token window, but the benchmark score seems to go down significantly past that. That's judging from the Long Context benchmark score they posted, but perhaps I'm misunderstanding what that implies.

tmpz22

Does this improve Tomahawk Missile accuracy?

simianwords

What is the point of gpt codex?

ilaksh

Remember when everyone was predicting that GPT-5 would take over the planet?

nthypes

$30/M Input and $180/M Output Tokens is nuts. Ridiculous expensive for not that great bump on intelligence when compared to other models.

world2vec

Benchmarks barely improved it seems

cj

I use ChatGPT primarily for health related prompts. Looking at bloodwork, playing doctor for diagnosing minor aches/pains from weightlifting, etc. Interesting, the "Health" category seems to report worse performance compared to 5.2.

wahnfrieden

No Codex model yet

Semantic search powered by Rivestack pgvector
3,471 stories · 32,344 chunks indexed