The famous O3 "GeoGuessr" prompt did not work
ingve
45 points
13 comments
May 21, 2026
Related Discussions
Found 5 related stories in 79.9ms across 8,303 title embeddings via pgvector HNSW
- Google Search Is Down recklesspigeon · 16 pts · May 12, 2026 · 41% similar
- Just Send the Prompt medalblue · 35 pts · March 04, 2026 · 41% similar
- GeoJSON tosh · 153 pts · May 08, 2026 · 40% similar
- Google just spat in my face gsky · 17 pts · May 21, 2026 · 39% similar
- Chat GPT 5.2 cannot explain the German word "geschniegelt" doener · 65 pts · March 23, 2026 · 39% similar
Discussion Highlights (6 comments)
grebc
I wonder if in all the sampling that all location meta data was stripped.
mickeyp
This test would be a lot more useful if the author used images the models obviously hadn't seen before. Pulling images from Wikipedia? They'll have seen 'em before, and the metadata, and all the pages they were casually linked to. The premise that the long prompt only made the model think 'a second longer' may have more to do with the fact that it knows about the images. So why think harder if you know the answer? At no point does the author contemplate that.
vintermann
Interesting what he reports, that newer models are worse at geolocation. Sorry if I'm getting paranoid, but I wonder if that's a deliberately nerfed capability.
Gys
> I think this shows how easy it is to fool yourself about the quality of prompting. When the model is already pretty good at a task, you can give it a very elaborate prompt without impacting performance. It’ll still be pretty good, except this time it’s good because of what you did.
fontain
“It’s also interesting to me that nobody checked this at the time. It took me about six hours of fairly-distracted work and about $15 to construct and run this benchmark. Why didn’t anyone do this when they were writing articles about how good the o3 prompt was?” Because the meta around AI is not rigorous reporting on the nuance of capabilities but bold claims that are easy to retweet. There is no incentive to say “actually, AI is not good at this”. Nobody checked it because nobody cares. There are lots of tasks that AI can be useful for but almost all of the headline claims (including Mythos) are exaggerated at best and bunk at worst.
_fs
I still return to O3 often. I enjoy metal detecting and O3 has been excellent at identifying unknown finds. It will spend 5-10 minutes in python adjusting the photo, zooming, cropping and manipulating it to get a better understanding of the object. And it's guesses, though not perfect, are often spot on. Newer models will never manipulate the photo and usually give a "guess" within 30 seconds. The guesses coming from the newer models rarely are even in the ballpark of the item. It will be a sad day when O3 goes away.