Different language models learn similar number representations
Anon84
94 points
38 comments
April 24, 2026
Related Discussions
Found 5 related stories in 64.5ms across 5,498 title embeddings via pgvector HNSW
- Top AI models underperform in languages other than English Brajeshwar · 19 pts · March 19, 2026 · 57% similar
- Language model teams as distributed systems jryio · 87 pts · March 16, 2026 · 56% similar
- Language Model Contains Personality Subnetworks PaulHoule · 48 pts · March 02, 2026 · 50% similar
- Show HN: I built a tiny LLM to demystify how language models work armanified · 249 pts · April 06, 2026 · 47% similar
- Show HN: We fingerprinted 178 AI models' writing styles and similarity clusters nuancedev · 76 pts · April 08, 2026 · 47% similar
Discussion Highlights (9 comments)
gn_central
Curious if this similarity comes more from the training data or the model architecture itself. Did they look into that?
ACCount37
The "platonic representation hypothesis" crowd can't stop winning. Potentially useful for things like innate mathematical operation primitives. A major part of what makes it hard to imbue LLMs with better circuits is that we don't know how to connect them to the model internally, in a way that the model can learn to leverage. Having an "in" on broadly compatible representations might make things like this easier to pull off.
dboreham
It's going to turn out that emergent states that are the same or similar in different learning systems fed roughly the same training data will be very common. Also predict it will explain much of what people today call "instinct" in animals (and the related behaviors in humans).
matja
The eigenvalue distribution looks somewhat similar to Benford's Law - isn't that expected for a human-curated corpus?
causal
Title is editorialized and needs to be fixed; the paper does not say what this title implies, nor is that the title of the paper.
jdonaldson
(Pardon the self promotion) Libraries like turnstyle are taking advantage of shared representation across models. Neurosymbolic programming : https://github.com/jdonaldson/turnstyle
fmbb
> Language models trained on natural text learn to represent numbers using periodic features with dominant periods at T=2,5,10. This proves a decimal system is correct. Base twelve numeral systems are clearly unnatural and inefficient.
zjp
Different models, similar number representations. Different models for different languages, similar concept representations. They have to learn all of this from human text input, so they're not divining it themselves. It all makes a strong case for universal grammar, IMO.
sigbottle
What exactly is the Platonic Representation Hypothesis? You just don't "learn reality" by getting good at representations. You can learn a data set. You can learn a statistical regularity in things such as human languages. You can analyze the concept spaces of LLM's and compare them numerically. I agree with that. What the hell does "learning an objective shared reality" mean? This reminds me of EY saying that a solomonoff inductor would learn all of physics in a few days of a 1920x1080 data stream. Either it's false (because it needs to do empirical testing itself), or it's true, but only if you presuppose the idea that it has a perfect model of all the interactions of the world and can decide between all theories a priori... so then why are we even asking if it's a "perfect learner"? It already has a model for all possible interactions already, there's nothing out of distribution. You might argue, "Well, which model is the correct one?" That's the wrong question already - empirical data is often about learning what you didn't know that you didn't know , not just learning about in-distribution unknowns. I just get an ick because I associate people talking about this hypothesis as if "LLM's converge on shared objective reality => they are super smart and objective, unlike humans". LLM's can be smart. They can even be smarter than humans. It's also true that empiricism is king.