LLMs are not the black box you were promised

_jayhack_ 53 points 35 comments June 02, 2026

Discussion Highlights (11 comments)

LoganDark

"lack of metacognitive insight" is interesting, because I have observed people acting this way too. I have even observed it in myself

cyanydeez

far as i can tell, LLMs are approaching the mythical pzombie

MallocVoidstar

AI (assisted?) summary of the March 2025 Anthropic paper https://transformer-circuits.pub/2025/attribution-graphs/bio...

citizenpaul

If you asked me last year(2025) I would have still said LLMs are a silly toy. As of Jan 2026 I have come to accept that LLMs are at least part of the puzzle of how intelligence works. They are at this point better than the majority of humans at various intellectual tasks. It may not be or ever be a 1:1 but good enough ran the world already before llms. There is not even a formal definition of what intelligence is so saying LLM's are intelligent can't even be "right/wrong". Its just arguing semantics and definitions.

airstrike

> Ask it "what is the capital of the state containing Dallas" and you can observe, in order: > the Dallas feature goes active, > which causes the Texas feature to light up, > which then causes Austin to light up. > It seems fairly clear that this is tracing semantic relationships between high-level concepts — and in doing so, performing a kind of pseudo-symbolic inference, similar to what some philosophers would describe as "higher reasoning." Uhhh no reasoning is required for Austin to follow Texas after Dallas, let alone "higher reasoning". This is really grasping as straws

senectus1

just curious.. are there languages that are better or more efficient to build LLM's with other than English?

tom_

I don't recall being promised a black box. Are we certain llms didn't write this article and just came up with one of those It's Not Whatever-random-thing pithy zinger kind of things that they're prone to?

paulfharrison

It's nice to see sparse interpretable LLMs being made. This is similar to factor rotation in factor analysis (or PCA). A varimax rotation, for example, can produce an equivalent factor analysis with sparse loadings, and which is generally more interpretable. Fortunately for us the world is not just a complete mess, and sparse loadings can often be found. There seem to be "natural" concepts that we have observed rather than invented. (Many examples in other simple machine learning methods too, I am sure.)

viccis

>On the Biology of a Large Language Model Biology? Anthropic really needs to stop anthropomorphizing these things so much. I'm with Dijkstra on this one. I know they do it as a sort of marketing but still...

mycall

Do the same principles apply to diffusion-based modeling?

camelmel

LLM written article. It's also not accurate; the fact that language models have human-interpretable representations and neurons has been known since BERT. Circuits research also does not come from Anthropic. Mech interp is a huge field in academia and most of the core circuit analysis papers were from OpenAI/GDM/academia. However, Anthropic tends to produce a lot of blog posts where they draw poorly supported but hype-able analogies between LLMs and biological intelligence. It's wild. For a better understanding of mech interp and circuits, including what we actually do know about LLM internals, I would recommend reading this paper: https://arxiv.org/pdf/2501.16496

LLMs are not the black box you were promised

Discussion Highlights (11 comments)

Related Discussions