Language Model Contains Personality Subnetworks
PaulHoule
48 points
27 comments
March 02, 2026
Related Discussions
Found 5 related stories in 49.5ms across 3,663 title embeddings via pgvector HNSW
- Language model teams as distributed systems jryio · 87 pts · March 16, 2026 · 62% similar
- Emotion Concepts and Their Function in a Large Language Model Anon84 · 15 pts · April 04, 2026 · 52% similar
- Emotion concepts and their function in a large language model ianbutler · 11 pts · April 02, 2026 · 50% similar
- Emotion concepts and their function in a large language model dnw · 167 pts · April 04, 2026 · 50% similar
- TLA+ Mental Models r4um · 15 pts · March 23, 2026 · 49% similar
Discussion Highlights (4 comments)
sarducci
to me this suggests that language strongly influences behavior
D-Machine
The personality thing seems kind of tautological / uninteresting, as I have pointed out before: https://news.ycombinator.com/item?id=46905692 . Psychological instruments and concepts (like MBTI) are constructed from the semantics of everyday language. Personality models (being based on self-report, and not actual behaviour) are not models of actual personality, but the correlation patterns in the language used to discuss things semantically related to "personality". It would be thus extremely surprising if LLM-output patterns (trained on people's discussions and thinking about personality) would not also result in learning similar correlational patterns (and thus similar patterns of responses when prompted with questions from personality inventories). The real and more interesting part of the paper is the use of statistical techniques to isolate sub-networks which can then be used to emit outputs more consistent with some desired personality configuration. There is no obvious reason to me that this couldn't be extended to other types of concepts, and it kind reads to me like a way of doing a very cheap, training-free sort of "fine-tuning".
est
is this somehow related ? https://www.anthropic.com/research/persona-selection-model
tl2do
The word "personality" smuggles in biological assumptions. Asking "does this model have personality?" feels unproductive because the term implies something it can't be. More useful framing: how do these subnetworks produce outputs that observers evaluate as personality-consistent? Personality isn't an internal property - it's a judgment made by people watching behavior.