Where the goblins came from
ilreb
313 points
138 comments
April 30, 2026
Related Discussions
Found 5 related stories in 85.5ms across 8,303 title embeddings via pgvector HNSW
- OpenAI Codex system prompt includes directive: "never talk about goblins" ndr42 · 14 pts · April 29, 2026 · 60% similar
- Why is GPT-5.4 obsessed with Goblins? pants2 · 13 pts · March 10, 2026 · 58% similar
- Anthropic's Little Brother paulpauper · 16 pts · April 28, 2026 · 49% similar
- How OpenAI Kills Oracle napolux · 12 pts · April 26, 2026 · 47% similar
- GPT-5.4 mudkipdev · 739 pts · March 05, 2026 · 46% similar
Discussion Highlights (20 comments)
maxdo
article : bla blah blah, marketing... we are fun people, bla blah, goblin, we will not destroy the world you live in.. RL rewards bug is a culprit. blah blah.
nomilk
> We unknowingly gave particularly high rewards for metaphors with creatures. I recall a math instructor who would occasionally refer to variables (usually represented by intimidating greek letters) as "this guy". Weirdly, the casual anthropomorphism made the math seem more approachable. Perhaps 'metaphors with creatures' has a similar effect i.e. makes a problem seem more cute/approachable. On another note, buzzwords spread through companies partly because they make the user of the buzzword sound smart relative to peers, thus increasing status. (examples: "big data" circa 2013, "machine learning" circa 2016, "AI" circa 2023-present..). The problem is the reputation boost is only temporary; as soon as the buzzword is overused (by others or by the same individual) it loses its value. Perhaps RLHF optimises for the best 'single answer' which may not sufficiently penalise use of buzzwords.
JoshTriplett
A plausible theory I've seen going around: https://x.com/QiaochuYuan/status/2049307867359162460
dakolli
Ahh I see. I guess when I turned off privacy settings and allowed training on my code, then generated 10 million .md files with random fantasy books, the poisoning worked. Keep using AI and you'll become a goblin too.
recursivedoubts
> Why it matters i despise this title so much now
tim-tday
So, you brain damaged your model with a system prompt.
canpan
I wondered how is training data balanced? If you put in to much Wikipedia, and your model sounds like a walking encyclopedia? After doing the Karpathy tutorials I tried to train my AI on tiny stories dataset. Soon I noticed that my AI was always using the same name for its stories characters. The dataset contains that name consistently often.
themafia
> You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. Just; the mentality required to write something like that, and then base part of your "product" on it. Is this meant to be of any actual utility or is it meant to trap a particular user segment into your product's "character?"
ninjagoo
> the evidence suggests that the broader behavior emerged through transfer from Nerdy personality training. > The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them > Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data. Sounds awfully like the development of a culture or proto-culture. Anyone know if this is how human cultures form/propagate? Little rewards that cause quirks to spread? Just reading through the post, what a time to be an AInthropologist. Anthropologists must be so jealous of the level of detailed data available for analysis. Also, clearly even in AI land, Nerdz Rule :) PS: if AInthropologist isn't an official title yet, chances are it will likely be one in the near future. Given the massive proliferation of AI, it's only a matter of time before AI/Data Scientist becomes a rather general term and develops a sub-specialization of AInthropologist...
ollin
For context, two days ago some users [1] discovered this sentence reiterated throughout the codex 5.5 system prompt [2]: > Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query. [1] https://x.com/arb8020/status/2048958391637401718 [2] https://github.com/openai/codex/blob/main/codex-rs/models-ma...
postalcoder
Would love if OpenAI did more of these types of posts. Off the top of my head, I'd like to understand: - The sepia tint on images from gpt-image-1 - The obsession with the word "seam" as it pertains to coding Other LLM phraseology that I cannot unsee is Claude's "___ is the real unlock" (try google it or search twitter!). There's no way that this phrase is overrepresented in the training data, I don't remember people saying that frequently.
hsuduebc2
I. Love. This.
jumploops
TIL gremlins weren’t just used to explain mysterious mechanical failures in airplanes, it’s the origin story of the term ‘gremlin’ itself[0]. I had always assumed there was some previous use of the term, neat! [0] https://en.wikipedia.org/wiki/Gremlin
acuozzo
Weird. I thought they came from Nilbog.
innis226
I suspect this was intentionally added. Just to give some personality and to fuel hype
iterateoften
This is funny because it’s a silly topic, but I think it shows something extremely seriously wrong with llms. The goblins stand out because it’s obvious. Think of all the other crazy biases latent in every interaction that we don’t notice because it’s not as obvious. Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.
albert_e
If a tiny misconfiguration of reward system can cause such noticeable annoyance ... What dangers lurk beneath the surface. This is not funny.
x0x7
I suspected OpenAI was actively training their models to be cringy in the thought that it's charming. Turns out it's true. And they only see a problem when it narrows down on one predicliction. But they should have seen it was bad long before that.
ComputerGuru
The explanation is very concerning. Lexical tidbits shouldn’t be learnt and reinforced across cross sections. Here, gremlin and goblin went from being selected for in the nerdy profile to being selected for in all profiles. The solution was easy: don’t mention goblins. But what about when the playful profile reinforces usage of emoji and their usage creeps up in all other profiles accordingly? Ban emoji everywhere? Now do the same thing for other words, concepts, approaches? It doesn’t scale! It seems like models can be permanently poisoned.
pants2
Nice, OpenAI mentioned my HackerNews post in their article :) I appreciate that they wrote a whole blog post to explain! https://news.ycombinator.com/item?id=47319285