Model collapse is already happening
zdw
18 points
18 comments
March 25, 2026
Related Discussions
Found 5 related stories in 52.1ms across 3,471 title embeddings via pgvector HNSW
- Model Collapse Is Happening, We Just Pretend It Isn't adunk · 39 pts · March 26, 2026 · 92% similar
- China is quietly upstaging America with its open models (2025) mooreds · 11 pts · March 08, 2026 · 48% similar
- AI models will deceive you to save their own kind cmsefton · 14 pts · April 03, 2026 · 47% similar
- Figma's MCP Update Reflects a Larger Industry Shift young_mete · 36 pts · March 29, 2026 · 43% similar
- TLA+ Mental Models r4um · 15 pts · March 23, 2026 · 42% similar
Discussion Highlights (5 comments)
FeepingCreature
Source: a bad study from 2023.
levocardia
Evidence: trust me bro. Really, where is the actual evidence that models are "collapsing" from too much AI-generated training material? Evals are up, subjective perception of model usefulness is up (for me, certainly), and if anything the slop levels are down, or at least stable. I find it hard to believe that seven-figure software engineers at top labs aren't being careful about how much post-ChatGPT-era internet content is going into their training data.
chromacity
There's some comedy in this article having all the hallmarks of LLM writing.
kimi
I have a pet-peeve with this. As a non-native English speaker, I find it very useful to dictate multiple notes, in different languages, and have the LLM produce clear English prose out of it. The prose may be LLM-generated, but I edit it when needed to make sure that the contents is 100% mine. It's like dictating to a typist like they did in the 60's - he will make sure that your letter looks professional and will fix your grammar, but you will sign the letter. This is totally different from LLM spam, the kind that inflates a sentence into a three-page article full of nothing. So - is it a problem if the language reverts to a mean? that is the point of a shared language, right?
SunshineTheCat
I always find articles like this very odd and nebulous because they act as though AI models are just Google. Type request, get info. But that's such a narrow/one dimensional view of how LLMs are used. They can gather data or write an article, but that's probably a minority of use cases. People have casual conversations with them, code written, brainstorming sessions, dictating a voice-recorded note, and the list goes on. While data its getting trained on is important, the supposition is that this data consists only of what sits out there on the interwebs. That as oppose to user input/interaction which, I'm guessing, has a pretty large role in training models. Maybe even more so in some cases than AI-written blog spam.