Marcus AI Claims Dataset

davegoldblatt 60 points 51 comments March 04, 2026

Discussion Highlights (17 comments)

barbarr

Can't wait to see Gary Marcus's rebuttal

dakolli

In 2026, I feel like a painter in 2022 being screamed at non stop by people telling me my craft is soon to be dead, that NFTs are the future by people who are largely behaving like gambling addicts (like NFT people).

bananaflag

Well the main thing he is known for is "it's all gonna crash" and that's a fact that this page admits he's wrong about. Everything else, yeah, he's right, and I never doubted. I agree LLMs are unreliable, insecure etc. But I don't deduce from that that they're gonna amount to nothing.

bionhoward

surprisingly accurate! Is Gary the AI equivalent of the “nothing ever happens” guy?

latexr

> All verdicts are LLM-scored, not human-verified. In other words, could be all slop. Or maybe it’s not. Maybe it’s mixed. No one knows.

d_silin

I don't think there will be any market crashes before major AI companies doing IPOs, and then for some time more (late 2027- mid 2028).

albatross79

Sounds about right, boosters are always vaguely claiming he's been obviously and ridiculously wrong, but when you actually listen to him he's tracked very well with the state of AI. GPT 5 was supposed to be AGI remember?

atleastoptimal

His whole thing is to make obvious incontestable claims about AI (LLM’s make mistakes) and connect it to unfalsifiable grand prognostications (It’s all gonna crash… any day now). Its the same tactic any preacher who harps on about the impending apocalypse uses.

dvt

I'm still not sure I fully understand the methodology. For example if Marcus makes the claim: "OpenAI sucks!" why would OpenAI's blog ever corroborate that? The sources used are all AI company blogs (Anthropic, Google, OpenAI) filled with inoffensive corpo-speak likely written to be as middle-of-the-ground as possible. In fact, I'd need an A/B test to make sure the LLM itself can properly rate various claims (positive, negative, and neutral) against such corporate sludge. Small aside: I'm only bringing this up because last year I worked on a game where you had to solve various moral dilemmas in a 1v1 situation (think trolley experiment and one player says "flip the switch" and the other says "don't flip the switch")—the idea was to get an LLM to rate the arguments in a fun turn-based online game. I built it out, but I kind of gave up when I realize how absolutely awful the LLM was at actually rating arguments and their nuances. Who won legitimately felt more like rolling a dice than a verdict given by a real judge or a philosophy professor grading a paper. I put that project aside, but might do a Show HN at some point since the game is basically done. Adjudication[1]—which is the real meat of this project—is done in a very partial way and I genuniely see basically zero value. Why not crawl reddit (or HN)? I know that also has issues, but it at least has more variety of tone. [1] https://github.com/davegoldblatt/marcus-claims-dataset/blob/...

ripped_britches

Strange, I had the same thought about doing this exact exercise this weekend. I think the overall percentage is the wrong approach here. It’s easy to say a lot of things that are factually true or predictions that are inevitably true. However the more salient point with Gary Marcus is the one unforgivable thing he was wrong about and continues to double down on which is that deep learning is hitting a wall. Starting in early 2022 and going through today, there is still so much low hanging fruit with deep learning. Today’s LLM progress is mostly being made in RL. But world models are also still so early and they’re deep learning all the way down. It would be nice if he would just admit he was wrong.

whattheheckheck

Now do this for every single person with actual power

hdgx63

Now do the Pentagon. Gary Marcus is uninteresting cause he has no Power over anything.

camerons03

Piping a few hundred Substack posts through Claude and ChatGPT and slapping a "hybrid reconciliation layer" on top doesn't magically turn token prediction into empirical evidence. Someone is so thin-skinned about a single guy writing a skeptical Substack that they spent their weekend building a dual-pipeline automation tool, scraping four years of his writing, instead of just building a product that actually disproves him. I’m not saying I agree with everything the man says, but until a human actually verifies these verdicts, this is just burnt tokens.

rvz

That's if you trust and believe that the LLMs themselves are 'correctly' scoring. I wouldn't immediately even agree with an assessment made by these LLMs if I were Gary Marcus as that could immediately contradict any of the claims he even made and falling into the trustworthy trap. I'd remain skeptical as ever... ...because this is the worst of the red flags that ultimately supports Gary's argument that the LLM results may be untrustworthy: All verdicts are LLM-scored, not human-verified. People should check for themselves and draw their own conclusions. > The crash hasn't come. yet.

logicprog

I think the problem here is that most of his claims are obvious, uninteresting, and largely agreed with even by the biggest AI hype people, like that AI hallucinates, or that they don't perfectly follow guardrails in the system prompt, or that they can be prompt injected, or that Open AI's financials look bad. But then on the other hand, he completely ignores all of the developments in the field scaffolding around these systems in order to resolve these problems. All of the changes and developments in how these models are trained, all of the things they've actually been able to achieve and do, and basically all of the positive use cases and things that balance out his criticisms. Since he doesn't really talk about any of that, of course he doesn't make false claims about it, he just ignores it, implicitly creating a false picture. And then it is this false picture that he uses to justify his grandiose claims about how everyone should have listened to him about how to do AI and these systems are inevitably going to turn out to be useless and the whole industry is going to collapse and fully disappear and society is going to be ruined and so on. So, of course, it looks like, on the one hand, all of his specific claims about AI are perfectly correct, and on the other hand, that all of his grinder claims about what that implies or means about the industry you have turned out to be wrong, and that he spends much more time on the latter than the former. I think it is really crucial to emphasize that even though most of the individual claims he makes are correct, he spends much more time on the prognostications that are fundamentally not correct, or at least are very speculative right now. I think that's an indication of something gone very wrong with someone's epistemic and incentive situation.

cortesi

I'm sorry, but we can tell absolutely nothing about Gary Marcus from this. People should have a look at the final data: https://github.com/davegoldblatt/marcus-claims-dataset/blob/... Many of the "supported" claims here are vague, banal, obvious, or just opinion. E.g. "the general public hasn't quite realized what's not possible yet" "loads of things scale, but not at all" "To be sentient is to be aware of yourself in the world; LaMDA simply isn't." "To date, nobody, ever, has given a convincing and thorough account of how human children (and human children alone) learn language." "A cat holding a remote control shouldn't have a human hand." "What I didn't see last night was vision" (about Tesla Optimus)

nurettin

You need to be a special kind of troll to use LLMs to respond to someone whose entire online persona is built around "AI bubble".

Marcus AI Claims Dataset

Discussion Highlights (17 comments)

Related Discussions