Why AI Agents Cannot Change Software Systems

jhevans 46 points 36 comments May 27, 2026
phroneses.com · View on Hacker News

Discussion Highlights (18 comments)

taintlord223

I would simplify to: Why agents cannot meaningfully contribute

fzeindl

I was originally sceptical of LLMs and am far from the „agents will magically fix our future“-crowd, but sentences like these trip me up: > „But pattern‑matching is not system understanding, and plausibility is not correctness.“ Why not? Who says that? Who proved that system understanding is not just more complex pattern matching? > „LLMs predict tokens, not consequences“ Same here. LLMs output tokens but who says that they don’t form some internal group of token-predicting tensors that move together and constitute the internal model of a „consequence“? It is like saying humans don’t have thoughts, they just have electrical impulses moving their tongues. I too think that LLMs seem to be a very specific form of intelligence, maybe resembling the parts of our brain that do language-processing, but it is a fact that they at least fake intelligence very convincingly. And that we actually don’t know how they do it.

jvanderbot

TFA falls into a few traps, like a reducto argument about text prediction. There's no reason text prediction can't do these things, fundamentally. But I pretty much agree with what they are saying. The missing "thing" is the developer context. Each agent I kick off needs a nonlinearly increasing amount of coaching, as a function of feature complexity. The sweet spot for productivity is currently the first 3 steps (from TFA), to get things into _my head_, then using the writing abilities more as ubersed or ubergrep with LSP integration. Love it for that. For example, I'll often write the first 5th to 3rd of a feature by hand, then ask the agent to extrapolate from there. The "Core" contains the important bits but in a large system there's a lot of corner cases and wiring, and agents are good a discovering those. I interrupt when it tries to fix things by departing from the design and instead nudge or write a better solution quickly. I absolutely hate the "Spin a cadre of agents to design/implement a feature from a concise spec" workflow. It involves so much planning to get the automatic execution working that it's often just easier to switch to hybrid planning/execution with both AI and people.

lubujackson

The thing is AI can maintain systems. The key point is that it can't do this without human intent, but human intent can be encoded into skills and tied together with orchestration. Rough example: have an LLM generate a plan. Have a skill that refines the plan considering security risks, another that ensures codebase structures are followed, another that considers the infrastructure and usage demands, etc. Then write code and tests. Another process to validate the tests, validate all the above, simplify the logic, etc. The key is that an LLM can do every task capably, even in a complex system. We simply have not built reasonable orchestration of all the human intent behind each filter, and many of them are constantly in flux. It may be that some elements resist encoding because the complexity of encoding is not worth the hassle to maintain. For better or worse, managing intent, orchestrating narrow agentic tasks and solidifying patterns into deterministic code (i.e. validation/tests) is going to be the focus of engineers going forward.

christkv

I find it works if you do it in small parts of the system but systemwide really creates a lot of slop.

dvh

If you spend $x amount of tokens to "produce a PR ready diff", how much of the $x are you willing to give upstream for incorporating your diff and maintain it in the future? So far ai folks seems to expect it to be $0. That's my only issue so far.

antirez

> LLMs generate statistically plausible continuations of text Jesus, it's fucking 2026. Even LeCun would never say this again.

adamtaylor_13

This is a lot of words to confirm what we already know: we have exosuits, not robots. Use them as capability enhancers, not drones who go do all the things without review.

injidup

Why do people keep writing this drivel. Obviously written by an LLM itself. What they are describing and which doesn't work is one shotting a fix. Almost or probably no human can one shot a fix to a significant working system. The human / llm needs to have some form of error correction signal. Either you have a corpus of tests or proof system that prevent regressions. If you have a working system with no tests or validation and let a human loose on it then it will break. How is this different?

baq

> LLMs generate statistically plausible continuations of text. This works well for self-contained tasks like writing a function or drafting documentation because these are pattern‑extension problems. But pattern‑matching is not system understanding, and plausibility is not correctness. closes tab

r_lee

LLM slop article about LLM slop. amazing how this stuff just gets instantly to the front page

DanielHB

One thing I realized is just how much the harnesses are geared towards _not_ parsing files and take shortcuts. And even then I am very unimpressed at the speed these systems output code and the amount of tokens you consume doing fairly basic stuff is quite high. My gut feeling is that it will take at least a couple of orders of magnitude improvements before these LLMs can even hold large systems fully in their context, much less understand them holistically. And I don't see an order of magnitude improvement coming any time soon, it feels the last one was GPT 3.5.

EGreg

"But agentic work is global and transformative: the LLM must change the system itself, which requires understanding dependencies, invariants, interactions, and downstream consequences. This is causal reasoning, not pattern extension. LLMs predict tokens, not consequences — and that is why the leap from writing code to producing a safe, system‑aware PR‑ready diff is not incremental but a shift into a fundamentally different problem space." This is well said. We need a new paradigm. I could go into the shortcomings of the current agent-oriented approaches but it would turn into a huge post. If you want to read it, I wrote it up here: http://safebots.ai/agents.html

jgbuddy

How to get to HN front page: 1) AI generate an article about why AI sucks 2) Profit

passive

I think this does a very good job of describing the real gaps agents are hitting in practical usage, along with a fairly compelling rationale for why those gaps aren't likely to disappear any time soon. If we're going to stabilize the software industry, we need to have more discussions like this that identify what constraints apply. (We should have had those discussion before pushing AI out this widely, but that wouldn't have gotten anyone rich.) I actually think that there's a world of software systems agents can change, but it's materially different from the one we have now, and has a different set of constraints that we've also mostly done a poor job identifying. So hopefully the discussion can help those of us on both sides. ;)

danielpardo

I used Claude Code to migrate from Electron to Node + React across ~6k LOC. It handled the mechanical parts well but anything that has to do with creativity or field of interest required human judgment. AI has no judgement or critical thinking even if it seems so, so we have to be wary to not let AI do this bc it will be poor quality and 0 innovative

liampulles

Developing software is as much about the journey as the destination. I build a lot of my understanding of the actual problem in the pursuit of solving it. There are many times when writing a feature that my spidey senses flare up and tell me that this thing is a lot more painful to code then I was expecting (and will be painful to maintain) and that a more elegant process may actually solve the problem, at which point I'll draw up an alternative option and talk to the product owner. I've definitely started to see the consequences of the converse, which is large amounts of shite brittle code that solved the original spec narrowly, but is now an elephant on our back when we need to add other concerns to the system that cross over. (BTW, this isn't against the use of coding agents entirely, its more against high-level agentic usage. I tend to use Claude Code to do little well defined tasks whilst I reflect on it).

joshka

Counterpoint: https://openai.com/index/harness-engineering/

Semantic search powered by Rivestack pgvector
8,637 stories · 81,559 chunks indexed