New Research Reassesses the Value of Agents.md Files for AI Coding

noemit 19 points 22 comments March 08, 2026
www.infoq.com · View on Hacker News

Discussion Highlights (8 comments)

verdverm

That research has been so misinterpreted for headlines and clicks... AGENTS.md are extremely helpful if done well.

noemit

The research mostly points to LLM-generated context lowering performance. Human-generated context improves performance, but any kind of AGENTS.md file increases token use, for what they say is "fake thinking." More research is needed.

stingraycharles

What is going on in this thread and why are all comments downvoted so heavily?

nayroclade

I suspect AGENTS.md files will prove to be a short-lived relic of an era when we had to treat coding agents like junior devs, who often need explicit instructions and guardrails about testing, architecture, repo structure, etc. But when agents have the equivalent (or better) judgement ability as a senior engineer, they can make their own calls about these aspects, and trying to "program" their behaviour via an AGENTS.md file becomes as unhelpful as one engineer trying to micro-manage another's approach to solving a problem.

dev_l1x_be

I never use these files and give the current guardrails of a specific task to each short run for agents. Have task specific “agents.md” works better for me.

CrzyLngPwd

I have a legacy codebase of around 300k lines spread across 1.5k files, and have had amazing success with the agents.md file. It just prevents hallucinations and coerces the AI to use existing files and APIs instead of inventing them. It also has gold-standard tests and APIs as examples. Before the agents file, it was just chaos of hallucinations and having to correct it all the time with the same things.

lmeyerov

I liked they did this work + its sister paper, but disliked how it was positioned basically opposite of the truth. The good: It shows on one kind of benchmark, some flavors of agentically-generated docs don't help on that task. So naively generating these, for one kind of task, doesn't work. Thank you, useful to know! The bad: Some people assume this means in general these don't work, or automation can't generate useful ones. The truth: Instruction files help measurably, and just a bit of engineering enables you to guarantee high scores for the typical cases. As soon as you have an objective function, you can flip it into an eval, and set an AI coder to editing these files until they work. Ex: We recently released https://github.com/graphistry/graphistry-skills for more easily using graphistry via AI coding, and by having our authoring AI loop a bit with our evals, we jumped the scores from 30-50% success rate to 90%+. As we encounter more scenarios (and mine them from our chats etc), it's pretty straight forward to flip them into evals and ask Claude/Codex to loop until those work well too. We do these kind of eval-driven AI coding loops all the time , and IMO how to engineer these should be the message, not that they don't work on average. Deeper example near the middle/end of the talk here: https://media.ccc.de/v/39c3-breaking-bots-cheating-at-blue-t...

OutOfHere

Duplicate of https://news.ycombinator.com/item?id=47280099

Semantic search powered by Rivestack pgvector
3,471 stories · 32,344 chunks indexed