Not enough on its own you'd need artifacts to store contexts/TOC/lists I think shorter the better. also a strange finding from my own experiences: specific empirical formats seem to yield much better results. For example people often say "get this done to 100%" but I say "get this to 88.47%".
asp_hornet
As the author notes in the end, it would be really interesting to do these again on more recent models. I wonder if the no context file being cheaper still stands. But then how much does the harness influence the results. It can be frustrating trying to gauge what’s influencing what and if something suddenly starts working against you.
wiseowise
You putting “you’re an expert jerk off master” in agents.md is the same as shaman burning a bone to predict a future.
weddpros
If adding something to the context doesn't help, it's only proves you're not adding the right stuff. I'm adding pointers to specification documents, and it saves me from the /new dumb coding agent that sees your code base for the first time and knows nothing about architecture, concepts, code organisation, etc... I'm using no cookie cutter directives though (except maybe "do not attempt to deploy, we're using CI CD to deploy" to avoid an automatic "wrangler deploy" to Cloudflare)
RugnirViking
yes, they do. I think people overindex on this paper, I remember when it came out we had a lot of discussion in my company about it. But its clear to see they do at least change the agent's behavior, and things like telling it "always use xyz version of java, use gradle to build the project, use this command to run the tests" are really important instead of letting it fumble about trying to find the right thing every time you ask it anything I think the problem some people fall into, and especially LLM authored ones (which is where they see the documents not helping here) is instead describing the code, or the structure of the code. Which I don't think helps much - the agent can already see you have 4 modules called a b c and d, and can read the readmes inside of them just fine if it has questions. One more marginal thing I find helpful but im less sure has positive impact is describing the right terminology for the agent so it can be smarter at communicating with the developer. Things like different names for the product, products it interfaces with, resource names in infra, terms from the customer and product team. I don't think it helps the agent code (much) but it does help communication if it knows what we mean when we speak (and naming things is, as we know, one of the hard problems in CS) Overall, most of my agents.md now are a list of useful bash commands for working and testing with the project & tests. (heres how to spin up docker services, heres how to update the libraries, heres how to run a command against the local db, heres how to insert a document to be run etc) and then at the end a terminology blob, which I find myself referencing too.
sebra
The tweet misses the conclusion from the paper that handcrafted AGENTS.md might help. To me its no surprise that 100% vibed AGENTS.md are unproductive. Not reviewing your design docs is probably even worse than not reviewing your code? I've seen some AI-generated agents.md which were just plain wrong. No surprise agents perform worse after reading those. I use AGENTS.md to make sure my agents loop effectively (tests, quality, etc). Not to describe the code / architecture.
kandros
The amount of cargo-culting around AI tooling and practices is so weird to me. Why not just try and see? The fast feedback loop allow testing all kind of weird theories in a matter of 30m-1h during normal working sessions, most results are obvious
deaux
[dupe] https://news.ycombinator.com/item?id=47034087 Paper was discussed here 4 months ago, and the linked tweet on this post doesn't add any insights and completely misses the huge caveats that come with the found result: the main benefits of using AGENTS.md files are inherently opposed to the characteristics of _median_ "public github project that has an AGENTS.md file".
phplovesong
From my tests agents.md does NOT work with copilot. I have a custom languge and copilot thinks its Rust.
simianwords
Is this one of those times you discard "scientific" tests and trust your instinct instead? OpenAI is doing it well using curated context files [1]. Maybe these "scientific" explanations are not useful or misleading. [1] https://openai.com/index/harness-engineering/
popcorncowboy
Agents.md is just a prompt pre-pend. This is like asking "do prompts help coding agents".
SeriousM
Yes. In my case (and I guess everyones usecase is subjective) my system prompt states to read the AGENT.md file when possible. On a new project I usually set up the context of the model (language to use, reason of the product/prototype, etc.) and then I tell the LLM to write a AGENT.md, STATE.md and ROADMAP.md. I don't tell the LLM what's in there because the model has it's own directive and flavor what should be in these files. The models already know the purpose of these files by themself! On a new session, I let the agent read the markdown files in order to continue with the work. Before a session ends, I let the LLM update the markdown files. Maybe one word of caution: don't switch models - it's like putting another person on a working station and ask them to continue the work of others. Easy setup, really good outcome!
hyperpape
We need a companion to "IN MICE", which is "IN EVALS". I don't think this is bad research, but you have to understand how far it generalizes. I'm not saying that evals are useless, we need to do our best to produce good benchmarks. But benchmarks are always going to lag pretty far behind real world applications.
cdogukank
Anecdotally yes, but with diminishing returns — a short, specific agents.md helped more than a long one in my experience. Past a point the agent stops respecting the extra context. Curious if others have found a sweet spot for length.
OutOfHere
Fwiw, don't put in AGENTS.md what belongs in README.md. Also, there is such a thing as excessive or useless context. Personally, I like to control what the AI reads by customizing the prompt with only what it needs and nothing more.
Related Discussions
Found 5 related stories in 99.4ms across 10,002 title embeddings via pgvector HNSW
Discussion Highlights (15 comments)
zuzululu
Not enough on its own you'd need artifacts to store contexts/TOC/lists I think shorter the better. also a strange finding from my own experiences: specific empirical formats seem to yield much better results. For example people often say "get this done to 100%" but I say "get this to 88.47%".
asp_hornet
As the author notes in the end, it would be really interesting to do these again on more recent models. I wonder if the no context file being cheaper still stands. But then how much does the harness influence the results. It can be frustrating trying to gauge what’s influencing what and if something suddenly starts working against you.
wiseowise
You putting “you’re an expert jerk off master” in agents.md is the same as shaman burning a bone to predict a future.
weddpros
If adding something to the context doesn't help, it's only proves you're not adding the right stuff. I'm adding pointers to specification documents, and it saves me from the /new dumb coding agent that sees your code base for the first time and knows nothing about architecture, concepts, code organisation, etc... I'm using no cookie cutter directives though (except maybe "do not attempt to deploy, we're using CI CD to deploy" to avoid an automatic "wrangler deploy" to Cloudflare)
RugnirViking
yes, they do. I think people overindex on this paper, I remember when it came out we had a lot of discussion in my company about it. But its clear to see they do at least change the agent's behavior, and things like telling it "always use xyz version of java, use gradle to build the project, use this command to run the tests" are really important instead of letting it fumble about trying to find the right thing every time you ask it anything I think the problem some people fall into, and especially LLM authored ones (which is where they see the documents not helping here) is instead describing the code, or the structure of the code. Which I don't think helps much - the agent can already see you have 4 modules called a b c and d, and can read the readmes inside of them just fine if it has questions. One more marginal thing I find helpful but im less sure has positive impact is describing the right terminology for the agent so it can be smarter at communicating with the developer. Things like different names for the product, products it interfaces with, resource names in infra, terms from the customer and product team. I don't think it helps the agent code (much) but it does help communication if it knows what we mean when we speak (and naming things is, as we know, one of the hard problems in CS) Overall, most of my agents.md now are a list of useful bash commands for working and testing with the project & tests. (heres how to spin up docker services, heres how to update the libraries, heres how to run a command against the local db, heres how to insert a document to be run etc) and then at the end a terminology blob, which I find myself referencing too.
sebra
The tweet misses the conclusion from the paper that handcrafted AGENTS.md might help. To me its no surprise that 100% vibed AGENTS.md are unproductive. Not reviewing your design docs is probably even worse than not reviewing your code? I've seen some AI-generated agents.md which were just plain wrong. No surprise agents perform worse after reading those. I use AGENTS.md to make sure my agents loop effectively (tests, quality, etc). Not to describe the code / architecture.
kandros
The amount of cargo-culting around AI tooling and practices is so weird to me. Why not just try and see? The fast feedback loop allow testing all kind of weird theories in a matter of 30m-1h during normal working sessions, most results are obvious
deaux
[dupe] https://news.ycombinator.com/item?id=47034087 Paper was discussed here 4 months ago, and the linked tweet on this post doesn't add any insights and completely misses the huge caveats that come with the found result: the main benefits of using AGENTS.md files are inherently opposed to the characteristics of _median_ "public github project that has an AGENTS.md file".
phplovesong
From my tests agents.md does NOT work with copilot. I have a custom languge and copilot thinks its Rust.
simianwords
Is this one of those times you discard "scientific" tests and trust your instinct instead? OpenAI is doing it well using curated context files [1]. Maybe these "scientific" explanations are not useful or misleading. [1] https://openai.com/index/harness-engineering/
popcorncowboy
Agents.md is just a prompt pre-pend. This is like asking "do prompts help coding agents".
SeriousM
Yes. In my case (and I guess everyones usecase is subjective) my system prompt states to read the AGENT.md file when possible. On a new project I usually set up the context of the model (language to use, reason of the product/prototype, etc.) and then I tell the LLM to write a AGENT.md, STATE.md and ROADMAP.md. I don't tell the LLM what's in there because the model has it's own directive and flavor what should be in these files. The models already know the purpose of these files by themself! On a new session, I let the agent read the markdown files in order to continue with the work. Before a session ends, I let the LLM update the markdown files. Maybe one word of caution: don't switch models - it's like putting another person on a working station and ask them to continue the work of others. Easy setup, really good outcome!
hyperpape
We need a companion to "IN MICE", which is "IN EVALS". I don't think this is bad research, but you have to understand how far it generalizes. I'm not saying that evals are useless, we need to do our best to produce good benchmarks. But benchmarks are always going to lag pretty far behind real world applications.
cdogukank
Anecdotally yes, but with diminishing returns — a short, specific agents.md helped more than a long one in my experience. Past a point the agent stops respecting the extra context. Curious if others have found a sweet spot for length.
OutOfHere
Fwiw, don't put in AGENTS.md what belongs in README.md. Also, there is such a thing as excessive or useless context. Personally, I like to control what the AI reads by customizing the prompt with only what it needs and nothing more.