How do you capture WHY engineering decisions were made, not just what?
We onboarded a senior engineer recently strong, 8 years experience. He spent 3 weeks playing code archaeologist just to understand WHY our codebase looks the way it does. Not what the code does. That was fast. But the reasoning behind decisions: - Why Redis over in-memory cache? - Why GraphQL for this one service but REST everywhere else? - Why that strange exception in the auth flow for enterprise users? Answers were buried in closed PRs with no descriptions, 18-month-old Slack threads, and the heads of two engineers who left last year. We tried ADRs. Lasted 6 weeks. Nobody maintained them. We tried PR description templates. Ignored within a month. We have a Notion architecture doc. Last updated 14 months ago. Every solution requires someone to manually write something. Nobody does. Curious how teams at HN actually handle this: 1. Do you have a system that actually works long-term? 2. Has anyone automated any part of this? 3. Or is everyone quietly suffering through this on every new hire?
Discussion Highlights (19 comments)
airspresso
Also wrestling with this challenge at the moment and curious to hear experiences from others. Even though it requires human input, the capture and the way it's updated has to get automated.
CGMthrowaway
Put the ADR in the PR as a requirement. Then automate extracting the decision info into an actual ADR.
_moof
This is called rationale and it goes in the design document. As work proceeds, it goes into tickets and meeting notes, and gets fed back into the design doc.
rustyzig
> - Why Redis over in-memory cache? - Why GraphQL for this one service but REST everywhere else? - Why that strange exception in the auth flow for enterprise users? These are all implementation details that shouldn't actually matter. What does matter is that the properties of your system are accounted for and validated. That goes in your test suite, or type system if your language has a sufficiently advanced type system. If replacing Redis with an in-memory cache is a problem technically, your tests/compiler should prevent you from switching to an in-memory cache. If you don't have that, that is where you need to start. Once you have those tests/types, many of the questions will also get answered. It won't necessarily answer why Redis over Valkey, but it will demonstrate with clear intent why not an in-memory cache.
Willamin
For context, my engineering team is fairly small – no guarantees this scales well for larger organizations. I capture the reasons for decisions on why code was written a particular way or why a particular architecture was decided upon in commit messages. We follow a squash-and-rebase flow for commits, so each PR is ultimately a single commit before merging. During that squash process, I'll update the commit message to sometimes be a few paragraphs long. Later when I'm curious why we made decision in the past, I can use git blame to navigate back until the point where I can find the answer.
rich_sasha
Doesn't really answer you question but IME this is sort of unavoidable unless you're massive and you can afford to have people who just document this kind of stuff as their job. Reason being, a lot of this stuff happens for no good reason, or by accident, or for reasons that no longer apply. Someone liked the tech so used it - then left. Something looked better in a benchmark, but then the requirements drifted and now it's actually worse but no one has the time to rewrite. Something was inefficient but implemented as a stop gap, then stayed and is now too hard to replace. So you can't explain the reasons when much of the time there aren't any. The non-solutions are: - document the high level principles and stick to them. Maybe you value speed of deployment, or stability, or control over codebase. Individual software choices often make sense in light of such principles. - keep people around and be patient when explaining what happened - write wiki pages, without that much effort at being systematic and up to date. Yes, they will drift out of sync, but they will provide breadcrumbs to follow.
SsgMshdPotatoes
I thought about this too recently. I guess documenting every consideration along the way would take way too much time (would be longer than the documentation of actual implementations), but one of these days this seems likely to change?
lowenbjer
My take after running engineering teams at multiple companies: documentation survives when it lives next to the code. File-level header comments explaining each component's purpose and role in the architecture. A good README tying it all together. If you compartmentalize architecture into folders, a README per folder. This works for humans, LLMs, and GitHub search alike. ADRs, Notion docs, and Confluence pages die because they're separate from the code. Out of sight, out of mind. If you want to be really disciplined about it, set up an LLM-as-judge git hook that runs on each PR. It checks whether code changes are consistent with the existing documentation and blocks the merge if docs need updating. That way the enforcement is automated and you only need a little human discipline, not a lot. There's no way to avoid some discipline though. But the less friction you add, the more likely it sticks.
sdeframond
Sometime the best way to why a (Chesterton's) fence is blocking the road is... to remove it and see what happens! Sorry, not really an answer to your problem. But I feel you, this is a genuinely hard problem. Keep in mind that, pretty often, the reason something is the way it is comes down to "no real reason", "that seemed easier at the time" or "we didnt know better". At least if you don't work on critical systems.
hammadfauz
If you do these things: * File issues in a project tracker (Github, jira, asana, etc) * Use the issue id at the start of every commit message for that issue * Use a single branch per issue, whose name also starts with the issue id * Use a single PR to merge that branch and close the issue * Don't squash merge PRs You can use `git blame` to get the why. git blame, gives you the change set and the commit message. Use the issue id in commit message to get to the issue. Issue description and comments provide a part of the story. Use the issue id, to track the branch and PR. The PR comments give you the rest of the story.
TheChelsUK
ADRs but give ownership to the team. They should sit in the repo most relevant, but a central repo called ADRs have issue templates and a readme which links off to all the repos and their ADRs - ADRs can not be approved and the issue closed until all the docs are in place. Everyone can see the open ADRs in the main repo and see issue and comment on them. Accountability is there if an assigned issue is open for days/weeks etc. GitHub issues templates are perfect for ADR templates. All Hands for engineering is a great place to mention them and for teams to comment on the decision and outcomes.
nonameiguess
ADRs are the only way I've ever seen it done well for a sufficiently large enough project, let alone something like an entire product line or suite of many projects. Sometimes those span multiple organizations. Think of the Internet and the IETF RFCs. Yes, they don't give a complete picture. Implementations may not match the specification. I don't really agree they require maintenance. It's just you have to write up a new one any time you change a decision and give a reason why. Yes, it takes a lot of organizational discipline to do that. You probably can't be in panic mode and it won't work for a startup that needs to ship in five weeks or they can't make payroll. But there isn't really a substitute for discipline. As maligned as it can be, the single best organization I've ever been a part of for code archaeology, on a huge multi-decade project that spanned many different companies and agencies of the government, simply made diligent use of the full Atlassian suite. Bitbucket, Jira, Confluence, Fish Eye, and Crucible all had the integrations turned on. Commits and PRs had a Jira ticket number in them. Follow that link to the original story, epic, whatever the hell it was, and that had further links to ADRs with peer review comments. I don't know that I ever really had to ask a question. Just find a line of interest and follow a bunch of links and you've got years of history on exactly what a whole bunch of different people (not just the one who committed code) were thinking and why they made the decisions they made. I've always thought about the tradeoffs involved. They were waterfall. They didn't deliver fast. Their major customers were constantly trying to replace them with cheaper, more agile alternatives. But competitors could never match the strict non-functional requirements for security, reliability, and performance, and non-tolerence of regressions, so it never happened and they've had a several decades monopoly in what they do because of it.
lwhsiao
> Every solution requires someone to manually write something. Nobody does. Hot take: hire people that value writing. Create a culture around that. Oxide is a great example of a company culture that values writing, as shown by their rigorous and prolific RFDs: https://rfd.shared.oxide.computer/rfd/0001 See also: https://oxide-and-friends.transistor.fm/episodes/rfds-the-ba... Many of these RFDs have hit HN by themselves.
hakunin
Simple: ask "why" in a PR review, put the answer in a code comment. If there is a bigger / higher level "why", add it to git commit description. This way it's auto-maintained with code, or stays frozen at a point in time in a git commit. More: https://max.engineer/reasons-to-leave-comment Much more: https://max.engineer/maintainable-code
hermitcrab
I worked on the problem of recording 'design rationale' ~25 years ago. It is a big problem. Particulalry for long-lived artefacts, such as nuclear reactors. Nobody is quite sure exactly why decisions were made, as the original designers have forgotten, retired or been run over by buses. And this makes changing things difficult and risky. The biggest problem is that there is no real incentive for the people making the decisions to write down why they made them: * they may see it as reducing their career security * they may see it as opening them up to potential prosecution * it takes a lot of time
physicles
First, recognize that, for the first time ever, having good docs actually pays dividends. LLMs love reading docs and they're fantastic at keeping them up to date. Just don't go overboard, and don't duplicate anything that can be easily grepped from the codebase. Second, for #3, it's a new hire's job to make sure the docs are useful for new hires. Whenever they hit friction because the docs are missing or wrong, they go find the info, and then update the docs. No one else remembers what it's like to not know the things they know. And new hires don't yet know that "nobody writes anything" at your company. In general, like another poster said, docs must live as close as possible to the code. LLMs are fantastic at keeping docs up to date, but only if they're in a place that they'll look. If you have a monorepo, put the docs in a docs/ folder and mention it in CLAUDE.md. ADRs (architecture decision records) aren't meant to be maintained, are they? They're basically RFCs, a tool for communication of a proposal and a discussion. If someone writes a nontrivial proposal in a slack thread, say "I won't read this until it's in an ADR." IMHO, PRs and commits are a pretty terrible place to bury this stuff. How would you search through them, dump all commit descriptions longer than 10 words into a giant .md and ask an LLM? No, you shouldn't rely on commits to tell you the "why" for anything larger in scope than that particular commit. It's not magic, but I maintain a rude Q&A document that basically has answers to all the big questions. Often the questions were asked by someone else at the company, but sometimes they're to remind myself ("Why Kafka?" is one I keep revisiting because I want to ditch Kafka so badly, but it's not easy to replace for our use case). But I enjoy writing. I'm not sure this process scales.
al_borland
If it’s something in the code, that’s where I use comments. It’s the only place people have a chance of seeing it. Even when I add these comments some people ask me about the code instead of reading them. This isn’t just for others, I forget as well. Something to the effect of… # This previously used ${old-solution}, but has moved to ${new-solution} because ${reason} Or # This is ugly and doesn’t make sense, but ${clean-logocal-way} doesn’t work due to ${reason}. If you change ${x} it will break. Or # This was a requirement from ${person} on ${date}. We want to remove this, but will need to wait until ${person} no longer needs it or leaves the company.
iSnow
I built an agentic framework that distills ADRs from Teams meetings where everyone discusses freely. Works surprisingly well to record the WHY without someone having to do the job.
vova_hn2
I suppose you are trying to "warm up" the audience before announcing you product, which is... fine, I guess. I also had a an idea for a solution to this problem long time ago. I wanted to make a thing that would allow you to record a meeting (in the company I where I worked back then such things where mostly discussed in person), transcribe it and link parts of the conversation to relevant tickets, pull requests and git commits. Back then the tech wasn't ready yet, but now it actually looks relatively easy to do. For now, I try to leave such breadcrumbs manually, whenever I can. For example, if the reason why a part of the code exists seems non-obvious to me, I will write an explanation in a comment/docstring and leave a link to a ticket or a ticket comment that provides additional context.