After outages, Amazon to make senior engineers sign off on AI-assisted changes
https://www.ft.com/content/7cab4ec7-4712-4137-b602-119a44f77... ( https://archive.ph/wXvF3 ) https://twitter.com/lukolejnik/status/2031257644724342957 ( https://xcancel.com/lukolejnik/status/2031257644724342957 )
Discussion Highlights (20 comments)
andsoitis
> Amazon’s website and shopping app went down for nearly six hours this month in an incident the company said involved an erroneous “software code deployment.” The outage left customers unable to complete transactions or access functions such as checking account details and product prices. The environment breathed a little.
ndr42
I think the problem of responsibility will come for many more companies sooner than later. It is possible that some of the alleged efficacy gains by using ai are not so big anymore when someone has to be accountable for it.
AlexeyBrin
I wonder how this will work in practice. Say I'm a senior engineer and I produce myself thousands of lines of code per day with the help of LLMs as mandated by the company. I still need to presumably read and test the code that I push to production. When will I have time to read and evaluate similar amounts of code produced by a junior or a mid level engineer ?
kmg_finfolio
The accountability problem is real but I think it's slightly different from what's being described. The issue isn't just "who signs off"; it's that the reasoning behind a change becomes invisible when AI generates it. A senior engineer can approve output they don't fully understand, and six months later when something breaks, nobody can reconstruct why that decision was made. Human review works when the reviewer can actually interrogate the logic. At LLM-assisted velocity, that bar gets harder to clear every month.
CodingJeebus
I'm at a small company struggling with this problem. Fundamentally, we have a limited context and AI is capable of generating tremendous amounts of output that exceed our ability to deeply process. I find myself context-switching all the time and it's pretty exhausting, while also finding that I'm not retaining as much deep application domain knowledge as I used to. On the surface, it's nice that I can give my LLM a well-written bug ticket and let it loose since it does a good job most of the time. But when it doesn't do a good job or it's making a change in an area of the codebase I'm not familiar with, auditing the change gets tiring really fast.
AlotOfReading
I'm not surprised by the outages, but I am surprised that they're leaning into human code review as a solution rather than a neverending succession of LLM PR reviewers. I wonder if it's an early step towards an apprenticeship system.
josefritzishere
The excessive exuberance of AI adoption is all part of the bubble.
Lalabadie
I'm not sure the sustainable solution is to treat an excess of lower-quality code output as the fixed thing to work with, and operationalize around that, but sure.
cobolcomesback
This “mandatory meeting” is just the usual weekly company-wide meeting where recent operational issues are discussed. There was a big operational issue last week, so of course this week will have more attendance and discussion. This meeting happens literally every week, and has for years. Feels like the media is making a mountain out of a mole hill here.
dude250711
I knew this would happen. Take a perfectly productive senior developer and instead make him be responsible for output of a bunch of AI juniors with the expectation of 10x output.
ritlo
The only way to see the kinds of speed-up companies want from these things, right now, is to do way too little review. I think we're going to see a lot of failures in a lot of sectors where companies set goals for reduced hours on various things they do, based on what they expected from LLM speed-ups, and it will have turned out the only way to hit those goals was by spending way too little time reviewing LLM output. They're torn between "we want to fire 80% of you" and "... but if we don't give up quality/reliability, LLMs only save a little time, not a ton, so we can only fire like 5% of you max". (It's the same in writing, these things are only a huge speed-up if it's OK for the output to be low-quality, but good output using LLMs only saves a little time versus writing entirely by-hand—so far, anyway, of course these systems are changing by the day, but this specific limitation has remained true for about four years now, without much improvement)
sethops1
> The response for now? Junior and mid-level engineers can no longer push AI-assisted code without a senior signing off. So basically, kill the productivity of senior engineers, kill the ability for junior engineers to learn anything, and ensure those senior engineers hate their jobs. Bold move, we'll see how that goes.
happytoexplain
>Junior and mid-level engineers can no longer push AI-assisted code without a senior signing off Review by a senior is one of the biggest "silver bullet" illusions managers suffer from. For a person (senior or otherwise) to examine code or configuration with the granularity required to verify that it even approximates the result of their own level of experience, even only in terms of security/stability/correctness, requires an amount of time approaching the time spent if they had just done it themselves. I.e. senior review is valuable, but it does not make bad code good. This is one major facet of probably the single biggest problem of the last couple decades in system management: The misunderstanding by management that making something idiot proof means you can now hire idiots (not intended as an insult, just using the terminology of the phrase "idiot proof").
dragonelite
Expect a shitload of AI powered code review products the next 18 months.
lokar
If this is true, it misunderstands the primary goals of code review. Code review should not be (primarily) about catching serious errors. If there are always a lot of errors, you can’t catch most of them with review. If there are few it’s not the best use of time. The goal is to ensure the team is in sync on design, standards, etc. To train and educate Jr engineers, to spread understanding of the system. To bring more points of view to complex and important decisions. These goals help you reduce the number of errors going into the review process, this should be the actual goal.
bigbuppo
Ugh. The Great Oops has never been closer.
skeledrew
> the affected tool served customers in mainland China Thought this blurb most interesting. What's the between-lines subtext here? Are they deliberately serving something they know to be faulty to the Chinese? Or is it the case that the Chinese use it with little to no issue/complaint? Or...?
MDGeist
A former colleague of mine recently took a role that has largely turned out to be "greybeard that reviews the AI slop of the junior engineers". In theory it sounds workable, but the volume of slop makes thoughtful review impossible. Seems like most orgs will just put pressure on the slop generators to do more and put pressure on the approvers and then scape goat the slop approvers if necessary?
oxqbldpxo
Not fun to work at amazon.com it seems.
mattschaller
Anyone work with Kiro before? As I understood, it was held as an INTERNAL USE ONLY tool for much longer than expected.