Did Claude increase bugs in rsync?

logicprog 372 points 375 comments June 05, 2026
alexispurslane.github.io · View on Hacker News

Discussion Highlights (20 comments)

wookmaster

Claude is just a tool ? The developers who merged that code and didn't properly test increased the bugs.

the_real_cher

Is there a non vibe coded fork of rsync?

rovr138

I'm just curious about testing. Is this a configuration that's not common and thus not tested? If people think they can do better, I want to see their forks and them keeping up with it. https://github.com/RsyncProject/rsync/graphs/contributors?fr...

nairboon

Is this an analysis made by/with Claude?

Polarity

so the answer is: no. actaully less bugs. thanks

geraneum

> But the critics' accusation is also blunt: "Claude is making things worse." A blunt instrument is the fairest response. So the criticism was bad, and that somehow makes it ok to use a bad metric?

faitswulff

> The analysis uses a single metric: bugs per 10 commits (bugs/10c). Bugs per commit as a metric papers over severity, both in terms of security severity as well as the effect on the user. A mislabeled button has the same weight as the entire app crashing in this framework.

gadrev

Ok. $ apt-cache policy rsync | grep Installed Installed: 3.4.1+ds1-7ubuntu0.2 $ sudo apt-mark hold rsync rsync set on hold.

scsh

> It does not control for commit complexity, security intensity, or bug severity. It does not distinguish between a one-line typo fix and a CVE patch. It is a blunt instrument. But the critics' accusation is also blunt: "Claude is making things worse." A blunt instrument is the fairest response. If by fairest you mean to say that this analysis and response is sufficient, then I'm sorry but I have to disagree. We really need to understand if the nature of the bugs are worse from a user's perspective. Even if the rate stayed unchanged, if the result is the perceived quality of the software declined then I would personally consider that worse, especially if I were a project maintainer. That's not meant to be wholly dismissive either. But in general, I don't think quantitative analysis alone is enough to fully answer this type of question.

logicprog

Okay, I really have to point out to everyone: the numbers and report cards are TEMPLATED IN BY A SCRIPT . Hallucinations are a moot point. https://github.com/alexispurslane/rsync-analysis/blob/main/s...

pushcx

What followed was extraordinary: 329 comments and counting, ranging from thoughtful concern to outright harassment. The thread did not stop at words. One user posted My Little Pony drawings of themselves strangling the "project janitor that pushed vibecoded commits": It spread to Hacker News and Lobsters, generating hundreds more comments. This is false, it did not appear on Lobsters. Here is the function in the codebase that prohibits this kind of brigading: https://github.com/lobsters/lobsters/blob/main/app/models/st... Please correct your article.

thorum

Unfortunately for the people mad about this, I predict the only thing they will accomplish by pressuring the rsync maintainers, is to discourage everyone else from responsibly disclosing their use of AI. You’re just going to make people disable Claude attribution on their commits to avoid drama.

dang

[stub for offtopicness] [see https://news.ycombinator.com/item?id=48416020 for how all this happened in the first place]

overgard

The TLDR seems to be: needs more data.

aesthesia

I don't have a dog in this fight, but a few points that look a little suspicious: - The release with the highest number of attributed bugs is the release _right before_ the first release with Claude-coauthored commits, released in January; is there a chance that unattributed LLM-authored commits made it into this release? - The release attribution methodology is not great, since it will tend to attribute bugs introduced in a minor version update to the longest-lived patch release of that minor version. I doubt that 3.4.1 actually introduced a lot of bugs, but since it was released a day after 3.4.0, bugs that were introduced in that release get attributed to 3.4.1. - Relatedly, more recent releases have had less time to have bugs filed against them, so there may be a bit of a bias toward evaluating recent releases as less buggy.

MagicMoonlight

Typical AI slop post. It’s pretty hilarious that he just added spaces before the emdashes and claims it’s human written. If I’m hiring and I see this kind of slop, I ain’t hiring you.

mikaeluman

Not going to critique this survey. Must have taken a lot of time and required a lot of patience. Great work! I think it will be up to some group in academia to make a real full blown study across several repositories. There must be tons to learn on how LLMs have changed software development and perhaps the cleanest separation will simply be going by what repositories declare e.g. "No LLM involved" vs those that proudly do the opposite or are neutral. Bugs is not the only variable of interest here. I am guessing someone is already doing this as we discuss it here...

logicprog

Another update: did an automated severity analysis on each bug report (~2000 of them!) using an LLM at temp=0 with a very strict rubric (and I checked to make sure that it rated things in a consistent, stable way using it). The rubric, LLM used, and some example ratings are included in the methodology section. For now, the information was just stored per-bug in the DuckDB and used to filter out non-bug bugs, to get a clearer signal. I'm going to try to use it to see if the post-Claude bugs were more severe in any way next.

tptacek

This is a neat post and I'm glad it got written and this is a little bit off-topic but: Hey, 'logicprog, your writing is fine! Use LLMs to critique your writing, check its structure, vet your choice of topic sentences, check flow from graf to graf and section to section, look for passive voice and overused words. LLMs are fantastic for that. But don't use a single word an LLM suggests in your actual writing. If it suggests something really fucking good, too bad, those words are disqualified. It's an easy red line to adhere to, easier than it sounds, and it'll keep your writing human. (You ended up somewhere around here anyways, but that was after you posted something with LLM-written language because you weren't confident enough in your own writing. The things you do "worse" than an LLM are what make you you; be protective of them!)

KronisLV

Pretty cool site! > v3.4.3 has been out long enough that its rate (5.00) is already comparable to historical releases. The "wait and see" argument is an appeal to an unknowable future that shifts the burden of proof away from the critics. If more bugs surface, they will enter the distribution like every other release. There is no reason to expect a regime break. I mean, as someone who uses LLMs, it might be a good idea to consider how one might limit the amount of bugs that will appear in the future at least a little bit: parallel iterative code review loops would probably be the easiest and most applicable to LLMs, though I guess test coverage and other code analysis tools help too.

Semantic search powered by Rivestack pgvector
10,002 stories · 93,925 chunks indexed