Issue: Claude Code is unusable for complex engineering tasks with Feb updates

StanAngeloff 921 points 526 comments April 06, 2026
github.com · View on Hacker News

Discussion Highlights (20 comments)

StanAngeloff

(Being true to the HN guidelines, I’ve used the title exactly as seen on the GitHub issue) I was wondering if anyone else is also experiencing this? I have personally found that I have to add more and more CLAUDE.md guide rails, and my CLAUDE.md files have been exploding since around mid-March, to the point where I actually started looking for information online and for other people collaborating my personal observations. This GH issue report sounds very plausible, but as with anything AI-generated (the issue itself appears to be largely AI assisted) it’s kind of hard to know for sure if it is accurate or completely made up. _Correlation does not imply causation_ and all that. Speaking personally, findings match my own circumstances where I’ve seen noticeable degradation in Opus outputs and thinking. EDIT: The Claude Code Opus 4.6 Performance Tracker[1] is reporting Nominal. [1]: https://marginlab.ai/trackers/claude-code/

KaiLetov

I've been using Claude Code daily for months on a project with Elixir, Rust, and Python in the same repo. It handles multi-language stuff surprisingly well most of the time. The worst failure mode for me is when it does a replace_all on a string that also appears inside a constant definition -- ended up with GROQ_URL = GROQ_URL instead of the actual URL. Took a second round of review agents to catch it. So yeah, you absolutely can't trust it to self-verify.

tyleo

Is this impacted by the effort level you set in Claude? e.g., if you use the new "max" setting, does Claude still think? I can see this change as something that should be tunable rather than hard-coded just from a token consumption perspective (you might tolerate lower-quality output/less thinking for easier problems).

summarity

Not claude code specific, but I've been noticing this on Opus 4.6 models through Copilot and others as well. Whenever the phrase "simplest fix" appears, it's time to pull the emergency break. This has gotten much, much worse over the past few weeks. It will produce completely useless code, knowingly (because up to that phrase the reasoning was correct) breaking things. Today another thing started happening which are phrases like "I've been burning too many tokens" or "this has taken too many turns". Which ironically takes more tokens of custom instructions to override. Also claude itself is partially down right now (Arp 6, 6pm CEST): https://status.claude.com/

phillipcarter

Maybe it's because I spend a lot of time breaking up tasks beforehand to be highly specific and narrow, but I really don't run into issues like this at all. A trivial example: whenever CC suggests doing more than one thing in a planning mode, just have it focus on each task and subtask separately, bounding each one by a commit. Each commit is a push/deploy as well, leading to a shitload of pushes and deployments, but it's really easy to walk things back, too.

petcat

I have found that Claude Opus 4.6 is a better reviewer than it is an implementer. I switch off between Claude/Opus and Codex/GPT-5.4 doing reviews and implementations, and invariably Codex ends up having to do multiple rounds of reviews and requesting fixes before Claude finally gets it right (and then I review). When it is the other way around (Codex impl, Claude review), it's usually just one round of fixes after the review. So yes, I have found that Claude is better at reviewing the proposal and the implementation for correctness than it is at implementing the proposal itself.

Retr0id

This seems anecdotal but with extra words. I'm fairly sure this is just the "wow this is so much better than the previous-gen model" effect wearing off.

jbethune

I think this is a model issue. I have heard similar complaints from team members about Opus. I'm using other models via Cursor and not having problems.

howmayiannoyyou

Not just engineering. Errors, delays and limits piling up for me across API and OAuth use. Just now: Unable to start session. The authentication server returned an error (500). You can try again.

himata4113

Not unique to claude code, have noticed similar regressions. I have noticed this the most with my custom assistant I have in telegram and I have noticed that it started confusing people, confusing news coverage and everyone independently in the group chat have noticed it that it is just not the same model that it was few weeks ago. The efficiency gains didn't come from nowhere and it shows.

virtualritz

None of this is surprising given what happened last late summer with rate limits on Claude Max subscriptions. And less so if you read [1] or similar assessments. I, too, believe that every token is subsidized heavily. From whatever angle you look at it. Thusly quality/token/whatever rug pulls are inevitable, eventually. This is just another one. [1] https://www.wheresyoured.at/subprimeai/

giwook

I wonder how much of this is simply needing to adapt one's workflows to models as they evolve and how much of this is actual degradation of the model, whether it's due to a version change or it's at the inference level. Also, everyone has a different workflow. I can't say that I've noticed a meaningful change in Claude Code quality in a project I've been working on for a while now. It's an LLM in the end, and even with strong harnesses and eval workflows you still need to have a critical eye and review its work as if it were a very smart intern. Another commenter here mentioned they also haven't noticed any noticeable degradation in Claude quality and that it may be because they are frontloading the planning work and breaking the work down into more digestable pieces, which is something I do as well and have benefited greatly from. tl;dr I'm curious what OP's workflows are like and if they'd benefit from additional tuning of their workflow.

matheusmoreira

That analysis is pretty brutal. It's very disconcerting that they can sell access to a high quality model then just stealthily degrade it over time, effectively pulling the rug from under their customers.

ex-aws-dude

Its so silly everyone being dependent on a black box like this

zeroonetwothree

I haven’t had any issues. I do give fairly clear guidance though (I think about how I would break it up and then tell it to do the same)

thrtythreeforty

I noticed this almost immediately when attempting to switch to Opus 4.6. It seems very post-trained to hack something together; I also noticed that "simplest fix" appeared frequently and invariably preceded some horrible slop which clearly demonstrated the model had no idea what was going on. The link suggests this is due to lack of research. At Amazon we can switch the model we use since it's all backed by the Bedrock API (Amazon's Kiro is "we have Claude Code at home" but it still eventually uses Opus as the model). I suppose this means the issue isn't confined to just Claude Code. I switched back to Opus 4.5 but I guess that won't be served forever.

adonese

Things had went downhill since they removed ultrathink /s

pjmlp

I am just waiting for everything to implode so that we can do away with those KPIs.

Aperocky

In my opinion cramming invisible subagents are entirely wrong, models suffer information collapse as they will all tend to agree with each other and then produce complete garbage. Good for Anthropic though as that's metered token usage. Instead, orchestrate all agents visibly together, even when there is hierarchy. Messages should be auditable and topography can be carefully refined and tuned for the task at hand. Other tools are significantly better at being this layer (e.g. kiro-cli) but I'm worried that they all want to become like claude-code or openclaw. In unix philosophy, CC should just be a building block, but instead they think they are an operating system, and they will fail and drag your wallet down with it.

Asmod4n

I’ve tried to use Claude code for a month now. It has a 100% failure rate so far. Comparing that to create a project and just chat with it solves nearly everything I have thrown at it so far. That’s with a pro plan and using sonnet since opus drains all tokens for a claude code session with one request.

Semantic search powered by Rivestack pgvector
3,752 stories · 35,056 chunks indexed