I cancelled Claude: Token issues, declining quality, and poor support

y42 845 points 499 comments April 24, 2026

Discussion Highlights (19 comments)

wilbur_whateley

Claude with Sonnet medium effort just used 100% of my session limit, some extra dollars, thought for 53 minutes, and said: API Error: Claude's response exceeded the 32000 output token maximum. To configure this behavior, set the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable.

easythrees

I have to say, this has been the opposite of my experience. If anything, I have moved over more work from ChatGPT to Claude.

throwaway2027

Same. I think one of the issues is that Claude reached a treshold where I could just rely on it being good and having to manually fix it up less and less and other models hadn't reached that point yet so I was aware of that and knew I had to fix things up or do a second pass or more. Other providers also move you to a worse model after you run out which is key in setting expectation as well. Developers knew that that was the trade-off. I think even with the worse limits people still hated it but when you start to either on purpose or inadvertently make the model dumber that's when there's really no purpose to keep using Claude anymore.

jwaldrip

I would love to just say that if you are using claude code, you should no be on pro. I feel like all the people complaining are complaining that an agent cant handle the work of a developer for $20/m. Get on at least max 5, its a world of a difference.

zendarr

Seems like some of the token issues may be corrected now https://www.anthropic.com/engineering/april-23-postmortem

varispeed

It also seems to me they route prompts to cheaper dumber models that present themselves as e.g. Opus 4.7. Perhaps that's what is "adaptive reasoning" aka we'll route your request to something like Qwen saying it's Opus. Sometimes I get a good model, so I found I'll ask a difficult question first and if answer is dumb, I terminate the session and start again and only then go with the real prompt. But there is no guarantee model will be downgraded mid session. I wish they just charged real price and stopped these shenanigans. It wastes so much time.

DeathArrow

I use Claude Code with GLM, Kimi and MiniMax models. :) I was worried about Anthropic models quality varying and about Anthropic jacking up prices. I don't think Claude Code is the best agent orchestrator and harness in existence but it's most widely supported by plugins and skills.

cbg0

I've been a fan since the launch of the first Sonnet model and big props for standing up to the government, but you can sure lose that good faith fast when you piss off your paying customers with bad communication, shaky model quality and lowered usage limits.

giancarlostoro

I'm torn because I use it in my spare time, so I've missed some of these issues, I don't use it 9 to 5, but I've built some amazing things, when 1 Million tokens dropped, that was peak Claude Code for me, it was also when I suspect their issues started. I've built up some things I've been drafting in my head for ages but never had time for, and I can review the code and refine it until it looks good. I'm debating trying out Codex, from some people I hear its "uncapped" from others I hear they reached limits in short spans of time. There's also the really obnoxious "trust me bro" documentation update from OpenClaw where they claim Anthropic is allowing OpenClaw usage again, but no official statement? Dear Anthropic: I would love to build a custom harness that just uses my Claude Code subscription, I promise I wont leave it running 24/7, 365, can you please tell me how I can do this? I don't want to see some obscure tweet, make official blog posts or documentation pages to reflect policies. Can I get whitelisted for "sane use" of my Claude Code subscription? I would love this. I am not dropping $2400 in credits for something I do for fun in my free time.

janwillemb

This is what worries me. People become dependent on these GenAI products that are proprietary, not transparant, and need a subscription. People build on it like it is a solid foundation. But all of a sudden the owner just pulls the foundation from under your building.

drunken_thor

AI services are only minorly incentivized to reduce token usage. They want high token usage, it makes you pay more. They are going to continually test where the limit is, what is the max token usage before you get angry. All AI companies will continue to trade places for token use and cost as cost increases. We are in tepid water pretending it is a bath pretending we aren’t about to be boiled frogs.

zkmon

Yesterday was a realization point for me. I gave a simple extraction task to Claude code with a local LLM and it "whirred" and "purred" for 10 minutes. Then I submitted the same data and prompt directly to model via llama_cpp chat UI and the model single-shotted it in under a minute. So obviously something wrong with coding agent or the way it is talking to LLM. Now I'm looking for an extremely simple open-source coding agent. Nanocoder doesn't seem install on my Mac and it brings node-modules bloat, so no. Opencode seems not quite open-source. For now, I'm doing the work of coding agent and using llama_cpp web UI. Chugging it along fine.

hedgehog

I used Opus via Copilot until December and then largely switched over to Claude Code. I'm not sure what the difference is but I haven't seen any of these issues in daily use.

bad_haircut72

Waiting 60s every time I send a msg really kills the ux of claude

bauerd

They can't afford to care about individual customers because enterprise demand exploded and they're short on compute

aleqs

The usage metering is just so incredibly inconsistent, sometimes 4 parallel Opus sessions for 3 hours straight on max effort only uses up 70% of a session, other times 20 mins / 3 prompts in one session completely maxes it out. (Max x20 plan) Is this just a bug on anthropic side or is the usage metering just completely opaque and arbitrary?

stldev

Same, after being a long-time proponent too. First was the CC adaptive thinking change, then 4.7. Even with `/effort max` and keeping under 20% of 1M context, the quality degradation is obvious. I don't understand their strategy here.

siliconc0w

Shameless self plug but also worried about the silent quality regressions, I started building a tool to track coding agent performance over time.. https://github.com/s1liconcow/repogauge Here is a sample report that tries out the cheaper models + the newest Kimi2.6 model against the 5.4 'gold' testcases from the repo: https://repogauge.org/sample_report .

lanthissa

for all the drama, its pretty clear both openai, google, and anthropic have had to degrade some of their products because of a lack of supply. There's really no immediate solution to this other than letting the price float or limiting users as capacity is built out this gets better.

I cancelled Claude: Token issues, declining quality, and poor support

Discussion Highlights (19 comments)

Related Discussions