We cut Claude's token usage 79% by redesigning our CLI for agents
glenngillen
11 points
4 comments
May 19, 2026
Related Discussions
Found 5 related stories in 92.4ms across 8,303 title embeddings via pgvector HNSW
- Universal Claude.md – cut Claude output tokens killme2008 · 231 pts · March 31, 2026 · 65% similar
- Measuring Claude 4.7's tokenizer costs aray07 · 574 pts · April 17, 2026 · 58% similar
- I cancelled Claude: Token issues, declining quality, and poor support y42 · 845 pts · April 24, 2026 · 57% similar
- Claude Managed Agents adocomplete · 152 pts · April 08, 2026 · 56% similar
- Reallocating $100/Month Claude Code Spend to Zed and OpenRouter kisamoto · 319 pts · April 09, 2026 · 56% similar
Discussion Highlights (3 comments)
akh
co-founder of Infracost here, we launched Infracost on HN five years ago, when the CLI just generated cost estimates for Terraform. Earlier this year we were scoping a 1.0 release: the CLI would stop being just a cost-estimation tool and start surfacing the issues behind the costs: previous-generation instances, policy violations, the kinds of issues a thorough PR review would catch. Then agent traffic started showing up, and it became clear the 1.0 scope was the right idea aimed at the wrong caller. A human reviewer reads a PR comment; an agent runs `infracost inspect --filter` ... and gets the same insight as a tabular row it can pipe into the next step. So we decided to skip our planned 1.0 release and go for 2.0, where we treated agents as a first-class citizen user of the CLI. Along the way we picked up some interesting lessons on optimizing user token usage when designing a CLI, and we want to share them with the HN community since other CLI builders might benefit.
dividendflow
Designing interfaces specifically for agents (M2M DX) is a fascinating shift from traditional human-centric CLI design. We're moving from a world where "pretty" output and progress bars mattered to a world where raw, structured density is the goal. A 79% reduction is massive, but I wonder if we’ll see a new type of "Agent-Optimized" protocol emerge that completely bypasses the text-heavy nature of current CLIs. The overhead of an LLM trying to parse "human" terminal output is essentially a tax on every call.
tommy29tmar
I'd treat the agent-facing output as an API, not just a display format. Once prompts and tools depend on it, a harmless CLI cleanup can break behavior the same way changing a JSON field would. The win here seems less about token count by itself and more about reducing inference from terminal decoration.