Prompt-caching – auto-injects Anthropic cache breakpoints (90% token savings)
ermis
68 points
27 comments
March 13, 2026
Related Discussions
Found 5 related stories in 48.8ms across 3,471 title embeddings via pgvector HNSW
- Sandboxing AI agents, 100x faster kentonv · 33 pts · March 24, 2026 · 53% similar
- Prompt Engineering for Humans mooreds · 14 pts · March 31, 2026 · 53% similar
- Show HN: A prompt that builds the most capable AI agent system fainir · 12 pts · March 28, 2026 · 52% similar
- Anthropic Subprocessor Changes tencentshill · 56 pts · March 26, 2026 · 50% similar
- Promptfoo Is Joining OpenAI Areibman · 25 pts · March 09, 2026 · 49% similar
Discussion Highlights (9 comments)
spiderfarmer
Will this work for Cowork as well?
somesnm
Hasn't this been largely solved by auto-caching introduced recently by Anthropic, where you pass "cache_control": {"type": "ephemeral"} in your request and it puts breakpoints automatically? https://platform.claude.com/docs/en/build-with-claude/prompt...
mijoharas
I don't understand, Claude code already has automatic prompt caching built in.[0] How does this change things? [0] https://code.claude.com/docs/en/costs
katspaugh
> This plugin is built for developers building their own applications with the Anthropic API. > Important note for Claude Code users: Claude Code already handles prompt caching automatically for its own API calls — system prompts, tool definitions, and conversation history are cached out of the box. Source: their GitHub
fschuett
Slightly off-topic, but I recently tested some tool and it turns out Opus is far cheaper than Sonnet, because it produces way less output tokens and those are what's expensive. It's also much slower than Opus (I did 9 runs to compare Haiku, Sonnet and Opus on the same problem). I also thought "oh, Sonnet is more light-weight and cheaper than Opus", no, that's actually just marketing.
adi_pradhan
This is applicable only to the API from what i understand. Since claude code already caches quite aggressively (try npx ccusage) Also the anthropic API did already introduce prompt-caching https://platform.claude.com/docs/en/build-with-claude/prompt... What is new here?
numlocked
As per its own FAQ this plugin is out of date and doesn’t actually do anything incremental re:caching: > "Hasn't Anthropic's new auto-caching feature solved this?" > Largely, yes — Anthropic's automatic caching (passing "cache_control": {"type": "ephemeral"} at the top level) handles breakpoint placement automatically now. This plugin predates that feature and originally filled that gap.
Slav_fixflex
Interesting – I've been using Claude heavily for building projects without writing code myself. Token costs add up fast, anything that reduces that is welcome. Has anyone tested this in production workflows?
joemazerino
Firing off cache writing costs 1.2x tokens iirc. Meaning non repeatable tasks will cost more in the long run.