Prompt-caching – auto-injects Anthropic cache breakpoints (90% token savings)

ermis 68 points 27 comments March 13, 2026
prompt-caching.ai · View on Hacker News

Discussion Highlights (9 comments)

spiderfarmer

Will this work for Cowork as well?

somesnm

Hasn't this been largely solved by auto-caching introduced recently by Anthropic, where you pass "cache_control": {"type": "ephemeral"} in your request and it puts breakpoints automatically? https://platform.claude.com/docs/en/build-with-claude/prompt...

mijoharas

I don't understand, Claude code already has automatic prompt caching built in.[0] How does this change things? [0] https://code.claude.com/docs/en/costs

katspaugh

> This plugin is built for developers building their own applications with the Anthropic API. > Important note for Claude Code users: Claude Code already handles prompt caching automatically for its own API calls — system prompts, tool definitions, and conversation history are cached out of the box. Source: their GitHub

fschuett

Slightly off-topic, but I recently tested some tool and it turns out Opus is far cheaper than Sonnet, because it produces way less output tokens and those are what's expensive. It's also much slower than Opus (I did 9 runs to compare Haiku, Sonnet and Opus on the same problem). I also thought "oh, Sonnet is more light-weight and cheaper than Opus", no, that's actually just marketing.

adi_pradhan

This is applicable only to the API from what i understand. Since claude code already caches quite aggressively (try npx ccusage) Also the anthropic API did already introduce prompt-caching https://platform.claude.com/docs/en/build-with-claude/prompt... What is new here?

numlocked

As per its own FAQ this plugin is out of date and doesn’t actually do anything incremental re:caching: > "Hasn't Anthropic's new auto-caching feature solved this?" > Largely, yes — Anthropic's automatic caching (passing "cache_control": {"type": "ephemeral"} at the top level) handles breakpoint placement automatically now. This plugin predates that feature and originally filled that gap.

Slav_fixflex

Interesting – I've been using Claude heavily for building projects without writing code myself. Token costs add up fast, anything that reduces that is welcome. Has anyone tested this in production workflows?

joemazerino

Firing off cache writing costs 1.2x tokens iirc. Meaning non repeatable tasks will cost more in the long run.

Semantic search powered by Rivestack pgvector
3,471 stories · 32,344 chunks indexed