Caveman: Why use many token when few token do trick

tosh 740 points 325 comments April 05, 2026
github.com · View on Hacker News

Discussion Highlights (20 comments)

andai

No articles, no pleasantries, and no hedging. He has combined the best of Slavic and Germanic culture into one :)

ArekDymalski

While really useful now, I'm afraid that in the long run it might accelerate the language atrophy that is already happening. I still remember that people used to enter full questions in Google and write SMS with capital letters, commas and periods.

TeMPOraL

Oh boy. Someone didn't get the memo that for LLMs, tokens are units of thinking . I.e. whatever feat of computation needs to happen to produce results you seek, it needs to fit in the tokens the LLM produces. Being a finite system, there's only so much computation the LLM internal structure can do per token, so the more you force the model to be concise, the more difficult the task becomes for it - worst case, you can guarantee not to get a good answer because it requires more computation than possible with the tokens produced. I.e. by demanding the model to be concise, you're literally making it dumber. (Separating out "chain of thought" into "thinking mode" and removing user control over it definitely helped with this problem.)

andai

So it's a prompt to turn Jarvis into Hulk!

zahirbmirza

You can also make huge spelling mistakes and use incomplete words with llms they just sem to know better than any spl chk wht you mean. I use such speak to cut my time spent typing to them.

VadimPR

Wouldn't this affect quality of output negatively? Thanks to chain of thought, actually having the LLM be explicit in its output allows it to have more quality.

teekert

Idk I try talk like cavemen to claude. Claude seems answer less good. We have more misunderstandings. Feel like sometimes need more words in total to explain previous instructions. Also less context is more damage if typo. Who agrees? Could be just feeling I have. I often ad fluff. Feels like better result from LLM. Me think LLM also get less thinking and less info from own previous replies if talk like caveman.

bhwoo48

I was actually worried about high token costs while building my own project (infra bundle generator), and this gave me a good laugh + some solid ideas. 75% reduction is insane. Starred

ryanschaefer

Kinda ironic this description is so verbose. > Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens", "be brief", or invokes /caveman For the first part of this: couldn’t this just be a UserSubmitPrompt hook with regex against these? See additionalContext in the json output of a script: https://code.claude.com/docs/en/hooks#structured-json-output For the second, /caveman will always invoke the skill /caveman: https://code.claude.com/docs/en/skills

Hard_Space

Also see https://arxiv.org/pdf/2604.00025 ('Brevity Constraints Reverse Performance Hierarchies in Language Models' March 2026)

saidnooneever

LOL it actually reads how humans reply the name is too clever :'). Not sure how effective it will be to dirve down costs, but honestly it will make my day not to have to read through entire essays about some trivial solution. tldr; Claude skill, short output, ++good.

gozzoo

I think this could be very useful not when we talk to the agent, but when the agents talk back to us. Usually, they generate so much text that it becomes impossible to follow through. If we receive short, focused messages, the interaction will be much more efficient. This should be true for all conversational agents, not only coding agents.

virtualritz

This is the best thing since I asked Claude to address me in third person as "Your Eminence". But combining this with caveman? Gold!

bogtog

I'd be curious if there were some measurements of the final effects, since presumably models wont <think> in caveman speak nor code like that

stared

I would prefer to talk like Abathur ( https://www.youtube.com/watch?v=pw_GN3v-0Ls ). Same efficiency but smarter.

cadamsdotcom

Caveman need invent chalk and chart make argument backed by more than good feel.

rschiavone

This trick reminds me of "OpenAI charges by the minute, so speed up your audio" https://news.ycombinator.com/item?id=44376989

nayroclade

Cute idea, but you're never gonna blow your token budget on output. Input tokens are the bottleneck, because the agent's ingesting swathes of skills, directory trees, code files, tool outputs, etc. The output is generally a few hundred lines of code and a bit of natural language explanation.

doe88

> If caveman save you mass token, mass money — leave mass star. Mass fun. Starred.

vivid242

Great idea- if the person who made it is reading: Is this based on the board game „poetry for cavemen“? (Explain things using only single-syllable words, comes even with an inflatable log of wood for hitting each other!)

Semantic search powered by Rivestack pgvector
3,663 stories · 34,065 chunks indexed