Apple Silicon costs more than OpenRouter

datadrivenangel 309 points 264 comments May 17, 2026
www.williamangel.net · View on Hacker News

Discussion Highlights (20 comments)

synthos

How much does your data privacy cost?

SecretDreams

Will this cost structure always be this way and are there other benefits to not running your LLM on the cloud? E.g. Privacy Uptime Future cost structure controls This is a field that has moved very quickly. And it has moved in a direction to try to trap users into certain habits. But these habits might not best align with what best benefits end users today or some time in the future.

SpyCoder77

Open router doesn't cost money per say, it depends on the providers pricing

applfanboysbgon

Unless I'm misunderstanding, this is counting the entire laptop in the cost of generating tokens. The calculation seems to omit that, in addition to receiving LLM output, you have also received a laptop in exchange for your money. If you intend to put this machine in a dark corner and run it solely as a token-munching server, a laptop would be an exceptionally poor choice of technology for this purpose. But if you intend to use the laptop as a laptop, having a laptop is a pretty big benefit over not having a laptop. You also get the benefit of privacy, freedom from censorship, and control over the model used (i.e. it will not be rugpulled on you in three months after you've built a workflow around a specific model's idiosyncrasies).

an0malous

OpenRouter and other LLM platforms are being subsidized by VC investment to less than it costs them to run inference, the MacBook Pro is not

maho

The author only compared output token costs -- but for typical agentic workloads, input tokens dominate the costs by a large margin. Running inference locally, input tokens are, to first order, free. (They only generate implicit costs through higher time-to-first-token, higher power use, and lower token output speed).

bilekas

I don't hear people debating which is cheaper, local or cloud run models. The conversation, at least what I hear, is a lot of the time users are not utilizing an awful lot of tickets all the time, those providers will be paid if you never use them. If 80% - 90% of the work I and my team are doing with Ai is grunt work, write tests for this, implement a FFT here, write the dB query for X. Nothing exhausting. Those who are using AI for whole cloth "vibe coded" applications and services are definitely better suited to cloud. If a work laptop can run my local models and get my works needed performance for development, why wouldn't I as a company prefer that? Add to that the privacy improvements and data protection and potentially further specific inferance if needed it's a no brainer. Again, Ai is a tool, and the right tool for the job, I would wager with no evidence looked up, is that the majority of Devs would be happy with 10-30 per second locally.

regexorcist

I simply can't go back to cloud AI. Privacy and full control are more important to me than speed and SOTA models.

JSR_FDED

Wouldn’t a Mac Mini be a better comparison?

bastawhiz

This isn't a good analysis, and it's because it keeps rounding everything up. He rounds up the cost of electricity by 10%. He has a range of power use, takes the high end (which is 2x the low end) and multiplies it by the inflated electricity cost. But then they talk about using a newly purchased Mac to do the inference, running at full capacity, 24/7. Why would you do that? Apple silicon is fast but the author points out: you're only getting 10-40 tokens per second. It's not bad, but it's not meant for this! It's comparing apples to oranges. Yeah, data centers don't pay residential electricity rates. Data centers use chips that are power efficient. Data centers use chips that aren't designed to be a Mac. Apple silicon works out pretty good if you're not burning tokens 24/7/365 and you're not buying hardware specifically to do it. I use my Mac Studio a few times a week for things that I need it for, but I can run ollama on it over the tailnet "for free". The economics work when I'm not trying to make my Mac Studio behave like a H100 cluster with liquid cooling. Which should come as no surprise to anyone: more tokens per watt on hardware that's multi tenant with cheap electricity will pretty much always win.

nu11ptr

"Accelerated depreciation (if any) from shortening the lifespan of the device will be more expensive than the electricity" Shortening the lifespan?

mrtimeman

The full-amortization framing is doing a lot of work here. I bought my laptop because I needed a laptop, not as an inference box, and running a model on it is incidental to that. Once the hardware is sunk for other reasons, the only cost left is electricity plus whatever depreciation you accelerate by hammering the SoC, which the post actually acknowledges in one parenthetical before allocating the full $4299 to tokens anyway. Also nobody I know picks local over OpenRouter on price. They pick it for offline, for data not leaving the machine, for no rate limits, for not having a provider go down mid-task. If $/Mtok is the only axis, sure, cloud wins. In practice the pattern I see is leaving a small model running on easy background tasks while using the laptop normally, not a dedicated inference box hammered flat out for 5 years.

michaelbuckbee

Slightly different slice into this a very similar situation (local vs OpenRouter AI inference). But in _every_ metric other than privacy it was better to run via OpenRouter than a local model, and not by a small amount. Direct link to the comparison charts: https://sendcheckit.com/blog/ai-powered-subject-line-alterna...

Havoc

I like that the numbers were crunched, but the answer to these is always a bit of a foregone conclusion. * Industrial power pricing * Wholesale hardware pricing * Utilization density of shared API means API always wins a cost shootout. Privacy & tinkering is cool too though

panny

Your laptop AI costs too much? Speculative investors can help!

freakynit

So I did the India-specific analysis for a tier-3 city. Here, electricity costs 1/3rd of the US version, and you also get solar subsidy up to a certain amount. https://shorturl.at/q6gRE tldr; Hardware deprecation costs are the major factor. But, if we assume ZERO hardware deprecation (not realistic), then local inference becomes super cheap.. roughly, 90%+ cheaper. Third case: the break-even happens only if we can get at the very very very least, 8.7 years of useful hardware life. A more realistic number, however, when working 8 hrs/day and not of 24 hrs/day, is around 25 years. So, for now, local inference is preferable if you deeply care about privacy. From cost perspective, it's still not there.

newsclues

Local isn’t (just) about cost, it’s control and trust.

Der_Einzige

OpenRouter doesn't expose all the LLM sampling parameters/research that llamacpp, vllm, sglang, et al expose (so no high temperature/highly diverse outputs). Also OpenRouter doesn't let you use steering vectors or LoRA or other personalization techniques per-request. Also no true guarantees of ZDR/privacy/data sovereignty. Oh, and the author didn't mention at all anything related to inference optimization, so no idea if they even know about or enabled things like speculative decoding, optimized attention backends, quantization, etc. At least AI slop would have hit on far more of the things I listed above. This is worse-than-AI.

brisket_bronson

> Let's round up to $0.20 per kWh. Next paragraph > At ~50-100 watts and $0.18/kWh that's $0.009 or $0.018 per hour. $0.02 per hour. $0.48 cents per day for the electricity to be running inference at 100%. lol

Jayakumark

OP is comparing against Gemma everywhere but concludes paying Anthropic make more sense. Anthropic is $15 per million output token which is 30-35x more expensive even in openrouter . This is like comparing e-bike at home with e-bike rental and concluding therefore we need to rent Toyota since it can go at similar speeds. Getting tired of bad posts getting much attention .

Semantic search powered by Rivestack pgvector
8,303 stories · 78,303 chunks indexed