Local AI needs to be the norm

cylo 856 points 384 comments May 10, 2026
unix.foo · View on Hacker News

Discussion Highlights (20 comments)

sgt

I guess Google got that memo!

williamtrask

I wonder if a popularization moment for local AI will ultimately be the pin-prick that pops the AI bubble. Like the deepseek or openclaw moments but bigger/next.

Galanwe

I would love for local inference to be possible, but from my experience, Kimi 2.6 is the only model that would be worth it, and its a $10k (M3 Ultra max spec'd - 30s TTFT so kind of slowish) to $30k (RTX6000/700GB+ DDR5) upfront, noise / power consumption aside.

jjordan

It feels like we're one technological breakthrough away from all of these data centers going up to be deemed irrelevant.

timeattack

My problem with LLMs (apart from philosophical aspects and economical impact) is that it would be unlikely for any of us to be able to train something functional locally (toy-like LLMs -- sure, but something really useful -- no). Apart from that it requires immense computing power, it also requires a dataset which is for the most part is obtained illegally.

revolvingthrow

A local Answer Machine is the dream, especially when the internet is decaying and generally on its last legs, but the hardware requirements seem like a huge mountain to climb. Things are progressing tremendously - deepseek v4 flash is very good for what it is - but even that goes beyond any reasonable local setup, which imo is 128 GB ram + 16 GB vram. 4 ram slots on a consumer board craters ram speed, 256 gb macs are too expensive, and even then the inference is ungodly slow. On the other hand… v4 flash model is actual magic compared to what was available 2 years ago. If the rate of improvement stays as is, we’ll get a similar performance in a ~120B model in a year, which is viable (if expensive) for everyman hardware. Possibly you’ll be able to run its equivalent on a ~$1200 laptop by 2028, which for me-in-2020 would sound straight out of a scifi movie. A good harness that lets the model fetch data from other sources like a local wikipedia copy from kiwix could do a lot for factual knowledge, too; there’s only so much you can encode in the model itself, but even a cheapish (pre-curent prices) 2TB drive can hold an immense amount of LLM-accessible data. Big caveat: I don’t see local models for programming or generally demanding agentic tasks being worth it anytime soon. You likely want bleeding edge models for it, and speed is far more important. Chat at 20tok/s is fine; working on even a small codebase at 20tok/s, especially on a noticeably weaker model, is just a waste of time. Maybe it’s a PEBKAC but I have no idea how people make any meaningful use out of qwen 3.6.

agentifysh

Until the hardware is economical and powerful enough, local AI that can compete with frontier models today is still far off. If we could even get something like GPT 5.5 running locally that would be quite useful.

vegabook

>> years ago I launched "The Brutalist Report" proceeds to brutalise the reader with an 88-point headline font.

hypfer

Same as local compute. Welcome back to 2014. Let us now continue yelling at the cloud.

artursapek

I'm someone who is trying to build a subscription-based business to cover underlying LLM costs, and very hopeful I can one day just sell a permanent license to the software instead with customers using local LLMs to power it.

TheJCDenton

For the mainstream audience, the sentiment around local ai today is the same that they had around open source a few decades ago. For a few products, some paid solutions were so much more advanced that open source were very often completely overlooked. Why bother ? And the like. Then we had captive SaaS and other plateforms and now it's obviously wrong for most of us. The dependency we have with anthropic and openai for coding for instance is insane. Most accept it because either they don't care, or they just hope chinese will never stop open weights. The business model of open weights is very new, include some power play between countries and labs, and move an absurd amount of money without any concrete oversight from most people. It's a very dangerous gamble. Today incredible value is available for nearly everyone. But it may stop without any warning, for reason outside our control.

shmerl

Depending on some remote AI provider is a major lock-in pitfall. But it's exactly what those AI providers want you to do.

cubefox

Local AI is a bit like wind parks. Everyone is in favor, except if they are in your own backyard. There was recently a huge outcry when Chrome shipped a local 4 GB AI model: https://news.ycombinator.com/item?id=48019219 I have to conclude that people would like to have powerful local AI but it should at the same time only be a tiny model. In which case it wouldn't be powerful.

barrkel

Local models are extraordinarily expensive if you're not maximizing throughput, and you're not going to be maximizing it. Local models need to be resident in expensive RAM, the kind that has fat pipes to compute. And if you have a local app, how do you take a dependency on whatever random model is installed? Does it support your tool calling complexity? Does it have multimodal input? Does it support system messages in the middle of the conversation or not? Is it dumb enough to need reminders all the time? Spend enough time building against local models and you'll see they're jagged in performance. You need to tune context size, trade off system message complexity with progressive disclosure. You simply can't rely on intelligence. A bunch of work goes into the harness. Meanwhile, third party inference is getting the benefits of scale. You only need to rent a timeslice of memory and compute. It's consistent and everybody gets the same experience. And yes, it needs paying for, but the economics are just better.

vb-8448

> Use cloud models only when they’re genuinely necessary. The problem is that it's much easier to use the SOTA models (especially if they are subsidized) instead of spending time fixing the knobs with the local one. I just realized this with coding agents, yeah, you probably shouldn't always use latest version at xhigh, but you will end doing it because you do the job in less time, with less "effort" and basically at the same price. I guess we'll see a real effort for local AI only when major vendors will start billing based on actual token usage.

holtkam2

I wish I could upvote this twice. We (devs) really REALLY need to consider on-device compute before going to the cloud for LLM inference.

eyk19

Apple stock is going to skyrocket

mattlondon

Yet there is another post a few rows down where people are losing their shit that Chrome has a local LLM model that uses a couple of GB of space for local-inference. Damned if they do, damned if they don't.

dana321

"NO AI" needs to be the norm, we should be working on better ways of sharing information and better documentation instead of fighting with computers for substandard results.

wilg

Two issues - 1. Local models are likely to be more power-expensive to run (per-"unit-of-intelligence") than remote models, due to datacenter economies of scale. People do not like to engage with this point, but if you have environmental concerns about AI, this is a pretty important one. 2. Using dumb models for simple tasks seems like a good idea, but it ends up being pretty clear pretty quick that you just want the smartest model you can afford for absolutely every task.

Semantic search powered by Rivestack pgvector
8,303 stories · 78,303 chunks indexed