Things I Think I Think... Preferring Local OSS LLMs

zdw 43 points 9 comments April 02, 2026

blogs.newardassociates.com · View on Hacker News

Discussion Highlights (4 comments)

farfatched

I'd like a local LLM too, but they're expensive (consider the opportunity cost of a GPU, if it sits idle most of the time), and produce heat and noise in places that I'm trying to cool and quiet. I'd like a private jet too, alas.

androiddrew

I love local first. I am finding that a 120B MoE is hitting the sweet spot for local hosted. Right now that takes a 2K strix halo, a 4k GB10 machine, or a 5k Mac Pro. 2 years from now I think hardware will take us back to the 2k ish range with good performance. I love my dual GPU setup (2AMD Radeon r9700 64GB vram) but it costs 5x electricity than my GX10 (GB10 chip inside) and since layers are landing in system memory my TPS is half the GX10. Now a dense model like Devstral2 24B slaps on the Dual GPU setup. I just haven’t gotten as much out of that as I have the 120 MoEs

jeromechoo

I think many developers worth their salt will argue the same. Cloud is and has always been a shortcut to buying your own hardware. Local models will get better and smaller. Qwen3-coder-next runs on a Spark and is as capable as Sonnet 4.5. Bonsai released a 1-bit model yesterday. I also like the freedom of not having to ration a daily allowance of tokens.

vlowther

MBP M5 Max. 128GB ram. oMLX. unsloth-Qwen3-Coder-Next-mlx-8bit. opencode with the telemetry stripped out. This seems to be the sweet spot for now for my local dev. Helps me to not accidentally blow through $100 in Claude tokens in a day when exploring different performance tradeoffs the backend of my $DAYJOB codebase.

Things I Think I Think... Preferring Local OSS LLMs

Discussion Highlights (4 comments)

Related Discussions