Things I Think I Think... Preferring Local OSS LLMs
zdw
43 points
9 comments
April 02, 2026
Related Discussions
Found 5 related stories in 51.7ms across 3,471 title embeddings via pgvector HNSW
- How I write software with LLMs indigodaddy · 69 pts · March 16, 2026 · 59% similar
- I don't use LLMs for programming ms7892 · 68 pts · March 12, 2026 · 55% similar
- Vitalik Buterin – "My self-sovereign / local / private / secure LLM setup" derrida · 25 pts · April 02, 2026 · 54% similar
- Reliable Software in the LLM Era mempirate · 102 pts · March 12, 2026 · 53% similar
- Thoughts on LLMs – Psychological Complications cdrnsf · 11 pts · March 24, 2026 · 53% similar
Discussion Highlights (4 comments)
farfatched
I'd like a local LLM too, but they're expensive (consider the opportunity cost of a GPU, if it sits idle most of the time), and produce heat and noise in places that I'm trying to cool and quiet. I'd like a private jet too, alas.
androiddrew
I love local first. I am finding that a 120B MoE is hitting the sweet spot for local hosted. Right now that takes a 2K strix halo, a 4k GB10 machine, or a 5k Mac Pro. 2 years from now I think hardware will take us back to the 2k ish range with good performance. I love my dual GPU setup (2AMD Radeon r9700 64GB vram) but it costs 5x electricity than my GX10 (GB10 chip inside) and since layers are landing in system memory my TPS is half the GX10. Now a dense model like Devstral2 24B slaps on the Dual GPU setup. I just haven’t gotten as much out of that as I have the 120 MoEs
jeromechoo
I think many developers worth their salt will argue the same. Cloud is and has always been a shortcut to buying your own hardware. Local models will get better and smaller. Qwen3-coder-next runs on a Spark and is as capable as Sonnet 4.5. Bonsai released a 1-bit model yesterday. I also like the freedom of not having to ration a daily allowance of tokens.
vlowther
MBP M5 Max. 128GB ram. oMLX. unsloth-Qwen3-Coder-Next-mlx-8bit. opencode with the telemetry stripped out. This seems to be the sweet spot for now for my local dev. Helps me to not accidentally blow through $100 in Claude tokens in a day when exploring different performance tradeoffs the backend of my $DAYJOB codebase.