Usage-based pricing killing your vibe, here's how to roll your own local AI

Bender 42 points 43 comments May 04, 2026
www.theregister.com · View on Hacker News

Discussion Highlights (8 comments)

_345

It's a seriously degraded experience from a developer's perspective. Okay you've got one local LLM installed finally after configuring everything perfectly, what happens when you want to run a second instance? Now you've blown past your vram and system ram limits, and you're stuck to just one. Furthermore, the model they recommend doesn't quite reach ~gpt-5.4-mini level performance- that quality dip means you may as well just pay for something like Kimi K2.6 via openrouter if you want a something ~>= sonnet 4.6 in performance as a backup for when you run out of anthropic/openai usage.

janice1999

A 24GB Nvidia RTX 3090 TI is ~2000 euro.

efficax

qwen3.6 does a good job locally except it can take 20-30 minutes to respond to a prompt on a mac studio with 32gb of ram.

roscas

Local AI does not mean privacy or offline. Claude code does not run offline. It needs an internet connection. "./claude-2.1.126-linux-x64 Welcome to Claude Code v2.1.126 Unable to connect to Anthropic services Failed to connect to api.anthropic.com: ECONNREFUSED Please check your internet connection and network settings. Note: Claude Code might not be available in your country. Check supported countries at https://anthropic.com/supported-countries " Let me also add that most of services that are private, will connect to the internet. LMStudio and many others will try to get a connection and all others. I don't remember a single one that does not connect to their servers and send some kind of information.

roscas

BTW, LMStudio and a few others are really amazing. They allow you to download models from HF and manage many details before load them. A medium pc with an 8 or 10gb graphics card is already a nice setup to run many models, that are really good. You can also run Ollama that is very simple to use and help you code on vscodium with Continue. Pretty nice!

AussieWog93

I've tried these small models and they're nowhere near as good as Claude or GPT-5. The new ones running on a 16GB M1 are maybe GPT-4 level (with decent performance to be fair). I wonder if it's possible to make some hyper-overturned model that, say, does nothing but program in Python get SOTA-ish performance in that narrow task.

trashface

I like how in copilot now, I need to consider in vscode whether to accept a tab-complete, because if its coming from copilot it will count against my usage, whereas if it is coming from the ide tools it will not. So I'm like, making individual decisions on whether to type something myself or just "use up" some completion budget. Funny to get nickel and dimed like this by one of the biggest companies in the world.

gowld

$20/month cloud plan is definitely better than anything you'll get locally. Cost is not a reason to go local.

Semantic search powered by Rivestack pgvector
8,303 stories · 78,303 chunks indexed