Usage-based pricing killing your vibe, here's how to roll your own local AI
Bender
42 points
43 comments
May 04, 2026
Related Discussions
Found 5 related stories in 103.6ms across 8,303 title embeddings via pgvector HNSW
- The current AI pricing was always going to go away arnon · 79 pts · May 22, 2026 · 61% similar
- AI is too expensive crescit_eundo · 135 pts · May 19, 2026 · 60% similar
- You are going to get priced out of the best AI coding tools (2025) fi-le · 76 pts · March 03, 2026 · 59% similar
- An AI coding agent, used to write code, needs to reduce your maintenance costs cratermoon · 104 pts · May 10, 2026 · 57% similar
- Do AI Agents Make Money in 2026? Or Is It Just Mac Minis and Vibes? SaaSasaurus · 26 pts · March 03, 2026 · 56% similar
Discussion Highlights (8 comments)
_345
It's a seriously degraded experience from a developer's perspective. Okay you've got one local LLM installed finally after configuring everything perfectly, what happens when you want to run a second instance? Now you've blown past your vram and system ram limits, and you're stuck to just one. Furthermore, the model they recommend doesn't quite reach ~gpt-5.4-mini level performance- that quality dip means you may as well just pay for something like Kimi K2.6 via openrouter if you want a something ~>= sonnet 4.6 in performance as a backup for when you run out of anthropic/openai usage.
janice1999
A 24GB Nvidia RTX 3090 TI is ~2000 euro.
efficax
qwen3.6 does a good job locally except it can take 20-30 minutes to respond to a prompt on a mac studio with 32gb of ram.
roscas
Local AI does not mean privacy or offline. Claude code does not run offline. It needs an internet connection. "./claude-2.1.126-linux-x64 Welcome to Claude Code v2.1.126 Unable to connect to Anthropic services Failed to connect to api.anthropic.com: ECONNREFUSED Please check your internet connection and network settings. Note: Claude Code might not be available in your country. Check supported countries at https://anthropic.com/supported-countries " Let me also add that most of services that are private, will connect to the internet. LMStudio and many others will try to get a connection and all others. I don't remember a single one that does not connect to their servers and send some kind of information.
roscas
BTW, LMStudio and a few others are really amazing. They allow you to download models from HF and manage many details before load them. A medium pc with an 8 or 10gb graphics card is already a nice setup to run many models, that are really good. You can also run Ollama that is very simple to use and help you code on vscodium with Continue. Pretty nice!
AussieWog93
I've tried these small models and they're nowhere near as good as Claude or GPT-5. The new ones running on a 16GB M1 are maybe GPT-4 level (with decent performance to be fair). I wonder if it's possible to make some hyper-overturned model that, say, does nothing but program in Python get SOTA-ish performance in that narrow task.
trashface
I like how in copilot now, I need to consider in vscode whether to accept a tab-complete, because if its coming from copilot it will count against my usage, whereas if it is coming from the ide tools it will not. So I'm like, making individual decisions on whether to type something myself or just "use up" some completion budget. Funny to get nickel and dimed like this by one of the biggest companies in the world.
gowld
$20/month cloud plan is definitely better than anything you'll get locally. Cost is not a reason to go local.