Can I run AI locally?
ricardbejarano
1103 points
276 comments
March 13, 2026
Related Discussions
Found 5 related stories in 38.3ms across 3,471 title embeddings via pgvector HNSW
- Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon sanchitmonga22 · 199 pts · March 10, 2026 · 55% similar
- How to run Qwen 3.5 locally Curiositry · 26 pts · March 07, 2026 · 55% similar
- Things I've Done with AI shepherdjerred · 80 pts · March 09, 2026 · 54% similar
- Launch an autonomous AI agent with sandboxed execution in 2 lines of code wiseprobe · 21 pts · March 18, 2026 · 53% similar
- AI (2014) bjornroberg · 69 pts · March 20, 2026 · 52% similar
Discussion Highlights (20 comments)
John23832
RTX Pro 6000 is a glaring omission.
sxates
Cool thing! A couple suggestions: 1. I have an M3 Ultra with 256GB of memory, but the options list only goes up to 192GB. The M3 Ultra supports up to 512GB. 2. It'd be great if I could flip this around and choose a model, and then see the performance for all the different processors. Would help making buying decisions!
GrayShade
This feels a bit pessimistic. Qwen 3.5 35B-A3B runs at 38 t/s tg with llama.cpp (mmap enabled) on my Radeon 6800 XT.
phelm
This is awesome, it would be great to cross reference some intelligence benchmarks so that I can understand the trade off between RAM consumption, token rate and how good the model is
S4phyre
Oh how cool. Always wanted to have a tool like this.
adithyassekhar
This just reminded me of this https://www.systemrequirementslab.com/cyri . Not sure if it still works.
twampss
Is this just llmfit but a web version of it? https://github.com/AlexsJones/llmfit
mrdependable
This is great, I've been trying to figure this stuff out recently. One thing I do wonder is what sort of solutions there are for running your own model, but using it from a different machine. I don't necessarily want to run the model on the machine I'm also working from.
g_br_l
could you add raspi to the list to see which ridiculously small models it can run?
vova_hn2
It says "RAM - unknown", but doesn't give me an option to specify how much RAM I have. Why?
charcircuit
On mobile it does not show the name of the model in favor of the other stats.
debatem1
For me the "can run" filter says "S/A/B" but lists S, A, B, and C and the "tight fit" filter says "C/D" but lists F. Just FYI.
metalliqaz
Hugging Face can already do this for you (with much more up-to-date list of available models). Also LM Studio. However they don't attempt to estimate tok/sec, so that's a cool feature. However I don't really trust those numbers that much because it is not incorporating information about the CPU, etc. True GPU offload isn't often possible on consumer PC hardware. Also there are different quants available that make a big difference.
havaloc
Missing the A18 Neo! :)
arjie
Cool website. The one that I'd really like to see there is the RTX 6000 Pro Blackwell 96 GB, though.
ge96
Raspberry pi? Say 4B with 4GB of ram. I also want to run vision like Yocto and basic LLM with TTS/STT
LeifCarrotson
This lacks a whole lot of mobile GPUs. It also does not understand that you can share CPU memory with the GPU, or perform various KV cache offloading strategies to work around memory limits. It says I have an Arc 750 with 2 GB of shared RAM, because that's the GPU that renders my browser...but I actually have an RTX1000 Ada with 6 GB of GDDR6. It's kind of like an RTX 4050 (not listed in the dropdowns) with lower thermal limits. I also have 64 GB of LPDDR5 main memory. It works - Qwen3 Coder Next, Devstral Small, Qwen3.5 4B, and others can run locally on my laptop in near real-time. They're not quite as good as the latest models, and I've tried some bigger ones (up to 24GB, it produces tokens about half as fast as I can type...which is disappointingly slow) that are slower but smarter. But I don't run out of tokens.
sshagent
I don't see my beloved 5060ti. looks great though
carra
Having the rating of how well the model will run for you is cool. I miss to also have some rating of the model capabilities (even if this is tricky). There are way too many to choose. And just looking at the parameter number or the used memory is not always a good indication of actual performance.
jrmg
Is there a reliable guide somewhere to setting up local AI for coding (please don’t say ‘just Google it’ - that just results in a morass of AI slop/SEO pages with out of date, non-self-consistent, incorrect or impossible instructions). I’d like to be able to use a local model (which one?) to power Copilot in vscode, and run coding agent(s) (not general purpose OpenClaw-like agents) on my M2 MacBook. I know it’ll be slow. I suspect this is actually fairly easy to set up - if you know how.