Can I run AI locally?

ricardbejarano 1103 points 276 comments March 13, 2026

Discussion Highlights (20 comments)

John23832

RTX Pro 6000 is a glaring omission.

sxates

Cool thing! A couple suggestions: 1. I have an M3 Ultra with 256GB of memory, but the options list only goes up to 192GB. The M3 Ultra supports up to 512GB. 2. It'd be great if I could flip this around and choose a model, and then see the performance for all the different processors. Would help making buying decisions!

GrayShade

This feels a bit pessimistic. Qwen 3.5 35B-A3B runs at 38 t/s tg with llama.cpp (mmap enabled) on my Radeon 6800 XT.

phelm

This is awesome, it would be great to cross reference some intelligence benchmarks so that I can understand the trade off between RAM consumption, token rate and how good the model is

S4phyre

Oh how cool. Always wanted to have a tool like this.

adithyassekhar

This just reminded me of this https://www.systemrequirementslab.com/cyri . Not sure if it still works.

twampss

Is this just llmfit but a web version of it? https://github.com/AlexsJones/llmfit

mrdependable

This is great, I've been trying to figure this stuff out recently. One thing I do wonder is what sort of solutions there are for running your own model, but using it from a different machine. I don't necessarily want to run the model on the machine I'm also working from.

g_br_l

could you add raspi to the list to see which ridiculously small models it can run?

vova_hn2

It says "RAM - unknown", but doesn't give me an option to specify how much RAM I have. Why?

charcircuit

On mobile it does not show the name of the model in favor of the other stats.

debatem1

For me the "can run" filter says "S/A/B" but lists S, A, B, and C and the "tight fit" filter says "C/D" but lists F. Just FYI.

metalliqaz

Hugging Face can already do this for you (with much more up-to-date list of available models). Also LM Studio. However they don't attempt to estimate tok/sec, so that's a cool feature. However I don't really trust those numbers that much because it is not incorporating information about the CPU, etc. True GPU offload isn't often possible on consumer PC hardware. Also there are different quants available that make a big difference.

havaloc

Missing the A18 Neo! :)

arjie

Cool website. The one that I'd really like to see there is the RTX 6000 Pro Blackwell 96 GB, though.

ge96

Raspberry pi? Say 4B with 4GB of ram. I also want to run vision like Yocto and basic LLM with TTS/STT

LeifCarrotson

This lacks a whole lot of mobile GPUs. It also does not understand that you can share CPU memory with the GPU, or perform various KV cache offloading strategies to work around memory limits. It says I have an Arc 750 with 2 GB of shared RAM, because that's the GPU that renders my browser...but I actually have an RTX1000 Ada with 6 GB of GDDR6. It's kind of like an RTX 4050 (not listed in the dropdowns) with lower thermal limits. I also have 64 GB of LPDDR5 main memory. It works - Qwen3 Coder Next, Devstral Small, Qwen3.5 4B, and others can run locally on my laptop in near real-time. They're not quite as good as the latest models, and I've tried some bigger ones (up to 24GB, it produces tokens about half as fast as I can type...which is disappointingly slow) that are slower but smarter. But I don't run out of tokens.

sshagent

I don't see my beloved 5060ti. looks great though

carra

Having the rating of how well the model will run for you is cool. I miss to also have some rating of the model capabilities (even if this is tricky). There are way too many to choose. And just looking at the parameter number or the used memory is not always a good indication of actual performance.

jrmg

Is there a reliable guide somewhere to setting up local AI for coding (please don’t say ‘just Google it’ - that just results in a morass of AI slop/SEO pages with out of date, non-self-consistent, incorrect or impossible instructions). I’d like to be able to use a local model (which one?) to power Copilot in vscode, and run coding agent(s) (not general purpose OpenClaw-like agents) on my M2 MacBook. I know it’ll be slow. I suspect this is actually fairly easy to set up - if you know how.

Can I run AI locally?

Discussion Highlights (20 comments)

Related Discussions