How to run Qwen 3.5 locally

Curiositry 26 points 5 comments March 07, 2026
unsloth.ai · View on Hacker News

Discussion Highlights (2 comments)

Twirrim

I've been finding it very practical to run the 35B-A3B model on an 8GB RTX 3050, it's pretty responsive and doing a good job of the coding tasks I've thrown at it. I need to grab the freshly updated models, the older one seems to occasionally get stuck in a loop with tool use, which they suggest they've fixed.

Curiositry

Qwen3.5 9b seems to be fairly competent at text manipulation and OCR running in llama.cpp on CPU, albeit slow. However, I have compiled it umpteen ways and still haven't gotten GPU offloading working properly (which I had with Ollama), on an old 1650 Ti with 4GB VRAM (it tries to allocate too much memory).

Semantic search powered by Rivestack pgvector
3,471 stories · 32,344 chunks indexed