I ran Gemma 4 as a local model in Codex CLI

dvaughan 25 points 7 comments April 12, 2026
blog.danielvaughan.com · View on Hacker News

Discussion Highlights (6 comments)

anactofgod

Amazing. Thanks for your detailed posts on the bake-off between the Mac and GB10, Daniel, and on your learnings. I had trying similar on both compute platforms on my to-do list. Your post should save me a lot of debugs, sweat, and tears.

fortyseven

I've been VERY impressed with Gemma4 (26B at the moment). It's the first time I've been able to use OpenCode via a llamacpp server reliably and actually get shit done. In fact, I started using it as a coding partner while learning how to use the Godot game engine (and some custom 'skills' I pulled together from the official docs). I purposely avoided Claude and friends entirely, and just used Gemma4 locally this week... and it's really helped me figure out not just coding issues I was encountering, but also helped me sift through the documentation quite readily. I never felt like I needed to give in and use Claude. Very, very pleased.

blackmanta

With a nvidia spark or 128gb+ memory machine, you can get a good speed up on the 31B model if you use the 26B MoE as a draft model. It uses more memory but I’ve seen acceptance rate at around 70%+ using Q8 on both models

ehtbanton

This is genuinely very helpful. I'm planning a MacBook pro purchase with local inference in mind and now see I'll have to aim for a slightly higher memory option because the Gemma A4 26B MoE is not all that!

brcmthrowaway

Nothing about omlx?

vsrinivas

Hey - I use the same, w/ both gemma4 and gpt-oss-*; some things I have to do for a good experience: 1) Pin to an earlier version of codex (sorry) - 0.55 is the best experience IME, but YMMV (see https://github.com/openai/codex/issues/11940 , https://github.com/openai/codex/issues/8272 ). 2) Use the older completions endpoint (llama.cpp's responses support is incomplete - https://github.com/ggml-org/llama.cpp/issues/19138 )

Semantic search powered by Rivestack pgvector
4,351 stories · 40,801 chunks indexed