Gemma 4 on iPhone
janandonly
534 points
139 comments
April 05, 2026
Related Discussions
Found 5 related stories in 49.5ms across 3,663 title embeddings via pgvector HNSW
- Google releases Gemma 4 open models jeffmcjunkin · 1306 pts · April 02, 2026 · 65% similar
- Gemma 4: Byte for byte, the most capable open models meetpateltech · 21 pts · April 02, 2026 · 63% similar
- Show HN: Gemma Gem – AI model embedded in a browser – no API keys, no cloud ikessler · 39 pts · April 06, 2026 · 60% similar
- Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code vbtechguy · 232 pts · April 05, 2026 · 56% similar
- Apple Can Create Smaller On-Device AI Models from Google's Gemini thm · 25 pts · March 25, 2026 · 51% similar
Discussion Highlights (20 comments)
hadrien01
Is it me or does the App Store website look... fake? The text in the header ("Productiviteit", "Alleen voor iPhone") looks pixelated, like it was edited on Paint, the header background is flickering, the app icon and screenshots are very low quality, the title of the website is incomplete ("App Store voor iPho...")
pmarreck
Impressive model, for sure. I've been running it on my Mac, now I get to have it locally in my iPhone? I need to test this. Wait, it does agent skills and mobile actions, all local to the phone? Whaaaat? (Have to check out later! Anyone have any tips yet?) I don't normally do the whole "abliterated" thing (dealignment) but after discovering https://github.com/p-e-w/heretic , I was too tempted to try it with this model a couple days ago (made a repo to make it easier, actually) https://github.com/pmarreck/gemma4-heretical and... Wow. It worked. And... Not having a built-in nanny is fun ! It's also possible to make an MLX version of it, which runs a little faster on Macs, but won't work through Ollama unfortunately. (LM Studio maybe.) Runs great on my M4 Macbook Pro w/128GB and likely also runs fine under 64GB... smaller memories might require lower quantizations. I specifically like dealigned local models because if I have to get my thoughts policed when playing in someone else's playground, like hell am I going to be judged while messing around in my own local open-source one too. And there's a whole set of ethically-justifiable but rule-flagging conversations (loosely categorizable as things like "sensitive", "ethically-borderline-but-productive" or "violating sacred cows") that are now possible with this, and at a level never before possible until now. Note: I tried to hook this one up to OpenClaw and ran into issues To answer the obvious question- Yes, this sort of thing enables bad actors more (as do many other tools). Fortunately, there are far more good actors out there, and bad actors don't listen to rules that good actors subject themselves to, anyway.
PullJosh
This is awesome! 1) I am able to run the model on my iPhone and get good results. Not as good as Gemini in the cloud, but good. 2) I love the “mobile actions” tool calls that allow the LLM to turn on the flashlight, open maps, etc. It would be fun if they added Siri Shortcuts support. I want the personal automation that Apple promised but never delivered. 3) I am so excited for local models to be normalized. I build little apps for teachers and there are stringent privacy laws involved that mean I strongly prefer writing code that runs fully client-side when possible. When I develop apps and websites, I want easy API access to on-device models for free. I know it sort of exists on iOS and Chrome right now, but as far as I’m aware it’s not particularly good yet.
jeroenhd
English version of the page: https://apps.apple.com/us/app/google-ai-edge-gallery/id67496... Also on Android: https://play.google.com/store/apps/details?id=com.google.ai.... It's a demo app for Google's Edge project: https://ai.google.dev/edge
carbocation
It would be very helpful if the chat logs could (optionally) be retained.
TGower
These new models are very impressive. There should be a massive speedup coming as well, AI Edge Gallery is running on GPU, but NPUs in recent high end processors should be much faster. A16 chip for example (Macbook Neo and iphone 16 series) has 35 TOPS of Neural Engine vs 7 TFLOPS gpu. Similar story for Qualcomm.
janandonly
OP Here. It is my firm belief that the only realistic use of AI in the future is either locally on-device for almost free, or in the cloud but way more expensive then it is today. The latter option will only bemusedly for tasks that humans are more expensive or much slower in. This Gemma 4 model gives me hope for a future Siri or other with iPhone and macOS integration, “Her” (as in the movie) style.
dwa3592
I think with this google starts a new race- best local model that runs on phones.
burnto
My iPhone 13 can’t run most of these models. A decent local LLM is one of the few reasons I can imagine actually upgrading earlier than typically necessary.
deckar01
It doesn’t render Markdown or LaTeX. The scrolling is unusable during generation. E4B failed to correctly account for convection and conduction when reasoning about the effects of thermal radiation (31b was very good). After 3 questions in a session (with thinking) E4B went off the rails and started emitting nonsense fragment before the stated token limit was hit (unless it isn’t actually checking).
__natty__
That's a great project! I just wondered whether Google would have a problem with you using their trademark
rickdg
How do these compare to Apple's Foundation Models, btw?
karimf
This app is cool and it showcases some use cases, but it still undersells what the E2B model can do. I just made a real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B. I posted it on /r/LocalLLaMA a few hours ago and it's gaining some traction [0]. Here's the repo [1] I'm running it on a Macbook instead of an iPhone, but based on the benchmark here [2], you should be able to run the same thing on an iPhone 17 Pro. [0] https://www.reddit.com/r/LocalLLaMA/comments/1sda3r6/realtim... [1] https://github.com/fikrikarim/parlor [2] https://huggingface.co/litert-community/gemma-4-E2B-it-liter...
beeflet
Isn't this already possible in a much more open-ended way with PocketPal? https://github.com/a-ghorbani/pocketpal-ai https://apps.apple.com/us/app/pocketpal-ai/id6502579498 https://play.google.com/store/apps/details?id=com.pocketpala...
dzhiurgis
I recently got to a first practical use of it. I was on a plane, filling landing card (what a silly thing these are). I looked up my hotel address using qwen model on my iPhone 16 Pro. It was accurate. I was quite impressed. After some back and forth the chat app started to crash tho, so YMMV.
allpratik
Nice! Tried on iPhone 16 pro with 30 TPS from Gemma-4-E2B-it model. Although the phone got considerably hot while inferencing. It’s quite an impressive performance and cannot wait to try it myself in one of my personal apps.
garff
How new of an iPhone model is needed?
XCSme
Gemma 4 is great: https://aibenchy.com/compare/google-gemma-4-31b-it-medium/go... I assume it is the 26B A4B one, if it runs locally?
dhbradshaw
My son just started using 2B on his Android. I mentioned that it was an impressively compact model and next thing I knew he had figured out how to use it on his inexpensive 2024 Motorolla and was using it to practice reading and writing in foreign languages.
thot_experiment
Gemma 4 E4B is an incredible model for doing all the home assistant stuff I normally just used Qwen3.5 35BA4B + Whisper while leaving me with wayy more empty vram for other bullshit. It works as a drop in replacement for all of my "turn the lights off" or "when's the next train" type queries and does a good job of tool use. This is the really the first time vramlets get a model that's reliably day to day useful locally. I'm curious/worried about the audio capability, I'm still using Whisper as the audio support hasn't landed in llama.cpp, and I'm not excited enough to temporarily rewire my stuff to use vLLM or whatever their reference impl is. The vision capabilities of Gemma are notably (thus far, could be impl specific issues?) much much worse than Qwen (even the big moe and dense gemma are much worse), hopefully the audio is at least on par with medium whisper.