Indexing a year of video locally on a 2021 MacBook with Gemma4-31B (50GB swap)
asenna
344 points
101 comments
May 21, 2026
Related Discussions
Found 5 related stories in 83.1ms across 8,303 title embeddings via pgvector HNSW
- Show HN: Gemini can now natively embed video, so I built sub-second video search sohamrj · 291 pts · March 24, 2026 · 52% similar
- Big data on the cheapest MacBook bcye · 333 pts · March 12, 2026 · 52% similar
- How to Grep Video mrmarket · 15 pts · April 23, 2026 · 49% similar
- Create a 5s 1080p Video in 4.5s with FastVideo on a Single GPU zhisbug · 12 pts · March 13, 2026 · 49% similar
- April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini greenstevester · 298 pts · April 03, 2026 · 48% similar
Discussion Highlights (18 comments)
andai
Awesome. Say, this is very comprehensive. I was vaguely aware of all these pieces existing (except for running a facial recognition database at home o_o), but it's really neat to put them all together like that.
throwa356262
I ran Gemma on a 2015 thinkpad to do something similar. Fortunately, I could upgrade the memory otherwise it would have been a painful exercise. Not gonna lie, llama.cpp had the fans spinning at max speed. But it worked and I got the job done.
egorfine
> generative AI video has no place on a real travel brand I am pretty sure that the vast majority of Airbnb hosts would not agree with you. > equals TripAdvisor crucifixion I have no idea how the Airbnb hosts with fake listings survive, really.
desro
> The skill is open at ~/.claude/skills/video-index/. If you're working on something similar (indexing personal archives, getting a local model to do real archival work, building agents that drive editing tools), I'd be glad to compare notes. When your Claude wrote this post they might not have selected the right URL to share, unless your home folder is exposed. Care to share the skill files?
egorfine
Thanks for the article! I have a beefy M5 Pro and I'm eagerly looking around for ways to use local models (specifically Gemma4 & Qwen3.6). This is an excellent thing to do. Especially that LLMs excel at batching thus you can index multiple photos and videos in parallel for no performance penalty.
herf
Two questions: 1. What is the search index? 2. The "description.md" example has things like "faces -> cluster_id". Is this from Davinci Resolve's face index? Things like faces+names and locations are really important with photo collections, but general LLMs don't handle them so well.
asenna
UPDATE: Quickly created a repo for this - https://github.com/Simbastack-hq/framedex (MIT License) It's not tested properly after I genericized it. Will try to go through it properly and add more updates. Two big things on my TODO: 1) Make use of this indexing and using Claude's help, make video editing faster with Davinci Resolve (now that I have a good index of all the content) 2) I currently did this for videos, but I want to add more things to this for my thousands of still images of my camera - need to make sense of them. So I'll be working on this as well.
brcmthrowaway
So do they run the lodge or what?
theodorewiles
My take is that B2C AI applications are kind of structurally limited by how hard it is to build personalized context. The idea of capable local models could be a huge unlock here if they are able to do the bottom-up context collection research / tagging / etc. at scale.
gitowiec
Reading this text feels strange, sentences seems to be detached
zazibar
The subject matter is interesting but the amount of slop makes it difficult to read through. Yeah, it's great that you can throw your technical problems at Claude without caring much about the generated output but treating your own writing that you actually want to share with the world the same way is a terrible idea.
cold_harbor
the reason 50GB swap is even viable here is Apple Silicon's memory bandwidth. on x86 that much swap would make inference unusably slow
yardie
Now I have another project for this weekend! I also have tons of video and not a lot of time to index them.
ngai_aku
I’d like to do something like this for the collection of home videos I have piling up, but I’m still on 16GB M1. Any hope of getting decent results with smaller models? If not, does anyone have tips on GPU rental? I have a Claude max sub and plenty of OpenRouter credit, but I don’t feel good about uploading my family’s private videos
clueless
This sounds like a great capability to be added to immich
Confiks
I'm not quite sure why all that swapping is necessary. I really does age your SSD quite fast considering the enormous memory bandwidth required. Gemma 4 31B at 4-bit quantization should only be around 19 GiB [1], not 28.4 GiB. I'm not feeding it images regularly, so I'm not sure how much memory it needs to get those into context, but I can't imagine it is more than 10 GiB. The activity monitor does show all kinds of Electron apps active, on top of a presumably model-loaded Handy and a virtual machine for Claude Code, so I guess that's the real root cause for all the swapping. If your laptop starts trashing I can't imagine you have any use for those apps, which will grind to a halt. [1] https://huggingface.co/mlx-community/gemma-4-31b-it-4bit
genxy
Why did you destroy your own voice to have it replaced by AI ?
carpo
This is great. I wish I had enough ram for a local model. I just spent the last few weeks writing something very similar, but I made it a local Electron app with Whisper, ffmpeg and I added semantic search and embeddings for chatting with the videos. It talks to Claude for the vision analysis, tagging and video chat. Do you only send one image for yours? I used a customised scene detection algorithm to find multiple different images per video and then send them all in one request to Claude (along with the subtitles). It's definitely the most expensive part. Using Sonnet 4.6 for the analysis and Haiku for the tagging costs about $1 for an hour of footage, I can imagine it would be slow locally.