Replacing a 3 GB SQLite db with a 10 MB FST (finite state transducer) binary
hiAndrewQuinn
173 points
31 comments
May 10, 2026
Related Discussions
Found 5 related stories in 91.4ms across 8,303 title embeddings via pgvector HNSW
- SQLite in Production: Lessons from Running a Store on a Single File thunderbong · 27 pts · April 04, 2026 · 55% similar
- Show HN: Turbolite – a SQLite VFS serving sub-250ms cold JOIN queries from S3 russellthehippo · 131 pts · March 26, 2026 · 52% similar
- SQLite Is a Library of Congress Recommended Storage Format whatisabcdefgh · 123 pts · May 06, 2026 · 49% similar
- Modern SQLite: Features You Didn't Know It Had thunderbong · 197 pts · April 02, 2026 · 49% similar
- We used Quint to find over 10 bugs in SQLite while hardening Turso 0xedb · 32 pts · May 19, 2026 · 46% similar
Discussion Highlights (7 comments)
lscharen
I was halfway through the article and began thinking that his described data structure sounded very familiar to something I used about 20 years ago. Sure enough, the first paragraph on the Wikipedia entry for DAFSA is: DAFSA is the rediscovery of a data structure called Directed Acyclic Word Graph (DAWG)
Hendrikto
> I do wish to point out, of course, that the whole reason it was possible to experiment cheaply and come across this serendipity was because 9 months ago, faced with the choice to either do the bad easy thing or the good nothing, I chose to do the bad easy thing.5 The SQLite database worked! I understood how it worked, behind the scenes with its B-trees and its Full Text Search extension. This is the most important takeaway, imo, and a very valuable technique: Start with the obvious, stupid solution that definitely works. Then do the optimized version, while making sure it matches the naive implementation. In this case, the optimized version could even be generated from the naive one.
asibahi
This was a very interesting read. I wonder if similar techniques can apply to Turkish or Japanese dictionaries?
cadamsdotcom
Why was the download 3gb, if the solution created a 300x reduction primarily by sharing suffixes? Wouldn’t vanilla compression have dealt with that and achieved a decent (not ideal) amount of compression of the database?
wood_spirit
This was a fun read! Thanks for the great introduction to Finite State Transducers. I hadn't heard the formal term before, but your article gave me serious déjà vu. Years ago, I entered a Scrabble programming contest and needed to compress a GADDAG dictionary to fit into my 6MB L3 cache. Without knowing the official name for it, I ended up using the exact same suffix-compression mechanism by moving characters to the edges instead of the nodes to merge overlapping paths. Sharing my old write-up here in case you or other data-structure nerds find the overlap interesting! https://williame.github.io/post/87682811573.html
hmokiguess
This was such a pleasant read, thank you for sharing your experience, I feel motivated now to go solve the same problems twice!
rishkewl
This feels like a good case of "SQLite was the right bad solution."