Mounting tar archives as a filesystem in WebAssembly
datajeroen
115 points
37 comments
April 24, 2026
Related Discussions
Found 5 related stories in 63.4ms across 5,498 title embeddings via pgvector HNSW
- Edge.js: Run Node apps inside a WebAssembly sandbox syrusakbary · 122 pts · March 17, 2026 · 50% similar
- Watgo – A WebAssembly Toolkit for Go ibobev · 88 pts · April 10, 2026 · 50% similar
- Go on Embedded Systems and WebAssembly uticus · 142 pts · April 03, 2026 · 47% similar
- Notes on writing Rust-based Wasm vinhnx · 218 pts · March 08, 2026 · 47% similar
- Show HN: TurboQuant-WASM – Google's vector quantization in the browser teamchong · 148 pts · April 04, 2026 · 46% similar
Discussion Highlights (8 comments)
sillysaurusx
Only peripherally relevant, but also see Ratarmount: https://github.com/mxmlnkn/ratarmount It lets you mount .tar files as a read only filesystem. It’s cool because you basically get random access to the tarball without paying any decompression costs. (It builds an index saying exactly where so-and-so is for every file.)
Ecco
How about using a format that has actually been designed to be a compressed read-only filesystem? Something like a SquashFS or cramfs disk image?
phiresky
I'm a bit disappointed that this only solves the "find index of file in tar" problem, but not at all the "partially read a tar.gz" file problem. So really you're still reading the whole file into memory, so why not just extract the files properly while you are doing that? Takes the same amount of time (O(n)) and less memory. The gzip-random-access problem one is a lot more difficult because the gzip has internal state. But in any case, solutions exist! Apparently the internal state is only 32kB, so if you save this at 1MB offsets, you can reduce the amount of data you need to decompress for one file access to a constant. https://github.com/mxmlnkn/ratarmount does this, apparently using https://github.com/pauldmccarthy/indexed_gzip internally. zlib even has an example of this method in its own source tree: https://github.com/gcc-mirror/gcc/blob/master/zlib/examples/... All depends on the use case of course. Seems like the author here has a pretty specific one - though I still don't see what the advantage of this is vs extracting in JS and adding all files individually to memfs. "Without any copying" doesn't really make sense because the only difference is copying ONE 1MB tar blob into a Uint8Array vs 1000 1kB file blobs One very valid constraint the author makes is not being able to touch the source file. If you can do that, there's of course a thousand better solutions to all this - like using zip, which compresses each file individually and always has a central index at the end.
Lerc
I did some similar shenanigans when I did a silly little system on NeoCities https://lerc.neocities.org/ It uses IndexedDB for the filesystem. Rather Dumbly it is loading the files from a tar archive that is encoded into a PNG because tar files are one of the forbidden file formats.
haunter
Now I want to try how does that work with BTFS which in a similar vein mounts a torrent file or magnet link as a read only directory https://github.com/johang/btfs
crabique
Very cool, I wish there were something similar to this for filesystem images though. Just recently I needed to somehow generate a .tar.gz from a .raw ext4 image and, surprisingly, there's still no better option than actually mounting it and then creating an archive. I managed to "isolate" it a bit with guestfish's tar-out, but still it's pretty slow as it needs to seek around the image (in my case over NBD) to get the actual files.
Dwedit
TAR archives are good in a few ways, but random access to files is not one of them. You need to iterate over every file before you can create a mapping between filename and its TAR file address. (Meanwhile, sending TAR over Netcat is a valid way to clone a filesystem to another computer, including maintaining the hardlinks and symlinks)
jghn
Isn't "archive" embedded in "tar" already? In other words, is this like saying one went to the "ATM machine"?