S3 Files

werner 257 points 75 comments April 07, 2026
www.allthingsdistributed.com · View on Hacker News

https://aws.amazon.com/blogs/aws/launching-s3-files-making-s...

Discussion Highlights (20 comments)

goekjclo

the "under the hood uses EFS" part is the most interesting bit here

themafia

> we locked a bunch of our most senior engineers in a room and said we weren’t going to let them out till they had a plan that they all liked. That's one way to do it. > When you create or modify files, changes are aggregated and committed back to S3 roughly every 60 seconds as a single PUT. Sync runs in both directions, so when other applications modify objects in the bucket, S3 Files automatically spots those modifications and reflects them in the filesystem view automatically. That sounds about right given the above. I have trouble seeing this as something other than a giant "hack." I already don't enjoy projecting costs for new types of S3 access patterns and I feel like has the potential to double the complication I already experience here. Maybe I'm too frugal, but I've been in the cloud for a decade now, and I've worked very hard to prevent any "surprise" bills from showing up. This seems like a great feature; if you don't care what your AWS bill is each month.

DenisM

TLDR: Eventually consistent file system view on top of s3 with read/write cache.

mgaunard

Zero mention of s3fs which already did this for decades.

CrzyLngPwd

If there is ever a post that needs a TLDR or an AI summary it is that one. Sell the benefits. I have around 9 TB in 21m files on S3. How does this change benefit me?

nvartolomei

> changes are aggregated and committed back to S3 roughly every 60 seconds as a single PUT Single PUT per file I assume?

gonzalohm

I cannot 100% confirm this, but I believe AWS insisted a lot in NOT using S3 as a file system. Why the change now?

gervwyk

any recommendations for a lambda based sftp sever setup?

ovaistariq

TLDR: EFS as a eventually consistent cache in front of S3.

PunchyHamster

Eagerly awaiting on first blogpost where developers didn't read the eventually consistent part, lost the data and made some "genius" workaround with help of the LLM that got them in that spot in the first place

MontyCarloHall

This is essentially S3FS using EFS (AWS's managed NFS service) as a cache layer for active data and small random accesses. Unfortunately, this also means that it comes with some of EFS's eye-watering pricing: — All writes cost $0.06/GB, since everything is first written to the EFS cache. For write-heavy applications, this could be a dealbreaker. — Reads hitting the cache get billed at $0.03/GB. Large reads (>128kB) get directly streamed from the underlying S3 bucket, which is free. — Cache is charged at $0.30/GB/month. Even though everything is written to the cache (for consistency purposes), it seems like it's only used for persistent storage of small files (<128kB), so this shouldn't cost too much.

rdtsc

Synchronization bits is what I was wondering about: https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-fil... > For example, suppose you edit /mnt/s3files/report.csv through the file system. Before S3 Files synchronizes your changes back to the S3 bucket, another application uploads a new version of report.csv directly to the S3 bucket. When S3 Files detects the conflict, it moves your version of report.csv to the lost and found directory and replaces it with the version from the S3 bucket. > The lost and found directory is located in your file system's root directory under the name .s3files-lost+found-file-system-id.

mbana

Werner Vogels is awesome. I first discovered about his writing when I learnt about Dynamo DB.

koolba

If you though locking semantics over NFS were wonky, just wait till we through a remote S3 backend in the mix!

nyc_pizzadev

This is very close to its first official release: https://fiberfs.io/ Built in cache, CDN compatible, JSON metadata, concurrency safe and it targets all S3 compatible storage systems.

jitl

I wish they offered some managed bridging to local NVMe storage. AWS NVMe is super fast compared to EBS, and EBS (node-exclusive access as block device) is faster than EFS (multi-node access). I imagine this can go fast if you put some kind of further-cache-to-NVMe FS on top, but a completely vertically integrated option would be much better.

mritchie712

tldr: this caches your S3 data in EFS. we run datalakes using DuckLake and this sounds really useful. GCP should follow suit quickly.

up2isomorphism

This why today’s sales pitch are often disguised as a tech blog.

dang

Since this is the thread that got attention, I've added the announcement link to the toptext and made the title work for both.

wbl

"NFS provides the semantics your applications expect" is one of the funniest things I have ever read.

Semantic search powered by Rivestack pgvector
3,871 stories · 36,122 chunks indexed