Linux 7.0 Broke PostgreSQL: The Preemption Regression Explained
0xKelsey
127 points
67 comments
April 29, 2026
Related Discussions
Found 5 related stories in 76.4ms across 8,303 title embeddings via pgvector HNSW
- AWS engineer reports PostgreSQL perf halved by Linux 7.0, fix may not be easy crcastle · 203 pts · April 05, 2026 · 64% similar
- What's new in Linux kernel for PostgreSQL erthalion · 21 pts · March 03, 2026 · 63% similar
- PostgreSQL production incident caused by transaction ID wraparound tcp_handshaker · 25 pts · April 18, 2026 · 52% similar
- Google Cloud: Investing in the Future of PostgreSQL kevincox · 23 pts · April 01, 2026 · 49% similar
- Postgres minor releases closing 11 CVEs tee-es-gee · 24 pts · May 14, 2026 · 49% similar
Discussion Highlights (17 comments)
selckin
https://news.ycombinator.com/item?id=47644864
PunchyHamster
Seems Linus needs to yell at someone again. Especially with containers around you might very well hit the case of running new kernel but older version of PostgreSQL with no code mitigation for the problem
baq
TLDR of the LMKL thread: 120GB RAM postgres with hugepages=off, lock contention went from terrible to abysmal. nothing to see here except that amazon for whatever reason runs DB tests with huge pages disabled. (hope I'm not paying for RDS and auroras like that in production!)
dataflow
An X% performance regression is basically a (100 - X)% feature breakage, so whatever that implies in terms of breaking userspace...
nijave
Right on the heels of 6.19 breaking tcmalloc and Mongo
MBCook
This only happened under a very odd configuration. Yeah it wasn’t great but it was not the normal case. The headline implies it broke PG everywhere. It didn’t.
buster
I'd rather like to know if any real world usage broke, before coming to the conclusion that an edge case synthetic benchmark is worth changing the kernel (back or wherever) where supposedly the change that broke the benchmark had real world benefits. Since we will never know it might be a good idea to feature gate the change, change the default and let users decide to change it back. This may give some feedback on the lkml or else to decide if the change is worthwhile?
ameliaquining
This post comes uncomfortably close to plagiarizing https://thebuild.com/blog/2026/04/23/preempt_none-is-dead-yo... , which it cites as a source; almost all the technical explanation is in there and some of the wording is extremely similar. Compare, e.g., "What Linux 7.0 actually changed" in Pettus's post to "What Is Preemption?" in this one. I think this link should have been to Pettus's post instead.
ozgrakkurt
It is a crime that postgres isn't able to allocate with 1GB huge pages by changing a config parameter in 2026 Also a crime that people are still running databases with 4kb pages. To put it in perspective, this means you will have more than 30 million pages on a server with 128GB RAM. As an example, if there is 16bytes of metadata for memory page. The metadata itself would take more than half a gigabyte.
ApolloFortyNine
I can't help but think of the classic XKCD example of breaking a user's workflow [1]. Doing research though a spinlock actually doesn't seem as unusual a hack as it would first seem, do drivers and the like not have similar issues because they don't trigger a page fault I guess? [1] https://xkcd.com/1172/
ahartmetz
PREEMPT_LAZY triggering on page faults seems like a bad idea in light of this. It is probably not a good idea to suspend processes right when they get unexpectedly bogged down. The logic makes a little more sense for syscalls that are expected to take long compared to a scheduling quantum (a few milliseconds). But page faults are mostly invisible and unplannable. It only took a few decades for Linux to get a good CPU scheduler and good I/O schedulers, too. I don't get how such an important area can be so bad for so long. But then, bad scheduling is everywhere . I find it to be a pretty fun area to work in, but, judging by how much it is less than half-assed in much existing software, most developers seem to hate dealing with it?
jeltz
Moderators should change this headline because it is nowhere near true. It only regressed performance on some incorrect configurations.
fulafel
> PREEMPT_NONE: The kernel almost never interrupts a running thread This seems confused. These are options for preemptibility of the kernel, which is a relatively modern fearure. Userspace could always be preempted and these options do not change anything there. The kernel must in any case frequently interrupt threads and processes to implement preemptive multitasking which Linux of course had since the beginning. Read more eg at https://lwn.net/Articles/944686/ or help texts at https://github.com/torvalds/linux/blob/master/kernel/Kconfig...
fabian2k
That regression is maybe most useful as a reminder to people to configure huge pages for PostgreSQL. That's the one recommended basic performance tuning that is just annoying enough to set up that I suspect many people with smaller DBs will skip it. Though I actually don't know how large shared buffers has to be for huge pages to make a noticeable difference.
singron
This has the wrong explanation of the proposed rseq (Restartable Sequences) solution. > a Linux kernel facility that lets userspace code detect whether it was preempted or migrated during a critical section and restart it if so. PostgreSQL's spinlock paths would use rseq to detect preemption and retry, avoiding the scenario where a preempted lock holder stalls all waiting backends. The real proposal is about time-slice extension, which is a feature that uses the abi for rseq but otherwise has nothing to do with retrying critical sections. While a process holds a s_lock, it would set a request bit. If the kernel tries to preempt that thread while the request bit is set, it instead extends the time slice once and returns control back to the thread. It's further explained here: https://docs.kernel.org/userspace-api/rseq.html
cachius
The last time a linux upgrade broke PG was the xz backdoor.
nubinetwork
They got rid of PREEMPT_NONE? Just a while ago they got rid of slab, and the noop io scheduler. Why do they insist on removing features that don't make sense on a desktop according to some random bozo? Not everyone is running a dyntick laptop.