The fastest Linux timestamps
hmpc
45 points
12 comments
April 26, 2026
Related Discussions
Found 5 related stories in 91.6ms across 8,303 title embeddings via pgvector HNSW
- Linux 6.6 LTS To Linux 7.1 Bechmarks: Performance Up 13% Threadripper Over 3 yrs Bender · 29 pts · May 18, 2026 · 49% similar
- Linux 7.0 File-System Benchmarks wasting_time · 14 pts · March 07, 2026 · 49% similar
- GitHub's Historic Uptime todsacerdoti · 447 pts · March 31, 2026 · 47% similar
- Linux Page Faults, MMAP, and userfaultfd for fast sandbox boot times shayonj · 14 pts · March 12, 2026 · 45% similar
- Things Linux Can Do That Windows Still Can't mikece · 22 pts · March 16, 2026 · 44% similar
Discussion Highlights (6 comments)
Veserv
You can still do better by not doing any TSC to nanosecond conversion. Instead, you inject clock adjustments into your log stream allowing you to convert when you decode the logs rather than when you generate the logs. Using the final version, you would just make the cache refresh() function emit the clock adjustment log entry instead of actually caching anything. Then, any later log entry TSC would implicitly be relative to that clock adjustment log entry when you decode the log. Worst case you would need to persist every clock adjustment entry even when sampling, but that would still only be on the order of a few KB/s at worst and you could still drop entrys with no non-clock adjustment entrys between them.
jeffbee
I can beat this by not trying to wrap a trace span around something that only takes 100ns. If the thing of interest just runs on the CPU briefly, tracing is not what you want. You want a profiler that only runs when you're looking at it. Distributed tracing is for things that can go wrong and take uncertain amounts of time.
rockwotj
Related reading is absl’s linear approximation of gettime from cycle counters, which I thought is a neat trick: https://github.com/abseil/abseil-cpp/blob/351086314d46e73d43...
amluto
If you do this, please be aware that there is absolutely no guarantee that you will not observe time going backwards. You probably will not have one thread ask for the time twice in a row and get results that are out of order, but you can have thread 1 ask for the time and do a store-release and then have thread 2 do a load-acquire, observe thread 1’s write, and ask for the time, and thread 2’s time may be earlier than thread 1’s. This is because RDTSC by itself does not respect x86’s memory order — it does not act like a load. source: I wrote a bunch of this code and I’ve tested it fairly extensively.
RossBencina
Great read. A question: what is the status of this problem on other architectures such as ARM and RISC-V, would the analysis and solution be the same? e.g. does ARM have invariant TSC?
loeg
If you can get away from measuring in the seconds domain, just sticking to timecounter values really simplifies. Embed a single reference timestamp pair (seconds domain <-> ticks) in your trace and do post-facto conversion in your analysis tool.