Show HN: Retrace – reverse debugging for production CPython applications
Nathan here, one of the people who built Retrace. Happy to answer technical questions. Retrace records a CPython application's interactions with the nondeterministic outside world i.e. network, DB, filesystem, time, randomness, subprocesses, and lets you replay that execution locally and deterministically. The goal is to take a production failure and open the same execution in VS Code, with the ability to step forwards and backwards through the replay. The core idea is to record boundary crossings rather than tracing every Python line in production. External calls are recorded as calls/results/errors, and replay stubs return the recorded results so the original application code runs again deterministically. The preview today covers Python 3.11/3.12 on macOS and Linux, with Flask, Django, requests, psycopg2, and threading/forking covered. There is a compatibility table in the README. This is a preview, not a finished product. Things we know are missing: async support is partial, FastAPI is not in the table yet, Windows is not supported, and free-threaded 3.13 is detected and refused. Happy to go deep on: - how we get determinism on real Python stacks (threads, async, third-party libraries, C extensions) - recording overhead, what it depends on, and what we have actually benchmarked vs claimed - what works and what does not yet - how this differs from rr, Replay.io, pdb time-travel forks, and APM tools Blog post (longer write-up): https://retracesoftware.com/blog/introducing-retrace/
Discussion Highlights (3 comments)
michaelsalim
Congrats to the team for the launch! I helped build a part of this in the past. The repo is complex but at its core, this is software to record execution without the performance & storage penalty that would usually come with recording all of production. To do that, they need to make sure that they record anything this is not deterministic, while leaving code that is deterministic to be executed during replay time. To be honest, I think this is a really hard problem, almost impossible I'd say. There's just so many things that can cause the same execution to cause different results. But last I've seen, the team is slowly squashing each of the edge cases. I think they've now gotten it to be quite stable. If everything goes well, this is very exciting and I think can revolutionise how we debug production code as an industry. I unfortunately don't run Python code so I can't meaningfully test this. Here's hoping it takes off and one day it'll be ported to the languages I use!
thedavidprice
I (we) have seen these pieces before: replay, time-travel debugging, tracing, APM, event sourcing, etc. What I haven’t seen is the whole thing work as described in a production setting. I.e. capture enough of a production run that you can replay it locally, get the same inputs/results, and then trace a bad value back to where it came from. Has anyone seen prior tech that got this whole combination viably working? Or is there true (potential) novelty here in the combination of production replay + value provenance + usable workflow?
alzamos
Do I understand correctly that this would enable me to do retroactive logging/perf-instrumentation?