Artemis II fault tolerance
speckx
72 points
39 comments
May 01, 2026
Related Discussions
Found 5 related stories in 75.5ms across 8,303 title embeddings via pgvector HNSW
- How NASA built Artemis II’s fault-tolerant computer speckx · 229 pts · April 09, 2026 · 74% similar
- Artemis II is not safe to fly idlewords · 160 pts · March 31, 2026 · 58% similar
- Artemis II and the invisible hazard on the way to the Moon zeristor · 84 pts · April 10, 2026 · 58% similar
- Artemis II Looking Back at Earth DarmokJalad1701 · 22 pts · April 03, 2026 · 55% similar
- Artemis II is competency porn jgrodziski · 47 pts · April 11, 2026 · 55% similar
Discussion Highlights (10 comments)
tcp_handshaker
For the Airbus they used different CPUs because CPUs have bugs too...
WorkerBee28474
> Orion utilizes two Vehicle Management Computers, each containing two Flight Control Modules, for a total of four FCMs. But the redundancy goes even deeper: each FCM consists of a self-checking pair of processors. Who sits down and determines that 8 is the correct number? Why not 4? Or 2? Or 16 or 32?
MiracleRabbit
Interesting. In safety components we are using Lockstep Microcontrollers which are doing something similar in a much smaller scale. https://en.wikipedia.org/wiki/Lockstep_(computing) Example: https://www.st.com/resource/en/datasheet/spc574k72e5.pdf
y1n0
What I would like to see is the fault data. Also a graph of the # of in sync FMCs over time and how well did it correlate with predictions. I other words, how over engineered is it.
_whiteCaps_
I'm a big fan of Dissimilar Redundancies (but didn't know that was the term until today) for building system software. Build for various Linux distros, and some of the BSDs. You'll encounter weird compile errors or edge cases that will pop up. Often times I've found that these will expose undefined behaviour or incorrect assumptions that you wouldn't notice if you were building for a single platform.
m3kw9
The training the astronauts need must be a lot
ranger207
> The self-checking pairs ensure that if a CPU performs an erroneous calculation due to a radiation event, the error is detected immediately and the system responds. How does a pair determine which of the pair did the calculation correctly?
methodical
Candidly, while I understand the need for some amount of redundancy, I'm curious what this level of redundancy adds in terms of complexity to the system of a whole and whether or not that complexity-add almost outweighs the higher redundancy. I'm sure NASA has calculated the trade off, but I'd be curious to see the thoughts behind that. I feel in a similar vein when learning of certain aircraft accidents over the years, where it feels like the redundancy of certain systems and the complexity it adds has been the indirect cause of accidents instead of preventing them. I suppose there's not really a way to quantify the accidents that it's prevent to be able to compare them directly.
arscan
I interned at a company called Stratus which did hardware fault tolerant computers in the 80s/90s. I think they called it a “Pair and spare” approach, where every component had 3 copies running and comparing state every cycle. If one component’s state stopped matching the other 2, the failing component would be taken offline and the system would call home for a replacement to be fedexed overnight. I think just about every component was hotswappable too. Pretty cool, but expensive, and other architectures for improving availability, or mitigating impact from loss of availability, won out (except for a handful of exotic use cases).
Nevermark
The most significant redundancy, is the decisive N rockets for N launches, hedge against any and all operational degradation.