Live Webcast 15th Annual Charm++ Workshop

-->
Robust Non-Intrusive Record-Replay with Processor Extraction
Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging at ISSTA (PADTAD) 2010
Publication Type: Paper
Repository URL: 201003_RecordReplay
Abstract
With the advent of increasingly larger parallel machines, debugging is becoming more and more challenging. In particular, applications at this scale tend to behave non-deterministically, leading to race condition bugs. Furthermore, gaining access to these large machines for long debugging sessions is generally infeasible. In this paper, we present a 3-step algorithm to perform what we call ``processor extraction'': a procedure to record the execution of a set of processors from a parallel application, and replay any of them in a controlled environment. Our technique generates very low interference in the recorded program thanks to the separation between non-determinism elimination, and detailed processor recording. In order to improve robustness and accuracy, we further augmented our algorithm with a self-correction mechanism.
TextRef
Filippo Gioachin and Gengbin Zheng and Laxmikant V. Kal{\'e}, "Robust Record-Replay with Processor Extraction", in Proceedings of the Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging (PADTAD - VIII), 2010
People
Research Areas