Debugging Parallel Applications via Provisional Execution
PPL Technical Report 2010
Publication Type: Paper
Repository URL: papers/201009_ProvisionalDelivery

While debugging message-passing parallel applications, a notoriously difficult bug to solve is a race condition: messages from different sources may arrive at a processor in any order, and the processing of these messages in some particular order is not handled correctly by the application. What compounds the difficulty is that this bug may not manifest itself until the application is deployed at large scale, and even then it may happen only in a small fraction of runs.

By allowing a developer to quickly test the outcome of the computation resulting from handling several messages in a specific order while the application is running, we enable a fast way to inspect the application for the existence of race conditions. Furthermore, this can help reduce the need for large machine allocations to discover race conditions. A fundamental component of the feature is the capability to perform the provisional delivery of a message quickly, providing an equally fast rollback.

Filippo Gioachin, Laxmikant Kale, Debugging Parallel Applications via Provisional Execution, PPL Technical report, October 2010.
Research Areas