Subroutines can also be suspended without using threads or macros by applying a simple pre-processor. We developed Structured Dagger (SDAG) [22] as a coordination language for expressing the control flow within a Charm++ parallel object naturally using certain C language-like constructs. In parallel programming, it leads to a style of programming which cleanly separates parallel from sequential functionality.
Figure 1 shows an example of a parallel 5-point stencil program with 1-D decomposition and ghost cell exchange written in SDAG. In the program, the for loop implements an outer iteration loop. Each iteration begins by calling sendStripToLeftAndRight in an atomic construct to send out messages to the neighbors.2The overlap immediately following asserts that the two events corresponding to getStripFromLeft and getStripFromRight can occur and be processed in any order. The when construct simply says that when a message (e.g. getStripFromLeft) arrives for the construct, it invokes the atomic action which calls a plain C++ function copyStripFromLeft to process the message. After both when constructs are executed, function doWork in the last atomic construct will be invoked and the program enters the next iteration of the for loop. The Structured Dagger preprocessor transforms all this syntax into code for an efficient finite-state machine, which receives and processes the network messages at runtime.
Overall, the event-driven style can be made quite efficient and reasonably easy to program. However, the event-driven style is difficult to apply to existing codes -- in particular, most parallel programming interfaces like the Message Passing Interface (MPI) are written in terms of blocking subroutine calls like MPI_Recv. A traditional C-compatible compiled library can only hope to switch tasks at these blocking calls by saving and restoring the machine's stack and registers in a thread-like fashion. Hence for the remainder of this paper, we focus on supporting threads.