CHARM++ is based on the Message-Driven parallel programming paradigm. The message-driven programming style avoids the use of blocking receives and allows overlap of computation and communication by scheduling computations depending on availability of data. This programing style enables CHARM++ programs to tolerate communication latencies adaptively. Threads suffer from loss of performance due to context-switching overheads and limited scalability due to large and unpredictable stack memory requirements, when used in a data-driven manner to coordinate a sequence of remotely triggered actions.
The need to sequence remotely triggered actions arises in many situations. Let us consider an example:
Consider an algorithm for computing cutoff-based pairwise interactions
between atoms in a molecular dynamics application, where interaction
between atoms is considered only when they are within some cutoff
distance of each other. This algorithm is based on a combination of
task and spatial decompositions of the molecular system. The bounding
box for the molecule is divided into a number of cubes (Patches)
each containing some number of atoms. Since each patch contains a
different number of atoms and these atoms migrate between patches as
simulation progresses, a dynamic load balancing scheme is used. In
this scheme, the task of computing the pairwise interactions between
atoms of all pairs of patches is divided among a number of Compute Objects. These compute objects are assigned at runtime to
different processors. The initialization message for each compute
object contains the indices of the patches. The patches themselves are
distributed across processors. Mapping information of patches to
processors is maintained by a replicated object called PatchManager. Figure
illustrates the CHARM++ implementation of the compute object. Each compute object requests
information about both patches assigned to it from the
PatchManager. PatchManager then contacts the appropriate processors
and delivers the patch information to the requesting compute
object. The compute object, after receiving information about each
patch, determines which atoms in a patch do not interact with atoms in
another patch since they are separated by more than the cut-off
distance. This is done in method filter. Filtering could be
done after both patches arrive. However, in order to increase
processor utilization, we do it immediately after any patch
arrives. Since the patches can arrive at the requesting compute object
in any order, the compute object has to buffer the received patches,
and maintain state information using counters or flags. This example
has been chosen for simplicity in order to demonstrate the necessity
of counters and buffers. In general, a parallel algorithm may have
more interactions leading to the use of many counters, flags, and
message buffers, which complicates program development significantly.
Threads are typically used to perform the abovementioned sequencing. Lets us code our previous example using threads.
Contrast the compute chare-object example in figure
with
a thread-based implementation of the same scheme in
figure
. Functions getFirst, and getSecond send
messages asynchronously to the PatchManager, requesting that the specified
patches be sent to them, and return immediately. Since these messages with
patches could arrive in any order, two threads, recvFirst and
recvSecond, are created. These threads block, waiting for messages to
arrive. After each message arrives, each thread performs the filtering
operation. The main thread waits for these two threads to complete, and then
computes the pairwise interactions. Though the programming complexity of
buffering the messages and maintaining the counters has been eliminated in this
implementation, considerable overhead in the form of thread creation, and
synchronization in the form of join has been added. Let us now code the
same example in Structured Dagger. It reduces the parallel programming complexity without
adding any significant overhead.
Structured Dagger is a coordination language built on top of CHARM++ that supports the sequencing mentioned above, while overcoming limitations of thread-based languages, and facilitating a clear expression of flow of control within the object without losing the performance benefits of adaptive message-driven execution. In other words, Structured Dagger is a structured notation for specifying intra-process control dependences in message-driven programs. It combines the efficiency of message-driven execution with the explicitness of control specification. Structured Dagger allows easy expression of dependences among messages and computations and also among computations within the same object using when-blocks and various structured constructs. Structured Dagger is adequate for expressing control-dependencies that form a series-parallel control-flow graph. Structured Dagger has been developed on top of CHARM++.Structured Dagger allows CHARM++ entry methods (in chares, groups or arrays) to specify code (a when-block body) to be executed upon occurrence of certain events. These events (or guards of a when-block) are entry methods of the object that can be invoked remotely. While writing a Structured Dagger program, one has to declare these entries in CHARM++ interface file. The implementation of the entry methods that contain the when-block is written using the Structured Dagger language. Grammar of Structured Dagger is given in the EBNF form below.
May 26, 2012
Charm Homepage