A. Structured Dagger

CHARM++ is based on the Message-Driven parallel programming paradigm. The message-driven programming style avoids the use of blocking receives and allows overlap of computation and communication by scheduling computations depending on availability of data. This programing style enables CHARM++ programs to tolerate communication latencies adaptively. Threads suffer from loss of performance due to context-switching overheads and limited scalability due to large and unpredictable stack memory requirements, when used in a data-driven manner to coordinate a sequence of remotely triggered actions.

The need to sequence remotely triggered actions arises in many situations. Let us consider an example:

      class compute_object : public Chare {
      private:
      int         count;
      Patch       *first, *second;
      public:
      compute_object(MSG *msg) {
      count = 2; MyChareID(&chareid);
      PatchManager->Get(msg->first_index, recv_first, &thishandle,NOWAIT);
      PatchManager->Get(msg->second_index, recv_second, &thishandle,NOWAIT);
      }
      void recv_first(PATCH_MSG *msg) {
       first = msg->patch;
       filter(first);
       if (-count == 0 ) computeInteractions(first,second);
      }
      void recv_second(PATCH_MSG *msg){
       second = msg->patch;
       filter(second);
       if (-count == 0) computeInteractions(first,second);
      }
     }

Consider an algorithm for computing cutoff-based pairwise interactions between atoms in a molecular dynamics application, where interaction between atoms is considered only when they are within some cutoff distance of each other. This algorithm is based on a combination of task and spatial decompositions of the molecular system. The bounding box for the molecule is divided into a number of cubes (Patches) each containing some number of atoms. Since each patch contains a different number of atoms and these atoms migrate between patches as simulation progresses, a dynamic load balancing scheme is used. In this scheme, the task of computing the pairwise interactions between atoms of all pairs of patches is divided among a number of Compute Objects. These compute objects are assigned at runtime to different processors. The initialization message for each compute object contains the indices of the patches. The patches themselves are distributed across processors. Mapping information of patches to processors is maintained by a replicated object called PatchManager. Figure [*] illustrates the CHARM++ implementation of the compute object. Each compute object requests information about both patches assigned to it from the PatchManager. PatchManager then contacts the appropriate processors and delivers the patch information to the requesting compute object. The compute object, after receiving information about each patch, determines which atoms in a patch do not interact with atoms in another patch since they are separated by more than the cut-off distance. This is done in method filter. Filtering could be done after both patches arrive. However, in order to increase processor utilization, we do it immediately after any patch arrives. Since the patches can arrive at the requesting compute object in any order, the compute object has to buffer the received patches, and maintain state information using counters or flags. This example has been chosen for simplicity in order to demonstrate the necessity of counters and buffers. In general, a parallel algorithm may have more interactions leading to the use of many counters, flags, and message buffers, which complicates program development significantly.

Threads are typically used to perform the abovementioned sequencing. Lets us code our previous example using threads.

void compute_thread(int first_index, int second_index)
{
    getPatch(first_index);
    getPatch(second_index);
    threadId[0] = createThread(recvFirst);
    threadId[1] = createThread(recvSecond);
    threadJoin(2, threadId);
    computeInteractions(first, second);
  }
  void recvFirst(void)
  {
    recv(first, sizeof(Patch), ANY_PE, FIRST_TAG);
    filter(first);
  }
  void recvSecond(void)
  {
    recv(second, sizeof(Patch), ANY_PE, SECOND_TAG);
    filter(second);
  }

Contrast the compute chare-object example in figure [*] with a thread-based implementation of the same scheme in figure [*]. Functions getFirst, and getSecond send messages asynchronously to the PatchManager, requesting that the specified patches be sent to them, and return immediately. Since these messages with patches could arrive in any order, two threads, recvFirst and recvSecond, are created. These threads block, waiting for messages to arrive. After each message arrives, each thread performs the filtering operation. The main thread waits for these two threads to complete, and then computes the pairwise interactions. Though the programming complexity of buffering the messages and maintaining the counters has been eliminated in this implementation, considerable overhead in the form of thread creation, and synchronization in the form of join has been added. Let us now code the same example in Structured Dagger. It reduces the parallel programming complexity without adding any significant overhead.

  array[1D] compute_object {
    entry void recv_first(Patch *first);
    entry void recv_second(Patch *first);
    entry void compute_object(MSG *msg){
      atomic {
         PatchManager->Get(msg->first_index,...);
         PatchManager->Get(msg->second_index,...);
      }
      overlap {
        when recv_first(Patch *first) atomic { filter(first); }
        when recv_second(Patch *second) atomic { filter(second); }
      }
      atomic { computeInteractions(first, second); }
    }
  }

Structured Dagger is a coordination language built on top of CHARM++ that supports the sequencing mentioned above, while overcoming limitations of thread-based languages, and facilitating a clear expression of flow of control within the object without losing the performance benefits of adaptive message-driven execution. In other words, Structured Dagger is a structured notation for specifying intra-process control dependences in message-driven programs. It combines the efficiency of message-driven execution with the explicitness of control specification. Structured Dagger allows easy expression of dependences among messages and computations and also among computations within the same object using when-blocks and various structured constructs. Structured Dagger is adequate for expressing control-dependencies that form a series-parallel control-flow graph. Structured Dagger has been developed on top of CHARM++.Structured Dagger allows CHARM++ entry methods (in chares, groups or arrays) to specify code (a when-block body) to be executed upon occurrence of certain events. These events (or guards of a when-block) are entry methods of the object that can be invoked remotely. While writing a Structured Dagger program, one has to declare these entries in CHARM++ interface file. The implementation of the entry methods that contain the when-block is written using the Structured Dagger language. Grammar of Structured Dagger is given in the EBNF form below.



Subsections

May 26, 2012
Charm Homepage