Consider an algorithm for computing cutoff-based pairwise interactions between atoms in a molecular dynamics application, where interaction between atoms is considered only when they are within some cutoff distance of each other. The bounding box for the molecule is divided into a number of cubes (Patches) each containing some number of atoms. Since each patch contains different number of atoms and these atoms migrate between patches as simulation progresses, a dynamic load balancing scheme is used. In this scheme, the task of computing the pairwise interactions between atoms of all pairs of patches is divided among a number of Compute Objects. These compute objects are assigned at runtime to different processors. The initialization message for each compute object contains the indices of the patches. The patches themselves are distributed across processors. Mapping information of patches to processors is maintained by a replicated object called PatchManager.
{CodeOne}
class compute_object : public Chare {
private:
int count;
Patch *first, *second;
CkChareID chareid;
public:
compute_object(MSG *msg) {
count = 2; MyChareID(&chareid);
PatchManager->Get(msg->first_index,recv_first, &chareid,NOWAIT);
PatchManager->Get(msg->second_index,recv_second, &chareid,NOWAIT);
}
void recv_first(PATCH_MSG *msg) {
first = msg->patch;
filter(first);
if (--count == 0 ) computeInteractions(first,second);
}
void recv_second(PATCH_MSG *msg) {
second = msg->patch;
filter(second);
if (--count == 0) computeInteractions(first,second);
}
}
Figure 4.3 illustrates the Charm++ implementation of the flow of control in compute object as illustrated in figure 4.4. Each compute object requests information about both patches assigned to it from the PatchManager. PatchManager then contacts the appropriate processors and delivers the patch information to the requesting compute object. The compute object, after receiving information about each patch, determines which atoms in a patch do not interact with atoms in other patch since they are apart by more than the cut-off distance. This is done in the filter method. Filtering could be done after both patches arrive. However, in order to increase overlap between computations and communications, we do it immediately after any patch arrives. Since the patches can arrive at the requesting compute object in any order, the compute object has to buffer the arrived patches, and maintain state information using counters or flags. This example has been chosen for simplicity in order to demonstrate the necessity of counters and buffers. In general, a parallel algorithm may have more interactions leading to the use of many counters, flags, and message buffers, which complicates program development significantly.