Structured Dagger is implemented on top of Charm++ as a translator and a run-time library. The Structured Dagger translator transforms the program to an equivalent Charm++ program. The translation consists of splitting a structured entry-method into a number of Charm++ entry-methods and private methods, inserting counters and flags to specify dependences between different component constructs of the structured entry-method. For each construct, the translator generates object-private methods for enabling the construct and for the completion of the construct. A construct is enabled when all its predecessors in the program order have been satisfied, and it is completed when all its component constructs have finished.
The runtime library maintains a collection of message queues and a list of pending when-blocks within every object that contains structured entries. The message queues are indexed by the entry-method numbers. Whenever any when-block is enabled, it checks if the messages intended for its component entries have already arrived in the message queue. If all of these are available, it enables its constituent constructs and if possible executes them. In particular, atomic constructs do not have dependence on message arrival, therefore the code generated for when-block executes code in those constructs. When a message directed at an entry method arrives, the generated code for that entry method inserts the message in an appropriate queue. It then checks to see if any when-blocks have been waiting for that entry to be triggered. If a when-block is waiting and has all its dependences satisfied with the arrival of this message, its component constructs are enabled, and if possible, executed. If this message has arrived out of order (i.e. when no when-block was waiting for it), the entry method buffers the message and sets appropriate flags indicating its availability. By doing a careful analysis of the dependence structure, the translator avoids periodic checking for all enabled when-blocks and only when-blocks that may be waiting for a particular entry-method are enabled.
For assessing the performance impact of our translation scheme, we ran two simple programs on a single node of SGI Origin2000. The first program created two objects (threads), supplying one of them with a seed message, which then started sending messages with reference numbers in the reverse order as that expected by the other object (thread). The other object (thread), upon receiving a message simply bounced it back to the first object (thread). The second program also created two objects (threads), but supplied the first object (thread) with a random order of reference numbers. The first object (thread) sent messages to the second object (thread) with reference numbers in the supplied order, while the other object (thread) received them in sequential order.
We compared the performance of our Structured Dagger programs with Charm++ programs and also with multi-threaded programs written using thread-objects in Converse. The results for 10000 round-trip messages (each of size 4 bytes) are in Table 1. As can be seen from these results, Structured Dagger program does not add significant overhead to the native Charm++ code while reducing the program complexity. The cost of context-switching in a multi-threaded program is significant when compared with Structured Dagger and Charm++ in the absence of any computation between the coordination steps.
|
Structured Dagger eliminates stack-size estimation required for multi-threaded components. Stack-size estimation is a non-trivial task for most real world applications, especially when system library functions are invoked. One has to conservatively estimate the stack size in order to avoid runtime errors of stack overflow. This leads to wastage of memory space. Structured Dagger methods are invoked from the message-driven scheduler, and thus execute on the default process stack, which is typically allocated in chunks by the operating system on demand, and thus does not require conservative estimation, thus leading to reduced memory requirements.
In order to make it possible for NAMD to simulate large molecular systems on machines with limited resources, we implemented the sequencer as a message-driven object using Structured Dagger. This new version of NAMD reduced the memory requirement significantly (consuming about 3.7 MB instead of 65 MB required for the threaded version, and it was possible to use NAMD on machines with limited memory without any performance degradation.