next up previous contents
Next: Motivating Example Up: Charisma: A Component Architecture Previous: Port Examples   Contents


Intra-component Coordination

Converse provides constructs for attaching code blocks to availability of specific messages. These blocks are scheduled for execution by the run-time system when the specified messages arrive. This scheme allows coexistence of multiple components within a single operating system process, while minimizing the performance impact of communication latency. Converse schedules a ready component for execution while other components are waiting for data. In an object-oriented message-driven languages, such as Charm++, these blocks correspond to methods of parallel objects, called entry-methods. In Charisma, computations are attached to availability of data at the input ports. For Charisma components written using Charm++, input ports are mapped to Charm++ entry methods, so that asynchronous invocation of Charm++ objects' entry methods is equivalent to making data available on input ports by emitting data to the connected output ports.

Thus a Charisma component is a flat collection of computations attached to a set of access points as shown in figure 4.1. However, real world components may have complex control-flow structure as shown in figure 4.2. Individual computations not only depend on the availability of data for them but also upon the completion of previous computations. Further, components that implement iterative parallel computations may have loops in their control-flow structures, which cannot be clearly expressed in components written in the style of figure 4.1.

Figure 4.1: Charisma components lack explicit control-flow specification.
\includegraphics[width=4in]{figures/flat}

Figure 4.2: Real world components may have complex control flow structures. In this diagram, solid circles denote computations that are internal to the component. Dashed circles denote external computations. Solid lines are dependencies internal to the component, whereas dashed lines denote external dependencies among computations.
\includegraphics[width=5in]{figures/controlflow}

In addition, The order of execution of computations is determined by the order of data (bundled in Converse messages) available at the input ports and by priorities associated with these ports. Due to unpredictable delays in remote response times, the messages may arrive in any order and the programmer must deal with all possible message orderings. This increases the programming complexity significantly because the component code has to handle all possible message orderings, by buffering unexpected messages for later delivery, and by writing bookkeeping code to check and fetch from these buffers upon completion of sub-tasks. We call this code ``component coordination''.

The intra-component control flow can be expressed by issuing blocking receives for tagged messages in multithreaded components. This method is appropriate for explicit message-passing components that block for receiving specific tagged messages. In thread-based coordination, a blocking receive blocks only the calling thread, and not entire processor. An example of such orchestrator is the sequencer subroutine in NAMD that coordinates the actions of a patch object. For message-passing components that use single threaded messaging libraries such as MPI, a user-level thread is an alien concept. Thus exposing explicit thread yielding to the component developer will need major changes to made to the existing MPI-based components. However, implementation of blocking calls such as MPI_Recv or collective communication operations can internally suspend a thread and yield to other running threads on a processor. In the next chapter, we describe Adaptive MPI, our implementation of MPI over Converse that simplifies component coordination.

While threads simplify coordination of message-driven components, there are several disadvantages of threads. Threads have to pay a performance penalty in terms of the context-switching costs. Implicit references to threads' stacks make thread migration complicated. Also, it is difficult to precisely estimate the stacksize requirements for threads, leading to wastage of memory. At thread creation time, one has to reserve the maximum stack space that may be used by the thread in its lifetime. For example, in NAMD, the estimated stack size for sequencer threads was 128 KB each. For a molecular system of 37000 atoms divided into 343 patches, NAMD would consume about 65 MB of memory, causing it to exhaust all available memory on some machines.

Our method for simplifying component coordination avoids use of threads. Instead, our approach provides a language for expressing control-dependence graphs for components, and generating the coordination glue code from these graphs. Dagger [28] allows easy expression of such dependence graphs by introducing when-blocks and condition variables in a message-driven component. A when-block specifies dependences as a list of messages and condition variables with their associated reference numbers. A Dagger program tells the run-time system that it is at a stage to process a message by issuing an expect statement. A condition variable is used to signal the end of a when-block with a ready statement. Thus control-dependences among when-blocks belonging to the same message-driven object can be expressed using condition variables. However, the structure of Dagger programs does not clearly reflect the control flow within an object because a Dagger program is a flat collection of when-blocks.

We have developed a coordination language called Structured Dagger (Structured Dagger [39]) for this purpose. This language provides structured constructs that adequately express most commonly occurring control flow graphs viz. the series-parallel graphs.

The next section motivates simplification of component coordination with an example.



Subsections
next up previous contents
Next: Motivating Example Up: Charisma: A Component Architecture Previous: Port Examples   Contents
Milind Bhandarkar 2002-06-12