Large scientific applications such as coupled simulations are composed from independently written MPI modules. AMPI makes it easier to combine such independently written MPI modules, because each individual module is run inside its own thread group, with it's own MPI_COMM_WORLD. Encapsulation of the global data by converting them into thread-private data, and the namespace separation effected by separate communicators allow these MPI modules co-exist within a single application. These module execute concurrently, thus overlapping idle times in one module with useful work in others.
In order to allow these modules to interact with each other, AMPI introduces
the notion of cross-communicator point-to-point communication. Thus, any
virtual processor in one module may send messages to and receive messages from
other virtual processors in other modules using the same syntax and semantics
as the MPI point-to-point communication subroutines. A concrete example may
make this clearer. Suppose an application consists of two modules, A and B, as
shown in figure 5.10. Each of these modules first need to
register themselves, so that AMPI knows how to invoke each of these modules,
and also allocates a communicator for them. This is done by providing a
subroutine called MPI_Setup as shown in figure 5.11.
The AMPI registration call MPI_Register returns an index for the
module, which can be used to look up the ``world'' communicator
(MPI_COMM_WORLD) for that module. These communicators are stored in
an indexed communicator array MPI_COMM_UNIVERSE. Thus if a virtual
processor from module A needs to communicate with virtual processor 14 in
module B, it can send a message to it using an MPI call such as
MPI_Send, but specifying MPI_COMM_UNIVERSE[B_Idx] as
communicator.
{CodeOne}
subroutine MPI_Setup
A_Idx = MPI_Register("A", A_Main)
B_Idx = MPI_Register("B", B_Main)
end subroutine
While this technique is suitable for components that are designed to be complementary only to each other, it does not result in truly reusable components, since components have to possess explicit knowledge of other component's decomposition. For example, in figure 5.10, virtual processor a of module A has to know the rank of virtual processor b of module B. If the decomposition of module B changes, say by splitting each original chunk into more chunks, module A will have to change its code to reflect that. This dependence on the other modules is contrary to the Charisma philosophy. However, a small modification to the way components are registered results in the Charisma-style interfaces for AMPI.
In the Charisma model, AMPI components register themselves with the runtime system, and as before, they get a unique MPI_COMM_WORLD. However, they also get two additional communicators, to which they can register their input and output ports. These communicators are called MPI_COMM_INPUT, and MPI_COMM_OUTPUT. At the registration stage, each component specifies its input and output ports to the runtime system using the call MPI_ADD_PORT. The parameters to this call are: the name of the port, and the MPI data-type it expects. This call returns the port index, that can later be used to publish data in case of output ports, or to wait for it in case of input ports. The application composer, after each component has been registered, specifies the connection between ports using Charisma port-binding calls. For an application composer written using AMPI, MPI_BIND call is made available for this purpose.
When an AMPI component needs to publish data on an output port, it sends a message using MPI_Send to an appropriate ``pseudo-processor'' (port index returned by MPI_ADD_PORT) in the communicator MPI_COMM_OUTPUT. Similarly, when it requires data on an input port, it makes the MPI_Recv call on the appropriate port index with the communicator MPI_COMM_INPUT. Since the connections between components are made outside of the component in this model, any AMPI component can be developed completely independently, without knowing about other components.
With colleagues at the Center for Simulation of Advanced Rockets (CSAR), we have converted some large MPI applications using this approach. The techniques used, efforts involved, and preliminary performance data are given in the next section.