Developing parallel Computational Science and Engineering (CSE) applications is a complex task. One has to implement the right physics, develop or choose and code appropriate numerical methods, decide and implement the proper input and output data formats, perform visualizations, and be concerned with correctness and efficiency of the programs. It becomes even more complex for multi-physics coupled simulations such as the solid propellant rocket simulation application. Our philosophy is to lessen the burden of the application developers by providing advanced programming paradigms and versatile runtime systems that can handle many common performance concerns automatically and let the application programmers focus on the actual application content.
One such concern is that of load imbalance. In a dynamic simulation application such as rocket simulation, burning solid fuel, sub-scaling for a certain part of the mesh, crack propagation, particle flows all contribute to load imbalance. Centralized load balancing strategy built into an application is impractical since each individual modules are developed almost independently by various developers. Thus, the runtime system support for load balancing becomes even more critical.
Automatic load balancing is infeasible for a program about which nothing is known. Other approaches to automatic load balancing therefore require the applications to provide hints about the load to the runtime system, or restrict load balance to a certain kind of algorithms such as Adaptive Mesh Refinement or to certain architectures such as shared memory machines. Our approach is based on actual measurement of load information at runtime, and on migrating computations from heavily loaded to lightly loaded processors.
For this approach to be effective, we need the computation to be split into pieces many more in number than available processors. This allows us to flexibly map and re-map these computational pieces to available processors. This approach is usually called ``multi-domain decomposition''.
CHARM++, which we use as a runtime system layer for the work described here exemplifies our approach. It embeds an elaborate performance tracing mechanism, a suite of plug-in load balancing strategies, infrastructure for defining and migrating computational load, and is interoperable with other programming paradigms.
June 29, 2008
AMPI Homepage
Charm Homepage