Optimizing Distributed Application Performance Using Dynamic Grid Topology-Aware Load Balancing
Authors:
Gregory A. Koenig and Laxmikant V. Kale
Parallel Programming Laboratory, Department of Computer Science, University
of Illinois at Urbana-Champaign
Proceedings of 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach California USA, March 2007.
Grid computing offers a model for solving large-scale scientific
problems by uniting computational resources owned by multiple
organizations to form a single cohesive resource for the duration of
individual jobs. Despite the appeal of using Grid computing to solve
large problems, its use has been hindered by the challenges involved
in developing applications that can run efficiently in Grid
environments. One substantial obstacle to deploying Grid applications
across geographically distributed resources is cross-site latency.
While certain classes of applications, such as master-slave style or
functional decomposition type applications, lend themselves well to
running in Grid environments due to inherent latency tolerance, other
classes of applications, such as tightly-coupled applications in which
each processor regularly communicates with its neighboring processors,
represent a significant challenge to deployment on Grids.
In this paper, we present a dynamic load balancing technique for Grid
applications based on graph partitioning. This technique exploits
knowledge of the topology of the Grid environment to partition the
computation's communication graph in such a way as to reduce the
volume of cross-site communication, thus improving the performance of
tightly-coupled applications that are co-allocated across distributed
resources. Our technique is particularly well suited to codes from
disciplines like molecular dynamics or cosmology due to the
non-uniform structure of communication in these types of applications.
We evaluate the effectiveness of our technique when used to optimize
the execution of a tightly-coupled classical molecular dynamics code
called LeanMD deployed in a Grid environment.