PPL: Load Balancing

Load Balancing

Our research in load balancing focuses on two primary areas: Object Migration and Seed Balancing.

Object Migration

Periodic Load Balancing for bipartite object networks
Adaptive use of Workstation Clusters
Optimal Object Migration to handle background load variation

A major reason slowing deployment of parallel programs is that efficient parallel programs are difficult to write. Parallel programming adds a second dimension to programming: not just when will a particular operation be executed, but where, i.e. what processor will perform it. A vast number of parallelizable applications do not have a regular structure for efficient parallelization. Such applications require load balancing to perform efficiently in parallel. The load in these applications may also change over time, requiring rebalancing. The programmer is left with the choice of either distributing computation haphazardly, producing poorly-performing programs, or spending more development time including load-balancing code in the application.

A typical object distribution on two processors in a Charm++ program.

The components and interactions in the load balancing framework.

In recent years, new types of "parallel" computers have appeared. Networks of commodity workstations are making parallel computation available to an expanding group of researchers. Workstation networks present new issues for the application programmer. Now, in addition to application imbalance, a parallel program must be concerned with background load from other simultaneous users. Parallel programs may run on clusters of workstations on an interactive user's desks, where the primary user only permits parallel computation when the computer is not being used interactively. Finally, computational clusters may expand over time, but with the rapid increase in computational power, new processors are likely to be faster than the older machines that they are supplementing. To maximize throughput, load balancers in parallel applications must account for all these factors.

Work migration is a unified scheme for handling both application-specific and externally-arising load imbalance. The difficulty with migrating work is that either work is repartitioned in an application-specific way, placing the burden on the application programmer, or that automatic migration is supported, but with poor accuracy, due to the lack of application-specific knowledge.

Object migration provides a way of performing accurate, fine-grained automatic load balancing. Objects usually have small, well-defined regions of memory on which they operate, reducing the cost of migration. Using the Charm++ object model, the run-time system measures the work represented by particular objects, rather than deriving execution time from application-specific heuristics. Furthermore, the run-time system records object-to-object communication patterns, so the load balancer can asses the communication impact of migrating particular objects.

With the advent of massively parallel machines like Bluegene/L and Cray XT3, our recent work has focussed on topology-aware migration of objects. We have developed strategies which take into account both the message sizes and the network hop length to minimize the total amount of communication.

Communication latencies form a significant factor in the performance of parallel applications on these large machines. The latencies are primarily due to network contention in the grid and torus networks, which are usually used in these large parallel machines. Our load balancing strategies minimize the impact of topology by heuristically minimizing the number of hops traveled by each communicated byte. They are not network specific and work for all classes of interconnection networks.

Seed Load Balancing

Seed load balancing involves the movement of object creation messages, or "seeds", to create a balance of work across a set of processors. Several variations of strategies are being analyzed. In particular, we distinguish between global strategies, which may result in communication amongst all processors to exchange load information, and neighborhood strategies, which typically impose a dense graph organization on the processors, and restrict communication to neighbors only. Some strategies use averaging of loads to determine how seeds should be distributed, while others use receiver-initiated strategies, where a processor requests work from elsewhere when it is about to go idle. A strategy that places seeds randomly when they are created and does no movement of seeds thereafter is used as a baseline for comparison on numerous benchmarks.

People

Papers

11-07 Aaron Becker, Gengbin Zheng and Laxmikant Kale, Distributed Memory Load Balancing, Encyclopedia of Parallel Computing, David Padua, Ed., 2011 (to appear)
10-26 Eduardo R. Rodrigues, Philippe O. A. Navaux, Jairo Panetta, Alvaro Fazenda, Celso L. Mendes and Laxmikant V. Kale , A Comparative Analysis of Load Balancing Algorithms Applied to a Weather Forecast Model, Proceedings of 22nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'2010)
10-20 Gengbin Zheng, Abhinav Bhatele, Esteban Meneses and Laxmikant V. Kale, Periodic Hierarchical Load Balancing for Large Supercomputers, accepted for publication in International Journal for High Performance Computing Applications (IJHPCA), 2010
10-17 Abhinav Bhatele, Automating Topology Aware Mapping for Supercomputers, PhD Thesis, Department of Computer Science, University of Illinois, 2010
http://hdl.handle.net/2142/16578
10-08 Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale, Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers, Proceedings of the International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), 2010
09-02 Abhinav Bhatele, Laxmikant V. Kale and Sameer Kumar, Dynamic Topology Aware Load Balancing Algorithms for Molecular Dynamics Applications, Proceedings of 23rd ACM International Conference on Supercomputing (ICS), 2009.
07-12 Abhinav Bhatele, Application-specific Topology-aware Mapping and Load Balancing for three-dimensional Torus Topologies, Master's Thesis, Department of Computer Science, University of Illinois, 2007
07-01 Gregory A. Koenig and Laxmikant V. Kale, Optimizing Distributed Application Performance Using Dynamic Grid Topology-Aware Load Balancing, Proceedings of 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach California USA, March 2007.
06-05 Gengbin Zheng, Orion Sky Lawlor, Laxmikant V. Kale, Multiple Flows of Control in Migratable Parallel Programs, the 8th Workshop on High Performance Scientific and Engineering Computing (HPSEC-06)
05-18 Tarun Agarwal, Amit Sharma, Laxmikant V. Kale, Topology-aware task mapping for reducing communication contention on large parallel machines, Proceedings of IEEE International Parallel and Distributed Processing Symposium 2006
05-06 Gengbin Zheng, Achieving High Performance on Extremely Large Parallel Machines: Performance Prediction and Load Balancing, Ph.D. Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 2005
99-06 Shobana Radhakrishnan, Robert Brunner, and Laxmikant Kale, Branch and Bound Based Load Balancing for Parallel Applications , Lecture Notes in Computer Science, Volume 1732, 1999.
99-03 Robert K. Brunner and Laxmikant V. Kale, Handling Application-Induced Load Imbalance using Parallel Objects, Parallel and Distributed Computing for Symbolic and Irregular Applications, pp. 167-181
99-02 Robert K. Brunner and Laxmikant V. Kale, Adapting to Load on Workstation Clusters, The Seventh Symposium on the Frontiers of Massively Parallel Computation
98-02 L. V. Kale, Milind Bhandarkar and Robert Brunner , Load Balancing in Parallel Molecular Dynamics, Fifth International Symposium on Solving Irregularly Structured Problems in Parallel
96-07 Sanjeev Krishnan and L. V. Kale, Automating Runtime Optimizations for Load Balancing in Irregular Problems, Proceedings of the Conference on Parallel and Distributed Processing Technology and Applications, San Jose, August 1996.
93-13 Amitabh B. Sinha and Laxmikant V. Kale, A Load Balancing Strategy For Prioritized Execution of Tasks, International Symposium on Parallel Processing, Newport Beach, CA, April 1993.
92-05 Amitabh B. Sinha and Laxmikant V. Kale, A Load Balancing Strategy For Prioritized Execution of Tasks, Workshop on Dynamic Object Placement and Load Balancing, in co-operation with ECOOP's 92, April 1992. Utrecht, The Netherlands.
89-08 W. W. Shu and L. V. Kale, A Dynamic Load Balancing Strategy for the Chare Kernel System, Supercomputing89, November, 1989, pp. 389--398

This page maintained by Gengbin Zheng Abhinav Bhatele.

Back to the PPL Research Page