Periodic Hierarchical Load Balancing for Large Supercomputers
International Journal for High Performance Computing Applications (IJHPCA) 2010
Publication Type: Paper
Repository URL: 201006_HierLdbIJHPCA
Large parallel machines with hundreds of thousands of processors are being built. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to yield poor load balance on very large machines. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and poor solutions of traditional distributed schemes. This is done by creating multiple levels of load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We present techniques to deal with scalability challenges of load balancing at very large scale. We show performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger cluster (at TACC) and 65,536 cores of a Blue Gene/P at Argonne National Laboratory for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD with results on the Blue Gene/P machine at ANL.
Gengbin Zheng, Abhinav Bhatele, Esteban Meneses and Laxmikant V. Kale, "Periodic Hierarchical Load Balancing for Large Supercomputers", accepted for publication in International Journal for High Performance Computing Applications (IJHPCA), 2010
Research Areas