Handling Transient and Persistent Imbalance Together in Distributed and Shared Memory
PPL Technical Report 2016
Publication Type: Paper
Repository URL: http://charm.cs.illinois.edu/newPapers/16-19/techreport.pdf
The recent trend of rapid increase in the number of cores per chip has resulted in vast amount of on-node parallelism. Not only the number of cores per node is increasing substantially but also the cores are becoming heterogeneous. The high variability in the performance of the hardware components introduce imbalance due to heterogeneity. The applications are also becoming more complex resulting in dynamic load imbalance. Load imbalance can result in loss of performance and decrease in system utilization. We address the challenge of handling both transient and persistent load imbalance while maintaining locality and incurring low overhead. In this paper, we propose a new integrated runtime system that combines the Charm++ distributed programming model with concurrent tasks to handle the load imbalance problem. It utilizes an infrequent periodic assignment of work to cores based on load measurement, in combination with user created tasks to handle load imbalance. We integrate OpenMP with Charm++ so as to enable creation of potential tasks via OpenMP’s parallel loop construct. This is not specific to Charm++ and is also available to MPI applications as well through Adaptive MPI implementation. We show the benefit of using this integrated runtime system on three different applications. We show improvements of 2X on ChaNGa on 128K cores and more than 3X on NAMD at 2K cores. We also show the benefit on an MPI application, Kripke, using Adaptive MPI.