Handling Transient and Persistent Imbalance Together in Distributed and Shared Memory
IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2018
Publication Type: Paper
Repository URL: https://charm.cs.illinois.edu/gerrit/#/admin/projects/papers/201603_NodeLevel
The recent trend of rapidly increasing numbers of cores per chip has resulted in vast amounts of on-node parallelism. These high core counts result in hardware variability that introduces imbalance. Applications are also becoming more complex, resulting in dynamic load imbalance. Load imbalance of any kind can result in loss of performance and decrease in system utilization. We address the challenge of handling both transient and persistent load imbalances while maintaining locality and incurring low overhead. In this paper, we propose an integrated runtime system that combines the Charm++ distributed programming model with concurrent tasks to mitigate load imbalances within and across shared memory address spaces. It utilizes an infrequent periodic assignment of work to cores based on load measurement, in combination with user created tasks to handle load imbalance. We integrate OpenMP with Charm++ to enable creation of potential tasks via OpenMP's parallel loop construct. This is not specific to Charm++ and is also available to MPI applications through the Adaptive MPI implementation. We demonstrate the benefits of this integrated runtime system on three different applications. We show improvement of Lassen around 29.6% on Cori and 46.5% on Theta. We also demonstrate the benefits on a Charm++application, ChaNGa, by 25.7% on Theta, as well as an MPI proxy application, Kripke, using Adaptive MPI.
Research Areas