Optimizing a Parallel Runtime System for Multicore Clusters: A Case Study
Publication Type: Paper
Repository URL: 201003_CharmSMPOptimization
Clusters of multicore nodes have become the most popular option for new HPC systems due to their scalability and performance/cost ratio. The complexity of programming multicore systems underscores the need for powerful and efficient runtime systems that manage resources such as threads and communication sub-systems on behalf of the applications. In this paper, we study several multicore performance issues on clusters using Intel, AMD and IBM processors in the context of the Charm++ runtime system. We then present the optimization techniques that overcome these performance issues. The techniques presented are general enough to apply to other runtime systems as well. We demonstrate the benefits of these optimizations through both synthetic benchmarks and production quality applications including NAMD and ChaNGa on several popular multicore platforms. We demonstrate performance improvement of NAMD and ChaNGa by about 20% and 10%, respectively.
Chao Mei and Gengbin Zheng and Filippo Gioachin and Laxmikant V. Kale, "Optimizing a Parallel Runtime System for Multicore Clusters: A Case Study", to appear in Proceedings of Teragrid'10