Live Webcast 15th Annual Charm++ Workshop

-->
ACM SRC: Fast Profiling-based Performance Modeling of Distributed GPU Applications
International Conference for High Performance Computing, Networking, Storage and Analysis (SC) 2019
Publication Type: Poster
Repository URL: https://sc19.supercomputing.org/presentation/?id=spostg126&sess=sess240
Thumbnail
TextRef
An increasing number of applications utilize GPUs to accelerate computation, with MPI responsible for communication in distributed environments. Existing performance models only focus on either modeling GPU kernels or MPI communication; few that do model the entire application are often too specialized for a single application and require extensive input from the programmer. To be able to quickly model different types of distributed GPU applications, we propose a profiling-based methodology for creating performance models. We build upon the roofline performance model for GPU kernels and analytical models for MPI communication, with a significant reduction in profiling time. We also develop a benchmark to model 3D halo exchange that occurs in many scientific applications. Our proposed model for the main iteration loops of MiniFE achieves 6-7% prediction error on LLNL Lassen and 1-2% error on PSC Bridges, with minimal code inspection required to model MPI communication.
People
Research Areas