Topology Aware Task Mapping Techniques: An API and Case Study
Authors:
Abhinav Bhatele, Eric Bohm and Laxmikant V. Kale
Parallel Programming Laboratory, Department of Computer Science, University
of Illinois at Urbana-Champaign
PPL Technical Report, August 2008
Optimal network performance is critical to efficient parallel scaling for communication-bound applications on large machines. With wormhole routing, no-load latencies do not increase significantly with number of hops. Yet, we, and others have recently shown that task mapping strategies on large machines should take the topology of the machine into account. This would reduce communication contention and reduce message latencies in such cases. In this paper, we present a uniform API which obtains topology information on 3D torus machines like Blue Gene and XT. We present techniques to use this API to improve performance. The API can be used by user-level codes to obtain information about allocated partitions at runtime which is essential for mapping. We motivate why is it important to consider network topology using a simple 3D Stencil kernel. We then present mapping strategies for a production code, OpenAtom running on three-dimensional torus and mesh topologies. OpenAtom presents complex communication scenarios of interaction between multiple groups of objects. Results are presented in the context of 3D Stencil and OpenAtom on up to 16,384 processors of Blue Gene/L, 2,048 processors of Cray XT3 and 8,192 processors of Blue Gene/P.