Topology Aware Task Mapping Techniques: An API and Case Study

PPL Paper Number: 08-06
PPL CVS: 200801_LeanCPTopo

Authors:
Abhinav Bhatele, Eric Bohm and Laxmikant V. Kale
Parallel Programming Laboratory, Department of Computer Science, University of Illinois at Urbana-Champaign

PPL Technical Report, August 2008


Abstract

Optimal network performance is critical to efficient parallel scaling for communication-bound applications on large machines. With wormhole routing, no-load latencies do not increase significantly with number of hops. Yet, we, and others have recently shown that task mapping strategies on large machines should take the topology of the machine into account. This would reduce communication contention and reduce message latencies in such cases. In this paper, we present a uniform API which obtains topology information on 3D torus machines like Blue Gene and XT. We present techniques to use this API to improve performance. The API can be used by user-level codes to obtain information about allocated partitions at runtime which is essential for mapping. We motivate why is it important to consider network topology using a simple 3D Stencil kernel. We then present mapping strategies for a production code, OpenAtom running on three-dimensional torus and mesh topologies. OpenAtom presents complex communication scenarios of interaction between multiple groups of objects. Results are presented in the context of 3D Stencil and OpenAtom on up to 16,384 processors of Blue Gene/L, 2,048 processors of Cray XT3 and 8,192 processors of Blue Gene/P.


[PDF] [bibtex] [text reference]