A Case Study of Communication Optimizations on 3D Mesh Interconnects
Authors:
Abhinav Bhatele, Eric Bohm and Laxmikant V. Kale
Parallel Programming Laboratory, Department of Computer Science, University
of Illinois at Urbana-Champaign
To appear in Proceedings of Euro-Par (Topic 13 - High Performance Networks), 2009
Optimal network performance is critical to efficient parallel scaling for
communication-bound applications on large machines. With wormhole routing,
no-load latencies do not increase significantly with number of hops. Yet, we,
and others have recently shown that task mapping strategies on large machines
should take the topology of the machine into account. This would reduce
communication contention and reduce message latencies in such cases. In this
paper, we present a uniform API which obtains topology information on 3D torus
machines like Blue Gene and XT. We present techniques to use this API to
improve performance. The API can be used by user-level codes to obtain
information about allocated partitions at runtime which is essential for
mapping.
We motivate why is it important to consider network topology using a simple 3D
Stencil kernel. We then present mapping strategies for a production code,
OpenAtom running on three-dimensional torus and mesh topologies. OpenAtom
presents complex communication scenarios of interaction between multiple groups
of objects. Results are presented in the context of 3D Stencil and OpenAtom on
up to 16,384 processors of Blue Gene/L, 2,048 processors of Cray XT3 and 8,192
processors of Blue Gene/P.
[html] [postscript] [PDF] [bibtex] [text reference] [presentation]