A Case Study of Communication Optimizations on 3D Mesh Interconnects
International European Conference on Parallel and Distributed Computing (Euro-Par) 2009
Publication Type: Paper
Repository URL: 200801_LeanCPTopo
Optimal network performance is critical to efficient parallel scaling for communication-bound applications on large machines. With wormhole routing, no-load latencies do not increase significantly with number of hops. Yet, we, and others have recently shown that task mapping strategies on large machines should take the topology of the machine into account. This would reduce communication contention and reduce message latencies in such cases. In this paper, we present a uniform API which obtains topology information on 3D torus machines like Blue Gene and XT. We present techniques to use this API to improve performance. The API can be used by user-level codes to obtain information about allocated partitions at runtime which is essential for mapping.

We motivate why is it important to consider network topology using a simple 3D Stencil kernel. We then present mapping strategies for a production code, OpenAtom running on three-dimensional torus and mesh topologies. OpenAtom presents complex communication scenarios of interaction between multiple groups of objects. Results are presented in the context of 3D Stencil and OpenAtom on up to 16,384 processors of Blue Gene/L, 2,048 processors of Cray XT3 and 8,192 processors of Blue Gene/P.
Abhinav Bhatele, Eric Bohm, Laxmikant V. Kale, "A Case Study of Communication Optimizations on 3D Mesh Interconnects", Proceedings of Euro-Par (Topic 13 - High Performance Networks), 2009
Research Areas