Application-specific Topology-aware Mapping for Three Dimensional Topologies
Authors:
Abhinav Bhatele, Laxmikant V. Kale
Parallel Programming Laboratory, Department of Computer Science, University
of Illinois at Urbana-Champaign
Workshop on Large-Scale Parallel Processing (IPDPS), 2008
The fastest supercomputers today such as Blue Gene/L and XT3 are connected by a 3-dimensional torus/mesh interconnect. Applications running on these machines can benefit from topology-awareness while mapping tasks to processors at runtime. By co-locating communicating tasks on nearby processors, the distance traveled by messages and hence the communication traffic can be minimized, thereby reducing communication latency and contention on the network. This paper describes this technique and performance improvements resulting from it in the context of a n-dimensional k-point stencil program. Automated topology-aware mapping by the runtime using similar ideas can relieve the application writer from this burden and result in better performance. Preliminary work towards achieving this for a molecular dynamics application, NAMD, is also presented. Results on up to 32,768 processors of IBM's Blue Gene/L and 2,048 processors of Cray's XT3 support the ideas discussed in the paper.
[html] [postscript] [PDF] [bibtex] [text reference] [presentation]