Abhinav Bhatelé ![[*]](/usr/share/latex2html/icons/footnote.png)
Department of Computer Science
University of Illinois at Urbana-Champaign
Urbana, Illinois 61801, USA
Laxmikant V. Kalé
Department of Computer Science
University of Illinois at Urbana-Champaign
Urbana, Illinois 61801, USA
The fastest supercomputers today such as Blue Gene/L, Blue Gene/P, Cray XT3 and
XT4 are connected by a three-dimensional torus/mesh interconnect. Applications
running on these machines can benefit from topology-awareness while mapping
tasks to processors at runtime. By co-locating communicating tasks on nearby
processors, the distance traveled by messages and hence the communication
traffic can be minimized, thereby reducing communication latency and contention
on the network. This paper describes preliminary work utilizing this technique
and performance improvements resulting from it in the context of a
n-dimensional k-point stencil program. It shows that even for simple
benchmarks, topology-aware mapping can have a significant impact on
performance. Automated topology-aware mapping by the runtime using similar
ideas can relieve the application writer from this burden and result in better
performance. Preliminary work towards achieving this for a molecular dynamics
application, NAMD, is also presented. Results on up to

processors of
IBM's Blue Gene/L,

processors of IBM's Blue Gene/P and

processors of Cray's XT3 support the ideas discussed in the paper.