Topology-Aware Task Mapping for Reducing Communication Contention on Large Parallel Machines
IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2006
Publication Type: Paper
Repository URL: TopoLBPaper
Abstract
Communication latencies constitute a significant factor in the
performance of parallel applications. With techniques such as
wormhole routing, the variation in no-load latencies became
insignificant, i.e., the no-load latencies for far-away processors
were not significantly higher (and too small to matter) than those
for nearby processors. Contention in the network is then left as
the major factor affecting latencies. With networks such as
Fat-Trees of hypercubes, with number of wires growing as P log P,
even this is not a very significant factor. However, for torus and
grid networks now being used in large machines such as BlueGene/L
and the Cray XT3, such contention becomes an issue. We quantify the
effect of this contention with benchmarks that vary the number of
hops traveled by each communicated byte. We then demonstrate a
process mapping strategy that minimizes the impact of topology by
heuristically minimizing the total number of hop-bytes
communicated. This strategy, and its variants, are implemented in
an adaptive runtime system in Charm++ and Adaptive MPI, so it is
available to many applications written using Charm++ as well as
MPI.
TextRef
Tarun Agarwal and Amit Sharma and Laxmikant V. Kale, "Topology-aware task
mapping for reducing communication contention on large parallel machines",
Parallel Programming Laboratory, Department of Computer Science, University of
Illinois at Urbana-Champaign, Proceedings of IEEE International Parallel and
Distributed Processing Symposium 2006, April 2006.
People
Research Areas