| Introduction
|
Recent years have seen the emergence of large supercomputers consisting of
tens of thousands of processors. The most scalable and fastest among these are
connected by a three-dimensional torus or a fat-tree. Number of links (hops)
traversed by a message has a direct effect on the time required to reach the
destination for such machines. Dedicated resources for communication like
communication co-processors and DMA engines have led to higher injection rates
than the network links can handle. Bandwidth congestion in such cases can
significantly increase message latencies, again depending on the number of
links traversed by the message. For large parallel machines with a significant
diameter, this can become a serious performance bottleneck. Traditionally,
application developers have neglected this fact because of the advantages of
wormhole routing for most message sizes on small machines. This might not be
true any longer due to the large diameters of machines and small messages
resulting from fine-grained parallelization.
We propose to minimize communication traffic and hence bandwidth congestion on
the network by topology-aware mapping of tasks in an application. By placing
communication tasks on processors which are in physical proximity on the
network, communication can be restricted to near neighbors. This reduces link
sharing among messages and leads to a better utilization of the available
bandwidth. Our aim is to minimize the hop-bytes which is the product of
the message size and the number of hops between the source and destination.
This can minimize the communication time and hence lead to significant
speed-ups for parallel applications and also remove scaling bottlenecks in some
cases.
This research involves a study of the communication
characteristics of different networks and quantification of message latencies,
both in absence and presence of contention. The aim is to evaluate the
dependence of message latencies on the distance traversed in different
scenarios. This will enhance our understanding of the reasons for contention
for network resources and will benefit the development of topology-aware
mapping algorithms. The other part of this research involves developing a
general automatic topology-aware mapping framework which takes the task graph
and processor graph as input, and outputs near-optimal mapping solutions.
Topology-aware mapping can be reduced to the graph embedding problem which is
NP-hard. Hence, the framework will employ heuristics depending on the
communication scenario to arrive at intelligent solutions.
|
| Study of Interconnects
|
A detailed study of the message latencies and effects of contention on
latencies on different parallel machines was required to understand the
characteristics of different machines better.
We have developed a set of benchmarks which quantify the message latencies in
absence and presence of contention as a function of the number of hops (links)
the messages traverse. More information about these benchmarks and their
results on IBM's Blue Gene/L, IBM's Blue Gene/P, Cray's XT3 and Cray's XT4 can
be found here.
|
| Application-specific Topology-aware Mapping
|
|
| Automatic Mapping Framework
|
|
|
- 08-15
Abhinav Bhatele, Laxmikant V. Kale, Dynamic Topology Aware Load Balancing Algorithms for MD Applications, submitted to Philosophical Transactions of the Royal Society A, 2008
- 08-09
Abhinav Bhatele, Laxmikant V. Kale, Benefits of Topology Aware Mapping for Mesh Interconnects, submitted to Parallel Processing Letters (LSPP special issue), 2008
- 08-07
Abhinav Bhatele and Laxmikant V. Kale, An Evaluation of the Effect of Interconnect Topologies on Message Latencies in Large Supercomputers, PPL Technical Report, May 2008
- 08-06
Abhinav Bhatele, Eric Bohm and Laxmikant V. Kale, Topology Aware Task Mapping Techniques: An API and Case Study, PPL Technical Report, August 2008
- 08-02
Abhinav Bhatele, Laxmikant V. Kale, Application-specific Topology-aware Mapping for Three Dimensional Topologies, Proceedings of Workshop on Large-Scale Parallel Processing (held as part of IPDPS '08), 2008
- 07-12
Abhinav Bhatele, Application-specific Topology-aware Mapping and Load Balancing for three-dimensional Torus Topologies, Master's Thesis, Department of Computer Science, University of Illinois, 2007
|