Improving Communication Performance in Dense Linear Algebra via Topology Aware Collectives
| Edgar Solomonik | Abhinav Bhatele | James Demmel
International Conference for High Performance Computing, Networking, Storage and Analysis (SC) 2011
Publication Type: Paper
Repository URL:
Abstract
Recent results have shown that topology aware mapping reduces network contention in communication-intensive kernels on massively parallel machines. We demonstrate that on mesh interconnects, topology aware mapping allows for utilization of highly-efficient topology aware collectives. We map novel 2.5D dense linear algebra algorithms to cuboid partitions allocated by a Blue Gene/P supercomputer. Our mappings allow the algorithms to exploit optimized line multicasts and reductions. Commonly used 2D algorithms cannot be mapped in this fashion. On 65,536 cores of Blue Gene/P, 2.5D algorithms with rectangular collectives are 2.6x and 2.7x faster for matrix multiply and LU factorization, respectively. For LU, communication time drops by up to 92%. We derive a novel performance model based on the LogP model for rectangular broadcasts and reductions. We model performance on a hypothetical exascale architecture. Our study evaluates the benefits of topology aware collectives for high performance algorithms.
TextRef
Edgar Solomonik, Abhinav Bhatele, James Demmel, Improving communication performance in dense linear algebra via topology aware collectives, International Conference for High Performance Computing, Networking, Storage and Analysis (Supercomputing) 2011
People
Research Areas