Scalable Algorithms for Constructing Balanced Spanning Trees on System-ranked Process Groups
European MPI Users' Group Meeting (EuroMPI) 2012
Publication Type: Paper
Repository URL: charmgit:papers/201203_DistributedST
Current implementations of process groups (subcommunicators) have non-scalable (O(group size)) memory footprints and even worse time complexities for setting up communication. We propose system-ranked process groups, where member ranks are picked by the runtime system, as a cheaper and faster alternative for a subset of collective operations (barrier, broadcast, reduction, allreduce). This paper presents two distributed algorithms for spanning tree construction over system-ranked process groups obtained by splitting a parent group. Our schemes have much smaller memory footprints and also perform better, even at modest process counts. We demonstrate performance results up to 131, 072 cores of BlueGene/P.