Scaling Collective Multicast on Fat-tree Networks
International Conference on Parallel and Distributed Systems (ICPADS) 2003
Publication Type: Paper
Collective communication operations can be a serious performance impediment, as naive strategies for collective communication do not scale to a large number of processors. In this paper, we study the all-to-all multicast operation. We present optimization strategies for all-to-all multicast and performance studies of those strategies. These strategies need to be different for small and large messages. For small messages, the major issue is the minimization of software overhead. This can be achieved by message combining. For large messages, the issue is network contention, which can be reduced by intelligent topology dependent message sequencing. We optimize these strategies for fat-tree networks. Many modern large parallel computers use the fat-tree interconnection topology. We therefore thoroughly analyze network contention on fat-tree networks. Certain communication schedules are contention free on such networks. We make use of such communication schedules in the design of a novel set of strategies. We evaluate performance of the resultant strategies for collective multicast on up to 256 nodes (1024 processors) on Lemieux. We also demonstrate that the software overhead of a collective operation is a small fraction of the total completion time. This is because modern network interfaces have a communication co-processor that performs message management through zero copy remote DMA operations. We therefore compare the performance of the studied strategies using two metrics, namely (i) Completion time, and (ii) Computation overhead.
Sameer Kumar and L. V. Kale, "Scaling Collective Multicast on Fat-tree Networks", ICPADS, Newport Beach, CA, July 2004.