Scaling Collective Multicast on Fat-tree Networks
International Conference on Parallel and Distributed Systems (ICPADS) 2004
Publication Type: Talk
In this talk, we study the all-to-all multicast operation. These strategies need to be different for small and large messages. For small messages, the major issue is the minimization of software overhead, where as for large messages, the issue is network contention. Many modern large parallel computers use the fat-tree interconnection topology. We therefore analyze network contention on fat-tree networks and develop strategies to optimize collective multicast using known contention free communication schedules on fat-tree networks in the design of two novel strategies. We evaluate performance of these strategies with up to 256 nodes (1024 processors) on an alpha cluster. We present schemes that perform well when a contiguous chunk of nodes is not available. For large messages, many of our strategies have two times better throughput than native MPI. We also demonstrate that the software overhead of a collective operation is a small fraction of the total completion time in the presence of the communication co-processor. We therefore compare the performance of the studied strategies using both metrics (i) Completion time, and (ii) Computation overhead.