Parallel Programming Laboratory

Scaling Collective Multicast on Fat-tree Networks

| Sameer Kumar

International Conference on Parallel and Distributed Systems (ICPADS) 2004

Publication Type: Talk

Repository URL:

Download: [PPT]

Summary

In this talk, we study the all-to-all multicast operation. These strategies need to be different for small and large messages. For small messages, the major issue is the minimization of software overhead, where as for large messages, the issue is network contention. Many modern large parallel computers use the fat-tree interconnection topology. We therefore analyze network contention on fat-tree networks and develop strategies to optimize collective multicast using known contention free communication schedules on fat-tree networks in the design of two novel strategies. We evaluate performance of these strategies with up to 256 nodes (1024 processors) on an alpha cluster. We present schemes that perform well when a contiguous chunk of nodes is not available. For large messages, many of our strategies have two times better throughput than native MPI. We also demonstrate that the software overhead of a collective operation is a small fraction of the total completion time in the presence of the communication co-processor. We therefore compare the performance of the studied strategies using both metrics (i) Completion time, and (ii) Computation overhead.

People

Sameer Kumar

Research Areas

Live Webcast 15th Annual Charm++ Workshop