Architecture for supporting Hardware Collectives in Output-Queued High-Radix Routers
| Sameer Kumar | Laxmikant Kale | Craig Stunkel
IEEE International Conference on High Performance Computing (HiPC) 2005
Publication Type: Paper
Repository URL:
Collective communication performance is critical for many applications. In this paper, we present an architecture to efficiently support collective operations (like multicasts and reductions) in the switches of parallel computer interconnects. We present an output queuing switch architecture with cross-point buffering. Output queuing architectures have been less popular in the past as they require more internal speedup and buffering. However, with current technology it is straightforward to build output-queued switches. We demonstrate in this paper that output-queued architectures make multicasts and reductions fairly easy to implement efficiently. We show the scalability of our schemes to a large number of switch ports. We present performance of multicasts and reductions on individual switches and networks of switches. We assume a fat-tree topology for the networks of switches. We also present simulation results based on synthetic workloads that emulate a molecular dynamics application.
Sameer Kumar and Laxmikant V. Kale and Craig Stunkel, "Architecture for supporting Hardware Collectives in Output-Queued High-Radix Routers", Parallel Programming Laboratory, Department of Computer Science, University of Illinois at Urbana-Champaign, March 2005.
Research Areas