A uGNI-Based Asynchronous Message-Driven Runtime System for Cray Supercomputers with Gemini Interconnect
IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2012
Publication Type: Paper
Repository URL: papers/201109_Gemini
Gemini as the network for new Cray XE/XT systems features low latency, high bandwidth and strong scalability. Its hardware support for remote direct memory access enables efficient implementation of the global address space programming languages. Although the Generic Network Interface (GNI) is designed to support message-passing applications, it is still challenging to attain good performance for applications written in alternative programming models, such as the message-driven programming model. In our earlier work we showed that CHARM++, an object-oriented message-driven programming model, scales up to the full Jaguar Cray machine. In this paper, we describe a general and light-weight asynchronous Low-level RunTime System (LRTS) for CHARM+, and its implementation on the uGNI software stack for Cray XE systems. Several techniques are presented to exploit the uGNI capability by reducing memory copy and registration overhead, taking advantage of persistent communication, and improving intra-node communication. Our micro-benchmark results demonstrate that the uGNI-based runtime system outperforms the MPI-based implementation by up to 50% in terms of message latency. For communication intensive applications such as N-Queens, this implementation scales up to 15,360 cores of a Cray XE6 machine and is 70% faster than an MPI-based implementation. In molecular dynamics application NAMD, the performance is also considerably improved by as high as 18%.
Yanhua Sun, Gengbin Zheng, L. V. Kale, Terry R. Jones and Ryan Olson, A uGNI-based Asynchronous Message-driven Runtime System for Cray Supercomputers with Gemini Interconnect, Proceedings of 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2012, Shanghai, China
Research Areas