Optimizing Fine-grained Communication in a Biomolecular Simulation Application on Cray XK6
International Conference for High Performance Computing, Networking, Storage and Analysis (SC) 2012
Publication Type: Paper
Repository URL:
Abstract
Achieving good scaling for fine-grained communication intensive applications on
modern supercomputers remains challenging. In our previous work, we have shown
that such an application --- NAMD --- scales well on the full Jaguar XT5
without long-range interactions; Yet, with them, the speedup falters beyond 64K
cores. Although the new Gemini interconnect on Cray XK6 has improved network
performance, the challenges remain, and are likely to remain for other such
networks as well. We analyze communication bottlenecks in NAMD and its CHARM++
runtime, using the Projections performance analysis tool. Based on the
analysis, we optimize the runtime, built on the uGNI library for Gemini. We
present several techniques to improve the fine-grained communication.
Consequently, the performance of running 92224-atom Apoa1 on GPUs is improved
by 36%. For 100-million-atom STMV, we improve upon the prior Jaguar XT5
result of 26 ms/step to 13 ms/step using 298,992 cores on Titan XK6.
People
- Yanhua Sun
- Gengbin Zheng
- Chao Mei
- Eric Bohm
- James Phillips
- Terry Jones
- Laxmikant Kale
Research Areas