Scaling Hierarchical N-Body Simulations on GPU Clusters
International Conference for High Performance Computing, Networking, Storage and Analysis (SC) 2010
Publication Type: Paper
Repository URL: 2009ChaNGaGPU
Abstract
This paper focuses on the use of clusters of general purpose
graphics processors as offload devices for tree-based N-body
simulations. Whereas the behavior of these hierarchical methods has
been studied in the past on CPU-based architectures, we investigate
key performance issues in the context of clusters of GPUs. These
include kernel organization and efficiency, the balance between
tree traversal and force computation work, grain size selection
through the tuning of offloaded work request sizes, and the
reduction of sequential bottlenecks. The effects of various
application parameters are studied and experiments are carried out
to quantify gains in performance. Our studies are carried out in
the context of a production-quality parallel cosmological simulator
called ChaNGa. We highlight the re-engineering of the application
to make it more suitable for GPU-based environments. Finally, we
present scaling performance results from experiments on the NCSA's
Lincoln GPU cluster.
TextRef
Pritish Jetley, Lukasz Wesolowski, Filippo Gioachin, Laxmikant V. Kalé and Thomas R. Quinn, "Scaling Hierarchical N-Body Simulations on GPU Clusters", Proceedings of the ACM/IEEE Supercomputing Conference 2010.
People
- Pritish Jetley
- Lukasz Wesolowski
- Filippo Gioachin
- Laxmikant Kale
- Thomas Quinn
Research Areas