Predicting Application Performance using Supervised Learning on Communication Features
International Conference for High Performance Computing, Networking, Storage and Analysis (SC) 2013
Publication Type: Paper
Repository URL:
Abstract
Task mapping on torus networks has traditionally focused on either reducing the
maximum dilation or average number of hops per byte for messages in an
application. These metrics make simplified assumptions about the cause of
network congestion and do not provide a perfect correlation with execution
time. Hence, these metrics, when derived offline for different mappings using
simulations, cannot be used to reasonably predict or compare application
performance for different mappings. In this paper, we attempt to model the
performance of an application by using communication data, such as the
communication graph and network hardware counters. We use supervised learning
algorithms, such as forests of randomized decision trees, to correlate
performance with prior and new metrics and their combinations. We propose new
hybrid metrics that provide high correlation with application performance. For
three different communication patterns and a production application, we
demonstrate a very strong correlation between the new proposed metrics and the
execution time of these codes.
TextRef
Nikhil Jain, Abhinav Bhatele, Michael P. Robson, Todd Gamblin, and Laxmikant V. Kale. Predicting application performance using supervised learning on communication features. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '13. IEEE Computer Society, November 2013 (to appear). LLNL-CONF-635857.
People
- Nikhil Jain
- Abhinav Bhatele
- Michael Robson
- Todd Gamblin
- Laxmikant Kale
Research Areas