Chee Wai Lee, Celso Mendes and Laxmikant V. Kalé
Department of Computer Science
University of Illinois at Urbana-Champaign
Performance analysis tools based on event tracing are important for
understanding the complex computational activities and communication
patterns in high performance applications. The purpose of these tools
is to help applications scale well to large numbers of
processors. However, the tools themselves have to be scalable. As
application problem sizes grow larger to exploit larger machines, the
volume of performance trace data generated becomes unmanagable
especially as we scale to tens of thousands of
processors. Simultaneously, at analysis time, the amount of
information that has to be presented to a human analyst can also
become overwhelming.
This paper investigates the effectiveness of employing heuristics and
clustering techniques in a scalability framework to determine a subset
of processors whose detailed event traces should be retained. It is a
form of compression where we retain information from processors with
high signal content.
We quantify the
reduction in the volume of performance trace data generated
by NAMD, a molecular dynamics simulation application implemented
using CHARM++.
We show that, for the known performance problem of poor application
grainsize, the quality of the trace data preserved by this approach is
sufficient to highlight the problem.