
| ||||||
The Projections Performance Analysis FrameworkAn Introduction to ProjectionsThe significant gap between peak and realized performance of parallel machines motivates the need for effective performance analysis and tuning of applications running on those machines. To this end, we have developed a framework for performance analysis and visualization called Projections for Charm++. Performance InstrumentationThe Charm++ Runtime system provides, to Projections' instrumentation component, the ability to record detailed performance information about events as an application is executed. Examples of these events are the start and end of Charm++ entry methods and message sends. This data is recorded on per-processor log buffers and written as log files at the end of the application. These log files are then used for post-mortem performance analysis through the visualization component of Projections. This instrumentation is provided automatically whenever the application is linked with Projections' tracing modules by the application developer. We also provide various runtime options and APIs to allow the user to flexibly control the intrusiveness, size of data collection as well as the resolution of performance data collected, from full event traces to a summary profile of entry method utilization. Performance Visualization and AnalysisIn its current form, the visualization component of Projections relies on manual analysis by the user. It is implemented in Java and provides support of the analysis through useful application views and abstractions like utilization graphs, histograms and event timelines. Performance analysis is human-centric. This illustrated below from the figures 1a to 1c: From visual distillations of overall application performance characteristics, the analyst employs a mixture of application domain knowledge and experience with visual cues expressed through Projections in order to identify general areas (e.g. over a set of processors and time intervals) of potential performance problems. The analyst then zooms in for more detail and/or seeks additional perspectives through the aggregation of information across data dimensions (e.g. processors). The same process is repeated, usually with higher levels of detail, as the analyst hones in on a problem or zooms into another area to correlate problems. The richness of information coupled with the tool's ability to provide relevant visual cues contribute greatly to the efficacy of this analysis process. As shown above, the Overview (Figure 1a) gives the user a general picture of application behavior in terms of utilization across processors and over time. The Time Profile (Figure 1b) provides a breakdown of entry method activity over time, summed across all processors, effectively providing another, more detailed, perspective of the data provided by Overview. The Timeline (Figure 1c) offers the most detailed look into exactly what performance events occurred on each selected processor, allowing the examination of causal effects and other runtime information. Other examples of the views offered include: the Usage Profile (Figure 2); which reveals information about the overall workloads across processors over a specified time range and is particularly useful in identifying Charm++ events that contribute to computational load imbalance in the program.
Keeping Performance Analysis EffectiveIssues and MotivationIn general, the analysis and subsequent tuning of an application is a non-trivial task for the analyst/developer. It is time-consuming and as an application scales to larger numbers of processors, running larger simulations, the problem of locating performance bottlenecks and problems can potentially be intractable. This is due to the growth in the volume of performance data the above-mentioned scaling inevitably produces. The consequences are twofold: the performance tool must read and process much more data, hence taking even more time and reducing responsiveness; and the performance information presented to the analyst visually can quickly become overwhelming. Our current research efforts in performance tools are directed to face these challenges in order to maintain Projections as an effective and useful tool.Automating Performance Problem DiscoveryWe have been developing ways to help automate the discovery of performance bottleneck for analysts and quickly presenting this information visually via the Projections visualization tool.One of these ways is through our NoiseMiner tool where we automatically locate precise sections of the performance space where unusually long (in regards to the rest of application activity) time durations are spent(Figure 3). Such long events may be symptoms of operating system interference, software interference, or computational noise. The analyst may then browse these sections of performance space in mini-timelines(Figure 4).
Performance Tool ScalabilityAs applications scale to handling larger datasets to be run on larger processor counts, the volume of performance data grows significantly. As such, for performance tools to remain effective and relevant, the scalability of the performance analysis process has to be addressed. Currently, we are pursuing research and development in the following directions of scalability:
| ||||||
| People | ||||||
| Papers | ||||||
| ||||||
| Related Links | ||||||
|