Center for Petascale Computing  
A collaboration led by Laxmikant Kalé (Computer Science) and Duane Johnson (Materials Science and Engineering) on a research theme within IACAT

Performance Analysis and Debugging Tools

Projections

The Projections performance analysis framework is unique in that it is aimed at studying applications where Charm++ objects (and AMPI threads) are first class entities, pro- viding deeper insights than a pure message-level MPI-based tool can. Other tools and frameworks like Vampir and Vampir NG, Cray Apprentice2, TAU, svPablo, jumpshot and Kojak are replete with MPI-specific or architecture-neutral features but lack support for the performance semantics suitable for the Charm++ model.

There is ongoing work in improving the scalability of Projections, both in handling the large volume of performance data generated by petascale applications and in presenting the visual information to analysts that will not overwhelm them. Techniques are being developed to pick out a suitable subset of processors for analysis in order to reduce total performance data volume. This makes Projections well-placed as an integrated tool for supporting performance analysis and tuning for the intended applications on extremely large number of processors.

BigSim

It is important to test the application on a "real" petascale environment and un- derstand the performance of this application on future petascale machines even before they are available for performance tuning. Identifying performance issues requires a series of runs on the entire machine, each further clarifying the performance issues, and possibly yielding improvements. However, the parallel machine is a scarce resource, which makes it difficult to work on the performance of the code.

In previous work, we have developed a simulation tool called BigSim which uses a simulation-based approach to surmount these challenges. The basic idea is to combine a parallel emulation and a trace-driven parallel simulation. To do this, we first use an emulator to execute the application in full-scale on an emulation of the target parallel machine, using a much smaller parallel machine. The message-level execution traces obtained during emulation, along with message-dependence information, are then passed to a simulator which also takes the architectural specification of the machine as its input, and produces detailed performance data for the application running on the simulated (target) machine.
 

Investigator:

Performance Analysis and Debugging Tools Information

Projections Information

Projections Manual

BigSim Information

BigSim Manual