To further improve the accuracy of performance prediction, it is necessary to accurately predict the timings of sequential fragments of codes, with instruction level accuracy. We are exploring the idea of incorporating an architecture simulation of multi-processor nodes via RSIM [14] simulation infrastructure, and its enhancements.
In the context of such large-scale simulations, it is quite challenging to exploit such a hardware simulator in BigSim. This is because detailed microarchitecture-level simulation runs an order of magnitude slower than other multiprocessor simulations that do not model the processor in detail.
To overcome the speed penalty, we have enhanced RSIM with a fast functional simulator called Rabbit. RSIM is used in Rabbit mode to accelerate initialization and other portions of the code where it is not important to collect timing statistics. The performance results show that the acceleration provided by ``Rabbit Mode'' is quite significant[14].
The second enhancement to RSIM models on-chip multithreading which is likely to be a key feature of future high-performance chips. Capabilities of simulating chip level multiprocessors has also been added to RSIM. We have an initial version of RSIM that supports detection of application phases [15], allowing it to predict performance of some phases without having to simulate them in detail.
With these enhancements to RSIM, we are integrating RSIM into BigSim to improve the simulation accuracy to the instruction level.