next up previous
Next: Thread Migration and Load Up: Performance Previous: Minimal Context Switching

Application -- Parallel Simulator

BigSim [43,44] is a parallel simulator developed on top of our Charm++ runtime system. It is capable of predicting performance of parallel applications on a massively parallel machine with petascale performance using an existing parallel machine with only hundreds of processors even before the target machine is built. Such simulation requires that one physical processor to simulate hundreds or even thousands of processors of the simulated machine, hence creating the scenarios of running multiple flows-of-control, one for simulating each target processor, on a simulating processor.

In a typical simulation, we simulated a Blue Gene like machine with 200,000 processors running a molecular dynamics (MD) simulation code. Running the test on 4 processors requires that each processor simulate 50,000 separate target processors, which clearly is not feasible on most machines using either processes or kernel threads. But by using user-level threads, we were able to simulate 50,000 target processors using 50,000 user-level threads on just one real processor.

Figure 11: Simulation time per step using a total of 200,000 user-level threads
\includegraphics[width=3in]{fig/emulator}

Figure 11 illustrates the performance of BigSim using Cth, Converse user-level threads. The test was run on LeMieux which is 750 Quad AlphaServer ES45 node machine at Pittsburgh Supercomputing Center (PSC). Each node is a 4 processor SMP, with 4 Gbytes of memory. We measured the time taken to simulate one timestep of the MD simulation using 4 to 64 LeMieux processors. The figure demonstrates excellent scalability of the simulator in this test.


next up previous
Next: Thread Migration and Load Up: Performance Previous: Minimal Context Switching
Gengbin Zheng 2006-03-18