next up previous contents
Next: Conclusions and Future Work Up: Performance Previous: Serial Performance   Contents

Parallel Performance

On ASCI RED, an Intel Paragon machine, we ran a simple scaling benchmark with 65,536 polygons per processor. The wall-clock time per timestep for various numbers of processors is shown in Figure [*]. A program with perfect speedup would have a constant time per step. Instead, we see a slow, logarithmic rise in the time per step due to the $ O(\lg p)$ synchronization overhead.

The smallest run shown, 65,536 triangles on a single processor, takes 0.44 seconds per step. The largest run shown, 65,536 triangles on each of 1,500 processors or 98.3 million triangles, takes 0.73 seconds per step, for a speedup of 915 or a parallel efficiency of 60 percent.

The observed parallel performance is indeed excellent. It also compares quite favorably with the result of 1 second per timestep for 8,000 objects per processor for the parallel RCB scheme described in [#!Hend96!#].

Figure: Time per step for a scaling parallel benchmark, with a fixed 65,536 polygons per processor.
\includegraphics[width=3.7in]{fig/huge_plot.eps}

The parallel implementation also scales down for smaller models and fast response time, such as for interactive applications. 32 processors of a 195 MHz Origin2000 system can handle 300,000 triangles at the good interactive rate of 30 milliseconds per step.


next up previous contents
Next: Conclusions and Future Work Up: Performance Previous: Serial Performance   Contents
Orion Lawlor 2001-08-31