next up previous
Next: Performance Up: Big FEM Simulation Previous: FEM Grainsize on Large

Bottlenecks on Large Machines

In summary, there are a number of practical bottlenecks to execution on very large machines. First, large meshes must be generated; this is difficult with today's tools. Second, the meshes must be partitioned for parallel execution. Finally, the resulting computation may still have small grainsize, so messaging performance is important.

Our runs with BigSim also exposed a number of unexpected bottlenecks and limitations to scalability. For example, the serial partitioning library we use consumes memory proportional to the number of output pieces, not the total size of the mesh; so even our 4GB machine ran out of memory when partitioning a relatively small 5M element mesh into more than 16K pieces. Hopefully ParMetis will solve this problem.

Similarly, even though our MPI implementation, AMPI, was designed to be scalable, while trying to simulate very large machines we discovered our implementation used $P^2$ total memory for $P$ processors. The culprit was a simple linear message ordering table kept by each processor, because the table's length was proportional to the number of processors. For today's machines, where $P$=1000, the total amount of memory used was 16MB; but for $P$=100,000, the tables would use 160GB! The solution was to break the tables into (software) pages and only allocate pages when referenced; this dramatically reduces the storage requirements for large machines because most processors only communicate directly with a small subset of other processors.


next up previous
Next: Performance Up: Big FEM Simulation Previous: FEM Grainsize on Large
Gengbin Zheng 2004-01-21