Consider a high-resolution fracture dynamics simulation of a block of metal. We first decompose the block of metal into ``elements'', which are small pieces of the domain with a simple shape, often tetrahedra or cubes. In theory, the number of elements is determined only by the physical fidelity we wish to achieve, but in practice the number of elements, and hence the accuracy of the simulation, is often limited by the memory and speed of the machine, as summarized in Table 1.
For example, a 1-meter cube of metal discretized into a 1cm scale mesh will require one million elements. But 1cm is quite coarse; if we need 1mm resolution the mesh will have one billion elements. If elements require 40 bytes each, such a mesh would require 40 GB of storage. This is larger than current serial machines can handle, but is plausible even on today's parallel machines.
Once a mesh is generated, it must be partitioned, and the pieces sent to different processors for parallel execution. The FEM framework currently uses the serial Metis partitioning library, so the partitioning is performed completely on one processor, which becomes a bottleneck for large meshes. We are working on integrating the parallel ParMetis partitioning package to avoid the serial mesh partitioning bottleneck, which should allow us to use larger meshes and scalably partition the mesh. An alternative approach is to use a simpler but inaccurate mesh partitioner such as geometric recursive coordinate bisection, then fix the resulting load imbalance using our load balancing framework.