next up previous
Next: Conclusion Up: Performance Previous: Application Parallel

Thread Migration and Load Balancing

The ``Multi-Zone'' NAS Parallel Benchmark [18] is an extension to the well-known NPB suite. It involves solving the application benchmarks LU, BT and SP on various collections of loosely coupled discrete meshes. It is characterized by partitioning the problems on a coarse-grain level to expose more parallelism and to stress the need for load balancing. Among these tests, BT-MZ creates the most dramatic load imbalance, which is used in our test runs.

We ran the BT-MZ benchmark with Adaptive MPI and used thread migration for the load balancing. The migratable threads use the isomalloc and swap-global mechanisms (Section 3.4.2) to allow transparent thread migration without having to change any of the benchmark code. In order for load balancing to be effective, AMPI requires the number of AMPI migratable threads to be much larger than the actual number of processors, so that AMPI threads can migrate from overloaded processors to underloaded ones to improve load balance.

Figure 12: The NAS BT-MZ benchmark with and without thread migration for automatic parallel load balancing.
\includegraphics[width=3.3in]{fig/bt-mz}

These tests were run on the Tungsten Xeon Linux cluster at NCSA. This cluster is based on Dell PowerEdge 1750 servers, each with two Intel Xeon 3.2 GHz processors, running Red Hat Linux and Myrinet interconnect network. Figure 12 shows the total execution time with various configurations of BT-MZ with vs. without load balancing. The x-axis represents each test case. For example, ``A.8,4PE'' indicates that the BT-MZ is compiled with CLASS=A and NPROCS=8 running with 8 AMPI threads but on 4 actual processors. Note that same class (A, B, etc) problems have same problem size. Indeed, for all three class B tests on 8 processors (B.16,8BE, B.32,8PE and B.64,8PE), the execution times after load balancing are about the same, while there is a dramatic variation in execution times before load balancing. This benchmark demonstrates the effect of load balancing that is made feasible via thread migration.


next up previous
Next: Conclusion Up: Performance Previous: Application Parallel
Gengbin Zheng 2006-03-18