next up previous contents
Next: Performance Up: Parallel Implementation Previous: Parallel Details   Contents

Load Balancing

The CHARM++ automatic load balancer [#!AdaptLBSFMPC99!#] measures the load and communication patterns of each parallel object. It then uses a load balancing ``strategy''3.3 to determine where each object should be migrated for optimal performance. Performing load balancing automatically at run-time enables an application to react quickly to irregularity in the problem structure.

In fact, since collision detection is often only a small part of a much larger parallel program, perfect load balance for the collision library is actually not necessary. As long as the program is structured to allow the execution of different libraries to interleave (as is always the case in CHARM++), different processors may have different amounts of collision work yet all have the same amount of work overall.

Consider the work done by a parallel machine during an unsynchronized period- that is, between two global synchronizations. In symbols, let $ A_i$ and $ B_i$ represent the amount of work needed by two libraries $ A$ and $ B$ on processor $ i$. Then for load balance we require that each processor have the same amount of work overall, i.e.:

$\displaystyle \forall_{i,j}{\; A_i+B_i = A_j+B_j}$ (3.1)

This is often ensured by requiring that every library impose the same amount of work on every processor, or:

$\displaystyle \forall_{i,j}{\; A_i=A_j \wedge B_i=B_j}$ (3.2)

It is clear that [*] implies [*], but [*] does not imply [*]. That is, library load balance is sufficient but not necessary for global load balance. Thus the usual parallel programming approach of load balancing every library is not necessary. A load balancer can use the extra degrees of freedom obtained by replacing [*] with [*] to minimize communication overhead.

For the voxel method, the CHARM++ automatic load balancer is free to migrate both the contributors, which do work before the synchronization; and the voxels, which do work afterwards. Hence the load balancer can balance the collision algorithm in the context of the larger program.


next up previous contents
Next: Performance Up: Parallel Implementation Previous: Parallel Details   Contents
Orion Lawlor 2001-08-31