The CHARM++ runtime framework includes an automated, run-time load balancer, which will automatically monitor the performance of your parallel program. If needed, the load balancer can ``migrate'' mesh chunks from heavily-loaded processors to more lightly-loaded processors, improving the load balance and speeding up the program. For this to be useful, pass the +vpN argument with a larger number of blocks N than processors Because this is somewhat involved, you may refrain from calling MBLK_Migrate and migration will never take place.
The runtime system can automatically move your thread stack to the new processor, but you must write a PUP function to move any global or heap-allocated data to the new processor (global data is declared at file scope or static in C and COMMON in Fortran77; heap allocated data comes from C malloc, C++ new, or Fortran90 ALLOCATE). A PUP (Pack/UnPack) function performs both packing (converting heap data into a message) and unpacking (converting a message back into heap data). All your global and heap data must be collected into a single block (struct in C; user-defined TYPE in Fortran) so the PUP function can access it all.
Your PUP function will be passed a pointer to your heap data block and a special handle called a ``pupper'', which contains the network message to be sent. Your PUP function returns a pointer to your heap data block. In a PUP function, you pass all your heap data to routines named pup_type, where type is either a basic type (such as int, char, float, or double) or an array type (as before, but with a ``s'' suffix). Depending on the direction of packing, the pupper will either read from or write to the values you pass- normally, you shouldn't even know which. The only time you need to know the direction is when you are leaving a processor or just arriving. Correspondingly, the pupper passed to you may be deleting (indicating that you are leaving the processor, and should delete your heap storage after packing), unpacking (indicating you've just arrived on a processor, and should allocate your heap storage before unpacking), or neither (indicating the system is merely sizing a buffer, or checkpointing your values).
PUP functions are much easier to write than explain- a simple C heap block and the corresponding PUP function is:
This single PUP function can be used to copy the my_block data into a message buffer and free the old heap storage (deleting pupper); allocate storage on the new processor and copy the message data back (unpacking pupper); or save the heap data for debugging or checkpointing.
A Fortran block TYPE and corresponding PUP routine is as follows:
int MBLK_Register(void *block, MBLK_PupFn pup_ud, int* rid)
subroutine MBLK_Register(block,pup_ud, rid)
integer, intent(out)::rid
TYPE(varies), POINTER :: block
SUBROUTINE :: pup_ud
Associates the given data block and PUP function. Returns a block
ID, which can be passed to MBLK_Get_registered later. Can only be
called from driver. It returns MBLK_SUCESS if the call was successful
and MBLK_FAILURE in case of error. For the declarations above, you call
MBLK_Register as:
Note that Fortran blocks must be allocated on the stack in driver;
while C/C++ blocks may be allocated on the heap.
void MBLK_Migrate()
subroutine MBLK_Migrate()
Informs the load balancing system that you are ready to be
migrated, if needed. If the system decides to migrate you, the
PUP function passed to MBLK_Register will be called with a sizing
pupper, then a packing, deleting pupper. Your stack (and pupped
data) will then be sent to the destination machine, where your PUP
function will be called with an unpacking pupper. MBLK_Migrate
will then return, whereupon you should call MBLK_Get_registered to
get your unpacked data block. Can only be called from driver.
int MBLK_Get_Userdata(int n, void** block)
Return your unpacked userdata after migration- that is, the
return value of the unpacking call to your PUP function. Takes
the userdata ID returned by MBLK_Register. Can be called from
driver at any time.
Since Fortran blocks are always allocated on the stack, the system migrates them to the same location on the new processor, so no Get_registered call is needed from Fortran.
January 17, 2008
MBlock Homepage
Charm Homepage