7.5 Migration

The CHARM++ runtime framework includes an automated, run-time load balancer, which will automatically monitor the performance of your parallel program. If needed, the load balancer can ``migrate'' mesh chunks from heavily-loaded processors to more lightly-loaded processors, improving the load balance and speeding up the program. For this to be useful, pass the +vpN argument with a larger number of blocks N than processors Because this is somewhat involved, you may refrain from calling MBLK_Migrate and migration will never take place.

The runtime system can automatically move your thread stack to the new processor, but you must write a PUP function to move any global or heap-allocated data to the new processor (global data is declared at file scope or static in C and COMMON in Fortran77; heap allocated data comes from C malloc, C++ new, or Fortran90 ALLOCATE). A PUP (Pack/UnPack) function performs both packing (converting heap data into a message) and unpacking (converting a message back into heap data). All your global and heap data must be collected into a single block (struct in C; user-defined TYPE in Fortran) so the PUP function can access it all.

Your PUP function will be passed a pointer to your heap data block and a special handle called a ``pupper'', which contains the network message to be sent. Your PUP function returns a pointer to your heap data block. In a PUP function, you pass all your heap data to routines named pup_type, where type is either a basic type (such as int, char, float, or double) or an array type (as before, but with a ``s'' suffix). Depending on the direction of packing, the pupper will either read from or write to the values you pass- normally, you shouldn't even know which. The only time you need to know the direction is when you are leaving a processor or just arriving. Correspondingly, the pupper passed to you may be deleting (indicating that you are leaving the processor, and should delete your heap storage after packing), unpacking (indicating you've just arrived on a processor, and should allocate your heap storage before unpacking), or neither (indicating the system is merely sizing a buffer, or checkpointing your values).

PUP functions are much easier to write than explain- a simple C heap block and the corresponding PUP function is:

     typedef struct
       int n1;/*Length of first array below*/
       int n2;/*Length of second array below*/
       double *arr1; /*Some doubles, allocated on the heap*/
       int *arr2; /*Some ints, allocated on the heap*/
      my_block;
 
     my_block *pup_my_block(pup_er p,my_block *m)
     
       if (pup_isUnpacking(p)) m=malloc(sizeof(my_block));
       pup_int(p,&m->n1);
       pup_int(p,&m->n2);
       if (pup_isUnpacking(p))
         m->arr1=malloc(m->n1*sizeof(double));
         m->arr2=malloc(m->n2*sizeof(int));
       
       pup_doubles(p,m->arr1,m->n1);
       pup_ints(p,m->arr2,m->n2);
       if (pup_isDeleting(p))
         free(m->arr1);
         free(m->arr2);
         free(m);
       
       return m;
     

This single PUP function can be used to copy the my_block data into a message buffer and free the old heap storage (deleting pupper); allocate storage on the new processor and copy the message data back (unpacking pupper); or save the heap data for debugging or checkpointing.

A Fortran block TYPE and corresponding PUP routine is as follows:

     MODULE my_block_mod
       TYPE my_block
         INTEGER :: n1,n2x,n2y
         REAL*8, POINTER, DIMENSION(:) :: arr1
         INTEGER, POINTER, DIMENSION(:,:) :: arr2
       END TYPE
     END MODULE
 
     SUBROUTINE pup_my_block(p,m)
       IMPLICIT NONE
       USE my_block_mod
       USE pupmod
       INTEGER :: p
       TYPE(my_block) :: m
       call pup_int(p,m%n1)
       call pup_int(p,m%n2x)
       call pup_int(p,m%n2y)
       IF (pup_isUnpacking(p)) THEN
         ALLOCATE(m%arr1(m%n1))
         ALLOCATE(m%arr2(m%n2x,m%n2y))
       END IF
       call pup_doubles(p,m%arr1,m%n1)
       call pup_ints(p,m%arr2,m%n2x*m%n2y)
       IF (pup_isDeleting(p)) THEN
         DEALLOCATE(m%arr1)
         DEALLOCATE(m%arr2)
       END IF
     END SUBROUTINE

int MBLK_Register(void *block, MBLK_PupFn pup_ud, int* rid)
subroutine MBLK_Register(block,pup_ud, rid)
integer, intent(out)::rid
TYPE(varies), POINTER :: block
SUBROUTINE :: pup_ud
Associates the given data block and PUP function. Returns a block ID, which can be passed to MBLK_Get_registered later. Can only be called from driver. It returns MBLK_SUCESS if the call was successful and MBLK_FAILURE in case of error. For the declarations above, you call MBLK_Register as:

          /*C/C++ driver() function*/
   int myId, err;
          my_block *m=malloc(sizeof(my_block));
          err =MBLK_Register(m,(MBLK_PupFn)pup_my_block,&rid);
 
          !- Fortran driver subroutine
          use my_block_mod
          interface
            subroutine pup_my_block(p,m)
              use my_block_mod
              INTEGER :: p
              TYPE(my_block) :: m
            end subroutine
          end interface
          TYPE(my_block) :: m
          INTEGER :: myId,err
          MBLK_Register(m,pup_my_block,myId,err)

Note that Fortran blocks must be allocated on the stack in driver; while C/C++ blocks may be allocated on the heap.

void MBLK_Migrate()
subroutine MBLK_Migrate()
Informs the load balancing system that you are ready to be migrated, if needed. If the system decides to migrate you, the PUP function passed to MBLK_Register will be called with a sizing pupper, then a packing, deleting pupper. Your stack (and pupped data) will then be sent to the destination machine, where your PUP function will be called with an unpacking pupper. MBLK_Migrate will then return, whereupon you should call MBLK_Get_registered to get your unpacked data block. Can only be called from driver.

int MBLK_Get_Userdata(int n, void** block)
Return your unpacked userdata after migration- that is, the return value of the unpacking call to your PUP function. Takes the userdata ID returned by MBLK_Register. Can be called from driver at any time.

Since Fortran blocks are always allocated on the stack, the system migrates them to the same location on the new processor, so no Get_registered call is needed from Fortran.

January 17, 2008
MBlock Homepage
Charm Homepage