CHARM++ includes several application frameworks, such as the Finite Element Framework, the Multiblock Framework, and AMPI. These frameworks do almost all their work in load balanced, migratable threads.
The Threaded CHARM++ Framework, TCHARM, provides both common runtime support for these threads and facilities for combining multiple frameworks within a single program. For example, you can use TCHARM to create a Finite Element Framework application that also uses AMPI to communicate between Finite Element chunks.
Specifically, TCHARM provides language-neutral interfaces for:
The first portion of this manual describes the general properties of TCHARM common to all the application frameworks, such as program contexts and how to write migratable code. The second portion describes in detail how to combine separate frameworks into a single application.
Parallel context routines run in a migratable, user-level thread maintained by TCHARM. Since there are normally several of these threads per processor, any code that runs in the parallel context has to be thread-safe. However, TCHARM is non-preemptive, so it will only switch threads when you make a blocking call, like ``MPI_Recv" or ``FEM_Update_field".
Global variables are shared by all the threads on a processor, which makes using global variables extremely error prone. To see why this is a problem, consider a program fragment like:
After this code executes, we might expect b to always be equal to a. but if foo is a global variable, MPI_Recv may block and foo could be changed by another thread.
For example, if two threads execute this program, they could interleave like:
| Thread 1 | Thread 2 |
| foo=1 | |
| block in MPI_Recv | |
| foo=2 | |
| block in MPI_Recv | |
| b=foo |
At this point, thread 1 might expect b to be 1; but it will actually be 2. From the point of view of thread 1, the global variable foo suddenly changed its value during the call to MPI_Recv.
There are several possible solutions to this problem:
The above only applies to routines that run in the parallel context. There are no restrictions on global variables for serial context code.
In the parallel context, there are several limitations on open files. First, several threads may run on one processor, so Fortran Logical Unit Numbers are shared by all the threads on a processor. Second, open files are left behind when a thread migrates to another processor--it is a crashing error to open a file, migrate, then try to read from the file.
Because of these restrictions, it is best to open files only when needed, and close them as soon as possible. In particular, it is best if there are no open files whenever you make blocking calls.
The CHARM++ runtime framework includes an automatic run-time load balancer, which can monitor the performance of your parallel program. If needed, the load balancer can ``migrate'' threads from heavily-loaded processors to more lightly-loaded processors, improving the load balance and speeding up the program. For this to be useful, you need to pass the link-time argument -balancer B to set the load balancing algorithm, and the run-time argument +vp N (use N virtual processors) to set the number of threads. The ideal number of threads per processor depends on the problem, but we've found five to a hundred threads per processor to be a useful range.
When a thread migrates, all its data must be brought with it. ``Stack data'', such as variables declared locally in a subroutine, will be brought along with the thread automatically. Global data, as described in Section 2.1, is never brought with the thread and should generally be avoided.
``Heap data'' in C is structures and arrays allocated using malloc or new; in Fortran, heap data is TYPEs or arrays allocated using ALLOCATE. To bring heap data along with a migrating thread, you have two choices: write a pup routine or use isomalloc. Pup routines are described in Section 3.1.
Isomalloc is a special mode which controls the allocation of heap data. You enable isomalloc allocation using the link-time flag ``-memory isomalloc''. With isomalloc, migration is completely transparent--all your allocated data is automatically brought to the new processor. The data will be unpacked at the same location (the same virtual addresses) as it was stored originally; so even cross-linked data structures that contain pointers still work properly.
The limitations of isomalloc are:
The runtime system can automatically move your thread stack to the new processor, but unless you use isomalloc, you must write a pup routine to move any global or heap-allocated data to the new processor. A pup (Pack/UnPack) routine can perform both packing (converting your data into a network message) and unpacking (converting the message back into your data). A pup routine is passed a pointer to your data block and a special handle called a ``pupper'', which contains the network message.
In a pup routine, you pass all your heap data to routines named pup_type or fpup_type, where type is either a basic type (such as int, char, float, or double) or an array type (as before, but with a ``s'' suffix). Depending on the direction of packing, the pupper will either read from or write to the values you pass- normally, you shouldn't even know which. The only time you need to know the direction is when you are leaving a processor, or just arriving. Correspondingly, the pupper passed to you may be deleting (indicating that you are leaving the processor, and should delete your heap storage after packing), unpacking (indicating you've just arrived on a processor, and should allocate your heap storage before unpacking), or neither (indicating the system is merely sizing a buffer, or checkpointing your values).
pup functions are much easier to write than explain- a simple C heap block and the corresponding pup function is:
This single pup function can be used to copy the my_block data into a message buffer and free the old heap storage (deleting pupper); allocate storage on the new processor and copy the message data back (unpacking pupper); or save the heap data for debugging or checkpointing.
A Fortran block TYPE and corresponding pup routine is as follows:
You indicate to TCHARM that you want a pup routine called using the routine below. An arbitrary number of blocks can be registered in this fashion.
void TCHARM_Register(void *block, TCharmPupFn pup_fn)
SUBROUTINE TCHARM_Register(block,pup_fn)
TYPE(varies), POINTER :: block
SUBROUTINE :: pup_fn
Associate the given data block and pup function. Can only be called from the parallel context. For the declarations above, you call TCHARM_Register as:
Note that the data block must be allocated on the stack. Also, in Fortran, the "TARGET" attribute must be used on the block (as above) or else the compiler may not update values during a migration, because it believes only it can access the block.
void TCHARM_Migrate()
subroutine TCHARM_Migrate()
Informs the load balancing system that you are ready to be migrated, if needed. If the system decides to migrate you, the pup function passed to TCHARM_Register will first be called with a sizing pupper, then a packing, deleting pupper. Your stack and pupped data will then be sent to the destination machine, where your pup function will be called with an unpacking pupper. TCHARM_Migrate will then return. Can only be called from in the parallel context.
You can also use a pup routine to set up initial values for global variables on all processors. This pup routine is called with only a pup handle, just after the serial setup routine, and just before any parallel context routines start. The pup routine is never called with a deleting pup handle, so you need not handle that case.
A C example is:
A fortran example is:
You register your global variable pup routine using the method below. Multiple pup routines can be registered the same way.
void TCHARM_Readonly_globals(TCharmPupGlobalFn pup_fn)
SUBROUTINE TCHARM_Readonly_globals(pup_fn)
SUBROUTINE :: pup_fn
This section describes how to combine multiple frameworks in a single application. You might want to do this, for example, to use AMPI communication inside a finite element method solver.
You specify how you want the frameworks to be combined by writing a special setup routine that runs when the program starts. The setup routine must be named TCHARM_User_setup. If you declare a user setup routine, the standard framework setup routines (such as the FEM framework's init routine) are bypassed, and you do all the setup in the user setup routine.
The setup routine creates a set of threads and then attaches frameworks to the threads. Several different frameworks can be attached to one thread set, and there can be several sets of threads; however, the most frameworks cannot be attached more than once to single set of threads. That is, a single thread cannot have two attached AMPI frameworks, since the MPI_COMM_WORLD for such a thread would be indeterminate.
void TCHARM_Create(int nThreads, TCharmThreadStartFn thread_fn)
SUBROUTINE TCHARM_Create(nThreads,thread_fn)
INTEGER, INTENT(in) :: nThreads
SUBROUTINE :: thread_fn
Create a new set of TCHARM threads of the given size. The threads will execute the given function, which is normally your user code. You should call TCHARM_Get_num_chunks() to get the number of threads from the command line. This routine can only be called from your TCHARM_User_setup routine.
You then attach frameworks to the new threads. The order in which frameworks are attached is irrelevant, but attach commands always apply to the current set of threads.
To attach a chare array to the TCHARM array, use:
CkArrayOptions TCHARM_Attach_start(CkArrayID *retTCharmArray,int *retNumElts)
This function returns a CkArrayOptions object that will bind your chare array to the TCHARM array, in addition to returning the TCHARM array proxy and number of elements by reference. If you are using frameworks like AMPI, they will automatically attach themselves to the TCHARM array in their initialization routines.
The complete set of link-time arguments relevant to TCHARM is:
The complete set of command-line arguments relevant to TCHARM is:
Until now, things were presented from the perspective of a user--one who writes a program for a library written on TCharm. This section gives an overview of how to go about writing a library in Charm++ that uses TCharm.
The overall scheme for writing a TCharm-based library "Foo" is:
One simple way to make the non-master threads block until the corresponding local array element is created is to use TCharm semaphores. These are simply a one-pointer slot you can assign using TCharm::semaPut and read with TCharm::semaGet. They're useful in this context because a TCharm::semaGet blocks if a local TCharm::semaGet hasn't yet executed.
The charm-api.h macros CDECL, FDECL and FTN_NAME should be used to provide both C and FORTRAN versions of each API call. You should use the "MPI capitalization standard", where the library name is all caps, followed by a capitalized first word, with all subsequent words lowercase, separated by underscores. This capitalization system is consistent, and works well with case-insensitive languages like Fortran.
Fortran parameter passing is a bit of an art, but basically for simple types like int (INTEGER in fortran), float (SINGLE PRECISION or REAL*4), and double (DOUBLE PRECISION or REAL*8), things work well. Single parameters are always passed via pointer in Fortran, as are arrays. Even though Fortran indexes arrays based at 1, it will pass you a pointer to the first element, so you can use the regular C indexing. The only time Fortran indexing need be considered is when the user passes you an index-the int index will need to be decremented before use, or incremented before a return.
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -white -antialias -local_icons -long_titles 1 -show_section_numbers -top_navigation -address '
May 26, 2012
Charm Homepage' -split 0 manual.tex
The translation was initiated by root on 2012-05-26
May 26, 2012
Charm Homepage