Parallel objects using "Asynchronous Remote Method Invocation":
Entry methods are all the methods of a chare where messages can be sent by other chares. They are declared in the .ci files, and they must be defined as public methods of the C++ object representing the chare.
No! This is one of the biggest differences between Charm++ and most other ``remote procedure call'' systems like CORBA, Java RMI, or RPC. ``Invoke an asynchronous method'' and ``send a message'' have exactly the same semantics and implementation. Since the invoking method does now wait for the remote method to terminate, it normally cannot receive any return value. (see later for a way to return values)
Asynchronous method invocation is more efficient because it can be implemented as a single message send. Unlike with synchronous methods, thread blocking and unblocking and a return message are not needed.
Another big advantage of asynchronous methods is that it's easy to make things run in parallel. If I execute:
Yes. If you want synchronous methods, so the caller will block, use the [sync] keyword before the method in the .ci file. This requires the sender to be a threaded entry method, as it will be suspended until the callee finishes. Sync entry methods are allowed to return values to the caller.
A threaded entry method is an entry method for a chare that executes in a separate user-level thread. It is useful when the entry method wants to suspend itself (for example, to wait for more data). Note that threaded entry methods have nothing to do with kernel-level threads or pthreads; they run in user-level threads that are scheduled by Charm++ itself.
In order to make an entry method threaded, one should add the keyword threaded withing square brackets after the entry keyword in the interface file:
The usual way to get data back to your caller is via another invocation in the opposite direction:
The above example is very non-modular, because b has to know that a called it, and what method to call a back on. For this kind of request/response code, you can abstract away the ``where to return the data'' with a CkCallback object:
There are a few reasons for that:
Each processor executes the following operations strictly in order:
This implies that you can assume that the previous steps has completely finished before the next one starts, and any side effect from all the previous steps are committed (and can therefore be used).
Inside a single step there is no order guarantee. This implies that, for example, two groups allocated from mainchare can be instantiated in any order. The only exception to this is processor zero, where chare objects are instantiated immediately when allocated in the mainchare, i.e if two groups are allocated, their order is fixed by the allocation order in the mainchare constructing them. Again, this is only valid for processor zero, and in no other processor this assumption should be made.
To notice that if array elements are allocated in block (by specifying the number of elements at the end of the ckNew function), they are all instantiated before normal execution is resumed; if manual insertion is used, each element can be constructed at any time on its home processor, and not necessarily before other regular communication messages have been delivered to other chares (including other array elements part of the same array).
A proxy is a local C++ class that represents a remote C++ class. When you invoke a method on a proxy, it sends the request across the network to the real object it represents. In Charm++, all communication is done using proxies.
A proxy class for each of your classes is generated based on the methods you list in the .ci file.
Proxies can be:
This will not compile, because a CProxy_A is not an A. What you want is CProxy_A *ap = new CProxy_A(handle).
You can include the def.h file once you've actually declared everything it will reference- all your chares and readonly variables. If your chares and readonlies are in your own header files, it is legal to include the def.h right away.
However, if the class declaration for a chare isn't visible when you include the def.h file, you'll get a confusing compiler error. This is why we recommend including the def.h file at the end.
Make the global variable ``readonly'' by declaring it in the .ci file. Remember also that read-onlies can be safely set only in che mainchare constructor. Any change after the mainchare constructor has finished will be local to the processor that made the change. To change a global variable later in the program, every processor must modify it accordingly (e.g by using a chare group. Note that chare arrays are not guaranteed to cover all processors)
One can have class-static variables as read-onlies. Inside a chare, group or array declaration in the .ci file, one can have a readonly variable declaration. Thus:
You then refer to the variable in your program as someChare::someGroup.
You can use CkWallTimer() to determine the time on some particular processor. To time some parallel computation, you need to call CkWallTimer on some processor, do the parallel computation, then call CkWallTimer again on the same processor and subtract.
These are just like the standard C++ assert calls in <assert.h>- they call abort if the condition passed to them is false.
We use our own version rather than the standard version because we have to call CkAbort, and because we can turn our asserts off when CMK_OPTIMIZE is defined.
No.
There is no nice library to solve this problem, as some messages might be queued on the receiving processor, some on the sender, and some on the network. You can still:
Quiescence is When nothing is happening anywhere on the parallel machine.
A low-level background task counts sent and received messages. When, across the machine, all the messages that have been sent have been received, and nothing is being processed, quiescence is triggered.
Probably not.
In some ways, quiescence is a very strong property (it guarentees nothing is happening anywhere) so if some other library is doing something, you won't reach quiescence. In other ways, quiescence is a very weak property, since it doesn't guarentee anything about the state of your application like a reduction does, only that nothing is happening. Because quiescence detection is on the one hand so strong it breaks modularity, and on the other hand is too weak to guarentee anything useful, it's often better to use something else.
Often global properties can be replaced by much easier-to-compute local properties. For example, my object could wait until all its neighbors have sent it messages (a local property my object can easily detect by counting message arrivals), rather than waiting until all neighbor messages across the whole machine have been sent (a global property that's difficult to determine). Sometimes a simple reduction is needed instead of quiescence, which has the benefits of being activated explicitly (each element of a chare array or chare group has to call contribute) and allows some data to be collected at the same time. A reduction is also a few times faster than quiescence detection. Finally, there are a few situations, such as some tree-search problems, where quiescence detection is actually the most sensible, efficient solution.
June 29, 2008
Charm Homepage