Communication Optimization Framework

Collaboration diagram for Communication Optimization Framework:
A communication optimization framework. More...


Modules

 Converse Communication Optimization Framework
 Framework for delegating converse level communication to Comlib.
 Charm++ Communication Optimization Framework
 Framework for delegating Charm++ level communication to Comlib.
 Converse level message Routers
 Routers used by converse strategies to route messages in certain topologies : grid, hypercube, etc.
 Strategies for use in Charm++
 Communication optimizing strategies for use in Charm++ programs.
 Strategies for use in converse
 Communication optimizing strategies for use in converse programs or in other comlib strategies.

Detailed Description

A communication optimization framework.

Comlib is a framework for delegating converse and charm++ level communications to optimizing strategies which can use appropriate topological routers.

The comlib framework underwent a large refactoring that was committed into CVS in February 2009. The framework was extricated from ck-core. Various files and classes were renamed, and many bugs were fixed. A new system for bracketed strategies was introduced.

Bracketed communication strategies, such as those for all-to-all communication, now have knowledge that is updated on demand. If errors are detected in the knowledge base (eg. a chare array element is no longer located on some PE), the strategies enter an error mode. After the knowledge has been updated, the strategies will exit the error mode and send messages along the new optimized paths. While in the error states, the strategies send existing messages directly, and buffer new messages.

Usage Restrictions

Strategies should be created in a MainChare's Main method (hence on PE 0). Proxies can be created later and these can be delegated to the existing strategies.

Startup

The initilization of Comlib is done both asynchronously for parts and at startup for other parts. There is an initproc routine initConvComlibManager() that instantiates the Ckpv processor local conv_com_object. This needs to be created before the ComlibManagerMain method is called. Because we cannot guarantee the order for which the mainchares execute, we must do this in an initproc routine.

The startup of Comlib happens asynchronously. The mainchare ComlibManagerMain has a constructor that runs along with all other mainchares while charm++ messages are not activated at startup. This constructor simply sets a few variables from command line arguments, and then creates the ComlibManager group . After all mainchares have run (in no determinable order), then the charm++ system will release all charm++ messages.

At this point the user program will continue asynchronously with the comlib startup.

Then ComlibManager::ComlibManager() calls ComlibManager::init(), sets up a few variables, and then calls ComlibManager::barrier(). After barrier() has been called by all PEs, it calls ComlibManager::resumeFromSetupBarrier().

ComlibManager::resumeFromSetupBarrier() completes the initialization of the charm layer of comlib after all group branches are created. It is guaranteed that Main::Main has already run at this point, so all strategies created there can be broadcast. This function calls ComlibDoneCreating() and then it sends all messages that were buffered (in unCompletedSetupBuffer).

ComlibDoneCreating() will do nothing on all PE != 0. On PE 0, it will call ConvComlibManager::doneCreating(). The strategy table will broadcast at this point.

The process for broadcasting the strategy table is as follows (see convcomlibmanager.C):

  1. the strategies are inserted on processor 0 (and possibly in other processors with the same order. The strategies are marked as "new"
  2. when ConvComlibManager::doneCreating() is called, processor 0 broadcasts all the new strategies to all the processors, and marks them as "inSync"
  3. when a processor receives a table it updates its personal table with the incoming, it marks all the strategies just arrived as "inSync", and it sends an acknowledgement back to processor 0.
  4. when an acknowledgement is received by processor 0, a counter is decremented. When it reaches 0, all the "inSync" strategies are switched to status "ready" and they can start working. All the messages in the tmplist are delivered. The sync is broadcasted.
  5. when an acknowledgement is received by a processor other than 0, all the "inSync" strategies are switched to "ready" and the messages in tmplist are delivered.
  6. in order to prevent two consecutive table broadcasts to interfere with each other, an additional acknowledgement is sent back by each processor to processor 0 to allow a new table update to happen.

Startup: Buffering of Messages

Because the startup of Comlib happens asynchronously. Thus, if a user program sends messages through a comlib strategy, and the strategy has not yet started up completely, then the messages may be delayed in one of two queues.

  1. CkQ<MessageHolder*> tmplist; found in convcomlibstrategy.h buffers converse level messages when the converse strategies are not ready.
  2. std::map<ComlibInstanceHandle, std::set<CharmMessageHolder*> > ComlibManager::delayMessageSendBuffer in ComlibManager.h buffers charm level messages at startup before ComlibManager::resumeFromSetupBarrier() or while a strategy is in an error state. Messages are flushed from here once both the startup has finished and the strategy is not in an error state. The messages are flushed from one place: ComlibManager::sendBufferedMessages().

Bracketed Strategies

Usage of Bracketed Strategies

Bracketed strategies have the following usage pattern. For each iteration of the program:

  1. Each source object calls ComlibManager::beginIteration(int instid, int iteration)
  2. Each source object invokes one or more entry method(s) on the delegated proxy
  3. Each source object then calls ComlibManager::endIteration().

Restrictions on Bracketed Strategies

  1. The user application is not allowed to call beginIteration for iteration n until all messages from iteration n-1 have been received.
  2. Migrations of elements are not allowed between when they call ComlibManager::beginIteration and the associated ComlibManager::endIteration for the same iteration.

Detecting migrations in Bracketed Strategies

The instance of each strategy on each PE maintains a list of the local array elements, and the last known iteration value. The current implementation only detects migrations when a PE gains a net positive number of migrated objects. All of the objects on that PE will call ComlibManager::beginIteration. Because the strategy knows how many elements were previously on the PE, it will detect more calls to ComlibManager::beginIteration than its previous element count. At this point, the future messages for the strategy will be enqueued in a buffer (ComlibManager::delayMessageSendBuffer). Once ComlibManager::endIteration() is called, the error recovery protocol will be started. All PEs will cause any objects that have migrated away to report back to PE 0, which updates a list of object locations. Once all PEs and migrated objects have reported back to PE 0, the updated PE list will be broadcast to all PEs, and the strategy will be enabled again. At this point any buffered messages will be released. The subsequent iteration of the application should then be optimized.

If two objects swap places between two PEs, the current implementation does not detect this change. In the future ComlibManager::beginIteration should compare the object to the list of known local objects, and start buffering messages and correcting this error condition.


Generated on Mon Nov 23 07:56:02 2009 for Charm++ by  doxygen 1.5.5