|
Modules | |
| Converse Communication Optimization Framework | |
| Framework for delegating converse level communication to Comlib. | |
| Charm++ Communication Optimization Framework | |
| Framework for delegating Charm++ level communication to Comlib. | |
| Converse level message Routers | |
| Routers used by converse strategies to route messages in certain topologies : grid, hypercube, etc. | |
| Strategies for use in Charm++ | |
| Communication optimizing strategies for use in Charm++ programs. | |
| Strategies for use in converse | |
| Communication optimizing strategies for use in converse programs or in other comlib strategies. | |
Comlib is a framework for delegating converse and charm++ level communications to optimizing strategies which can use appropriate topological routers.
The comlib framework underwent a large refactoring that was committed into CVS in February 2009. The framework was extricated from ck-core. Various files and classes were renamed, and many bugs were fixed. A new system for bracketed strategies was introduced.
Bracketed communication strategies, such as those for all-to-all communication, now have knowledge that is updated on demand. If errors are detected in the knowledge base (eg. a chare array element is no longer located on some PE), the strategies enter an error mode. After the knowledge has been updated, the strategies will exit the error mode and send messages along the new optimized paths. While in the error states, the strategies send existing messages directly, and buffer new messages.
Strategies should be created in a MainChare's Main method (hence on PE 0). Proxies can be created later and these can be delegated to the existing strategies.
The initilization of Comlib is done both asynchronously for parts and at startup for other parts. There is an initproc routine initConvComlibManager() that instantiates the Ckpv processor local conv_com_object. This needs to be created before the ComlibManagerMain method is called. Because we cannot guarantee the order for which the mainchares execute, we must do this in an initproc routine.
The startup of Comlib happens asynchronously. The mainchare ComlibManagerMain has a constructor that runs along with all other mainchares while charm++ messages are not activated at startup. This constructor simply sets a few variables from command line arguments, and then creates the ComlibManager group . After all mainchares have run (in no determinable order), then the charm++ system will release all charm++ messages.
At this point the user program will continue asynchronously with the comlib startup.
Then ComlibManager::ComlibManager() calls ComlibManager::init(), sets up a few variables, and then calls ComlibManager::barrier(). After barrier() has been called by all PEs, it calls ComlibManager::resumeFromSetupBarrier().
ComlibManager::resumeFromSetupBarrier() completes the initialization of the charm layer of comlib after all group branches are created. It is guaranteed that Main::Main has already run at this point, so all strategies created there can be broadcast. This function calls ComlibDoneCreating() and then it sends all messages that were buffered (in unCompletedSetupBuffer).
ComlibDoneCreating() will do nothing on all PE != 0. On PE 0, it will call ConvComlibManager::doneCreating(). The strategy table will broadcast at this point.
The process for broadcasting the strategy table is as follows (see convcomlibmanager.C):
Because the startup of Comlib happens asynchronously. Thus, if a user program sends messages through a comlib strategy, and the strategy has not yet started up completely, then the messages may be delayed in one of two queues.
Bracketed strategies have the following usage pattern. For each iteration of the program:
The instance of each strategy on each PE maintains a list of the local array elements, and the last known iteration value. The current implementation only detects migrations when a PE gains a net positive number of migrated objects. All of the objects on that PE will call ComlibManager::beginIteration. Because the strategy knows how many elements were previously on the PE, it will detect more calls to ComlibManager::beginIteration than its previous element count. At this point, the future messages for the strategy will be enqueued in a buffer (ComlibManager::delayMessageSendBuffer). Once ComlibManager::endIteration() is called, the error recovery protocol will be started. All PEs will cause any objects that have migrated away to report back to PE 0, which updates a list of object locations. Once all PEs and migrated objects have reported back to PE 0, the updated PE list will be broadcast to all PEs, and the strategy will be enabled again. At this point any buffered messages will be released. The subsequent iteration of the application should then be optimized.
If two objects swap places between two PEs, the current implementation does not detect this change. In the future ComlibManager::beginIteration should compare the object to the list of known local objects, and start buffering messages and correcting this error condition.
1.5.5