In step 1, the contributors use the input format described in
Appendix
. As usual in CHARM++, because
contributors are almost always parallel objects themselves,
there can be several contributors per processor, or none at all on a
particular processor.
In step 2, the local collideMgr actually accumulates the triangles
to be sent off, then finally sends all the triangles for a voxel
at once. This local accumulation is much more efficient than sending
a separate message for each triangle, and allows us to use the
indexed triangle representation. Splitting up indexed triangle lists
efficiently is the subject of Appendix
.
The message sends in step 2 actually create the voxels, if they haven't already been created. By default, the creation happens on the sending machine- this makes all future communication between this collideMgr and the voxel efficient. If the triangles contributed on a processor are clustered well, most will never be sent over the network.
The synchronization in step 3 is needed because voxels may receive objects from several processors. Since any processor can send a triangle to any voxel, ``Have all triangles been delivered?'' is a global property, obtainable only via a global operation. Recall that in CHARM++, however, other local objects will automatically use the CPU during this single synchronization.
After the synchronization, each voxel runs a separate serial collision detection algorithm and reports the results.