Consider step 2 of the parallel algorithm above.
Triangles destined for voxel
can be given
to the collideMgr on any processor. Somehow,
each processor must determine if that voxel already exists
somewhere on the parallel machine. If so, the triangles
need to be sent to the voxel. If not, a new voxel must
be created, ideally on the same processor.
Further, triangles destined for the same, nonexistent voxel may be delivered to two different processors at the same time. Exactly one of the processors must create a new voxel; and the other's triangles must be sent there.
It's easy to imagine using a centralized voxel registry to map grid locations to processors and serialize the creation race condition. However, a centralized registry is is not scalable and would quickly become a serial bottleneck. The natural solution, then, is to use a distributed registry.
This is the approach used by the parallel array framework. To deliver a message to an unknown grid location, it computes a ``home'' processor3.2for that location via a simple hash function. This home processor will either inform the sender of the voxel's location, or authorize the sender to create a new local voxel. Home processors thus act as a distributed registry to keep track of which voxels exist, and where. Processors cache the location of recently referenced voxels, eliminating the overhead of contacting the home processor for repeated communication.
Home processors also provide an efficient, distributed means to support migrating array elements. The array framework also properly supports broadcasts, used in step 4 above; as well as reductions, used in step 5. The details of how to support these operations with ongoing migrations are presented by the author in [#!ArrayMigISCOPE01!#].