A much simpler goal is automatic determination of the ideal voxel size. Voxels that are too small have too much overhead; voxels that are too large do not create enough parallelism and lump together too many objects. The tradeoff should be better analyzed and, if possible, automatic.
Voxels need not form a regular grid at all. There are several other regular and irregular ways to tile 3D space. A less angular grid cell, with a higher volume to surface area than a cube, would be less likely to bring together objects that will never collide.
Although the voxel algorithm applies to arbitrary objects, the implementation is currently limited to triangles. A useful extension would be to include more complex primitives. The implementation could also make much better use of temporal and spatial coherence, as suggested in [#!LC91!#] and [#!Lin93!#].
The implementation currently requires one barrier operation per collision, which could become a bottleneck for small problems on large machines. Some sort of global operation is needed to ensure all triangles have been delivered; but a systolic approach, continually summing the current send and receive counts, might be more proactive.
We currently only determine which objects intersect. This is sufficient for some applications, such as motion planning, clearance checking, and many graphics applications. However, a more complete solution would go on to determine the entire set of penetrating points, the penetration distance, and the penetration direction for each point. Kawachi and Suzuki [#!KS00!#] use a ``discrete closest feature list'' to efficiently determine these quantities via an algorithm similar to the voxel method.