next up previous contents
Next: Bibliography Up: A Grid-Based Parallel Collision Previous: Efficiently Splitting Indexed Triangle   Contents


Efficient Gridding

The inner loop of the voxel method outlined in this work computes the grid locations spanned by an object, and then inserts the object into the object lists at each grid location. For simplicity, we compute the grid locations based on the object bounding box. Hence one extremely common operation is to convert the coordinate of each face of a bounding box, a floating-point number, into a grid location, an integer. However, on modern architectures, converting floating-point numbers to integers is often slow.

In certain situations, a novel but simple technique can be used to dramatically speed up floating-point to integer conversions. In particular, given a double-precision source and integer destination array, we may use either the first or the second two C statements below:

    dest[i] = (int)floor(src[i]);  /* Conventional version*/
    float f = (float)(src[i] + convert_offset);
    dest[i] = *(int32 *)&f;    /* Optimized version*/

The reason is that an IEEE single precision floating-point number's mantissa field overlaps the low bits of an integer, as shown in Figure [*]. If the grid size is a power of two, we can convert a coordinate into a grid location by shifting and extracting out the appropriate bits of the coordinate.

Figure: An IEEE single precision floating-point number and a 32-bit integer
\includegraphics[width=3in]{fig/a_ieee.eps}

Floating-point numbers are always normalized into the form $ 2^m 1.xxxx_2$, so we can shift the mantissa of a floating point number by simply adding a constant. For example, given a floating-point coordinate with binary representation $ xxxx.yyyy_2$, after adding $ 2^{23} 1.1000_2$, we get

$\displaystyle 2^{23} 1.\underbrace{100000000000000000xxxx}_{23 \mathrm{bits}} $

By adding a constant, we have shifted the grid location $ xxxx$ into the low bits, and by rounding to single precision we have removed the excess bits $ yyyy$ that represent the sub-gridcell location.

The resulting floating-point number can be compared, incremented, and decremented as if an integer. The IEEE exponent field corrupts the high bits of this integer, but these can easily be subtracted off. Since the origin of the grid coordinates is arbitrary anyway, we leave the exponent. The mantissa is only 23 bits, but this is still enough to represent 8 million grid cells along each axis, or $ 10^{20}$ cells total. Converting a double-precision floating point number to a 64-bit integer in the same way would yield a 52-bit grid index.

By adding a carry-under protection bit to the constant, this method can accept both positive and negative input coordinates. By subtracting the coordinate origin, we can accomplish a shifting as well as scaling. We can also scale the constant to extract out a different set of bits, simulating any power-of-two grid size. Finally, we must compensate for IEEE rounding, which rounds to the nearest bitpattern rather than truncating. The ideal constant for this method thus has the form:

$\displaystyle (\underbrace{1.5}_{\mathrm{carry protect}} *
\overbrace{2^{23}}^{\mathrm{shift}}
-\underbrace{0.5}_{\mathrm{round}})
*gridsize - origin $

Thus given a power-of-two grid size and origin, the optimized C code to compute the conversion constant and then convert a floating point value src to an integer grid cell index dest is as follows.

    double convert_offset = (1.5*(1<<23)-0.5)*gridsize-origin;
    float f = (float)(src + convert_offset);
    int dest = *(int32 *)&f;

Of course, the same method can be applied in any language that allows us to quickly interpret the bits of one data type as bits of another data type.

The table on the next page shows the time per floating point to integer conversion for both the conventional and optimized approaches. The performance difference between the two is dramatic- a factor of 2 to 40, depending on the architecture. Because mapping one bounding box onto a grid requires six such conversions, the total savings by using the optimized method is a microsecond or two per object- a significant speedup.


Table: Comparing conventional and optimized conversion times.
Machine ConventionalB.1 OptimizedB.1
195 MHz MIPS R10000B.2 109 ns 21 ns
300 MHz SPARC Ultra 10B.3 245 ns 30 ns
332 MHz PowerPC 604eB.4 281 ns 127 ns
500 MHz Pentium 3 XenonB.5 236 ns 8 ns
400 MHz AMD K6-3B.6 368 ns 64 ns
300 MHz AMD K6-3B.7 491 ns 84 ns
240 MHz PA-RISC 8200B.8 648 ns 15 ns



next up previous contents
Next: Bibliography Up: A Grid-Based Parallel Collision Previous: Efficiently Splitting Indexed Triangle   Contents
Orion Lawlor 2001-08-31