Table of content

Context
- Division is pretty much always slower than multiplication
Solution
Conclusion

Context

Consider $x/c$ where x and c are both unsigned integers. How do you avoid asking the compiler to submit a division instruction and instead submit addition/multiplication instructions instead?

Division is pretty much always slower than multiplication

Asking the computer to divide an integer is pretty always slower than multiplication. At a high level, multiplication can be done in parallel and all information is present. Contrast that to division. In division, we do NOT know what the quotient is; consequently, we MUST iterate. Iteration implies sequence as we continue the iteration until we find the correct quotient.

Solution

We can avoid division by changing the equation to be multiplication by first computing the reciprocal.

// First consider unsigned division.
// Our strategy is to precompute 1/c then do
//   ⎣x / c⎦ = ⎣x * (1/c)⎦.

Assuming $1/c$ is exact, this identity is true. A consideration that is obvious in math but not in reality.

We need to scale the equation so that we stay in integer land.

// 1/c is less than 1, so we can't compute it directly in
// integer arithmetic.  Let's instead compute 2^e/c
// for a value of e TBD (^ = exponentiation).  Then
//   ⎣x / c⎦ = ⎣x * (2^e/c) / 2^e⎦.

To do this, we introduce a scaling factor of $2^e$ . We have a base of 2 because computer bits. We can divide by merely shifting the bits over.

Since $2^e/c$ isn’t always an integer, we need an approximation, m.

// Dividing by 2^e is easy.  2^e/c isn't an integer, unfortunately.
// So we must approximate it.  Let's call its approximation m.
// We'll then compute
//   ⎣x * m / 2^e⎦
// Which we want to be equal to ⎣x / c⎦ for 0 <= x < 2^n-1
// where n is the word size.

We need to now show that using the approximation m will be the same as the true division. To do this, we need to find bounds.

// Setting x = c gives us c * m >= 2^e.
// We'll chose m = ⎡2^e/c⎤ to satisfy that equation.

We chosse $x=c$ because that’s where the property first breaks. Note that we are exploring the property $\left\lfloor \frac{x m}{2^e} \right\rfloor = \left\lfloor \frac{x}{c} \right\rfloor \text{ for all } 0 \le x < 2^n$ At $x=c$ , this flips to 1 instead of 0. Remember that $\frac{x}{2^e}$ contains the original $1/c$ multiplied by the scaling factor $2^e$ .

But now,we need to show this approximation will never round differently. As m is an approximation, it MUST contain a delta from the exact.

// Let m = 2^e/c + delta, 0 <= delta < 1
//   ⎣x * (2^e/c + delta) / 2^e⎦
//   ⎣x / c + x * delta / 2^e⎦
// We must have x * delta / 2^e < 1/c so that this
// additional term never rounds differently than ⎣x / c⎦ does.

The delta represents the error in the approximation. As long as we can guarantee that the delta does not change the rounding behavior, we are good.

We can rearrange the terms.

c * x * delta < 2^e

Since delta is guaranteed to be at most 1 and x is at most $2^n-1$ , we can use

c * 2^n < 2^e

In other words, we can pick e to be $e = n + s \space \text{where } s = \lceil log_2(c)\rceil$ . We require the ceiling on the log base 2 of c because the inequality implies that e must be larger. The floor operation does not make this true.

Thus, we are done.

Conclusion

We have shown an alternative to just straight division. This allows us to at compiler time, to perform “strength reduction” for division operations. The compiler provided only the list of integer types such int8 (8 bytes).

The compiler maintains the table/map of

key: (type-width, constant divisor c)
value: (m, e)

Because the trick needs the width of the integer and the constant divisor, to get the magic number to scale it to and then e is the number of bits to shift back for the correct value.

The benefit is the runtime savings at program time since the compiler is precomputing and storing this map. So when you divde, the map is referenced over and over again.

q = x/ c
q = (x * m) >> e
multiple then shift

Unsigned Division Optimization

Table of content

Context

Division is pretty much always slower than multiplication

Solution

Conclusion