## Floating-point maths, constexpr style

(Start at the beginning of the series – and all the source can be found in my github repo)

To ease into constexpr programming I decided to tackle some floating-point maths functions. Disclaimer: I’m not a mathematician and this code has not been rigorously tested for numeric stability or convergence in general. I wrote it more for my own learning than for serious mathematical purposes, and anyway, it’s perhaps likely that `constexpr`

maths functions will be in the standard library sooner or later (and possibly implemented by intrinsics). However, I did subject it to many ad-hoc tests using `static_assert`

. One of the nice things about compile-time computation: the tests are compile-time too, and if it compiles, one can be reasonably sure it works (at least for the values tested)!

Anyway, I went to cppreference.com’s list of common mathematical functions to see what to tackle. Starting with `abs`

is trivial:

Starting with `abs`

is trivial:

template <typename T> constexpr T abs(T x) { return x >= 0 ? x : x < 0 ? -x : throw err::abs_runtime_error; } |

For the sake of clarity, I’m omitting some `enable_if`

boilerplate here that ensures that the argument has the right type.

**Iterative methods FTW**

Now, other functions take a little thought. But remembering calculus, many of them are susceptible to iterative methods, so my general plan formed: formulate a recursive calculation which converges to the real answer, and terminate the recursion when the value is within epsilon of the answer.

For example, to compute `sqrt`

, we can use the well-known Newton-Raphson method, where we start with a guess g_{0} as an approximation to √x, then:

First, we need a function to determine termination, which I put in a `detail`

namespace:

namespace detail { template <typename T> constexpr bool feq(T x, T y) { return abs(x - y) <= std::numeric_limits<T>::epsilon(); } } |

And then `sqrt`

is straightforward to express (using the value itself as the initial guess), with a driver function handling the domain of the argument and the `throw`

pattern, and delegating the recursion to a function in the `detail`

namespace.

namespace detail { template <typename T> constexpr T sqrt(T x, T guess) { return feq(guess, (guess + x/guess)/T{2}) ? guess : sqrt(x, (guess + x/guess)/T{2}); } } template <typename T> constexpr T sqrt(T x) { return x == 0 ? 0 : x > 0 ? detail::sqrt(x, x) : throw err::sqrt_domain_error; } |

This ends up being a useful pattern for many functions. For example, `cbrt`

is practically identical; only the equational details change. And once we have `sqrt`

, `hypot`

is trivial of course.

Next I tackled trigonometric functions. Remembering Taylor series and turning to that Oracle of All Things Maths, Wolfram Alpha, we can find the Taylor series expansions:

and

These are basically the same formula, and with some massaging, they can share a common recursive function with slightly different initial constants:

namespace detail { template <typename T> constexpr T trig_series(T x, T sum, T n, int i, int s, T t) { return feq(sum, sum + t*s/n) ? sum : trig_series(x, sum + t*s/n, n*i*(i+1), i+2, -s, t*x*x); } } template <typename T> constexpr T sin(T x) { return true ? detail::trig_series(x, x, T{6}, 4, -1, x*x*x) : throw err::sin_runtime_error; } template <typename T> constexpr T cos(T x) { return true ? detail::trig_series(x, T{1}, T{2}, 3, -1, x*x) : throw err::cos_runtime_error; } |

Once we have `sin`

and `cos`

, `tan`

is trivial of course. And `exp`

has almost the same series as `sin`

and `cos`

– easier, actually:

Which yields in code:

namespace detail { template <typename T> constexpr T exp(T x, T sum, T n, int i, T t) { return feq(sum, sum + t/n) ? sum : exp(x, sum + t/n, n * i, i+1, t * x); } } template <typename T> constexpr T exp(T x) { return true ? detail::exp(x, T{1}, T{1}, 2, x) : throw err::exp_runtime_error; } |

**No peeking at the floating-point details**

Now, how about `floor`

, `ceil`

, `trunc`

and `round`

? They are all variations on a theme: solving one basically means solving all, with only slight differences for things like negative numbers. I thought about this for a half hour – there wasn’t an obvious solution to me at first. We can’t trivially cast to an integral type, because there is no integral type big enough to hold a floating point value in general. And in the name of strict portability, we can’t be sure that the floating point format is IEEE 754 standard (although I think I am prepared to accept this as a limitation). So even if we could fiddle with the representation, portability goes out the window. Finally, `constexpr`

(including C++14 `constexpr`

) forbids `reinterpret_cast`

and other tricks like access through `union`

members.

So how to compute `floor`

? After a little thought I hit upon an approach. In fact, I am relying on the IEEE 754 standard in a small way – the fact that any integer power of 2 is exactly representable as floating point. With that I formed the plan:

- Start with a guess of zero.
- Start with an increment: 2
^{(std::numeric_limits::max_exponent – 1)}. - If (guess + increment) is larger than x, halve the increment.
- Otherwise, add the increment to the guess, and that’s the new guess.
- Stop when the increment is less than 2.

This gives a binary-search like approach to finding the correct value for `floor`

. And the code is easy to write. First we need a function to raise a floating point value to an integer power, in the standard O(log n) way:

namespace detail { template <typename T> constexpr T ipow(T x, int n) { return (n == 0) ? T{1} : n == 1 ? x : n > 1 ? ((n & 1) ? x * ipow(x, n-1) : ipow(x, n/2) * ipow(x, n/2)) : T{1} / ipow(x, -n); } } |

Then the actual `floor`

function, which uses `ceil`

for the case of negative numbers:

namespace detail { template <typename T> constexpr T floor(T x, T guess, T inc) { return guess + inc <= x ? floor(x, guess + inc, inc) : inc <= T{1} ? guess : floor(x, guess, inc/T{2}); } } constexpr float floor(float x) { return x < 0 ? -ceil(-x) : x >= 0 ? detail::floor( x, 0.0f, detail::ipow(2.0f, std::numeric_limits<float>::max_exponent-1)) : throw err::floor_runtime_error; } |

The other functions `ceil`

, `trunc`

and `round`

are very similar.

**That’s some deep recursion**

Now, there is a snag with this plan. It works well enough for `float`

, but when we try it with `double`

, it falls down. Appendix B of the C++ standard (Implementation quantities) recommends that compilers offer a minimum recursive depth of 512 for `constexpr`

function invocations. And on my machine, `std::numeric_limits<float>::max_exponent`

is 128. But for `double`

, it’s 1024. And for `long double`

, it’s 16384. Since by halving the increment each time, we’re basically doing a linear search on the exponent, 512 recursions isn’t enough. I’m prepared to go with what the standard recommends in the sense that I don’t mind massaging switches for compilers that don’t follow the recommendations, but I’d rather not mess around with compiler switches for those that do. So how can I get around this?

Well, I’m using a binary search to cut down the increment. What if I increase the fanout so it’s not a binary search, but a trinary search or more? How big of a fanout do I need? What I need is:

max_recursion_depth > max_exponent/x, where 2^{x} is the fanout

So to deal with `double`

, x = 3 and we need a fanout of 8. An octary search? Well, it’s easy enough to code, even if it is a nested-ternary-operator-from-hell:

template <typename T> constexpr T floor8(T x, T guess, T inc) { return inc < T{8} ? floor(x, guess, inc) : guess + inc <= x ? floor8(x, guess + inc, inc) : guess + (inc/T{8})*T{7} <= x ? floor8(x, guess + (inc/T{8})*T{7}, inc/T{8}) : guess + (inc/T{8})*T{6} <= x ? floor8(x, guess + (inc/T{8})*T{6}, inc/T{8}) : guess + (inc/T{8})*T{5} <= x ? floor8(x, guess + (inc/T{8})*T{5}, inc/T{8}) : guess + (inc/T{8})*T{4} <= x ? floor8(x, guess + (inc/T{8})*T{4}, inc/T{8}) : guess + (inc/T{8})*T{3} <= x ? floor8(x, guess + (inc/T{8})*T{3}, inc/T{8}) : guess + (inc/T{8})*T{2} <= x ? floor8(x, guess + (inc/T{8})*T{2}, inc/T{8}) : guess + inc/T{8} <= x ? floor8(x, guess + inc/T{8}, inc/T{8}) : floor8(x, guess, inc/T{8}); } |

(Apologies for the width, but I really can’t make it much more readable.) For the base case, I revert to the regular binary search implementation of `floor`

. What about for `long double`

? Well, since max_exponent is 16384, x = 33 and we need a fanout of 2^{33}. That’s not happening! For `long double`

, I have no choice but to revert to the C++14 `constexpr`

rules:

constexpr long double floor(long double x) { if (x < 0.0) return -ceil(-x); long double inc = detail::ipow( 2.0l, std::numeric_limits<long double>::max_exponent - 1); long double guess = 0.0l; for (;;) { while (guess + inc > x) { inc /= 2.0l; if (inc < 1.0l) return guess; } guess += inc; } throw err::floor_runtime_error; } |

And that’s enough of such stuff for one blog post. Next time, I’ll go through the rest of the maths functions I implemented, with help from Leonhard Euler and Wikipedia.