Floating-point maths, constexpr style – Why is a raven like a writing desk?

(Start at the beginning of the series – and all the source can be found in my github repo)

To ease into constexpr programming I decided to tackle some floating-point maths functions. Disclaimer: I’m not a mathematician and this code has not been rigorously tested for numeric stability or convergence in general. I wrote it more for my own learning than for serious mathematical purposes, and anyway, it’s perhaps likely that constexpr maths functions will be in the standard library sooner or later (and possibly implemented by intrinsics). However, I did subject it to many ad-hoc tests using static_assert. One of the nice things about compile-time computation: the tests are compile-time too, and if it compiles, one can be reasonably sure it works (at least for the values tested)!

Anyway, I went to cppreference.com’s list of common mathematical functions to see what to tackle. Starting with abs is trivial:

Starting with abs is trivial:

template 
constexpr T abs(T x)
{
  return x >= 0 ? x :
    x < 0 ? -x :
    throw err::abs_runtime_error;
}

For the sake of clarity, I'm omitting some enable_if boilerplate here that ensures that the argument has the right type.

Iterative methods FTW

Now, other functions take a little thought. But remembering calculus, many of them are susceptible to iterative methods, so my general plan formed: formulate a recursive calculation which converges to the real answer, and terminate the recursion when the value is within epsilon of the answer.

For example, to compute sqrt, we can use the well-known Newton-Raphson method, where we start with a guess g₀ as an approximation to √x, then:

[latex]g_{n+1} = \frac{(g_n + \frac{x}{g_n})}{2}[/latex]

First, we need a function to determine termination, which I put in a detail namespace:

namespace detail
{
  template 
  constexpr bool feq(T x, T y)
  {
    return abs(x - y) <=
      std::numeric_limits::epsilon();
  }
}

And then sqrt is straightforward to express (using the value itself as the initial guess), with a driver function handling the domain of the argument and the throw pattern, and delegating the recursion to a function in the detail namespace.

namespace detail
{
  template 
  constexpr T sqrt(T x, T guess)
  {
    return feq(guess, (guess + x/guess)/T{2}) ? guess :
      sqrt(x, (guess + x/guess)/T{2});
  }
}
template 
constexpr T sqrt(T x)
{
  return x == 0 ? 0 :
    x > 0 ? detail::sqrt(x, x) :
    throw err::sqrt_domain_error;
}

This ends up being a useful pattern for many functions. For example, cbrt is practically identical; only the equational details change. And once we have sqrt, hypot is trivial of course.

Next I tackled trigonometric functions. Remembering Taylor series and turning to that Oracle of All Things Maths, Wolfram Alpha, we can find the Taylor series expansions:

[latex]sin(x) = \sum_{k=0}^{\infty} \frac{(-1)^k x^{1+2k}}{(1+2k)!}[/latex]
and
[latex]cos(x) = \sum_{k=0}^{\infty} \frac{(-1)^k x^{2k}}{(2k)!}[/latex]

These are basically the same formula, and with some massaging, they can share a common recursive function with slightly different initial constants:

namespace detail
{
  template 
  constexpr T trig_series(T x, T sum, T n, int i, int s, T t)
  {
    return feq(sum, sum + t*s/n) ?
      sum :
      trig_series(x, sum + t*s/n, n*i*(i+1), i+2, -s, t*x*x);
  }
}
template 
constexpr T sin(T x)
{
  return true ?
    detail::trig_series(x, x, T{6}, 4, -1, x*x*x) :
    throw err::sin_runtime_error;
}
template 
constexpr T cos(T x)
{
  return true ?
    detail::trig_series(x, T{1}, T{2}, 3, -1, x*x) :
    throw err::cos_runtime_error;
}

Once we have sin and cos, tan is trivial of course. And exp has almost the same series as sin and cos - easier, actually:

[latex]e^x = \sum_{k=0}^{\infty} \frac{x^k}{k!}[/latex]

Which yields in code:

namespace detail
{
  template 
  constexpr T exp(T x, T sum, T n, int i, T t)
  {
    return feq(sum, sum + t/n) ?
      sum :
      exp(x, sum + t/n, n * i, i+1, t * x);
  }
}
template 
constexpr T exp(T x)
{
  return true ? detail::exp(x, T{1}, T{1}, 2, x) :
    throw err::exp_runtime_error;
}

No peeking at the floating-point details

Now, how about floor, ceil, trunc and round? They are all variations on a theme: solving one basically means solving all, with only slight differences for things like negative numbers. I thought about this for a half hour - there wasn't an obvious solution to me at first. We can't trivially cast to an integral type, because there is no integral type big enough to hold a floating point value in general. And in the name of strict portability, we can't be sure that the floating point format is IEEE 754 standard (although I think I am prepared to accept this as a limitation). So even if we could fiddle with the representation, portability goes out the window. Finally, constexpr (including C++14 constexpr) forbids reinterpret_cast and other tricks like access through union members.

So how to compute floor? After a little thought I hit upon an approach. In fact, I am relying on the IEEE 754 standard in a small way - the fact that any integer power of 2 is exactly representable as floating point. With that I formed the plan:

Start with a guess of zero.
Start with an increment: 2^{(std::numeric_limits::max_exponent - 1)}.
If (guess + increment) is larger than x, halve the increment.
Otherwise, add the increment to the guess, and that's the new guess.
Stop when the increment is less than 2.

This gives a binary-search like approach to finding the correct value for floor. And the code is easy to write. First we need a function to raise a floating point value to an integer power, in the standard O(log n) way:

namespace detail
{
  template 
  constexpr T ipow(T x, int n)
  {
    return (n == 0) ? T{1} :
      n == 1 ? x :
      n > 1 ? ((n & 1) ? x * ipow(x, n-1) :
                         ipow(x, n/2) * ipow(x, n/2)) :
      T{1} / ipow(x, -n);
  }
}

Then the actual floor function, which uses ceil for the case of negative numbers:

namespace detail
{
  template 
  constexpr T floor(T x, T guess, T inc)
  {
    return guess + inc <= x ? floor(x, guess + inc, inc) :
      inc <= T{1} ? guess : floor(x, guess, inc/T{2});
  }
}
constexpr float floor(float x)
{
  return x < 0 ? -ceil(-x) :
    x >= 0 ? detail::floor(
        x, 0.0f,
        detail::ipow(2.0f,
            std::numeric_limits::max_exponent-1)) :
    throw err::floor_runtime_error;
}

The other functions ceil, trunc and round are very similar.

That's some deep recursion

Now, there is a snag with this plan. It works well enough for float, but when we try it with double, it falls down. Appendix B of the C++ standard (Implementation quantities) recommends that compilers offer a minimum recursive depth of 512 for constexpr function invocations. And on my machine, std::numeric_limits<float>::max_exponent is 128. But for double, it's 1024. And for long double, it's 16384. Since by halving the increment each time, we're basically doing a linear search on the exponent, 512 recursions isn't enough. I'm prepared to go with what the standard recommends in the sense that I don't mind massaging switches for compilers that don't follow the recommendations, but I'd rather not mess around with compiler switches for those that do. So how can I get around this?

Well, I'm using a binary search to cut down the increment. What if I increase the fanout so it's not a binary search, but a trinary search or more? How big of a fanout do I need? What I need is:

max_recursion_depth > max_exponent/x, where 2^x is the fanout

So to deal with double, x = 3 and we need a fanout of 8. An octary search? Well, it's easy enough to code, even if it is a nested-ternary-operator-from-hell:

template 
constexpr T floor8(T x, T guess, T inc)
{
  return
    inc < T{8} ? floor(x, guess, inc) :
    guess + inc <= x ? floor8(x, guess + inc, inc) :
    guess + (inc/T{8})*T{7} <= x ? floor8(x, guess + (inc/T{8})*T{7}, inc/T{8}) :
    guess + (inc/T{8})*T{6} <= x ? floor8(x, guess + (inc/T{8})*T{6}, inc/T{8}) :
    guess + (inc/T{8})*T{5} <= x ? floor8(x, guess + (inc/T{8})*T{5}, inc/T{8}) :
    guess + (inc/T{8})*T{4} <= x ? floor8(x, guess + (inc/T{8})*T{4}, inc/T{8}) :
    guess + (inc/T{8})*T{3} <= x ? floor8(x, guess + (inc/T{8})*T{3}, inc/T{8}) :
    guess + (inc/T{8})*T{2} <= x ? floor8(x, guess + (inc/T{8})*T{2}, inc/T{8}) :
    guess + inc/T{8} <= x ? floor8(x, guess + inc/T{8}, inc/T{8}) :
    floor8(x, guess, inc/T{8});
}

(Apologies for the width, but I really can't make it much more readable.) For the base case, I revert to the regular binary search implementation of floor. What about for long double? Well, since max_exponent is 16384, x = 33 and we need a fanout of 2³³. That's not happening! For long double, I have no choice but to revert to the C++14 constexpr rules:

constexpr long double floor(long double x)
{
  if (x < 0.0) return -ceil(-x);
  long double inc = detail::ipow(
      2.0l,
      std::numeric_limits::max_exponent - 1);
  long double guess = 0.0l;
  for (;;)
  {
    while (guess + inc > x)
    {
      inc /= 2.0l;
      if (inc < 1.0l)
        return guess;
    }
    guess += inc;
  }
  throw err::floor_runtime_error;
}

And that's enough of such stuff for one blog post. Next time, I'll go through the rest of the maths functions I implemented, with help from Leonhard Euler and Wikipedia.

Leave a comment