## Archive for the ‘C++’ Category

### C++11 compile-time string hashing

Tuesday, October 13th, 2015

(Start at the beginning of the series – and all the source can be found in my github repo)

Now that I was used to writing C++11-style constexpr, I decided to try some compile-time string hashing. It turns out that FNV1 is very easy to express in a move-down-the-string recursive style:

namespace detail { constexpr uint64_t fnv1(uint64_t h, const char* s) { return (*s == 0) ? h : fnv1((h * 1099511628211ull) ^ static_cast<uint64_t>(*s), s+1); } } constexpr uint64_t fnv1(const char* s) { return true ? detail::fnv1(14695981039346656037ull, s) : throw err::fnv1_runtime_error; }

That wasn’t difficult enough

So far so good. How about some other hash functions? On a previous project I had used MurmurHash, so I decided to give that a go.

(Coincidentally, a blog post has recently surfaced implementing 32-bit Murmur3 using C++14 constexpr. But I was sticking to C++11 constexpr.)

Implementing such a big function from cold in C++11 constexpr was bound to be difficult, so I scaffolded. I used the freely-available runtime implementation for testing, and initially I converted it to C++14 constexpr, which was fairly easy. After doing that, and with Wikipedia as a pseudocode guide I was ready to start breaking down (or building up) the functionality, C++11-style. At the base of 32-bit Murmur3 is the hash round function which is done for each 4-byte chunk of the key:

constexpr uint32_t murmur3_32_k(uint32_t k) { return (((k * 0xcc9e2d51) << 15) | ((k * 0xcc9e2d51) >> 17)) * 0x1b873593; }   constexpr uint32_t murmur3_32_hashround( uint32_t k, uint32_t hash) { return (((hash^k) << 13) | ((hash^k) >> 19)) * 5 + 0xe6546b64; }

And this can easily be put into a loop, with a helper function to constitute 4-byte chunks (the somewhat strange formulation of word32le here is because it is used differently in some other code):

constexpr uint32_t word32le(const char* s, int len) { return (len > 0 ? static_cast<uint32_t>(s[0]) : 0) + (len > 1 ? (static_cast<uint32_t>(s[1]) << 8) : 0) + (len > 2 ? (static_cast<uint32_t>(s[2]) << 16) : 0) + (len > 3 ? (static_cast<uint32_t>(s[3]) << 24) : 0); }   constexpr uint32_t word32le(const char* s) { return word32le(s, 4); }   constexpr uint32_t murmur3_32_loop( const char* key, int len, uint32_t hash) { return len == 0 ? hash : murmur3_32_loop( key + 4, len - 1, murmur3_32_hashround( murmur3_32_k(word32le(key)), hash)); }

So this is the first part of Murmur3 that will process all the 4-byte chunks of key. What remains is to deal with the trailing bytes (up to 3 of them) and then finalize the hash. To deal with the trailing bytes, I chain the end functions (_end0 through _end3) and “jump in” at the right place. There is probably a more elegant way to do this, but…

constexpr uint32_t murmur3_32_end0(uint32_t k) { return (((k*0xcc9e2d51) << 15) | ((k*0xcc9e2d51) >> 17)) * 0x1b873593; }   constexpr uint32_t murmur3_32_end1(uint32_t k, const char* key) { return murmur3_32_end0( k ^ static_cast<uint32_t>(key[0])); }   constexpr uint32_t murmur3_32_end2(uint32_t k, const char* key) { return murmur3_32_end1( k ^ (static_cast<uint32_t>(key[1]) << 8), key); }   constexpr uint32_t murmur3_32_end3(uint32_t k, const char* key) { return murmur3_32_end2( k ^ (static_cast<uint32_t>(key[2]) << 16), key); }   constexpr uint32_t murmur3_32_end(uint32_t hash, const char* key, int rem) { return rem == 0 ? hash : hash ^ (rem == 3 ? murmur3_32_end3(0, key) : rem == 2 ? murmur3_32_end2(0, key) : murmur3_32_end1(0, key)); }

Finalizing the hash is a very similar affair, with 3 stages:

constexpr uint32_t murmur3_32_final1(uint32_t hash) { return (hash ^ (hash >> 16)) * 0x85ebca6b; } constexpr uint32_t murmur3_32_final2(uint32_t hash) { return (hash ^ (hash >> 13)) * 0xc2b2ae35; } constexpr uint32_t murmur3_32_final3(uint32_t hash) { return (hash ^ (hash >> 16)); }   constexpr uint32_t murmur3_32_final(uint32_t hash, int len) { return murmur3_32_final3( murmur3_32_final2( murmur3_32_final1(hash ^ static_cast<uint32_t>(len)))); }

And that’s all there is to it: all that remains is to stitch the pieces together and provide the driver function:

constexpr uint32_t murmur3_32_value(const char* key, int len, uint32_t seed) { return murmur3_32_final( murmur3_32_end( murmur3_32_loop(key, len/4, seed), key+(len/4)*4, len&3), len); }   constexpr uint32_t murmur3_32(const char *key, uint32_t seed) { return true ? murmur3_32_value(key, strlen(key), seed) : throw err::murmur3_32_runtime_error; }

Doing strlen non-naively

One more thing: strlen crept in there. This is of course a constexpr version of strlen. The naive way to do this is to step down the string linearly, as seen in the FNV1 algorithm. With a max recursive depth of 512, I thought this fairly limiting… it might not be uncommon to have 1k string literals that I want to hash?

So the way I implemented strlen was to measure the string by chunks, constraining a max recursion depth of say 256. Instead of a plain linear recursion, it’s basically recursing for every 256 bytes of string length, and then recursing on the chunk inside that. So the max depth is 256 + 256 = 512, and we can deal with strings close to 64k in size (depending on how deep we are already when calling strlen). There’s a chance that strlen might be made constexpr in the future – I think it’s already implemented as an intrinsic in some compilers. So maybe I can throw away that code someday.

Anyway, now we have a complete implementation of 32-bit Murmur3 in C++11 constexpr style, and the tests go something like this (and can be checked online):

static_assert(murmur3_32("hello, world", 0) == 345750399, "murmur3 test 1"); static_assert(murmur3_32("hello, world1", 0) == 3714214180, "murmur3 test 2"); static_assert(murmur3_32("hello, world12", 0) == 83041023, "murmur3 test 3"); static_assert(murmur3_32("hello, world123", 0) == 209220029, "murmur3 test 4"); static_assert(murmur3_32("hello, world1234", 0) == 4241062699, "murmur3 test 5"); static_assert(murmur3_32("hello, world", 1) == 1868346089, "murmur3 test 6");

OK. That was a lot of code for a blog post. But after the trivial FNV1 and the only slightly harder Murmur3, I wanted more of a challenge. What about MD5? Or SHA-256? That’s for the next post.

### More constexpr floating-point computation

Tuesday, October 13th, 2015

(Start at the beginning of the series – and all the source can be found in my github repo)

In the last post, I covered my first forays into writing C++11-style constexpr floating point math functions. Once I’d done some trig functions, exp, and floor and friends, it seemed like the rest would be a piece of cake. Not entirely…

But there was some easy stuff. With trunc sorted out, fmod was just:

constexpr float fmod(float x, float y) { return y != 0 ? x - trunc(x/y)*y : throw err::fmod_domain_error; }

And remainder was very similar. Also, fmax, fmin and fdim were trivial: I won’t bore you with the code.

Euler was quite good at maths

Now I come from a games background, and there are a few maths functions that get used all the time in games. I already have sqrt. What I want now is atan and atan2. To warm up, I implemented the infinite series calculations for asin and acos, each of which are only defined for an input |x| <= 1. That was fairly straightforward, broadly in line with what I'd done before for the series expansions of sin and cos.

Inverting tan was another matter: but a search revealed a way to do it using asin:

$arctan(x) = arcsin(\frac{x}{\sqrt{x^2+1}})$

Which I went with for a while. But then I spotted the magic words:

“Leonhard Euler found a more efficient series for the arctangent”

$arctan(z) = \frac{z}{1+z^2} \sum_{n=0}^{\infty} \prod_{k=1}^n \frac{2kz^2}{(2k+1)(1+z^2)}$

More efficient (which I take here to mean more quickly converging)? And discovered by Euler? That’s good enough for me.

namespace detail { template <typename T> constexpr T atan_term(T x2, int k) { return (T{2}*static_cast<T>(k)*x2) / ((T{2}*static_cast<T>(k)+T{1}) * (T{1}+x2)); } template <typename T> constexpr T atan_product(T x, int k) { return k == 1 ? atan_term(x*x, k) : atan_term(x*x, k) * atan_product(x, k-1); } template <typename T> constexpr T atan_sum(T x, T sum, int n) { return sum + atan_product(x, n) == sum ? sum : atan_sum(x, sum + atan_product(x, n), n+1); } } template <typename T> constexpr T atan(T x) { return true ? x / (T{1} + x*x) * detail::atan_sum(x, T{1}, 1) : throw err::atan_runtime_error; }

And atan2 follows from atan relatively easily.

It’s big, it’s heavy, it’s wood

Looking over what I’d implemented so far, the next important function (and one that would serve as a building block for a few others) was log. So I programmed it using my friend the Taylor series expansion.

Hm. It turns out that a naive implementation of log is quite slow to converge. (Yes, what a surprise.) OK, a bit of searching yields fruit: “… use Newton’s method to invert the exponentional function … the iteration simplifies to:”

$y_{n+1} = y_n + 2 \frac{x - e^{y_n}}{x + e^{y_n}}$

“… which has cubic convergence to ln(x).”

Result! I coded it up.

namespace detail { template <typename T> constexpr T log_iter(T x, T y) { return y + T{2} * (x - cx::exp(y)) / (x + cx::exp(y)); } template <typename T> constexpr T log(T x, T y) { return feq(y, log_iter(x, y)) ? y : log(x, log_iter(x, y)); } }

It worked well. A week or so later, an article about C++11 constexpr log surfaced serendipitously, and on reading it I realised my implementation wasn’t very well tested, so I added some tests. The tests revealed numerical instability for larger input values, so I took a tip from the article and used a couple of identities to constrain the input:

$ln(x) = ln(\frac{x}{e}) + 1$
and
$ln(x) = ln(e.x) - 1$

Coding these expressions to constrain the input proper to detail::log seemed to solve the instability issues.

Are we done yet?

Now I was nearing the end of what I felt was a quorum of functions. To wit:

• sqrt, cbrt, hypot
• sin, cos, tan
• floor, ceil, trunc, round
• asin, acos, atan, atan2
• fmod, remainder
• exp, log, log2, log10

With a decent implementation of exp and log it was easy to add the hyperbolic trig functions and their inverses. The only obviously missing function now was pow. Recall that I already had ipow for raising a floating-point value to an integral power, but I had been puzzling over a more general version of pow to allow raising to a floating-point power for a while. And then I suddenly realised the blindingly obvious:

$x^y = (e^{ln(x)})^y = e^{ln(x).y}$

And for the general case, pow was solved.

template <typename T> constexpr T pow(T x, T y) { return true ? exp(log(x)*y) : throw err::pow_runtime_error; }

I had now reached a satisfying conclusion to maths functions and I was getting fairly well-versed in writing constexpr code. Pausing only to implement erf (because why not?), I turned my attention now to other constexpr matters.

### Floating-point maths, constexpr style

Tuesday, October 13th, 2015

(Start at the beginning of the series – and all the source can be found in my github repo)

To ease into constexpr programming I decided to tackle some floating-point maths functions. Disclaimer: I’m not a mathematician and this code has not been rigorously tested for numeric stability or convergence in general. I wrote it more for my own learning than for serious mathematical purposes, and anyway, it’s perhaps likely that constexpr maths functions will be in the standard library sooner or later (and possibly implemented by intrinsics). However, I did subject it to many ad-hoc tests using static_assert. One of the nice things about compile-time computation: the tests are compile-time too, and if it compiles, one can be reasonably sure it works (at least for the values tested)!

Anyway, I went to cppreference.com’s list of common mathematical functions to see what to tackle. Starting with abs is trivial:

Starting with abs is trivial:

template <typename T> constexpr T abs(T x) { return x >= 0 ? x : x < 0 ? -x : throw err::abs_runtime_error; }

For the sake of clarity, I’m omitting some enable_if boilerplate here that ensures that the argument has the right type.

Iterative methods FTW

Now, other functions take a little thought. But remembering calculus, many of them are susceptible to iterative methods, so my general plan formed: formulate a recursive calculation which converges to the real answer, and terminate the recursion when the value is within epsilon of the answer.

For example, to compute sqrt, we can use the well-known Newton-Raphson method, where we start with a guess g0 as an approximation to √x, then:

$g_{n+1} = \frac{(g_n + \frac{x}{g_n})}{2}$

First, we need a function to determine termination, which I put in a detail namespace:

namespace detail { template <typename T> constexpr bool feq(T x, T y) { return abs(x - y) <= std::numeric_limits<T>::epsilon(); } }

And then sqrt is straightforward to express (using the value itself as the initial guess), with a driver function handling the domain of the argument and the throw pattern, and delegating the recursion to a function in the detail namespace.

namespace detail { template <typename T> constexpr T sqrt(T x, T guess) { return feq(guess, (guess + x/guess)/T{2}) ? guess : sqrt(x, (guess + x/guess)/T{2}); } } template <typename T> constexpr T sqrt(T x) { return x == 0 ? 0 : x > 0 ? detail::sqrt(x, x) : throw err::sqrt_domain_error; }

This ends up being a useful pattern for many functions. For example, cbrt is practically identical; only the equational details change. And once we have sqrt, hypot is trivial of course.

Next I tackled trigonometric functions. Remembering Taylor series and turning to that Oracle of All Things Maths, Wolfram Alpha, we can find the Taylor series expansions:

$sin(x) = \sum_{k=0}^{\infty} \frac{(-1)^k x^{1+2k}}{(1+2k)!}$
and
$cos(x) = \sum_{k=0}^{\infty} \frac{(-1)^k x^{2k}}{(2k)!}$

These are basically the same formula, and with some massaging, they can share a common recursive function with slightly different initial constants:

namespace detail { template <typename T> constexpr T trig_series(T x, T sum, T n, int i, int s, T t) { return feq(sum, sum + t*s/n) ? sum : trig_series(x, sum + t*s/n, n*i*(i+1), i+2, -s, t*x*x); } } template <typename T> constexpr T sin(T x) { return true ? detail::trig_series(x, x, T{6}, 4, -1, x*x*x) : throw err::sin_runtime_error; } template <typename T> constexpr T cos(T x) { return true ? detail::trig_series(x, T{1}, T{2}, 3, -1, x*x) : throw err::cos_runtime_error; }

Once we have sin and cos, tan is trivial of course. And exp has almost the same series as sin and cos – easier, actually:

$e^x = \sum_{k=0}^{\infty} \frac{x^k}{k!}$

Which yields in code:

namespace detail { template <typename T> constexpr T exp(T x, T sum, T n, int i, T t) { return feq(sum, sum + t/n) ? sum : exp(x, sum + t/n, n * i, i+1, t * x); } } template <typename T> constexpr T exp(T x) { return true ? detail::exp(x, T{1}, T{1}, 2, x) : throw err::exp_runtime_error; }

No peeking at the floating-point details

Now, how about floor, ceil, trunc and round? They are all variations on a theme: solving one basically means solving all, with only slight differences for things like negative numbers. I thought about this for a half hour – there wasn’t an obvious solution to me at first. We can’t trivially cast to an integral type, because there is no integral type big enough to hold a floating point value in general. And in the name of strict portability, we can’t be sure that the floating point format is IEEE 754 standard (although I think I am prepared to accept this as a limitation). So even if we could fiddle with the representation, portability goes out the window. Finally, constexpr (including C++14 constexpr) forbids reinterpret_cast and other tricks like access through union members.

So how to compute floor? After a little thought I hit upon an approach. In fact, I am relying on the IEEE 754 standard in a small way – the fact that any integer power of 2 is exactly representable as floating point. With that I formed the plan:

2. Start with an increment: 2(std::numeric_limits::max_exponent – 1).
3. If (guess + increment) is larger than x, halve the increment.
4. Otherwise, add the increment to the guess, and that’s the new guess.
5. Stop when the increment is less than 2.

This gives a binary-search like approach to finding the correct value for floor. And the code is easy to write. First we need a function to raise a floating point value to an integer power, in the standard O(log n) way:

namespace detail { template <typename T> constexpr T ipow(T x, int n) { return (n == 0) ? T{1} : n == 1 ? x : n > 1 ? ((n & 1) ? x * ipow(x, n-1) : ipow(x, n/2) * ipow(x, n/2)) : T{1} / ipow(x, -n); } }

Then the actual floor function, which uses ceil for the case of negative numbers:

namespace detail { template <typename T> constexpr T floor(T x, T guess, T inc) { return guess + inc <= x ? floor(x, guess + inc, inc) : inc <= T{1} ? guess : floor(x, guess, inc/T{2}); } } constexpr float floor(float x) { return x < 0 ? -ceil(-x) : x >= 0 ? detail::floor( x, 0.0f, detail::ipow(2.0f, std::numeric_limits<float>::max_exponent-1)) : throw err::floor_runtime_error; }

The other functions ceil, trunc and round are very similar.

That’s some deep recursion

Now, there is a snag with this plan. It works well enough for float, but when we try it with double, it falls down. Appendix B of the C++ standard (Implementation quantities) recommends that compilers offer a minimum recursive depth of 512 for constexpr function invocations. And on my machine, std::numeric_limits<float>::max_exponent is 128. But for double, it’s 1024. And for long double, it’s 16384. Since by halving the increment each time, we’re basically doing a linear search on the exponent, 512 recursions isn’t enough. I’m prepared to go with what the standard recommends in the sense that I don’t mind massaging switches for compilers that don’t follow the recommendations, but I’d rather not mess around with compiler switches for those that do. So how can I get around this?

Well, I’m using a binary search to cut down the increment. What if I increase the fanout so it’s not a binary search, but a trinary search or more? How big of a fanout do I need? What I need is:

max_recursion_depth > max_exponent/x, where 2x is the fanout

So to deal with double, x = 3 and we need a fanout of 8. An octary search? Well, it’s easy enough to code, even if it is a nested-ternary-operator-from-hell:

template <typename T> constexpr T floor8(T x, T guess, T inc) { return inc < T{8} ? floor(x, guess, inc) : guess + inc <= x ? floor8(x, guess + inc, inc) : guess + (inc/T{8})*T{7} <= x ? floor8(x, guess + (inc/T{8})*T{7}, inc/T{8}) : guess + (inc/T{8})*T{6} <= x ? floor8(x, guess + (inc/T{8})*T{6}, inc/T{8}) : guess + (inc/T{8})*T{5} <= x ? floor8(x, guess + (inc/T{8})*T{5}, inc/T{8}) : guess + (inc/T{8})*T{4} <= x ? floor8(x, guess + (inc/T{8})*T{4}, inc/T{8}) : guess + (inc/T{8})*T{3} <= x ? floor8(x, guess + (inc/T{8})*T{3}, inc/T{8}) : guess + (inc/T{8})*T{2} <= x ? floor8(x, guess + (inc/T{8})*T{2}, inc/T{8}) : guess + inc/T{8} <= x ? floor8(x, guess + inc/T{8}, inc/T{8}) : floor8(x, guess, inc/T{8}); }

(Apologies for the width, but I really can’t make it much more readable.) For the base case, I revert to the regular binary search implementation of floor. What about for long double? Well, since max_exponent is 16384, x = 33 and we need a fanout of 233. That’s not happening! For long double, I have no choice but to revert to the C++14 constexpr rules:

constexpr long double floor(long double x) { if (x < 0.0) return -ceil(-x); long double inc = detail::ipow( 2.0l, std::numeric_limits<long double>::max_exponent - 1); long double guess = 0.0l; for (;;) { while (guess + inc > x) { inc /= 2.0l; if (inc < 1.0l) return guess; } guess += inc; } throw err::floor_runtime_error; }

And that’s enough of such stuff for one blog post. Next time, I’ll go through the rest of the maths functions I implemented, with help from Leonhard Euler and Wikipedia.

### Experimenting with constexpr

Tuesday, October 13th, 2015

Since seeing Scott Schurr at C++Now in Aspen and hearing his talks about constexpr, it’s been on my list of things to try out, and recently I got around to it. With the release of Visual Studio 2015, Microsoft’s compiler now finally supports C++11 style constexpr (modulo some minor issues), so it’s a good time to jump in.

Doing things the hard way

Now, I have no expectation that MS will provide for C++14 constexpr any time soon, since (as I understand it) it requires a fairly dramatic rework of the compiler front end. So although C++14’s relaxed constexpr is much easier to use than C++11’s relatively draconian version, I decided to stick as much as possible to C++11 to see what I could do. This basically means:

• Functions are limited to a single return statement
• Constructor bodies are empty – everything must be done in the initialization list

But I can do conditionals with the ternary operator, and I can call functions. That means in theory I can do anything! I like functional programming!

Scott’s talk covers three broad application areas of constexpr: floating-point computations, parsing and containers. So I started at the beginning.

I decided to put everything in the cx namespace, and make use of a trick I learned from Scott. One of the snags with constexpr is that it can be at the compiler’s discretion. Let’s say you write a constexpr function, and call it. Unless you use the result in a context that is required at compile time (such as a non-type template argument, array size, or assigned to a constexpr variable), the compiler is free not to do the work at compile time, but instead generate a normal runtime function. And in my experience, compilers aren’t aggressive about doing work at compile time.

From one point of view this is somewhat desirable – or at least, we want constexpr functions to be able to do double duty as compile-time and runtime functions. Well, that’s a stated goal, but as we shall see, writing C++11-friendly constexpr functions sometimes results in formulations that are very different from what we’d usually expect/want at runtime. But let’s assume that I write a nice constexpr function:

constexpr float abs(float x) { return x >= 0 ? x : -x; }

Now what if I make a mistake using this? What if I call this function and forget to use the result in a constexpr variable? The compiler’s going to be quite happy to emit the function, and I will end up with runtime computation that I don’t want.

Avoiding accidental runtime work

There’s no way I know of at compile time to insist that the compiler stay in constexpr-land. But we can at least turn a runtime problem into a link-time error with a little trick:

namespace err { namespace { extern const char* abs_runtime_error; } }   constexpr float abs(float x) { return x >= 0 ? x : x < 0 ? -x : throw err::abs_runtime_error; }

This makes use of a couple of features of constexpr. First, throw is not allowed in constexpr functions (because what could it mean to throw an exception at compile time? Madness). But that doesn’t matter, because constexpr functions are effectively evaluated in a C++ interpreter, so as long as we never trigger the condition that causes evaluation of the throw, we’re OK. This compile-time interpreter also has a lot fewer features than the normal compiler that we’re used to, including (at least at time of writing) things like warnings for unreachable code and conditional expressions always being true, so we can even write things like:

constexpr float myFunc(float x) { return true ? x : throw some_error; }

And the compiler won’t make a sound.

The throw pattern

So what’s actually happening here? We’re getting two forms of protection. First, we can use this to enforce the domain, as we might with a square root function:

namespace err { namespace { extern const char* sqrt_error; } }   constexpr float sqrt(float x) { return x >= 0 ? sqrt_impl(x) : throw sqrt_error; }

If we accidentally pass a negative number to sqrt here, it will try to evaluate throw in a constexpr function and we’ll get a compile error. So that’s nice.

The second, more insidious error that we avoid is the one where we accidentally turn a compile-time function into a runtime one. If we call sqrt without (for example) assigning the result to a constexpr variable, and the compiler emits sqrt as a runtime function, the linker will fail trying to find sqrt_error. A link error isn’t quite as good as a compile error, but it’s a lot better than a runtime problem that might not even be discovered!

The final oddity here arises when you consider that this code might be in a header. Since constexpr functions are (we hope) computed at compile time, they are implicitly inline, so that’s OK. But perhaps the stranger thing is that we have a symbol in an anonymous namespace in a header. This, by the way, is called out in the ISO C++ Core Guidelines as a no-no: “It is almost always a bug to mention an unnamed namespace in a header file.” But in this case, we don’t want the user to be able to define the symbol, and the potential for ODR-violations/multiple definitions isn’t there – the symbol is never supposed to be defined or used. Maybe we’ve found the one reason for that “almost”.

So those are some of the basics of building constexpr machinery. Next: my dive into floating-point maths at compile time.

### CppCon 2015

Sunday, September 27th, 2015

CppCon 2015 is over and I’m home from Bellevue. It was a really great week; I learned a lot, talked to lots of interesting folks, and traded C++ tips and techniques with the cognoscenti. Having been to C++Now, I knew a bunch of people already, which was a good leg-up on socializing during the conference. I managed to persuade some of my colleagues to come out of their shells and talk to the great and the good – most of whom are very approachable! Chatting with Sean Parent about EoP/FMtGP before his keynote or lunching with STL talking about <functional> and Hearthstone – these are things that are well within the reach of mortals at CppCon 🙂

My highlights:

And of course, my own talk went pretty smoothly, which was good. Now I’m looking forward to digesting all the information, trying out all kinds of new things, seeing what can be put into practice.

### Exercising Ranges (part 8)

Wednesday, July 1st, 2015

(Start at the beginning of the series if you want more context.)

Having implemented iterate, there were a couple of other useful things from the Haskell Prelude that I wanted to have available with ranges. First, there’s cycle. It simply takes a finite range and repeats it ad infinitum. The functionality of the cursor is fairly simple, and uses patterns we’ve already seen, so here it is in its entirety:

template <bool IsConst> struct cursor { private: template <typename T> using constify_if = meta::apply<meta::add_const_if_c<IsConst>, T>; using cycle_view_t = constify_if<cycle_view>; cycle_view_t *rng_; range_iterator_t<constify_if<Rng>> it_;   public: using single_pass = std::false_type; cursor() = default; cursor(cycle_view_t &rng) : rng_{&rng} , it_{begin(rng.r_)} {} constexpr bool done() const { return false; } auto current() const RANGES_DECLTYPE_AUTO_RETURN_NOEXCEPT ( *it_ ) void next() { if (++it_ == end(rng_->r_)) it_ = begin(rng_->r_); } CONCEPT_REQUIRES((bool) BidirectionalRange<Rng>()) void prev() { if (it_ == begin(rng_->r_)) it_ = end(rng_->r_); --it_; } bool equal(cursor const &that) const { return it_ == that.it_; } };

Like all cursors, it implements current(), next() and equal(). Additionally, if the range passed in is bidirectional, so is the cursor, using the CONCEPT_REQUIRES macro to conditionally enable prev(). There is no sentinel type needed, and the cursor implements done() in a way that makes the range infinite.

In Haskell, cycle is occasionally useful, although it is often used with zip and filter to achieve the same effect that stride – a function already provided by the range library – offers. (I think this combo use of cycle, zip and filter was hinted at briefly in Eric’s ranges keynote when discussing the motivation that lead to stride.) But it’s sufficiently easy to implement and makes a nice example.

One more from the Prelude

A slightly more useful function from the Prelude is scan (in Haskell, scanl and scanr). Despite its somewhat obscure name, it’s easy to understand: it’s a fold that returns all the partial results in a sequence. For example:

Prelude> scanl (+) 0 [1,2,3,4] [0,1,3,6,10]

In C++, I’m not sure what to call this… perhaps accumulate_partials? I left it as scan, for want of a better name. (Edit: It’s partial_sum and it exists in the STL and the ranges library. So I’ll let this section just stand as a further example.)

One immediate thing to notice about scan is that the range it produces is one element longer than the input range. That’s easily done:

namespace detail { template<typename C> using scan_cardinality = std::integral_constant<cardinality, C::value == infinite ? infinite : C::value == unknown ? unknown : C::value >= 0 ? static_cast<ranges::cardinality>(C::value + 1) : finite>; } // namespace detail

In general, the action of the cursor is going to be the same as we saw with iterate: keep a cached value that is updated with a call to next(), and return it with a call to current().

Eric made a nice addition to the range version of accumulate which I also used for scan: in addition to the binary operation and the initial value, there is the option of a projection function which will transform the input values before passing them to the binary operation. So when next() updates the cached value, that code looks like this:

void update_value() { val_ = rng_->op_(val_, rng_->proj_(*it_)); }

It’s like having transform built in instead of having to do it in a separate step: a nice convenience. And that’s all there is to scan.

Conclusions

This has been a fun series of experiments with ranges, and I’ve learned a lot. I’ve made mistakes along the way, and I’m sure there are still parts of my code that reflect extant misunderstandings and general lack of polish. But as an experiment, this has been a success. The code works. As a user of the ranges library with some functional programming experience, it wasn’t too hard to put together solutions. And the Concept messages really helped, along with a knowledge transfer of iterator categories from the STL.

As an extender of the library adding my own views, it took a little effort to establish correct assumptions and understandings, but Eric’s code is clear and offers good patterns to build from. I’m sure a little documentation would have steered me more quickly out of problem areas that I encountered. There are some things that are still problematic and I’m not yet sure how to tackle: chiefly involving recursion. And at the moment, Concept messages don’t always help – I think digging into compiler diagnostics is going to be a fact of life as a range implementer.

C++ will never be as syntactically nice as Haskell is, but ranges offer our best chance yet at high-performance, composable, declarative, correct-by-construction code.

### Exercising Ranges (part 7)

Wednesday, July 1st, 2015

(Start at the beginning of the series if you want more context.)

Calculus operations

It’s relatively simple to do differentiation and integration of power series, since they’re just regular polynomial-style. No trigonometric identities, logarithms or other tricky integrals here. Just the plain rule for positive powers of x. For differentiation, we have:

$\frac{d}{dx}kx^{n} = knx^{n-1}$

Which is straightforwardly expressed with ranges: we simply take the tail of the power series and pairwise multiply with n, the power:

template <typename Rng> auto differentiate(Rng&& r) { return ranges::view::zip_with( std::multiplies<>(), ranges::view::iota(1), ranges::view::tail(std::forward<Rng>(r))); }

Integration is almost as easy. The fly in the ointment is that up until now, we have been working entirely with integers, or whatever underlying arithmetic type is present in the range. Here is where Haskell wins with its built in rational numbers and polymorphic fromIntegral for conversion from integral types. Integration brings division into the picture:

$\int_{}{} kx^n dx = \frac{kx^{n+1}}{n+1}$

Which means we need to turn a range of ints into a range of floats or doubles. Or use a rational number library. But anyway, sweeping these concerns under the rug, the actual code for integration is as easy as that for differentiation:

template <typename Rng> auto integrate(Rng&& r) { return ranges::view::concat( ranges::view::single(0), ranges::view::zip_with( [] (auto x, auto y) { return (float)x / (float)y; }, std::forward<Rng>(r), ranges::view::iota(1))); }

We prepend a zero as the constant of integration.

Calculus: in real life, more difficult than arithmetic, but with ranges, considerably easier. That was a welcome break from extreme mental effort. In the next part, I round out my range experiments for now, with a look at a couple more functions from the Haskell Prelude.

### Exercising Ranges (part 6)

Wednesday, July 1st, 2015

(Start at the beginning of the series if you want more context.)

Multiplying power series

Now we come to something that took considerable thought: how to multiply ranges. Doug McIlroy’s Haskell version of range multiplication is succinct, lazily evaluated and recursive.

(f:ft) * gs@(g:gt) = f*g : ft*gs + series(f)*gt

Which is to say, mathematically, that if we have:

$F(x) = f + xF'(x)$
$G(x) = g + xG'(x)$

then the product is given by:

$F(x)G(x) = fg + xF'(g + xG') + xfG'$

In C++, this function can be naively written:

template <typename R1, typename R2> auto mult(R1&& r1, R2&& r2) { auto h1 = *ranges::begin(std::forward<R1>(r1)); auto h2 = *ranges::begin(std::forward<R2>(r2)); return view::concat( view::single(h1 * h2), add(mult(view::tail(r1), r2), view::transform( view::tail(r2), [] (auto x) { return x * h1; } ))); }

If this could be compiled, it might even work. I think the lazy evaluation of the resulting range together with the monoidal_zip behaviour of add would terminate what looks like infinite recursion. But this doesn’t compile, because we don’t just have to deal with recursion at runtime: there is no termination of the recursion at compile-time resulting from the compiler trying to do type inference on the return value. Clang initially complained about exceeding the max template depth (default 256) and when I upped this to 1024, clang laboured over it for an impressive 50 seconds before crashing.

Back to pencil and paper

So I needed to think harder. It seemed that I would need to write a view representing multiplication of two series, and I started playing with polynomials – for the moment, assuming they were of the same degree. After a while I noticed something useful:

$F = f_0 : f_1 : f_2 : ...$
$G = g_0 : g_1 : g_2 : ...$
$FG = f_0g_0 : f_0g_1 + f_1g_0 : f_0g_2 + f_1g_1 + f_2g_0 : ...$

Each successive term is the sum of the terms for which multiplication produces the appropriate power of x. And the computation of any given term is:

$f_0g_n + f_1g_{n-1} + f_2g_{n-2} + ... + f_ng_0$

This is inner_product of a range with a reversed range! And we know we can reverse the range as long as it has a bidirectional iterator, because it’s just what we’ve already counted up to from the beginning – we already have the end iterator in hand. Next I had to think about how to deal with the cardinality of the product of two polynomials: it’s not enough to iterate to the end of both, even if they’re of the same degree, because there are higher powers of x that get generated in the “tail” of the product. The cardinality is actually the sum of the input cardinalities, minus one.

template<typename C1, typename C2> using series_mult_cardinality = std::integral_constant<cardinality, C1::value == infinite || C2::value == infinite ? infinite : C1::value == unknown || C2::value == unknown ? unknown : C1::value >= 0 && C2::value >= 0 ? static_cast<ranges::cardinality>(C1::value + C2::value - 1) : finite>;

A trick of the tail

And once we reach the end of both range iterators we are multiplying, we need to go an extra distance equal to the length we’ve just covered, to obtain the higher power coefficients. I do this with a length_ variable that tracks the distance advanced, and a tail_ variable that increments after the ranges are exhausted until it reaches length_, at which point we are at the end. The end sentinel comparison looks like this:

bool equal(cursor<IsConst> const &pos) const { return detail::equal_to(pos.it1_, end1_) && detail::equal_to(pos.it2_, end2_) && pos.tail_ == pos.length_; }

The only remaining thing is to deal with different degree polynomials, for which I use the same approach as monoidal_zip, keeping a diff_ variable and doing some bookkeeping. Computing the current value for the iterator becomes the inner product of one range leading up to the current position with the reverse of its counterpart in the other range (with appropriate adjustments to deal with the tail and the different degrees of the ranges):

auto compute_current() const { auto r1 = make_range( begin(rng_->r1_) + tail_ + (diff_ > 0 ? diff_ : 0), it1_); auto r2 = view::reverse( make_range( begin(rng_->r2_) + tail_ + (diff_ < 0 ? -diff_ : 0), it2_)); return inner_product(r1, r2, 0); }

So view::series_mult works. It isn’t perfect, because it imposes a bidirectional constraint on the arguments. It is O(n2), which can’t be avoided (we must ultimately multiply every coefficient by every other coefficient). To avoid the bidirectional constraint and just traverse the sequences once, I would need to go back to the recursive definition and reformulate it, somehow avoiding the recursive type, and I can think of a few possible ways to do that:

• Explicitly write a recursive type function, which includes a base case.
• Type-erase the multiplied range somehow (this would involve throwing away cardinality and maybe some other things, so it might not lead to a good outcome).
• Break the type recursion with a technique akin to recursive_variant.
• Somehow collapse the type of the range, how I’m not sure… the only real way to do this I think is by copying to a container and making a fresh range from it.
• Write my own system for lazy evaluation… which obviates ranges, really.

None of these are easy (and some have undesirable consequences). I continue to think on the matter, because Doug McIlroy’s Haskell formulations are recursive and beautiful, and multiplication is the easiest of the recursive definitions. Things get even more tricky with division and composition…

Tackling exponentiation

As a footnote to multiplication, I tried to define exponentiation. (I recently read the excellent From Mathematics to Generic Programming, and Chapter 7 is very instructive in the process of generalizing exponentiation from the basis of multiplication.) But I ran into basically the same issue: writing recursive functions to produce ranges is tricky because of the recursive type expansion. Maybe in this case, the problem is defined enough that I can write a recursive type function counterpart to avoid the compiler’s non-terminating type inference.

For the next part of this series, I’ll look at an easier topic: differentiation and integration of power series.

### Exercising Ranges (part 5)

Wednesday, July 1st, 2015

(Start at the beginning of the series if you want more context.)

Generalising iota

To pretty-print a reversible polynomial, we need to provide for reversibility of iota. So what is iota? It makes a range by repeatedly incrementing a value. What if instead of increment, we use an arbitrary function? We could do this with generate_n:

template <typename Rng> std::string to_string_reverse(Rng&& r) { auto s = ranges::size(r); auto powers = ranges::view::generate_n( [s] () mutable { return --s; }, s); return detail::to_string( ranges::view::zip(ranges::view::reverse(std::forward<Rng>(r)), powers)); }

But this isn’t quite as clean as it could be: the lambda mutates state. An alternative is found in the Haskell Prelude:

iterate :: (a -> a) -> a -> [a]

iterate applies a function repeatedly to an argument, feeding the result of the last iteration to the input of the next one. It’s very similar to generate, and we can write both iterate and iterate_n. The view takes a function and an initial value, and produces the sequence:

$x, f(x), f(f(x)), f(f(f(x))), ...$

Here’s the C++ for the skeleton of iterate_view:

template<typename G, typename T> struct iterate_view : view_facade<iterate_view<G, T>, infinite> { private: friend struct range_access; using result_t = concepts::Function::result_t<G, T>; semiregular_t<G> gen_; semiregular_t<result_t> val_; ... explicit iterate_view(G g, T&& t) : gen_(std::move(g)), val_(std::move(t)) {} ... };

The cursor is easily written just like generate‘s cursor: next() caches a new value, and current() returns it. Like generate, the view can only be traversed once, so the cached value is kept in the view itself rather than the iterator. And there is no end cursor.

struct cursor { iterate_view *view_; ... result_t current() const { return view_->val_; } void next() { view_->next(); } }; void next() { val_ = gen_(val_); }

For iterate_n, the approach is almost identical, except that next() decrements a count, and size() is also provided. Using iterate_n, we can generate a decreasing sequence and use it in the reverse print function for a polynomial, which we know is a SizedRange.

template <typename Rng> std::string to_string_reverse(Rng&& r) { auto s = ranges::size(r); auto powers = ranges::view::iterate_n( [] (auto i) { return i - 1; }, s-1, s); return detail::to_string( ranges::view::zip(ranges::view::reverse(std::forward<Rng>(r)), powers)); }

As for iterate itself, it’s potentially useful in lots more places. The obvious use cases (and test cases) are mathematical in nature, for instance my test case using the Collatz conjecture (aka Ulam conjecture, 3n+1 conjecture) sequence starting with 27:

int collatz(int n) { if (n & 1) return 3*n+1; return n>>1; }   DEF_TEST(Collatz27, Iterate) { auto m = view::iterate(collatz, 27) | view::take_while([] (int n) { return n != 1; }); EXPECT(ranges::find(m, 9232) != ranges::end(m)); return true; }

But there are also plenty of applications in other code, where we are updating the state of something based on its previous state: iterate separates the loop code and allows us to write a clean function that simply produces a new state from an old state. This is useful for games and simulations of all kinds. So iterate is a useful addition to the range views.

Now we can pretty-print polynomials properly, in the next part I’m going to take on the trickier subject of how to multiply them.

### Exercising Ranges (part 4)

Wednesday, July 1st, 2015

(Start at the beginning of the series if you want more context.)

On reversibility

It occurs to me that I’ve been touting reversibility as a good thing, both implicitly and explicitly (for polynomials), and the reason for that is not just that I might want to print polynomials in natural decreasing-power-of-x order.

The first reason reversibility is important to me: coming from the STL, we are used to containers being iterable. If a container supports bidirectional iterators, it’s necessarily reversible, since all containers are finite and one can obtain the end iterator. The same is not true of ranges – because of the generalization of end as a sentinel condition, ranges can be infinitely long. An infinitely long range has no end, so there’s no way to reverse it, because there’s nowhere to start. It can still support bidirectional iteration, but without reversibility that’s of limited use in most cases. So in my mind, reversibility is a better concept for ranges than simply the availability of bidirectional iteration. (There may be occasional uses for bidirectional iteration without the existence of end – Turing machine simulation springs to mind – but I think reversibility is the more common desire.)

The second reason might be called Postel’s Law of iterator categories: if you’re writing an algorithm (or range view), accept as weak an iterator category as you can, but provide as strong an iterator category as you can. Many views can work with input iterators. Some require forward iterators. Few require stronger guarantees. (They might work more efficiently if random access iteration is available, but this isn’t the same as saying they require that.) And on the flip side, a view should almost never degrade the iterator category that is passed in to it. Polynomials should be reversible, so views on them should be reversible.

In this vein, I tried to make monoidal_zip require only InputRange, but support reversing if the underlying ranges support it.

Pretty-printing polynomials

With that in mind, I embarked on pretty-printing polynomials and infinite series, wanting to print them both in infinite series form (increasing powers-of-x) and “more natural” polynomial form (decreasing powers-of-x).

We have a sequence of coefficients, and each corresponds to a power of x, so I start out by zipping the polynomial with an increasing sequence of integers representing the powers.

I first thought of view::transform and view::intersperse. I coded up a solution using view::transform (view::intersperse turned out to be problematic because the thing being interspersed – the plus or minus sign – depends on the sign of the coefficient).

Here’s a helper function for printing the power of x:

std::string x_to_power(int n) { if (n == 0) return {}; if (n == 1) return {"x"}; return {"x^" + std::to_string(n)}; }

And here’s the prettyprint function (that’s what I originally called it) using view::transform:

template <typename Rng> void prettyprint(Rng&& r) { bool start = true;   auto m = view::zip(std::forward<Rng>(r), view::iota(0)) | view::transform( [&start] (const std::pair<int, int>& p) -> std::string { std::string s; if (p.first != 0) { if (!start) s = p.first > 0 ? " + " : " - "; s += std::to_string(p.first) + x_to_power(p.second); start = false; } return s; });   ranges::for_each(m, [](const string& s){ cout << s; }); cout << endl; }

It has several bugs as-is, and I wasn’t happy with that start boolean hanging out there. And moreover, this seemed like a very procedural way to work and was annoying the functional programming part of my brain. What I actually wanted was not a function to output a sequence, but a fold to convert a sequence into a string, or to put it in C++ language, accumulate. The fold is independent of whether or not the sequence is reversed, so I factored out the “driver function” in anticipation of creating a reversing version of it.

template <typename Rng> std::string to_string(Rng&& r) { return detail::to_string( ranges::view::zip(std::forward<Rng>(r), ranges::view::iota(0))); }

And the fold looks like this:

template <typename Rng> std::string to_string(Rng&& r) { return ranges::accumulate( std::forward<Rng>(r), std::string(), [] (std::string s, const auto& p) { // if the coefficient is non-zero if (p.first != 0) { // treat the first one printed specially if (s.empty()) s += p.first > 0 ? "" : "-"; else s += p.first > 0 ? " + " : " - ";   auto coeff = p.first > 0 ? p.first : -p.first; // don't print coefficient of 1 (unless it's the constant) if (coeff != 1 || p.second == 0) { s += std::to_string(coeff); } s += detail::x_to_power(p.second); } return s; }); }

The complexity that creeps in here is, I think, the intrinsic complexity of printing a polynomial rather than complexity introduced by the wrong solution (start has disappeared, replaced with logic that deals with the first printed term). So this accumulate-based approach works with the power series approach of printing in increasing powers of x; but how do we reverse that zip? How can we reverse iota?

To answer that, I needed to go back to the Haskell Prelude, in the next part.