Compile-time counters, revisited – Why is a raven like a writing desk?

(Start at the beginning of the series – and all the source can be found in my github repo)

Some time ago I read a blog post showing how to make a compile-time counter: a constexpr function that would return monotonically increasing integers. When I first read it I didn’t really take the time to understand it fully, but now that I was on a compile-time computation kick, I decided to grok it fully.

Without getting too far into the nitty-gritty (go read the other blog post if you’re interested), the technique relies on using template instantiations to “bring-to-life” functions that affect future template instantiations. Thus we have a flag that declares (but does not yet define) a friend function:

template 
struct flag
{
  friend constexpr int adl_flag (flag);
};

And then a recursive reader function template that uses ADL and SFINAE to match against the (as-yet-undefined) friend function(s), bottoming out at zero:

template {})>
constexpr int reader(int, flag)
{
  return N;
}

template 
constexpr int reader(
    float, flag, int R = reader(0, flag{}))
{
  return R;
}

constexpr int reader(float, flag<0>)
{
  return 0;
}

And finally, a writer that, when instantiated with the result of calling reader, instantiates the friend function, making the next call to reader terminate at one level higher:

template 
struct writer
{
  friend constexpr int adl_flag (flag)
  {
    return N;
  }
  static constexpr int value = N;
};

This is a curiosity, right? A foible of C++, a fairy tale told by wizened programmers to fresh graduates to simultaneously impress and revolt them, no? Could anything useful be done with this? Well, C++ is full of such tales, and it’s a short hop from can’t-look-away-revolting to established feature – after all, template metaprogramming was discovered pretty much by accident…

In fact, while at CppCon, I met up with Ansel and Barbara from CopperSpice, who are using a very similar technique to do away with the Qt Metaobject Compiler.

Max recursion depth, we meet again

My first thought was that this technique suffers at the hands of my old enemy, maximum recursion depth. In this case, maximum template instantiation depth, which despite a standard-recommended 1024, is frequently just 256 – lower than the recommended constexpr recursion depth of 512. So let’s do something about that.

Well, one quick-and-dirty way to do this is to compute the count in two halves: lower bits and upper bits, and then stick them together. When we reach the max on the lower bits, we’ll roll over one of the upper bits. So we have two flags representing the high and low, with the low flag also parameterized on the high bits:

template 
struct flag1
{
  friend constexpr int adl_flag1(flag1);
};
template 
struct flag2
{
  friend constexpr int adl_flag2(flag2);
};

And two readers: the low bits reader is in a struct to avoid partial specialization of a function, because it’s effectively parameterized on the high bits as well as the low bits.

template 
struct r1
{
  template {})>
  static constexpr int reader(int, flag1)
  {
    return L;
  }
  template 
  static constexpr int reader(
      float, flag1, int R = reader(0, flag1{}))
  {
    return R;
  }
  static constexpr int reader(float, flag1)
  {
    return 0;
  }
};

template {})>
constexpr int reader(int, flag2)
{
  return H;
}
template 
constexpr int reader(
    float, flag2, int R = reader(0, flag2{}))
{
  return R;
}
constexpr int reader(float, flag2<0>)
{
  return 0;
}

The low bits writer looks much the same as before, and the high bits writer is specialized on a bool indicating whether or not to instantiate the friend function, which we only do when the low bits roll over:

template 
struct writehi
{
  friend constexpr int adl_flag2(flag2)
  {
    return H;
  }
  static constexpr int value = H;
};

template 
struct writehi
{
  static constexpr int value = H;
};

The writer can then write both the high and low bits accordingly:

template 
struct writer
{
  static constexpr int hi_value =
    writehi::value;
  static constexpr int lo_value =
    writelo::value;
  static constexpr int value = (H << BIT_DEPTH) + L;
};

Using this approach we can easily increase the maximum number that we can get out of our counter from 256 to 16k or so, which is enough for one translation unit, for me.

Random acts of compiler abuse

Now, a counter is OK, as far as that goes, but something more useful might be nice... how about random numbers? But surely only a madman would try to implement a Mersenne Twister in C++11 constexpr-land. (This despite the fact that I did SHA256 string hashing.) No, these days when I think of random numbers, I think of Melissa O'Neill and her excellent PCG32. If you haven't seen her video, go watch it. I'll wait here. PCG32 has some distinct advantages for constexpr implementation:

It's easy to implement
It's fast
It's not a lot of code (seriously, ~10 lines)
It's easy to implement
It's understandable
It's easy to implement

Here's a simple implementation of the whole 32-bit affair:

constexpr uint64_t pcg32_advance(uint64_t s)
{
  return s * 6364136223846793005ULL
    + (1442695040888963407ULL | 1);
}

constexpr uint64_t pcg32_advance(uint64_t s, int n)
{
  return n == 0 ? s : pcg32_advance(pcg32_advance(s), n-1);
}

constexpr uint32_t pcg32_xorshift(uint64_t s)
{
  return ((s >> 18u) ^ s) >> 27u;
}

constexpr uint32_t pcg32_rot(uint64_t s)
{
  return s >> 59u;
}

constexpr uint32_t pcg32_output(uint64_t s)
{
  return (pcg32_xorshift(s) >> pcg32_rot(s))
    | (pcg32_xorshift(s) << ((-pcg32_rot(s)) & 31));
}

And now we can use exactly the same pattern as the constexpr counter, except the writer, instead of giving us an integer, will give us a random number (and we also plumb through the random seed S as a template parameter):

template 
struct writer
{
  static constexpr int hi_value =
    writehi::value;
  static constexpr int lo_value =
    writelo::value;
  static constexpr uint32_t value =
    pcg32_output(pcg32_advance(S, (H << BIT_DEPTH) + L));
};

There is probably a way to improve this by storing the actual PCG-computed value in a template instantiation, rather than simply using the integer to pump the PCG every time as I am doing. But it's a proof-of-concept and works well enough for now. A simple macro will give us a suitable seed for our compile-time RNG by hashing a few things together:

#define cx_pcg32 \
  cx::pcg32

And now we have a compile-time RNG that is seeded differently every compile, every file, every line. There are potentially a lot of template instantiations - maybe using a lot of memory in the compiler - but we can do some useful things with this.

Leave a comment