Archive for May, 2015

“The Lambda Trick”

Monday, May 18th, 2015

I just got back from C++Now, an excellent conference where C++ template metaprogramming experts abound. A phrase I overheard often was “the lambda trick”. It’s a trick for speeding up compiles when templates get deep.

Every C++ programmer knows that deep templates can slow down compilation. Most assume that this is because of excessive code generation (“code bloat”). But it turns out, I’m told, this isn’t actually the primary cause. As templates get deeper, type names get longer and longer, and it’s the compiler manipulating (hashing/mangling) these internally that is a major cause of slow compilation.

But when a compiler generates a closure type for a lambda, the name of that type doesn’t depend on its arguments. So it’s short, and it effectively truncates the long type name that was building up because of templates. Thus the lambda trick was born. In a post to the Boost developers’ mailing list on June 6th 2014, Louis Dionne shared this technique that he’d discovered while programming the Hana library, a functional library implemented with heavy template metaprogramming.

Let’s look at some code by way of explanation. The effect is most easily demonstrated with large tuples.

#include <iostream>
#include <tuple>
using namespace std;
 
int main(void)
{
  auto len = tuple_size<decltype(
      make_tuple( 1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
                 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
                 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
                 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
                 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
                 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
                 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
                 81, 82, 83, 84, 85, 86, 88, 88, 89, 90,
                 91, 92, 93, 94, 95, 96, 97, 98, 99, 100))>::value;
  cout << len << endl;
}

This is a very large tuple, resulting in a template type with a long name, and in fact I need to give GCC an argument to increase the template depth it can deal with. When I compile it (with GCC 4.9.2) this is the result:

$ time g++ -std=c++1y -O2 -ftemplate-depth=2048 -c main.cpp -o tmp.o
 
real	0m1.616s
user	0m1.500s
sys	0m0.108s

1.6 seconds to compile that. And it produces the following assembly, optimized as you’d expect:

0000000000000000 <main>:
   0:	48 83 ec 08          	sub    $0x8,%rsp
   4:	be 64 00 00 00       	mov    $0x64,%esi
   9:	bf 00 00 00 00       	mov    $0x0,%edi
   e:	e8 00 00 00 00       	callq  13 <main+0x13>;
   . . .

The constant 100 (0x64) is clearly visible there. And as expected, len is a compile-time constant. I can use it in template expressions or as an array size, for instance.

Now here is the same code with the “lambda trick” applied.

int main(void)
{
  auto list = [] (auto... xs) {
    return [=] (auto access) { return access(xs...); };
  };
 
  auto length = [] (auto xs) {
    return xs([] (auto... z) { return sizeof...(z); });
  };
 
  auto len = length(list( 1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
                         11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                         21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
                         31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
                         41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
                         51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
                         61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
                         71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
                         81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
                         91, 92, 93, 94, 95, 96, 97, 98, 99, 100));
  cout << len << endl;
}

The idea here is that list is a lambda that hides the long template type. It’s a generic lambda (C++14) and its closure object operator() takes a variadic template argument. We can still pass in a function to work on the template – in this case length – and once again, len is known at compile time (tuple_size has been replaced with sizeof...). Compiling with -O2 gives identical code (although the lambda returned by list looks like it captures, everything gets computed at compile time), but look at the time taken:

$ time g++ -std=c++1y -g -O2 -c main.cpp -o tmp.o
 
real	0m0.325s
user	0m0.296s
sys	0m0.024s

It’s 5x faster! The lambda hid the template, truncating the internal type name, so the compiler didn’t have to deal with manipulating such long strings. That’s “the lambda trick” in a nutshell.

“In-place” might be less in-place than you think

Tuesday, May 5th, 2015

The intuitive view of algorithms that work in-place is that (it sounds obvious) they don’t use any extra space. Canonically in C/C++ we think of something like reversing an array, or that interview staple, removing spaces from an ASCII string, which we might write as:

int remove_spaces(char *s)
{
  char *readptr = s;
  char *writeptr = s;
  while (*readptr)
  {
    if (!isspace(*readptr))
    {
      *writeptr++ = *readptr++;
    }
    else
    {
      ++readptr;
    }
  }
  *writeptr = 0;
  return writeptr - s;
}

But “in-place” has a technical definition that is actually more relaxed than using no extra space, because this intuitive sense is too limiting under a formal analysis. As Wikipedia says:

In computational complexity theory, in-place algorithms include all algorithms with O(1) space complexity, the class DSPACE(1). This class is very limited; it equals the regular languages.[1] In fact, it does not even include any of the examples listed above.

For this reason, we also consider algorithms in L, the class of problems requiring O(log n) additional space, to be in-place.

And in the recent book From Mathematics to Generic Programming by Alexander Stepanov (original author of the STL) and Daniel Rose, page 215 offers the definition:

An algorithm is in-place (also called polylog space) if for an input of length n it uses O((log n)k) additional space, where k is a constant.

(I highly recommend the book, by the way.) If you’re only used to thinking about “in-place” intuitively, you could probably be persuaded that using constant extra space is admissible – after all, one probably has to use a few stack variables – but I think the idea that an algorithm is entitled to call itself “in-place” if it uses logarithmic extra space might be surprising. But that’s the technical definition; an author is entitled to call his or her algorithm “in-place” even though it uses O(log n) extra space (and includes the requirement to allocate that space).