“The Lambda Trick”

I just got back from C++Now, an excellent conference where C++ template metaprogramming experts abound. A phrase I overheard often was “the lambda trick”. It’s a trick for speeding up compiles when templates get deep.

Every C++ programmer knows that deep templates can slow down compilation. Most assume that this is because of excessive code generation (“code bloat”). But it turns out, I’m told, this isn’t actually the primary cause. As templates get deeper, type names get longer and longer, and it’s the compiler manipulating (hashing/mangling) these internally that is a major cause of slow compilation.

But when a compiler generates a closure type for a lambda, the name of that type doesn’t depend on its arguments. So it’s short, and it effectively truncates the long type name that was building up because of templates. Thus the lambda trick was born. In a post to the Boost developers’ mailing list on June 6th 2014, Louis Dionne shared this technique that he’d discovered while programming the Hana library, a functional library implemented with heavy template metaprogramming.

Let’s look at some code by way of explanation. The effect is most easily demonstrated with large tuples.

#include <iostream>
#include <tuple>
using namespace std;
 
int main(void)
{
  auto len = tuple_size<decltype(
      make_tuple( 1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
                 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
                 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
                 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
                 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
                 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
                 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
                 81, 82, 83, 84, 85, 86, 88, 88, 89, 90,
                 91, 92, 93, 94, 95, 96, 97, 98, 99, 100))>::value;
  cout << len << endl;
}

This is a very large tuple, resulting in a template type with a long name, and in fact I need to give GCC an argument to increase the template depth it can deal with. When I compile it (with GCC 4.9.2) this is the result:

$ time g++ -std=c++1y -O2 -ftemplate-depth=2048 -c main.cpp -o tmp.o
 
real	0m1.616s
user	0m1.500s
sys	0m0.108s

1.6 seconds to compile that. And it produces the following assembly, optimized as you’d expect:

0000000000000000 <main>:
   0:	48 83 ec 08          	sub    $0x8,%rsp
   4:	be 64 00 00 00       	mov    $0x64,%esi
   9:	bf 00 00 00 00       	mov    $0x0,%edi
   e:	e8 00 00 00 00       	callq  13 <main+0x13>;
   . . .

The constant 100 (0x64) is clearly visible there. And as expected, len is a compile-time constant. I can use it in template expressions or as an array size, for instance.

Now here is the same code with the “lambda trick” applied.

int main(void)
{
  auto list = [] (auto... xs) {
    return [=] (auto access) { return access(xs...); };
  };
 
  auto length = [] (auto xs) {
    return xs([] (auto... z) { return sizeof...(z); });
  };
 
  auto len = length(list( 1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
                         11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                         21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
                         31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
                         41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
                         51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
                         61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
                         71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
                         81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
                         91, 92, 93, 94, 95, 96, 97, 98, 99, 100));
  cout << len << endl;
}

The idea here is that list is a lambda that hides the long template type. It’s a generic lambda (C++14) and its closure object operator() takes a variadic template argument. We can still pass in a function to work on the template – in this case length – and once again, len is known at compile time (tuple_size has been replaced with sizeof...). Compiling with -O2 gives identical code (although the lambda returned by list looks like it captures, everything gets computed at compile time), but look at the time taken:

$ time g++ -std=c++1y -g -O2 -c main.cpp -o tmp.o
 
real	0m0.325s
user	0m0.296s
sys	0m0.024s

It’s 5x faster! The lambda hid the template, truncating the internal type name, so the compiler didn’t have to deal with manipulating such long strings. That’s “the lambda trick” in a nutshell.

Leave a Reply