Archive for the ‘Programming’ Category

“The Lambda Trick”

Monday, May 18th, 2015

I just got back from C++Now, an excellent conference where C++ template metaprogramming experts abound. A phrase I overheard often was “the lambda trick”. It’s a trick for speeding up compiles when templates get deep.

Every C++ programmer knows that deep templates can slow down compilation. Most assume that this is because of excessive code generation (“code bloat”). But it turns out, I’m told, this isn’t actually the primary cause. As templates get deeper, type names get longer and longer, and it’s the compiler manipulating (hashing/mangling) these internally that is a major cause of slow compilation.

But when a compiler generates a closure type for a lambda, the name of that type doesn’t depend on its arguments. So it’s short, and it effectively truncates the long type name that was building up because of templates. Thus the lambda trick was born. In a post to the Boost developers’ mailing list on June 6th 2014, Louis Dionne shared this technique that he’d discovered while programming the Hana library, a functional library implemented with heavy template metaprogramming.

Let’s look at some code by way of explanation. The effect is most easily demonstrated with large tuples.

#include <iostream>
#include <tuple>
using namespace std;
 
int main(void)
{
  auto len = tuple_size<decltype(
      make_tuple( 1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
                 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
                 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
                 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
                 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
                 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
                 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
                 81, 82, 83, 84, 85, 86, 88, 88, 89, 90,
                 91, 92, 93, 94, 95, 96, 97, 98, 99, 100))>::value;
  cout << len << endl;
}

This is a very large tuple, resulting in a template type with a long name, and in fact I need to give GCC an argument to increase the template depth it can deal with. When I compile it (with GCC 4.9.2) this is the result:

$ time g++ -std=c++1y -O2 -ftemplate-depth=2048 -c main.cpp -o tmp.o
 
real	0m1.616s
user	0m1.500s
sys	0m0.108s

1.6 seconds to compile that. And it produces the following assembly, optimized as you’d expect:

0000000000000000 <main>:
   0:	48 83 ec 08          	sub    $0x8,%rsp
   4:	be 64 00 00 00       	mov    $0x64,%esi
   9:	bf 00 00 00 00       	mov    $0x0,%edi
   e:	e8 00 00 00 00       	callq  13 <main+0x13>;
   . . .

The constant 100 (0x64) is clearly visible there. And as expected, len is a compile-time constant. I can use it in template expressions or as an array size, for instance.

Now here is the same code with the “lambda trick” applied.

int main(void)
{
  auto list = [] (auto... xs) {
    return [=] (auto access) { return access(xs...); };
  };
 
  auto length = [] (auto xs) {
    return xs([] (auto... z) { return sizeof...(z); });
  };
 
  auto len = length(list( 1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
                         11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                         21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
                         31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
                         41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
                         51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
                         61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
                         71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
                         81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
                         91, 92, 93, 94, 95, 96, 97, 98, 99, 100));
  cout << len << endl;
}

The idea here is that list is a lambda that hides the long template type. It’s a generic lambda (C++14) and its closure object operator() takes a variadic template argument. We can still pass in a function to work on the template – in this case length – and once again, len is known at compile time (tuple_size has been replaced with sizeof...). Compiling with -O2 gives identical code (although the lambda returned by list looks like it captures, everything gets computed at compile time), but look at the time taken:

$ time g++ -std=c++1y -g -O2 -c main.cpp -o tmp.o
 
real	0m0.325s
user	0m0.296s
sys	0m0.024s

It’s 5x faster! The lambda hid the template, truncating the internal type name, so the compiler didn’t have to deal with manipulating such long strings. That’s “the lambda trick” in a nutshell.

“In-place” might be less in-place than you think

Tuesday, May 5th, 2015

The intuitive view of algorithms that work in-place is that (it sounds obvious) they don’t use any extra space. Canonically in C/C++ we think of something like reversing an array, or that interview staple, removing spaces from an ASCII string, which we might write as:

int remove_spaces(char *s)
{
  char *readptr = s;
  char *writeptr = s;
  while (*readptr)
  {
    if (!isspace(*readptr))
    {
      *writeptr++ = *readptr++;
    }
    else
    {
      ++readptr;
    }
  }
  *writeptr = 0;
  return writeptr - s;
}

But “in-place” has a technical definition that is actually more relaxed than using no extra space, because this intuitive sense is too limiting under a formal analysis. As Wikipedia says:

In computational complexity theory, in-place algorithms include all algorithms with O(1) space complexity, the class DSPACE(1). This class is very limited; it equals the regular languages.[1] In fact, it does not even include any of the examples listed above.

For this reason, we also consider algorithms in L, the class of problems requiring O(log n) additional space, to be in-place.

And in the recent book From Mathematics to Generic Programming by Alexander Stepanov (original author of the STL) and Daniel Rose, page 215 offers the definition:

An algorithm is in-place (also called polylog space) if for an input of length n it uses O((log n)k) additional space, where k is a constant.

(I highly recommend the book, by the way.) If you’re only used to thinking about “in-place” intuitively, you could probably be persuaded that using constant extra space is admissible – after all, one probably has to use a few stack variables – but I think the idea that an algorithm is entitled to call itself “in-place” if it uses logarithmic extra space might be surprising. But that’s the technical definition; an author is entitled to call his or her algorithm “in-place” even though it uses O(log n) extra space (and includes the requirement to allocate that space).

Recursive lambdas

Thursday, April 16th, 2015

One can assign a lambda to auto or to std::function. Normally one would assign a lambda to auto to avoid possible unwanted allocation from std::function. But if you want recursion, you need to be able to refer to the lambda variable inside the lambda, and you can’t do that if it’s assigned to auto. So how do you do recursive lambdas without using std::function?

Use a fixed-point combinator (y-combinator) of course.

#include <functional>
#include <iostream>
 
template<typename F>
struct Fix
{
  Fix(const F& f)
    : m_f(f)
  {}
 
  Fix(F&& f)
    : m_f(std::move(f))
  {}
 
  template <typename T>
  auto operator()(T t) const
  {
    return m_f(*this, t);
  }
 
  F m_f;
};
 
template<typename F>
auto fix(F&& f)
{
  return Fix<F>(std::forward<F>(f));
}
 
int main(void)
{
  auto f = fix(
      [] (auto& f, int x) -> int
      {
        if (x <= 2) return 1;
        return f(x-1) + f(x-2);
      });
  std::cout << f(6) << std::endl;
 
  return 0;
}

This code compiles with GCC 4.9.1, Clang 3.4.2 or MSVC 2015 preview (and produces “8” as the output). Clang allows omission of the lambda’s trailing return type.

Rules for using <random>

Wednesday, April 8th, 2015

These days, it’s easy to do the right thing.

Don’t do this:

  • Don’t use std::rand(). Ever.
  • Don’t use std::random_shuffle() to permute containers. (Too easy to misuse; can use std::rand() under the hood.)
  • Don’t use any kind of clock for a seed.
  • Don’t use mod (%) to get a random value into a range. It introduces bias.
  • Don’t use std::default_random_engine – it’s probably not the best choice and can vary by implementation.

Do this:

  • Use std::random_device() as a source of entropy. It should be the best randomness your OS can supply.
  • Use std::shuffle() to permute containers. Its interface doesn’t permit misuse like std::random_shuffle().
  • Use std::mt19937 (or std::mt19937_64) as an RNG engine. (Or consider PCG, but that’s not standard – yet.)
  • Seed the engine with as much seed data as it requires for its internal state (using its state_size member), otherwise you’re throwing away entropy. For std::mt19937 this is 624 32-bit values (not just a single 32-bit value). Use std::seed_seq to help with initialization.
  • RNG engines (particularly std::mt19937) can be expensive to initialize (and have large amounts of internal state), so don’t put them on the stack unnecessarily. Consider using thread_local.
  • Use std::uniform_int_distribution (and the other distributions) to get your random number into the required range. The distributions are carefully crafted to be unbiased (why you shouldn’t use %). Note that they work on closed (inclusive) ranges, not half-open ranges – otherwise the max value would be unreachable.

Wrong:

// init RNG
srand(time(NULL));
 
// get random number
int n = rand() % 100;

Right:

// init RNG
std::array<int, std::mt19937::state_size> seed_data;
std::random_device r;
std::generate_n(seed_data.data(), seed_data.size(), std::ref(r));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));
std::mt19937 gen(seq);
 
// get random number
std::uniform_int_distribution<int> dis(0,99);
int n = dis(gen);

Edit: Alternatively, Melissa O’Neill has produced randutils which wraps C++11 random number generation in a nice interface, and does the right thing. She knows what she’s talking about: her work on randomness is worth your time.

C++ Tuples: the missing functionality

Monday, April 6th, 2015

C++ provides a strange mix of compile-time and runtime functionality for dealing with tuples. There are some interesting parts, like std::tie to destructure a tuple, and std::tuple_cat to join together several tuples into one. So there is evidence that the standard has been influenced by some functional programming ideas, but I don’t think the full power of tuples has been realized (in both senses), and I found myself thinking about some missing parts.

Like why do we have std::tuple_cat and std::get<0> (aka std::tuple_head), but not std::tuple_cons or std::tuple_tail? So here are some possible implementations.

First, something missing from type_traits which will help us constrain things:

template <typename T>
struct is_tuple : public std::false_type {};
 
template <typename... Ts>
struct is_tuple<std::tuple<Ts...>> : public std::true_type {};

Both tuple_cons and tuple_tail can make use of an expansion of std::index_sequence to work their magic, with a user-facing function that provides the appropriate std::index_sequence to an overload.

For tuple_cons, we just call std::make_tuple with the new element and the result of expanding std::get across the other tuple:

template <typename U, typename T, std::size_t ...Is>
auto tuple_cons(U&& u, T&& t, std::index_sequence<Is...>,
    std::enable_if_t<is_tuple<std::decay_t<T>>::value>* = nullptr)
{
  return std::make_tuple(std::forward<U>(u), 
                         std::get<Is>(std::forward<T>(t))...);
}
 
template <typename U, typename T>
auto tuple_cons(U&& u, T&& t,
    std::enable_if_t<is_tuple<std::decay_t<T>>::value>* = nullptr)
{
  using Tuple = std::decay_t<T>;
  return tuple_cons(std::forward<U>(u), std::forward<T>(t),
      std::make_index_sequence<std::tuple_size<Tuple>::value>());
}

And for tuple_tail, we construct a std::index_sequence of length n-1, then offset it by one in the expansion to obtain the tail:

template <typename T, std::size_t ...Is>
auto tuple_tail(T&& t, std::index_sequence<Is...>,
    std::enable_if_t<is_tuple<std::decay_t<T>>::value>* = nullptr)
{
  return std::make_tuple(std::get<Is + 1>(std::forward<T>(t))...);
}
 
template <typename T>
auto tuple_tail(T&& t, 
    std::enable_if_t<is_tuple<std::decay_t<T>>::value>* = nullptr)
{
  using Tuple = std::decay_t<T>;
  return tuple_tail(std::forward<T>(t),
      std::make_index_sequence<std::tuple_size<Tuple>::value - 1>());
}

Another myth, about C++ lambdas

Monday, March 16th, 2015

Myth: Lambda expressions can cause heap allocations.

I see this myth coming up a lot. People think that lambdas can cause a heap allocation – on Reddit, on StackOverflow, on Channel9 comments, in personal conversations. I don’t blame people for thinking this – C++ is complex. But it always seems to be some vague notion of “heap allocations can happen under some circumstances”. So let’s think about it more clearly.

Lambdas are a language feature. The compiler will create a closure object representing the function literal you write. The closure object is a value type, created as a prvalue and with automatic storage when assigned to auto (as you must, because its type is unutterable). The only way heap allocation can occur is if you by-value capture a variable whose copy constructor does heap allocation (and this isn’t a consequence of the lambda per se, as I hope you can see).

You can use lambdas all day long on the stack, assigned to auto variables, and never incur a heap allocation. Since lambdas are a language feature, it’s unclear if they could do allocation even if it were desirable – in C++, allocation is in the realm of library code, and I think it would almost certainly entail semantic difficulties for the language, as well as difficulties with exceptions and the like.

Where the confusion comes from, perhaps, is a conflation of lambdas with std::function. (Because lambdas are commonly assigned to std::function for storage and passing around?) std::function is a library construct that wraps a callable using type erasure, and may very well incur heap allocation. In order to wrap any callable, it has to create an internal class in the standard type-erasure way, and that might involve an allocation.

However, std::function does have a couple of common optimizations up its sleeve. First is the so-called “small functor optimization” – a buffer inside a std::function object that is typically big enough to store a few pointers and can be used to store the internal object, assuming it will fit. This allows std::function to avoid heap allocation in common cases (typically just one or maybe two captures).

The second optimization is a space optimization. The typical type-erasure pattern involves calling a virtual function on an internal base class, whose derived class is parametrized on the actual type passed in. But every lambda has a different type, so a naive implementation of this could result in many vtables being generated. So std::function commonly optimizes the call machinery, basically by supplanting the normal C++ virtual call process with a free function implementation that doesn’t cause vtable bloat.

And that’s about it. Lambdas don’t (intrinsically) cause any heap allocation. Ever. When you assign to a std::function, that may cause an allocation, but for a small number of captures, the small functor optimization will probably apply.

A persistent myth about STL’s remove (and friends)

Sunday, March 8th, 2015

There seems to be a persistent myth about STL’s remove, remove_if, etc.
Ask even a relatively experienced C++ programmer to explain this code.

vector<int> v = { 1,2,3,4,5 };
v.erase(remove_if(v.begin(), v.end(),
                  [] (int i) { return (i & 1) == 0; }),
        v.end());

They’ll recognize the erase-remove idiom and correctly say that it’s removing even numbers from the vector so that what’s left is { 1,3,5 }. So then you ask them how it works, and likely as not, they say something like:

“The remove_if moves all the elements you want to remove to the end, then the erase gets rid of them.”

This isn’t what remove_if does. (And likewise, the other remove* algorithms.) If it did that – which is more work than it does – it would in fact be partition. What remove does is move the elements that won’t be removed to the beginning. A suitable implementation of remove_if, which you can see in the libc++ source, is:

template <class _ForwardIterator, class _Predicate>
_ForwardIterator
remove_if(_ForwardIterator __first, _ForwardIterator __last, 
          _Predicate __pred)
{
    __first = _VSTD::find_if<
        _ForwardIterator, 
        typename add_lvalue_reference<_Predicate>::type>
            (__first, __last, __pred);
    if (__first != __last)
    {
        _ForwardIterator __i = __first;
        while (++__i != __last)
        {
            if (!__pred(*__i))
            {
                *__first = _VSTD::move(*__i);
                ++__first;
            }
        }
    }
    return __first;
}

The elements that got removed just got overwritten. They didn’t get moved to the end. After the call to remove_if (before the erase call), the vector was { 1,3,5,4,5 }.

This means that remove can potentially invalidate invariants, e.g. if we expect the sequence to contain unique values (the erase restores the invariant, so erase-remove should always be paired in this case). And if this had been a container of pointers, in all likelihood, memory would have been leaked. Hence item 33 in Effective STL, “Be wary of remove-like algorithms on containers of pointers”.

So remember, remove doesn’t move things to the end. Next time you hear that, congratulate the programmer saying it – they’re one of today’s lucky 10,000!

Adopting C++11: no-brainer features

Friday, February 20th, 2015

C++11/14 is a significant change from C++98/03, and features like move semantics take a while to get used to. Also, people tend to be quite conservative about adopting new features (especially if they look unfamiliar). It took us in the games industry a while to move to C++ from C. But here are what I consider the no-brainer, programming-in-the-small things to adopt. Extra safety and expressiveness with zero runtime impact and pretty much no potential for misuse (never say never, but you’d really have to go out of your way).

1. using instead of typedef

Not only is it a whole two characters fewer to type, it’s easier to read in the common function pointer case, and you can use it for template aliases, which often saves a lot more verbosity in the rest of the code. No more typename everywhere! Compare:

// old and busted: typedef
template <class T>
struct Foo
{
  // I can't typedef Foo<T>::type 
  // and I have to use typename everywhere
  typedef T type;
};
 
// Typical usage: a function pointer
typedef void (*funcptr)(int);
// new hotness: using
template <class T>
struct Foo
{
  using type = T;
};
 
// This pattern is used a lot in C++14 STL
template <class T>
using foo_t = typename Foo<T>::type;
 
// Function pointer is easier to read
using funcptr = void (*)(int);

2. static_assert

Compile-time asserts; what’s not to like? Odds are you have a homegrown C++98 TMP version of this somewhere; now it’s part of the language. Extra safety for zero runtime cost.

3. nullptr

Again, extra safety for zero cost. Ditch your zeros and NULLs, and you can safely have functions overloaded on pointer and integral types.

4. scoped enums (enum class)

The compiler can help prevent accidental confusion between types, and the enum values don’t leak any more. Hurrah! (It’s like Haskell’s newtype for integers!)

5. forward declarations of enums, underlying type for enums

This works on (new) scoped and (old) unscoped enums alike. You don’t have to put in fake bit-width values to force the size of an enum any more, and you can hide the values separately from the declaration, so you don’t need to recompile everything when you add a value.

6. override

When (for example) the class you’re deriving from changes upstream, and now you’re accidentally not overriding a member function you thought you were… you’d really like to know that. With override, a bug that might take you a couple of hours to track down becomes a trivial-to-fix compile error. (Often mentioned in the same breath as override is final, and it’s fine, but much less usefully applicable.)

Now, there are also a couple of things to stop using.

1. rand()

Stop using rand(). Really, it’s bad. Deep down, we always knew it was “bad” but maybe we convinced ourselves it was OK for “small”/”quick-and-dirty” tasks. It isn’t. And now, it’s not any easier or faster than just doing the Right Thing with <random>. STL explains it all in rand() considered harmful.

2. bind

Lambdas supersede bind in every way. They’re more straightforward to write, easier to read, as powerful and at least as efficient if not more so, and just… you know, not so weird. And if you think lambdas take some getting used to, well, I don’t think I ever got used to bind

VS2010 woes: tuples as map keys

Friday, February 20th, 2015

Another day, another compiler/library bug. If you’re unfortunate enough to still be using Visual Studio 2010, don’t use tuples as map keys.

#include <map>
#include <tuple>
#include <utility>
 
using namespace std;
 
typedef tuple<int *, int> FooT;
typedef map<FooT, int> MapT;
 
int main(int, char*[])
{
  MapT m;
 
  // put a value in the map
  {
    FooT f(nullptr, 0);
    m[f] = 1337;
  }
 
  // retrieve a value
  int s;
  {
    FooT f(nullptr, 0);
    auto i = m.find(f);
    if (i != m.end())
    {
      // VS2010 says:
      // error C2440: 'initializing' : 
      // cannot convert from 'const FooT' to 'int *'
      s = i->second;
    }
  }
 
  return 0;
}

A problem with C++ lambdas?

Wednesday, February 18th, 2015

C++ lambdas are wonderful for all sorts of reasons (especially with their C++14-and-beyond power). But I’ve run into a problem that I can’t think of a good way around yet.

If you’re up to date with C++, of course you know that rvalue references and move semantics are a major thing. At this point, there are a lot of blog posts, youtube/conference videos, and even books about how they work, and also about how forwarding references (formerly known as universal references) work with templates and std::forward to provide nice optimal handling of objects with move semantics.

In C++14 the ability to handle lambda captures with move semantics was added with generalized lambda capture; another piece of the puzzle as far as move-semantic completeness goes.

Another lesser-known but important piece of the complete picture is that class member functions can have reference qualifiers. (You can find this in the standard in section 9.3.1 [class.mfct.non-static]). This means that just as we write member functions with const modifiers, we can overload member functions with & or && modifiers and they will be called according to the value category of this. So you know when you can call std::move on data members safely.

Now, lambdas are conceptually like anonymous classes where the body is the operator() and the captured variables are the data members. And we can write lambdas with a mutable modifier indicating that the data members are mutable. (By far the common case is for them to be const, so the ordinary usage of const on a member function is inverted for lambdas.)

I said conceptually because it turns out they aren’t completely like that in at least one important way: lambdas can’t have reference qualifiers. Maybe for good reason – how would the compiler implement that? How would the programmer who wanted that behaviour specify the overloads (s)he’s wanting? These are tricky questions to answer well. But it is a problem – as far as I can tell so far, there is no way to know, inside a lambda, what the value category of the lambda object is. So the performance promise of the move semantic model falls down in the face of safety concerns: I don’t know whether I can safely move from a captured variable inside a lambda.

If anyone has any ideas about this, please let me know. Google and StackOverflow can tell me all about how move captures work with lambdas, but nothing about how to move things out of lambdas safely, or divine the value category of a lambda object. All the things I’ve tried have either not worked, or have resulted in suboptimalities of various kinds. (And frankly, if anything had worked, at this point I’d put it down to a compiler quirk and not to be relied on.)

As far as I can tell, it’s a definite shortcoming of C++ that there’s no way to do this in a compile-time, type-inferred, lambda-land way. I don’t see this in the core language issues – is it a known problem, or is there a known way to solve it that I don’t know yet?