Archive for March, 2015

Another myth, about C++ lambdas

Monday, March 16th, 2015

Myth: Lambda expressions can cause heap allocations.

I see this myth coming up a lot. People think that lambdas can cause a heap allocation – on Reddit, on StackOverflow, on Channel9 comments, in personal conversations. I don’t blame people for thinking this – C++ is complex. But it always seems to be some vague notion of “heap allocations can happen under some circumstances”. So let’s think about it more clearly.

Lambdas are a language feature. The compiler will create a closure object representing the function literal you write. The closure object is a value type, created as a prvalue and with automatic storage when assigned to auto (as you must, because its type is unutterable). The only way heap allocation can occur is if you by-value capture a variable whose copy constructor does heap allocation (and this isn’t a consequence of the lambda per se, as I hope you can see).

You can use lambdas all day long on the stack, assigned to auto variables, and never incur a heap allocation. Since lambdas are a language feature, it’s unclear if they could do allocation even if it were desirable – in C++, allocation is in the realm of library code, and I think it would almost certainly entail semantic difficulties for the language, as well as difficulties with exceptions and the like.

Where the confusion comes from, perhaps, is a conflation of lambdas with std::function. (Because lambdas are commonly assigned to std::function for storage and passing around?) std::function is a library construct that wraps a callable using type erasure, and may very well incur heap allocation. In order to wrap any callable, it has to create an internal class in the standard type-erasure way, and that might involve an allocation.

However, std::function does have a couple of common optimizations up its sleeve. First is the so-called “small functor optimization” – a buffer inside a std::function object that is typically big enough to store a few pointers and can be used to store the internal object, assuming it will fit. This allows std::function to avoid heap allocation in common cases (typically just one or maybe two captures).

The second optimization is a space optimization. The typical type-erasure pattern involves calling a virtual function on an internal base class, whose derived class is parametrized on the actual type passed in. But every lambda has a different type, so a naive implementation of this could result in many vtables being generated. So std::function commonly optimizes the call machinery, basically by supplanting the normal C++ virtual call process with a free function implementation that doesn’t cause vtable bloat.

And that’s about it. Lambdas don’t (intrinsically) cause any heap allocation. Ever. When you assign to a std::function, that may cause an allocation, but for a small number of captures, the small functor optimization will probably apply.

A persistent myth about STL’s remove (and friends)

Sunday, March 8th, 2015

There seems to be a persistent myth about STL’s remove, remove_if, etc.
Ask even a relatively experienced C++ programmer to explain this code.

vector<int> v = { 1,2,3,4,5 };
v.erase(remove_if(v.begin(), v.end(),
                  [] (int i) { return (i & 1) == 0; }),
        v.end());

They’ll recognize the erase-remove idiom and correctly say that it’s removing even numbers from the vector so that what’s left is { 1,3,5 }. So then you ask them how it works, and likely as not, they say something like:

“The remove_if moves all the elements you want to remove to the end, then the erase gets rid of them.”

This isn’t what remove_if does. (And likewise, the other remove* algorithms.) If it did that – which is more work than it does – it would in fact be partition. What remove does is move the elements that won’t be removed to the beginning. A suitable implementation of remove_if, which you can see in the libc++ source, is:

template <class _ForwardIterator, class _Predicate>
_ForwardIterator
remove_if(_ForwardIterator __first, _ForwardIterator __last, 
          _Predicate __pred)
{
    __first = _VSTD::find_if<
        _ForwardIterator, 
        typename add_lvalue_reference<_Predicate>::type>
            (__first, __last, __pred);
    if (__first != __last)
    {
        _ForwardIterator __i = __first;
        while (++__i != __last)
        {
            if (!__pred(*__i))
            {
                *__first = _VSTD::move(*__i);
                ++__first;
            }
        }
    }
    return __first;
}

The elements that got removed just got overwritten. They didn’t get moved to the end. After the call to remove_if (before the erase call), the vector was { 1,3,5,4,5 }.

This means that remove can potentially invalidate invariants, e.g. if we expect the sequence to contain unique values (the erase restores the invariant, so erase-remove should always be paired in this case). And if this had been a container of pointers, in all likelihood, memory would have been leaked. Hence item 33 in Effective STL, “Be wary of remove-like algorithms on containers of pointers”.

So remember, remove doesn’t move things to the end. Next time you hear that, congratulate the programmer saying it – they’re one of today’s lucky 10,000!

How many bugs are left? The Lincoln Index

Friday, March 6th, 2015

You’re working on some software, and you have some QA folks testing it. How do you know how many bugs are left? You know the bugs that the testers find, but how can you estimate the number that aren’t yet found?

If your testers work independently, and if your feature is not under continuous development (i.e. you’re not adding bugs), (and maybe a couple of other ifs) you can use a thing called the Lincoln Index.

It’s quite simple.

Assume that there are N total bugs (found and as-yet-unfound).
Tester a finds A bugs. A = PaN, where Pa is the probability of a finding a bug.
Similarly, tester b finds B bugs. B = PbN, by the same reasoning.

There will be some bugs found by both testers, call that number C. It is evident that C = PaPbN. (The probability that both find a bug is the product of the individual probabilities.)

Now:

AB / C = PaNPbN / PaPbN = PaPbN2 / PaPbN = N2 / N = N

So a simple calculation gives us an estimate for N, the total number of bugs.