## Archive for June, 2017

### Development of an Algorithm

Wednesday, June 21st, 2017

Here’s an exercise: given a nice piece of code sitting in a file, how do you take that code and make it generic, in the style of an STL algorithm?

For our example, let’s consider an algorithm that isn’t (yet) in the STL. First, the problem it solves. Imagine that you have a set (in the mathematical sense, not the container sense) of numbers; discontiguous, unordered, and notionally taken from the positive integers, e.g.:

$\{5,2,7,9,4,0,1\}$

What we want to find is the smallest number that isn’t included in the set. For the example given above, the answer is 3. If the set is sorted, the solution is apparent; adjacent_find will find the first gap for us. So one solution would be:

1 2 3 4 5 6 7 8 // assuming our set is in an array or vector std::sort(a.begin(), a.end()); auto p = std::adjacent_find( a.begin(), a.end(), [] (int x, int y) { return y-x > 1; }); int unused = a.back() + 1; if (p != a.end()) unused = *p + 1;

The asymptotic complexity of this solution is of course O(n.log(n)), since adjacent_find is linear and the complexity of sort dominates. Is there a better solution? It’s not obvious at first glance, but there is a linear time solution to this problem. The key is a divide and conquer strategy using that workhorse of algorithms, partition.

The smallest unused number

Let’s start by assuming that the minimum unused value is zero. We’ll choose a pivot value, m, which is the assumed midpoint of the sequence – it doesn’t actually have to be present in the sequence, but we’re going to simply assume that it’s the middle value. Partitioning the sequence about the pivot, we’ll get a situation like this:

where the first value equal to or greater than m is at position p (which is returned by the call to partition). If the lower half of the sequence has no gaps, m will be equal to p, and we can recurse on the top half of the sequence, setting the new minimum unused value to m.

We know that m cannot be less than p since there cannot be any repetitions in the set. Therefore, if m is not equal to p, it must be greater – meaning that there is at least one gap below p, and that we can recurse on the bottom half of the sequence (keeping the minimum unused value as-is).

The base case of the algorithm is when the sequence we need to partition is empty. At that point we will have found the minimum unused value. Here’s the algorithm in code, with a bit of setup:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 void min_unused() { // initialize RNG std::array<int, std::mt19937::state_size> seed_data; std::random_device r; std::generate_n(seed_data.data(), seed_data.size(), std::ref(r)); std::seed_seq seq(std::begin(seed_data), std::end(seed_data)); std::mt19937 gen(seq);   // fill an array with ints, shuffle, discard some int a[10]; std::iota(&a[0], &a[10], 0); std::shuffle(&a[0], &a[10], gen); int first_idx = 0; int last_idx = 7; // arbitrary truncation   for (int i = first_idx; i < last_idx; ++i) std::cout << a[i] << '\n';   // the algorithm int unused = 0; while (first_idx != last_idx) { int m = unused + (last_idx - first_idx + 1)/2; auto p = std::partition(&a[first_idx], &a[last_idx], [&] (int i) { return i < m; }); if (p - &a[first_idx] == m - unused) { unused = m; first_idx = p - &a[0]; } else { last_idx = p - &a[0]; } }   std::cout << "Min unused: " << unused << '\n'; }

You can also see the algorithm on wandbox. I leave it to you to convince yourself that the algorithm works. The asymptotic complexity of the algorithm is linear; at each stage we are running partition, which is O(n), and then recursing, but only on one half of the sequence. Thus, the complexity is:

$O(n + \frac{n}{2} + \frac{n}{4} + \frac{n}{8} + ...) = O(\sum_{i=0}^\infty \frac{n}{2^i}) = O(2n) = O(n)$

From concrete to generic: first steps

So, now we have an algorithm that works… for C-style arrays, and in one place only. Naturally we want to make it as generic as we can; what, then, are the steps to transform it into a decent algorithm that can be used just as easily as those in the STL? Perhaps you’re already thinking that this is overly concrete – but hey, this is a pedagogical device. Bear with me.

First, we can pull the algorithm out into a function. You’ve already seen the setup code, so you should be able to imagine the calling code from here on. While we’re at it, let’s change the index-based arithmetic to pointers. We can pass the sequence in to the function as an argument; for now, we may as well say that the sequence is a vector, because vector is always the right default choice! Our updated function:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 // version 1: a function that takes a vector int min_unused(std::vector<int>& v) { int* first = &v[0]; int* last = first + v.size(); int unused = 0; while (first != last) { int m = unused + (last - first + 1)/2; auto p = std::partition(first, last, [&] (int i) { return i < m; }); if (p - first == m - unused) { unused = m; first = p; } else { last = p; } } return unused; }

This is already looking a little clearer. Using pointers removes some of the syntactic noise and allows us to see what’s going on better. But, as you may already have thought, once we have vectors, the next logical step is to consider using iterators. Let’s make that change now so that we can run the algorithm on a subrange within any vector.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 // version 2: use iterators, like a real algorithm template <typename It> inline int min_unused(It first, It last) { int unused = 0; while (first != last) { int m = unused + (last - first + 1)/2; auto p = std::partition(first, last, [&] (int i) { return i < m; }); if (p - first == m - unused) { unused = m; first = p; } else { last = p; } } return unused; }

Now instead of using the vector directly, now we can pass in begin() and end(), or any other valid range-delimiting pair of iterators. Note also that because min_unused is now a function template, we added inline on line 3, meaning that we can put it in a header without causing link errors when it’s instantiated the same way in multiple translation units.

This is a good start, but we can make it more generic yet! At the moment it’s still only working on sequences of ints, so let’s fix that.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 // version 3: infer the value_type template <typename It> inline typename std::iterator_traits<It>::value_type min_unused(It first, It last) { using T = typename std::iterator_traits<It>::value_type; T unused{}; while (first != last) { T m = unused + (last - first + 1)/2; auto p = std::partition(first, last, [&] (const T& i) { return i < m; }); if (p - first == m - unused) { unused = m; first = p; } else { last = p; } } return unused; }

Here we’re using iterator_traits to discover the value_type of the iterators passed in. This works with anything that satisfies the Iterator concept. Consequently we’ve also now removed the ints that were scattered through the code and instead are using Ts. Note that we are value-initializing the T on line 6, which will properly zero-initialize integral types.

Since we aren’t actually calculating the initial value of unused, we need not assume it to be zero; we could let the caller pass it in, and default the argument for convenience. In order to do that, we’ll pull out T into a defaulted template argument.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 // version 4: let the user supply the initial minimum template <typename It, typename T = typename std::iterator_traits<It>::value_type> inline T min_unused(It first, It last, T init = T{}) { while (first != last) { T m = init + (last - first + 1)/2; auto p = std::partition(first, last, [&] (const T& i) { return i < m; }); if (p - first == m - init) { init = m; first = p; } else { last = p; } } return init; }

A caller-provided minimum might be quite handy – it’s common for zero to be a reserved value, or to have a non-zero starting point for values.

Iterators again

At this point, min_unused is starting to look more like a real STL algorithm, at least superficially, but it’s not quite there yet. Lines 7 and 10 are doing plain arithmetic, which assumes that the iterators passed in support that. A fundamental concept in the STL is the idea of iterator categories: concepts which define operations available on different kinds of iterators. Not all iterators support random access, and any time we are writing a generic algorithm, we had better use the appropriate functions to manipulate iterators rather than assuming valid operations:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 // version 5: handle iterator arithmetic generically template <typename It, typename T = typename std::iterator_traits<It>::value_type> inline T min_unused(It first, It last, T init = T{}) { while (first != last) { T m = init + (std::distance(first, last)+1)/2; auto p = std::partition(first, last, [&] (const T& i) { return i < m; }); if (std::distance(first, p) == m - init) { init = m; first = p; } else { last = p; } } return init; }

In the code above, the arithmetic on lines 7 and 10 has been replaced with calls to distance. This is a key change! Now we are no longer limited to random access iterators like those of vector or just plain pointers. Our algorithm now works on all kinds of iterators, but which ones exactly? It behooves us to define exactly what we require of these iterators being used as our arguments. Until C++ has Concepts, we can’t enforce this, but we can document it, typically by calling the template argument something more descriptive than ‘It‘. Looking at the references for distance and partition, we find that a forward iterator is required.

For many algorithms in the STL, there are more efficient versions available for stronger iterator categories. A good example is partition itself: with just a forward iterator, it may do N swaps, but with a bidirectional iterator, it need only perform, at most, N/2 swaps. Wherever we want to use different algorithms for different iterator categories, overloading with tag dispatch is a common technique; the STL provides iterator tag types for use as arguments to these overloads.

But with min_unused, we don’t have any differences in the top level algorithm that would give us greater efficiency based on iterator category. Both distance and partition do that work themselves.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 // version 6: document the relaxed iterator category template <typename ForwardIt, typename T = typename std::iterator_traits<ForwardIt>::value_type> inline T min_unused(ForwardIt first, ForwardIt last, T init = T{}) { while (first != last) { T m = init + (std::distance(first, last)+1)/2; auto p = std::partition(first, last, [&] (const T& i) { return i < m; }); if (std::distance(first, p) == m - init) { init = m; first = p; } else { last = p; } } return init; }

Strength reduction

Now is also a good time to perform some manual strength reduction on our code, making sure that each operation used is the weakest option to do the job. This doesn’t really apply to any operation instances in min_unused, but in practical terms it typically means avoiding postfix increment and decrement. If the algorithm implements more than trivial mathematics it may also require quite a bit of decomposition to discover all potential efficiencies and opportunities to eliminate operations. For more examples, I recommend watching any and all of Alex Stepanov’s excellent series of video lectures.

We do have one operation in this algorithm that stands out: on line 7 there is a divide by 2. Twenty years ago, I might have converted this to a shift, but in my opinion the divide is clearer to read, and these days there is no competitive compiler in the world that won’t do this trivial strength reduction. According to my experiments with the Compiler Explorer, MSVC and GCC do it even without optimization.

STL-ready?

So there we have it. Our algorithm is looking pretty good now by my reckoning – appropriately generic and, overall, a good addition to my toolkit.

Remaining are just a couple of final considerations. The first is ranges: the Ranges TS will change the STL massively, providing for new algorithms and changes to existing ones. A specific consequence of ranges is that begin and end iterators need not be the same type any more; how that may affect this algorithm I leave as an exercise to the reader.

Even more generic?

The other consideration takes us further still down the path of making our algorithm as generic as possible. A useful technique for discovering the constraints on types is to print out the code in question and use markers to highlight variables that interact with each other in the same colour. This can reveal which variables interact as well as which ones are entirely disjoint in usage – a big help in determining how they should be made into template parameters. In general, examining how variables interact can be a fruitful exercise.

Thinking about how the variables are used in our case, we see (on line 7) that we need the ability to add the result of distance to T, and (on line 10) that the difference between Ts is comparable to the result of distance. We are used to thinking about addition and subtraction on integers, where they are closed operations (i.e. adding two integers or subtracting two integers gives you another integer), but there is another possibility. And in fact, there is already a good example of it in the STL.

Arithmetic the chrono way

It is possible for the type produced by subtraction on Ts to be something other than T. This is what happens with chrono, with time_point and duration. Subtracting two time_points yields a duration. It is meaningless – and, therefore, illegal – to add two time_points, but durations may be added both to themselves and to time_points.

In mathematical language, chrono models a one-dimensional affine space. (Thanks to Gašper Ažman for that particular insight!)

This is also the case with min_unused: T is our time_point equivalent, and the result of distance is our duration equivalent, suggesting that we can pull out the difference type as a template parameter:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 template <typename ForwardIt, typename T = typename std::iterator_traits<ForwardIt>::value_type, typename DiffT = typename std::iterator_traits<ForwardIt>::difference_type> inline T min_unused(ForwardIt first, ForwardIt last, T init = T{}) { while (first != last) { T m = init + DiffT{(std::distance(first, last)+1)/2}; auto p = std::partition(first, last, [&] (const T& i) { return i < m; }); if (DiffT{std::distance(first, p)} == m - init) { init = m; first = p; } else { last = p; } } return init; }

The advantage here is that we can now use min_unused to deal with any code that has different value and difference types, like chrono:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 // a vector of time_points, 1s apart constexpr auto VECTOR_SIZE = 10; std::vector<std::chrono::system_clock::time_point> v; auto start_time = std::chrono::system_clock::time_point{}; std::generate_n(std::back_inserter(v), VECTOR_SIZE, [&] () { auto t = start_time; start_time += 1s; return t; }); std::shuffle(v.begin(), v.end(), gen); v.resize(3*VECTOR_SIZE/4);   for (auto i : v) std::cout << std::chrono::duration_cast<std::chrono::seconds>(i.time_since_epoch()).count() << '\n';   // we can now find the minimum time_point not present auto u = min_unused< decltype(v.begin()), decltype(start_time), std::chrono::seconds>( v.begin(), v.end());   cout << "Min unused: " << std::chrono::duration_cast<std::chrono::seconds>(u.time_since_epoch()).count() << '\n';

See this code on wandbox.

This concludes our exercise in developing a generic algorithm – at least, for now. Good luck with your own experiments!

### C++17 Class Templates: Deduced or Not?

Thursday, June 15th, 2017

C++17 introduces class template deduction: a way for the compiler to deduce the arguments to construct a class template without our having to write a make_* function. But it’s not quite as straightforward as it seems.

Imagine we have a simple type that will tell us when it’s copied or moved, just for testing.

struct S { S() = default; S(const S&) { std::cout << "copy\n"; } S(S&&) { std::cout << "move\n"; } };

And likewise a very simple template like so:

template <typename T> struct Foo { Foo(const T& t_) : t(t_) {} Foo(T&& t_) : t(std::move(t_)) {}   private: T t; };

Note that we provide two forms of constructor to deal with both rvalues and lvalues being passed in. With C++14, prior to class template deduction, we would write a make_foo function to instantiate this template something like this:

template <typename T> auto make_foo(T&& t) { return Foo<std::decay_t<T>>(std::forward<T>(t)); }

And call it like so:

int main() { // rvalue: move auto f1 = make_foo(S());   // lvalue: copy S s; auto f2 = make_foo(s); }

The important thing here is that the template argument to make_foo is deduced, and the template argument to Foo is not (cannot be, in C++14). Furthermore, because the template argument to make_foo is deduced, make_foo‘s argument is a forwarding reference rather than an rvalue reference, and hence we use perfect forwarding to pass it along to the appropriate constructor for Foo.

So far so good. Now, with C++17, class template deduction is available to us, so we can get rid of make_foo. Now our main() function looks like this:

int main() { // rvalue: move? Foo f3{S()};   // lvalue: copy? S s; Foo f4{s}; }

Here’s the unintuitive part: the template arguments are being deduced now. But in that case, doesn’t that mean that Foo‘s constructor argument is being deduced? Which means that what looks like Foo‘s rvalue constructor is actually now a constructor with a forwarding reference that will outcompete the other constructor? If so, that would be bad – we could end up moving from an lvalue!

I woke up the other morning with this worrying thought and had to investigate. The good news is that even though the code looks like it would break, it doesn’t.

So in fact, yes, although Foo‘s template argument is being deduced, I think the crucial thing is that Foo‘s constructor still takes an rvalue reference – not a forwarding reference – at the point of declaration. And according to 16.3.1.8 [over.match.class.deduct] the compiler is forming an overload set of function templates that match the signatures of the constructors available, and it’s using the correct types. In other words, I think it’s doing something that we could not do: it’s forming a function template, for the purposes of deduction, whose argument is an rvalue reference rather than a forwarding reference.

As is often the case in C++, one needs to be careful to properly distinguish things. It is very easy to get confused over rvalue references and forwarding references, because they look the same. The difference is that forwarding references must be deduced… and in this case, even though it looks like it’s deduced, it isn’t.

Edit: Reddit user mps1729 points out that indeed, neither of the implicitly generated functions is using a forwarding reference, as clarified in 17.8.2.1/3 [temp.deduct.call]. Thanks for the clarification!