The internet has lots of C++-related book recommendations to make — for example, you should absolutely read all the authors listed above — at whatever C++ developmental stage is appropriate for you. But since you can already find so many C++-specific book lists out there, I’d like to recommend some books perhaps a bit further off the beaten track that should nevertheless be interesting and help you become a better programmer in C++ or any other language. C++ is a multiparadigm language, and there is a lot outside of the usual C++ ecosystem that is valuable to learn.

These recommendations fall broadly in two categories: books about specific areas of programming, mathematics, or computer science; and books about the history of the subject and industry. There is some overlap between these categories, and I believe that all of my recommendations will either give you extra tools to use or at the very least add context to your endeavours. I can earnestly say that I have read every single one of the books I’m recommending here, although some are now available in editions that have evolved since I last read them.

In no particular order:

*Dealers of Lightning: Xerox PARC and the Dawn of the Computer Age* – Michael A. Hiltzik

I’m kicking off the list with a historical recommendation. As programmers, we tend to exalt the new and pay little attention to the old, but it pays to study the history of the field — *nihil sub sole novum*, after all. *Dealers of Lightning* follows the history of Xerox PARC, the place where luminaries like Alan Kay and Butler Lampson pioneered a few small things you’ve probably heard of, like laser printers, personal computing, ethernet, and GUIs. If your only experience with “object-oriented programming” is the way it’s done in C++, well… yeah. You should study history.

*A Book of Abstract Algebra* – Charles C. Pinter

Perhaps, like me, you have a mathematical background that ends somewhere around high school or your first year of university. Maybe you’ve heard of things like monoids, rings, or groups being used in the context of programming and wondered why smart people like Alexander Stepanov seem to talk about them so much. If you want to fill in the gaps in your knowledge, this is the book for you. One slight word of warning: it is dense, and it is mathematical. The good news is that it is also very accessible. Most of us without university-level mathematics under our belts haven’t studied abstract algebra at all, but it doesn’t really require anything beyond a comprehension of junior high mathematics to get started. When I was a teenager, calculus was hard. As an adult, I’ve been able to expand and enrich my experiences in the physical world, allowing me to make more of the mental connections needed for calculus to be accessible for me. So it is with abstract algebra — it is to programming what calculus is to modelling the real world.

*Digital Typography* – Donald E. Knuth

Ah, Knuth! Arguably the world’s most famous living computer scientist, Knuth is a programming household name thanks to *The Art of Computer Programming*, the complete volumes of which sit unread on many bookshelves. *Digital Typography* is an alternative offering composed of a series of essays relating various aspects of the development of TeX and Metafont. What comes across the most strongly for me while reading this is the astonishing attention to detail that Knuth brings to his works. Once you’re finished reading it, odds are you’ll have read one more Knuth book than most programmers!

*The Garbage Collection Handbook: The Art of Automatic Memory Management* – Richard Jones, Antony Hosking, Eliot Moss

This is the more modern descendant of *Garbage Collection* by Richard Jones and Rafael Lins. C++ has an aversion to garbage collection at the language level, but you might well find yourself implementing garbage collection or using a framework that has GC at some point in your career. After all, smart pointers are a form of garbage collection… that don’t handle loops and incur a potentially unbounded cost on a free. At any rate, this book is the bible of GC algorithms. It’s really interesting and useful for anyone concerned with handling memory — i.e., everyone who writes C++.

*Purely Functional Data Structures* – Chris Okasaki

Functional programming seems to be all the rage these days. Pick a random video from a random C++ conference and there’s an even chance it mentions some FP influence. This seminal book is almost 20 years old and still relevant as we C++-programmers tentatively explore the world of persistent data structures and immutability. It is the book form of Okasaki’s PhD thesis, so it’s quite rigorous, particularly with respect to cost and complexity analysis. It’s also a great read if you want to understand data structures in functional languages and come up with some ideas for your own implementations. The original PhD thesis is also available for free.

*Calendrical Calculations* – Edward M. Reingold & Nachum Dershowitz

This is a niche topic — and therefore an expensive outlay if you can’t apply it — but for those who are interested in date calculations, it’s both comprehensive and fascinating. The code given in the book is in Lisp and is not licensed for commercial use, but the authors also point out that their aim is simply to communicate the ideas and give you a starting point for your own implementation in your language of choice. If you’ve done any kind of CS or programming course, chances are you’ve written a function to determine leap years, so unlike Microsoft Excel, you know that 1900 was not a leap year. This book takes things so much further; when I say that it is comprehensive, I mean astonishingly so. Gregorian and Julian calendars are just the tip of the iceberg here, and the book covers almost everything else you could think of. Jewish and Chinese calendars? Of course. How about Mayan, or French Revolutionary, or a dozen other rules-based or astronomical calendars? If this whets your appetite, you may want to wait for the Ultimate Edition, currently due to be released at the end of March 2018. I wonder if Howard Hinnant is working his way through all these calendars while building his date library?

*About Face: The Essentials of Interaction Design* – Alan Cooper et al.

This is now up to its 4th edition; I read the 2nd. If you work on an application that has a user interface — and they all do — you should read this. If you work with designers, this will help you understand their processes and language. This book strikes the right balance between breadth and depth and covers a lot of ground. I found the chapter “Rethinking Files and Save” particularly thought-provoking.

*Hacker’s Delight* – Henry S. Warren

After a fairly high-level recommendation, this one gets straight to the really low-level fun stuff. This is the bit-twiddler’s almanac, a spiritual descendant of HAKMEM and a trove of two’s complement recreations. If you’ve ever been asked during interview how to count the number of set bits in a word, this is the book you want to reference. It has everything from popcount and power-of-2 boundaries to space-filling curves, error correction, and prime number generation. You should probably also look up HAKMEM (AI Memo 239), the original collection of bit manipulation hacks.

*Pearls of Functional Algorithm Design* – Richard Bird

You’ve probably seen the famous “C++ Seasoning” talk by Sean Parent — if you haven’t, please go watch it now. Perhaps you were motivated to read Stepanov or study algorithms as a result. The Stepanovian C++-centric view of algorithms is very much concerned with counting operations, performing manual strength reductions, and identifying intermediate calculations. This book offers an alternative way to design beautiful algorithms in a functional style using a comparatively top-down approach to express algorithmic ideas while making them fast through clean decomposition and subsequent fusion. I think it’s important to study and apply both camps of algorithm design. It is difficult to write beautiful code if you just “count the swaps” without an awareness of the mathematics behind it, but it is equally difficult to make code fast if you simply “express the mathematics” without any awareness of the operations.

*Taking Sudoku Seriously: The Math Behind the World’s Most Popular Pencil Puzzle* – Jason RosenHouse & Laura Taalman

Like the first recommendation, this isn’t a book that will be immediately applicable to your day job, unless perhaps you’re writing a sudoku mobile app — in which case, buy and read this immediately. But I’m recommending it anyway because I love recreational mathematics, and this book is fun and accessible without dumbing down the material. If you’ve played sudoku and ever thought about solving — or better, generating — the puzzle programmatically, odds are you’ll enjoy this book. Group theory, graph theory, and sudoku variations of every kind abound. As a bonus, it’s dedicated to Martin Gardner.

Any recommendation list is going to be wildly incomplete and subjective, and, of course, there are a lot of books I had to leave out from this one, but perhaps a Part Two will follow at some point in the future. Feel free to chime in if you agree, disagree, or have your own favourites.

]]>This year the program was strong as always, and many slots featured 2 or 3 talks that I wanted to see; of course I could only be in one place at a time. Of those that I attended, here are a few highlights.

Matt Godbolt’s keynote, “Unbolting the Compiler’s Lid”, was excellent. I think Matt was really feeling the love this conference, as well he should; his Compiler Explorer is an excellent tool that has changed the way we communicate about C++. His talk highlighted the accessibility of using Compiler Explorer and I hope that as a result, more C++ programmers get the curiosity to look under the hood and understand what their code is really doing. It’s great also that the whole project is available on Github so that anyone can grab it and use it locally, perhaps on a code base internal to their company.

John Regehr is a well-known blogger and authority on undefined behaviour, and his two-part talk was really good. He showed where and why undefined behaviour exists and offered ideas for a path forwards for dealing with UB. He is also a very good speaker, as I find many university professors tend to be. Scott Schurr’s talk “Type Punning in C++17: Avoiding Pun-defined Behavior” was also a good talk that showed a lot of code and covered practical techniques for avoiding UB. Scott ran out of time and had to omit a section of the slides, but what he did show was very good advice.

I like talks that are both entertaining and rigorous, and Arthur O’Dwyer delivers both in spades. He had two talks at CppCon this year. “dynamic_cast From Scratch” was an excellent exploration of C++ inheritance as implemented against the Itanium ABI, and “A Soupçon of SFINAE” offered very useful techniques for leveraging home-made type traits. There is a lot in both talks that can be applied against the codebases I work in.

This year’s CppCon featured many talks about allocators, and I found two very useful. Bob Steagall presented “How to Write a Custom Allocator”, which particularly opened my eyes to fancy pointers and their use in relocatable heaps. This is a technique I’ll definitely be looking into; I think it has great applicability to my projects. Pablo Halpern presented “Modern Allocators: The Good Parts” on Friday morning, a comprehensive overview of how allocators have changed for C++17 and a really good explanation of how exactly the new polymorphic memory resource approach works. Again, this is something that is very exciting for me personally in terms of how much easier it is to use. In the past I’ve seen people reimplement standard containers just because the allocation was so hard to control. My hope now is that with the C++17 changes, we can use an allocator to get the control we want and not take on the maintenance burden of rewriting so much.

The intense schedule and in-depth topic coverage of CppCon can leave one’s brain tired, so it’s nice now and then to have talks that are fun. Matt Godbolt’s early-morning open content session “Emulating a BBC Micro in Javascript” took me right back to my childhood. Sadly, open content sessions aren’t recorded, but it was great to remember the 6502-based BBC Micro, not to mention fascinating to hear the intricacies of the 8-bit hardware timing and copy protection schemes. Hearing that BBC disk drive and seeing Matt play Elite was a great way to start a Wednesday morning.

Another fun talk to finish up with on Friday was Juan Arrieta’s “Travelling the Solar System with C++: Programming Rocket Science”. This talk was light on C++ content compared to other talks, but really engrossing nonetheless. Juan is a good presenter and hearing all about his work at JPL kept me spellbound for the hour.

All these talks, and more, were great, and when the videos start appearing on YouTube, I’ll be watching the rest – and probably re-watching the ones I liked. Bash Films did a fantastic job last year getting the videos sorted out in short order, so I expect there won’t be long to wait. But for me CppCon isn’t just about the talks; it’s about the connections and the opportunity to discuss things, between sessions or over food and/or drink.

At CppCon, I can ask Marshall Clow about specifics of how the standard library is implemented (why is a moved-from string cleared when it’s within the small buffer, and does it matter?), or I can chat with David Sankel about whether or not the currently-proposed coroutines are sufficiently powerful to represent certain mathematical constructs. I can ask Chandler Carruth about whether such-and-such inhibits RVO (he was noncommittal, but empirical evidence shows that RVO still works fine in that case, yay!). I can hash out new ideas with Louis Dionne and Gašper Ažman for removing current limitations I run into with lambdas. Or I can meet new friends like Stephanie Hurlburt, who’s doing exciting things with texture compression that could really make a difference to the games industry. The list of inspiring people and interactions is almost endless.

I’ve come away from CppCon energised, with tons of new things to try and to apply. Thanks to Jon, Bryce, and everyone else who made CppCon run smoothly, and see you next year!

]]>One of the great things that C++ gets right is that it grants programmers the ability to create types that behave like built-in types. Many languages don’t offer this feature, treating the built-in types as special in some way, e.g. limiting us to defining types with reference semantics, or perhaps preventing operators from working with user-defined types. But C++ allows us to define types with value semantics that work, syntactically and semantically, in an identical way to machine types like `float`

and `int`

.

**Regular types: “When in doubt, do as the ints do.”**

The concept of a regular type is probably familiar to anyone who has watched a C++ conference video or read a popular C++ blog within the past few years. It is an attempt to formalize the semantics of user-defined types in order to match built-in types. Alexander Stepanov and James C. Dehnert wrote in their paper *Fundamentals of Generic Programming*:

“Since we wish to extend semantics as well as syntax from built-in types to user types, we introduce the idea of a regular type, which matches the built-in type semantics, thereby making our user-defined types behave like built-in types as well.”

And they go on to define the fundamental operations that can be applied to a regular type:

- default construction
- copy construction
- destruction
- assignment
- equality & inequality
- ordering

The reason for choosing these operations is to provide a computational basis that supports interoperation of types with data structures (such as STL containers) and algorithms. The first four of these operations have default definitions in C++ for any type, user-defined or otherwise.

As a computational basis for data structures and algorithms, it seems that all of these operations serve a purpose in code — except default construction. Default construction can be used in specifying semantics, but it is not needed by data structures and algorithms. In *From Mathematics to Generic Programming*, chapter 10.3, Alexander Stepanov and Daniel Rose define regular types without the operation of default construction, then go on to say:

“Having a copy constructor implies having a default constructor, since

`T a(b);`

should be equivalent to`T a; a = b;`

.”

This is a fine and necessary axiom for the semantics of regular types, but we never actually need to write that in C++. We would never write “`T a; a = b;`

“. Instead, we would write it as “`T a(b);`

“, invoking either the copy or the move constructor.

**Default construction: inherited from C?**

C++ has its roots in C, particularly when considering built-in types. In the Unix tradition, C is famously terse, parsimonious, and unforgiving of mistakes. C++ comes from the same stock with our maxim, “don’t pay for what you don’t use.”

We all recognize the following as undefined behaviour (applicable to C and C++ alike):

```
int main()
{
int x;
return x;
}
```

When we wrote “`int x;`

” the compiler did nothing to initialize `x`

, and quite rightly so. This is, in fact, precisely the meaning of default construction according to the axioms of regular types — to do as the `int`

s do. Alexander Stepanov and Paul McJones use the phrase “partially formed” to convey this in *Elements of Programming*:

“An object is a partially formed state if it can be assigned to or destroyed. For an object that is partially formed but not well-formed, the effect of any procedure other than assignment (only on the left side) and destruction is not defined.”

“A default constructor takes no arguments and leaves the object in a partially formed state.”

On first encountering this definition some years ago, I experienced some discomfort; this was not my mental model of default construction for most of my previous programming life. I thought of default construction as a way to completely form an object in some kind of default state. But the more I thought about it, the more I appreciated this new point of view.

As a junior programmer, my notion of a default state was not at all well defined. Many of the types I’ve written over the years used default constructors as a crutch. Some used two-phase construction, ostensibly in the name of performance, but more likely because it was easier to write quickly. Commonly, a default constructor would set sentinel “invalid” values that polluted use of the type, requiring checks in other methods or at call sites. If I was lucky, “default construction” would establish the type’s invariants.

I didn’t have a rigorous idea of what it meant to make a type, nor was I able to formulate a solid argument or semantics for my mental model of default construction — because I wasn’t writing default constructors. I was writing nullary (zero-argument) constructors, and they just don’t make sense for all, or even many, types.

**Aside: partially formed == moved-from?**

This lack of clarity seems to echo the current situation with moved-from objects. Setting aside any arguments about destructive move, the current standard says that the state of a moved-from object is “valid but unspecified.” It does not mention partially formed objects.

But in my view, the right way to think about moved-from objects is to consider them as having this partially formed state. Moved-from objects may only be assigned to or destroyed, and nothing else (is guaranteed). Where it gets a bit murky is the guaranteed part, because there are some types for which the ability to destroy them necessarily entails the ability to call other methods. Containers spring to mind; for a `vector`

to be properly destructible, it must — albeit coincidentally — also support `size()`

and `capacity()`

.

This, in turn, is similar to the situation with certain types of undefined behaviour. These days, signed integer overflow is undefined behaviour but not necessarily *malum in se*. The overwhelming majority of us are programming on two’s-complement machines where we know exactly what behaviour to expect when wrapping the bit pattern. But the standard tells us that signed integer overflow is undefined behaviour and thus *malum prohibitum*, and optimizers exploit this. If the standard were to similarly define use-after-move, other than assignment or destruction, as undefined behaviour, I can imagine compilers taking advantage.

**Nullary constructors vs default constructors**

Back to default construction. How do we make a member of a type? My mental model of construction in general is as follows: a constructor takes an arbitrary piece of memory — a space apportioned from the heap or the stack — and makes that memory into a member of a given type.

I expect this jibes with what most C++ programmers think. The only problem is that this isn’t what default construction does. A partially formed object is not yet an object. There is no semantic link between the bit pattern it contains and the type it will eventually inhabit; that semantic link is made by assignment.

Consider the following examples of “default construction”:

```
int x;
// 1. Is x an int here? No!
enum struct E : int { A, B, C };
E e;
// 2. Is e an E here? No!
struct Coord { int x; int y; };
Coord c;
// 3. Is c a Coord here? No!
std::pair
``` p;
// 4. Is p a pair here? Yes.
std::vector v;
// 5. Is v a vector here? Yes.
std::unique_ptr u;
// 6. Is u a unique_ptr here? Yes?

We would probably call all of these declarations “default construction” when, in fact, they are all slightly different.

In the first example, some people would claim that `x`

is an `int`

after declaration. After all, in the memory where `x`

lives, there is no possible bit pattern that is not a valid `int`

. Representationally, `x`

is perfectly fine as an `int`

— it’s just undefined. In some sense it’s just a matter of theory that we choose to view `x`

as not yet an `int`

.

The “theory” argument is a little more persuasive in the second example. There are many possible bit patterns in the memory where `e`

lives that don’t contain well-formed values of `E`

. Since we are using C++, there are still arguments that can be made for seeing `e`

as an `E`

, but I won’t go down that rabbit hole, as it isn’t vital to the particular argument I’m pursuing.

Compare examples 3 and 4, the `Coord`

and the `pair`

, respectively. This is an interesting case where `c`

is clearly default constructed and so, by our rules, is not yet a `Coord`

. The `pair p`

looks identical, but the standard says that `pair`

‘s zero-argument constructor value-initializes the elements, meaning `int`

s are zero-initialized. This means that `p`

isn’t default constructed according to regular type axioms; rather, it has been nullary constructed.

Example 5, the `vector`

, is something new entirely. In the case of `vector`

, a partially formed object that must support destruction is coincidentally a well-formed object. The only operations that are not valid on `v`

are the ones, such as `front()`

, that are specifically prohibited because of preconditions.

The last example is interesting because it is an example of a sentinel value within a type that is baked right into the language. A default constructed `unique_ptr`

contains a value-initialized raw pointer. Again, partially formed coincides with well-formed here. But there’s more; the language allows the destructor to call `delete`

on that null pointer. This is a sentinel value, like so many I’ve set inside “default constructors” over the years, but one that is so ubiquitous that we don’t even think of it as unusual. I’ll hazard a guess and say that there are many systems in the world where zero is a fine address to dereference, probably far more than there are one’s-complement systems. It is perhaps only due to history and convenience that we outlaw signed overflow while codifying null pointer sentinels within the language itself.

Considering the differences between these examples, I think it makes sense to mentally differentiate default construction from nullary construction and further, to carefully consider where nullary construction is warranted and where it doesn’t actually make sense.

**Sentinels can be harmful**

Probably the biggest problem with nullary construction is that it tends to introduce magic sentinel values into the type itself where they should not exist. The most famous magic sentinel is the null pointer, Tony Hoare’s “billion-dollar mistake”, but we run this risk with any number of types that we use with value semantics.

Empty strings are used much more frequently as a sentinel than is sensible. Numeric types are often given default values of zero that can lead to bugs. As any game programmer can tell you, “disappeared” objects can frequently be found at the origin.

Sometimes we take value types that have no natural defaults, give them nullary constructors because we think we have to, and choose sentinel values that deliberately stand out. Colours spring to mind here; there is no real “default value” for a colour. Still, we often write a nullary constructor anyway and use something like bright magenta, indicating that a default colour would be a bug that we want to spot. Why provide a nullary constructor in the first place?

Most types that model real-world quantities, like colour, have no good defaults. There’s no default country. There’s no default gender. There’s no default time zone. Trying to provide defaults for these can lead to bugs, bad user experiences, or both.

It isn’t my intent to build too tall of a straw man here. This sort of thing does happen, but nullary construction of this kind is also something we try to avoid in C++, especially as we get more and more features in the language to mitigate it. Unlike C, we’ve always had the ability to delay declaration until the point of initialization, thus avoiding the need for nullary construction. This is both safer and more efficient. As previously mentioned, we never default construct and initialize; we just copy construct or value-initialize. RAII is considered good, so we use it to avoid two-phase initialization. Regardless of language, using sentinels to signal invalid state can be considered a code smell and a type system power and/or usage failure.

The important point here is that we shouldn’t be making nullary constructors where they don’t really make sense just because we think they’re a requirement. They’re much less of a requirement than we have led ourselves to believe.

**Magic values from nowhere**

Nullary constructors hinder our ability to reason about code, especially generic code. If we know that a given function output cannot be conjured out of the ether, but can only be constructed from the arguments passed in, then we can infer things about the function. We can discount edge cases in our reasoning if we know from the function signature that the inputs must be used a certain way in order to provide specific output.

A lack of nullary constructors means that total functions are favoured, because it’s not possible to create magic values. To the extent that we can achieve this, it’s desirable, particularly in a value-oriented style of programming. It’s possible to envisage a complete lack of nullary constructors with everything built up from value initialization, n-ary constructors, conversions, *et cetera*. If even a subsection of the code can be partitioned in this way, it allows us to be more certain of its function.

**The odd requirement for nullary construction**

One sticking point remains — namely, that there are places in the STL that require nullary construction. Two in particular are commonly cited, one more problematic than the other.

`vector::resize`

The requirement that `vector::resize`

has on nullary construction is fairly easy to work around. We simply don’t have to use `resize`

, and if we never use it, it’s never instantiated.

Resizing a vector to a larger size is seldom useful; `reserve`

and `push_back`

, `emplace_back`

, or `insert`

handle those use cases. There may once have been an efficiency argument for resize-to-larger, but given move semantics and the fact that any contemporary STL implementation will use `memcpy`

for trivially copyable types, I struggle to come up with any argument for ever calling resize-to-larger these days. Of course, if a situation where it is the best option should ever arise, it can still be used — just not with types that don’t provide nullary constructors.

Resizing to a smaller size can be achieved with `erase`

at negligible extra cost. Of course, nullary construction is not strictly required here, so in the event that `vector`

were revised, I would advocate removing `resize`

from the interface and perhaps instead providing a `truncate`

method to achieve resize-to-smaller.

`map::operator[]`

The index operator on `map`

has a famously poor signature. It’s frustrating that you can’t use it on a `const map`

even when you know that the value is contained. It is the sole member of `map`

that requires the `mapped_type`

to be nullary constructible.

Happily, C++17 has expanded the interface on `map`

. We now have `insert_or_assign`

to cover the mutable use case of the index operator, and it does not require nullary constructability. In the case of `const map`

s, C++11 offers us `map::at`

, which behaves analogously to `vector::at`

. Although C++17 has `optional`

, it is not yet integrated with container types, so there is currently no lookup function on a `const map`

that returns an `optional`

value.

**Nullary constructor leakage**

I believe that there are some types in the STL that have nullary constructors for no good reason other than satisfying existing constraints. If we look through the lens of nullary construction being unnecessary, it seems that some things in the STL have nullary constructors only because of container requirements. For example, to my mind, `weak_ptr`

doesn’t need to have a nullary constructor. The nullary constructor of `variant`

also seems out of place, since the entire point of `variant`

is the choice of type contained within. It seems perverse to create a `variant`

without knowing what to put inside of it.

**Conclusions**

Default construction semantics are not as straightforward as they may seem.

The ability to write default constructors is undeniably valuable if we want to give our types the same semantics as built-in types, but given C++’s heritage and quirks, it’s not always possible to achieve exact parity.

The most useful, rigorous, and consistent model is the one advanced by the works of Alex Stepanov *et al.* on regular type semantics: default construction produces a partially formed object. A partially formed object has the same semantics as a moved-from object. A partially formed object is not yet a member of its type.

It is instructive to mentally separate the ideas of default construction — giving a partially formed object — and nullary construction — giving a well-formed object which is a member of the type with its invariants established. Some objects are well-formed coincidentally as a result of being partially formed.

We should not write nullary constructors without consideration, particularly for value types. Sensible defaults don’t always exist and sentinels should be avoided. Reasoning about functions and data flow becomes easier if types lack nullary constructors.

The requirement that regular types be default constructible is not an operational requirement; it is merely an axiomatic requirement. The vast majority of methods on most STL containers do not require default construction, and we can work around the few specific methods which do require it. Some types in the STL seem to have nullary constructors simply to fulfil a requirement which is questionable in the first place. Future revisions of the STL should scrutinize the operational requirements for default construction and remove them where possible.

]]>For our example, let’s consider an algorithm that isn’t (yet) in the STL. First, the problem it solves. Imagine that you have a set (in the mathematical sense, not the container sense) of numbers; discontiguous, unordered, and notionally taken from the positive integers, e.g.:

What we want to find is the smallest number that isn’t included in the set. For the example given above, the answer is 3. If the set is sorted, the solution is apparent; `adjacent_find`

will find the first gap for us. So one solution would be:

```
// assuming our set is in an array or vector
std::sort(a.begin(), a.end());
auto p = std::adjacent_find(
a.begin(), a.end(),
[] (int x, int y) { return y-x > 1; });
int unused = a.back() + 1;
if (p != a.end())
unused = *p + 1;
```

The asymptotic complexity of this solution is of course O(n.log(n)), since `adjacent_find`

is linear and the complexity of `sort`

dominates. Is there a better solution? It’s not obvious at first glance, but there is a linear time solution to this problem. The key is a divide and conquer strategy using that workhorse of algorithms, `partition`

.

**The smallest unused number**

Let’s start by assuming that the minimum unused value is zero. We’ll choose a pivot value, `m`

, which is the assumed midpoint of the sequence – it doesn’t actually have to be present in the sequence, but we’re going to simply assume that it’s the middle value. Partitioning the sequence about the pivot, we’ll get a situation like this:

where the first value equal to or greater than `m`

is at position `p`

(which is returned by the call to `partition`

). If the lower half of the sequence has no gaps, `m`

will be equal to `p`

, and we can recurse on the top half of the sequence, setting the new minimum unused value to `m`

.

We know that `m`

cannot be less than `p`

since there cannot be any repetitions in the set. Therefore, if `m`

is not equal to `p`

, it must be greater – meaning that there is at least one gap below p, and that we can recurse on the bottom half of the sequence (keeping the minimum unused value as-is).

The base case of the algorithm is when the sequence we need to partition is empty. At that point we will have found the minimum unused value. Here’s the algorithm in code, with a bit of setup:

```
void min_unused()
{
// initialize RNG
std::array
``` seed_data;
std::random_device r;
std::generate_n(seed_data.data(), seed_data.size(), std::ref(r));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));
std::mt19937 gen(seq);
// fill an array with ints, shuffle, discard some
int a[10];
std::iota(&a[0], &a[10], 0);
std::shuffle(&a[0], &a[10], gen);
int first_idx = 0;
int last_idx = 7; // arbitrary truncation
for (int i = first_idx; i < last_idx; ++i)
std::cout << a[i] << '\n';
// the algorithm
int unused = 0;
while (first_idx != last_idx) {
int m = unused + (last_idx - first_idx + 1)/2;
auto p = std::partition(&a[first_idx], &a[last_idx],
[&] (int i) { return i < m; });
if (p - &a[first_idx] == m - unused) {
unused = m;
first_idx = p - &a[0];
} else {
last_idx = p - &a[0];
}
}
std::cout << "Min unused: " << unused << '\n';
}

You can also see the algorithm on wandbox. I leave it to you to convince yourself that the algorithm works. The asymptotic complexity of the algorithm is linear; at each stage we are running `partition`

, which is O(n), and then recursing, but only on one half of the sequence. Thus, the complexity is:

**From concrete to generic: first steps**

So, now we have an algorithm that works... for C-style arrays, and in one place only. Naturally we want to make it as generic as we can; what, then, are the steps to transform it into a decent algorithm that can be used just as easily as those in the STL? Perhaps you're already thinking that this is overly concrete - but hey, this is a pedagogical device. Bear with me.

First, we can pull the algorithm out into a function. You've already seen the setup code, so you should be able to imagine the calling code from here on. While we're at it, let's change the index-based arithmetic to pointers. We can pass the sequence in to the function as an argument; for now, we may as well say that the sequence is a `vector`

, because `vector`

is always the right default choice! Our updated function:

```
// version 1: a function that takes a vector
int min_unused(std::vector
```& v)
{
int* first = &v[0];
int* last = first + v.size();
int unused = 0;
while (first != last) {
int m = unused + (last - first + 1)/2;
auto p = std::partition(first, last,
[&] (int i) { return i < m; });
if (p - first == m - unused) {
unused = m;
first = p;
} else {
last = p;
}
}
return unused;
}

This is already looking a little clearer. Using pointers removes some of the syntactic noise and allows us to see what's going on better. But, as you may already have thought, once we have `vector`

s, the next logical step is to consider using iterators. Let's make that change now so that we can run the algorithm on a subrange within any `vector`

.

```
// version 2: use iterators, like a real algorithm
template
```
inline int min_unused(It first, It last)
{
int unused = 0;
while (first != last) {
int m = unused + (last - first + 1)/2;
auto p = std::partition(first, last,
[&] (int i) { return i < m; });
if (p - first == m - unused) {
unused = m;
first = p;
} else {
last = p;
}
}
return unused;
}

Now instead of using the `vector`

directly, now we can pass in `begin()`

and `end()`

, or any other valid range-delimiting pair of iterators. Note also that because `min_unused`

is now a function template, we added `inline`

on line 3, meaning that we can put it in a header without causing link errors when it's instantiated the same way in multiple translation units.

This is a good start, but we can make it more generic yet! At the moment it's still only working on sequences of `int`

s, so let's fix that.

```
// version 3: infer the value_type
template
```
inline typename std::iterator_traits::value_type min_unused(It first, It last)
{
using T = typename std::iterator_traits::value_type;
T unused{};
while (first != last) {
T m = unused + (last - first + 1)/2;
auto p = std::partition(first, last,
[&] (const T& i) { return i < m; });
if (p - first == m - unused) {
unused = m;
first = p;
} else {
last = p;
}
}
return unused;
}

Here we're using `iterator_traits`

to discover the `value_type`

of the iterators passed in. This works with anything that satisfies the Iterator concept. Consequently we've also now removed the `int`

s that were scattered through the code and instead are using `T`

s. Note that we are value-initializing the `T`

on line 6, which will properly zero-initialize integral types.

Since we aren't actually calculating the initial value of `unused`

, we need not assume it to be zero; we could let the caller pass it in, and default the argument for convenience. In order to do that, we'll pull out `T`

into a defaulted template argument.

```
// version 4: let the user supply the initial minimum
template
```::value_type>
inline T min_unused(It first, It last, T init = T{})
{
while (first != last) {
T m = init + (last - first + 1)/2;
auto p = std::partition(first, last,
[&] (const T& i) { return i < m; });
if (p - first == m - init) {
init = m;
first = p;
} else {
last = p;
}
}
return init;
}

A caller-provided minimum might be quite handy - it's common for zero to be a reserved value, or to have a non-zero starting point for values.

**Iterators again**

At this point, `min_unused`

is starting to look more like a real STL algorithm, at least superficially, but it's not quite there yet. Lines 7 and 10 are doing plain arithmetic, which assumes that the iterators passed in support that. A fundamental concept in the STL is the idea of iterator categories: concepts which define operations available on different kinds of iterators. Not all iterators support random access, and any time we are writing a generic algorithm, we had better use the appropriate functions to manipulate iterators rather than assuming valid operations:

```
// version 5: handle iterator arithmetic generically
template
```::value_type>
inline T min_unused(It first, It last, T init = T{})
{
while (first != last) {
T m = init + (std::distance(first, last)+1)/2;
auto p = std::partition(first, last,
[&] (const T& i) { return i < m; });
if (std::distance(first, p) == m - init) {
init = m;
first = p;
} else {
last = p;
}
}
return init;
}

In the code above, the arithmetic on lines 7 and 10 has been replaced with calls to distance. This is a key change! Now we are no longer limited to random access iterators like those of `vector`

or just plain pointers. Our algorithm now works on all kinds of iterators, but which ones exactly? It behooves us to define exactly what we require of these iterators being used as our arguments. Until C++ has Concepts, we can't enforce this, but we *can* document it, typically by calling the template argument something more descriptive than '`It`

'. Looking at the references for `distance`

and `partition`

, we find that a forward iterator is required.

For many algorithms in the STL, there are more efficient versions available for stronger iterator categories. A good example is `partition`

itself: with just a forward iterator, it may do N swaps, but with a bidirectional iterator, it need only perform, at most, N/2 swaps. Wherever we want to use different algorithms for different iterator categories, overloading with tag dispatch is a common technique; the STL provides iterator tag types for use as arguments to these overloads.

But with `min_unused`

, we don't have any differences in the top level algorithm that would give us greater efficiency based on iterator category. Both `distance`

and `partition`

do that work themselves.

```
// version 6: document the relaxed iterator category
template
```::value_type>
inline T min_unused(ForwardIt first, ForwardIt last, T init = T{})
{
while (first != last) {
T m = init + (std::distance(first, last)+1)/2;
auto p = std::partition(first, last,
[&] (const T& i) { return i < m; });
if (std::distance(first, p) == m - init) {
init = m;
first = p;
} else {
last = p;
}
}
return init;
}

**Strength reduction**

Now is also a good time to perform some manual strength reduction on our code, making sure that each operation used is the weakest option to do the job. This doesn't really apply to any operation instances in `min_unused`

, but in practical terms it typically means avoiding postfix increment and decrement. If the algorithm implements more than trivial mathematics it may also require quite a bit of decomposition to discover all potential efficiencies and opportunities to eliminate operations. For more examples, I recommend watching any and all of Alex Stepanov's excellent series of video lectures.

We do have one operation in this algorithm that stands out: on line 7 there is a divide by 2. Twenty years ago, I might have converted this to a shift, but in my opinion the divide is clearer to read, and these days there is no competitive compiler in the world that won't do this trivial strength reduction. According to my experiments with the Compiler Explorer, MSVC and GCC do it even without optimization.

**STL-ready?**

So there we have it. Our algorithm is looking pretty good now by my reckoning - appropriately generic and, overall, a good addition to my toolkit.

Remaining are just a couple of final considerations. The first is ranges: the Ranges TS will change the STL massively, providing for new algorithms and changes to existing ones. A specific consequence of ranges is that begin and end iterators need not be the same type any more; how that may affect this algorithm I leave as an exercise to the reader.

**Even more generic?**

The other consideration takes us further still down the path of making our algorithm as generic as possible. A useful technique for discovering the constraints on types is to print out the code in question and use markers to highlight variables that interact with each other in the same colour. This can reveal which variables interact as well as which ones are entirely disjoint in usage - a big help in determining how they should be made into template parameters. In general, examining how variables interact can be a fruitful exercise.

Thinking about how the variables are used in our case, we see (on line 7) that we need the ability to add the result of `distance`

to `T`

, and (on line 10) that the difference between `T`

s is comparable to the result of `distance`

. We are used to thinking about addition and subtraction on integers, where they are closed operations (i.e. adding two integers or subtracting two integers gives you another integer), but there is another possibility. And in fact, there is already a good example of it in the STL.

**Arithmetic the chrono way**

It is possible for the type produced by subtraction on `T`

s to be something other than `T`

. This is what happens with `chrono`

, with `time_point`

and `duration`

. Subtracting two `time_point`

s yields a `duration`

. It is meaningless - and, therefore, illegal - to add two `time_point`

s, but `duration`

s may be added both to themselves and to `time_point`

s.

In mathematical language, `chrono`

models a one-dimensional affine space. (Thanks to Gašper Ažman for that particular insight!)

This is also the case with `min_unused`

: `T`

is our `time_point`

equivalent, and the result of `distance`

is our `duration`

equivalent, suggesting that we can pull out the difference type as a template parameter:

`template `::value_type,
typename DiffT = typename std::iterator_traits::difference_type>
inline T min_unused(ForwardIt first, ForwardIt last, T init = T{})
{
while (first != last) {
T m = init + DiffT{(std::distance(first, last)+1)/2};
auto p = std::partition(first, last,
[&] (const T& i) { return i < m; });
if (DiffT{std::distance(first, p)} == m - init) {
init = m;
first = p;
} else {
last = p;
}
}
return init;
}

The advantage here is that we can now use `min_unused`

to deal with any code that has different value and difference types, like `chrono`

:

```
// a vector of time_points, 1s apart
constexpr auto VECTOR_SIZE = 10;
std::vector
``` v;
auto start_time = std::chrono::system_clock::time_point{};
std::generate_n(std::back_inserter(v), VECTOR_SIZE,
[&] () {
auto t = start_time;
start_time += 1s;
return t;
});
std::shuffle(v.begin(), v.end(), gen);
v.resize(3*VECTOR_SIZE/4);
for (auto i : v)
std::cout
<< std::chrono::duration_cast(i.time_since_epoch()).count()
<< '\n';
// we can now find the minimum time_point not present
auto u = min_unused<
decltype(v.begin()), decltype(start_time), std::chrono::seconds>(
v.begin(), v.end());
cout << "Min unused: "
<< std::chrono::duration_cast(u.time_since_epoch()).count()
<< '\n';

See this code on wandbox.

This concludes our exercise in developing a generic algorithm - at least, for now. Good luck with your own experiments!

]]>`make_*`

function. But it’s not quite as straightforward as it seems.
Imagine we have a simple type that will tell us when it’s copied or moved, just for testing.

```
struct S
{
S() = default;
S(const S&) { std::cout << "copy\n"; }
S(S&&) { std::cout << "move\n"; }
};
```

And likewise a very simple template like so:

`template `
struct Foo
{
Foo(const T& t_) : t(t_) {}
Foo(T&& t_) : t(std::move(t_)) {}
private:
T t;
};

Note that we provide two forms of constructor to deal with both rvalues and lvalues being passed in. With C++14, prior to class template deduction, we would write a `make_foo`

function to instantiate this template something like this:

`template `
auto make_foo(T&& t)
{
return Foo>(std::forward(t));
}

And call it like so:

```
int main()
{
// rvalue: move
auto f1 = make_foo(S());
// lvalue: copy
S s;
auto f2 = make_foo(s);
}
```

The important thing here is that the template argument to `make_foo`

is deduced, and the template argument to `Foo`

is not (cannot be, in C++14). Furthermore, because the template argument to `make_foo`

is deduced, `make_foo`

's argument is a *forwarding reference* rather than an rvalue reference, and hence we use perfect forwarding to pass it along to the appropriate constructor for `Foo`

.

So far so good. Now, with C++17, class template deduction is available to us, so we can get rid of `make_foo`

. Now our `main()`

function looks like this:

```
int main()
{
// rvalue: move?
Foo f3{S()};
// lvalue: copy?
S s;
Foo f4{s};
}
```

Here's the unintuitive part: the template arguments are being deduced now. But in that case, doesn't that mean that `Foo`

's constructor argument is being deduced? Which means that what looks like `Foo`

's rvalue constructor is actually now a constructor with a forwarding reference that will outcompete the other constructor? If so, that would be bad - we could end up *moving from an lvalue*!

I woke up the other morning with this worrying thought and had to investigate. The good news is that even though the code looks like it would break, it doesn't.

So in fact, yes, although `Foo`

's template argument is being deduced, I think the crucial thing is that `Foo`

's constructor still takes an rvalue reference - *not* a forwarding reference - at the point of declaration. And according to 16.3.1.8 [over.match.class.deduct] the compiler is forming an overload set of function templates that match the signatures of the constructors available, and it's using the correct types. In other words, I think it's doing something that we could not do: it's forming a function template, for the purposes of deduction, whose argument is an rvalue reference rather than a forwarding reference.

As is often the case in C++, one needs to be careful to properly distinguish things. It is very easy to get confused over rvalue references and forwarding references, because they look the same. The difference is that forwarding references must be deduced... and in this case, even though it looks like it's deduced, it isn't.

Edit: Reddit user mps1729 points out that indeed, neither of the implicitly generated functions is using a forwarding reference, as clarified in 17.8.2.1/3 [temp.deduct.call]. Thanks for the clarification!

]]>**The keynotes**

This year’s keynotes focused on what C++ can learn from other languages – Rust, Haskell, and D.

Rust: Hack Without Fear! – Niko Matsakis

The Rust keynote was really good, showcasing Rust’s ownership model that allows it to achieve memory safety and freedom from data races. Pretty cool stuff, and something that C++ is just beginning to take baby steps towards.

Competitive Advantage with D – Ali Çehreli

The D keynote was very underwhelming – I didn’t come away with a good sense of where D is better than C++ and why its features are desirable.

Haskell Taketh Away: Limiting Side Effects for Parallel Programming – Ryan Newton

The Haskell keynote was probably my favourite of the trio, and the presenter did a really good job of not going down the Haskell rabbit holes so common in such talks. He didn’t try to explain monads or anything like that. He didn’t explain how return is different. He did explain real usage of Haskell and the memory model and choices for achieving side effects, and he was obviously an experienced speaker. I know some Haskell, and I was surprised to see that he put up some code on slides that was pretty advanced for a C++ crowd of mostly Haskell novices. But I think he did a great job of keeping the talk concrete, and some of the compiler technology and directions he talked about toward the end of the talk were pretty amazing.

**Selected talks**

C++11’s Quiet Little Gem: `<system_error>`

– Charley Bay

This was the first talk of the week after the opening keynote. Charley is a high-energy and entertaining speaker and this talk really opened my eyes to the goodness that is `<system_error>`

and how to use it. Around 350 fast-moving slides! I’ll definitely be using and recommending `<system_error>`

in future.

The Mathematical Underpinnings of Promises in C++ – David Sankel

This was the kind of talk you only find at C++Now and it was great. David engages the audience really well on esoteric topics and showed how a rigorous mathematical treatment of a subject can help ensure that an API is appropriately powerful, but also that a good API shouldn’t just be a simple reflection of the mathematical operations.

He followed up this talk with its complement, a practical talk about using the API (Promises in C++: The Universal Glue for Asynchronous Programs). I’m sold: `std::future`

is crippled, and a library like his promise library is the right way to deal with asynchronous control flow.

The ‘Detection Idiom’: A Better Way to SFINAE – Marshall Clow

This was a repeat of Marshall’s recent ACCU talk, with the opportunity for the C++Now audience to ask plenty of questions. The detection idiom (developed by Walter Brown in various talks and papers over the last few years) is a pretty simple way to do one common kind of metaprogramming operation: detecting whether a given type supports something. I feel like this is exactly the kind of thing which is going to help mainstream C++ take advantage of metaprogramming recipes: fairly uncomplicated techniques that just make code better.

The Holy Grail: A Hash Array Mapped Trie For C++ – Phil Nash

Phil presented a followup to his Meeting C++ and ACCU talk “Functional C++ for Fun and Profit“. He showed a very cool persistent HAMT implementation. There was plenty of hard data and code to digest in this talk and like all the best talks, I walked away wanting to play with things I’d just seen. He also noted that “trie” is properly pronounced “tree” (although many people say “try” which is good for disambiguation) and that he’d be sticking to “tree” not only for that reason, but because he didn’t want to be known for both “try” and Catch!

**Other thoughts**

The annual week in Aspen for C++Now is the highlight of my professional year. The program this year was very strong and I’m sad that I had to miss talks and make hard decisions about what to see in any given slot. But the talks are only half of this conference: at least as important is the socialising and discussion that goes on. C++Now is deliberately scheduled with long sessions and long breaks between sessions so that attendees can chat, argue, and collaborate. At other conferences I learn a handful of new things over the course of a week. At C++Now I learn dozens of new techniques, tricks and titbits at the rate of several per hour. Just a few examples of things that happen at C++Now:

- Library in a Week: Jeff Garland corrals whoever wants to get involved, and over the week people implement really cool things. It’s like a hackathon within a conference.
- Having a problem with your benchmarking? Take your laptop to the hotel bar where Chandler Carruth will take a look at your code and tell you exactly how the optimizer sees it, and what you can do to make it twice as fast.
- The student volunteer program is simply amazing and inspiring. Just look at the current crop and alumni and you see young men and women who are incredibly smart and doing amazing things with C++. Figuring out new tricks and ways to stretch the language that will become codified idiom, making the library you use in a couple of years faster, cleaner, and nicer to use.

Finally, my own talk (with Jason Turner) – entitled “constexpr ALL the things!” – was very well received. I’m heading home tomorrow, but the experiences from Aspen will go back with me, leading to improvements in my own code and that of my work colleagues, making lots of things better. And the friendships I’ve made and refreshed during this week will continue that flow of information and improvement through the year.

]]>`<chrono>`

and `<random>`

functionality, with cryptarithmetic interludes…
At CppCon this year there were several good talks about randomness and time calculations in C++. On randomness: Walter Brown’s What C++ Programmers Need to Know About Header <random> and Cheinan Marks’ I Just Wanted a Random Integer! were both excellent talks. And Howard Hinnant gave several great talks: A <chrono> Tutorial, and Welcome to the Time Zone, a followup to his talk from last year, A C++ Approach to Dates and Times.

**CHRONO + RANDOM = HORRID ?**

That’s perhaps a little unfair, but recently I ran into the need to compute a random period of time. I think this is a common use case for things like backoff schemes for network retransmission. And it seemed to me that the interaction of `<chrono>`

and `<random>`

was not quite as good as it could be:

```
system_clock::duration minTime = 0s;
system_clock::duration maxTime = 5s;
uniform_int_distribution<> d(minTime.count(), maxTime.count());
// 'gen' here is a Mersenne twister engine
auto nextTransmissionWindow = system_clock::duration(d(gen));
```

This code gets more complex when you start computing an exponential backoff. Relatively straightforward, but clumsy, especially if you want a floating-point base for your exponent calculation: `system_clock::duration`

has an integral representation, so in all likelihood you end up having to cast multiple times, using either `static_cast`

or `duration_cast`

. That’s a bit messy.

I remembered some code from another talk: Andy Bond’s AAAARGH!? Adopting Almost Always Auto Reinforces Good Habits!? in which he presented a function to make a uniform distribution by inferring its argument type, useful in generic code. Something like the following:

`template `,
typename D = std::uniform_int_distribution>
inline auto make_uniform_distribution(const A& a,
const B& b = std::numeric_limits**::max())
-> std::enable_if_t**::value, D>
{
return D(a, b);
}

Of course, the standard also provides `uniform_real_distribution`

, so we can provide another template and overload the function for real numbers:

`template `,
typename D = std::uniform_real_distribution>
inline auto make_uniform_distribution(const A& a,
const B& b = B{1})
-> std::enable_if_t::value, D>
{
return D(a, b);
}

And with these two in hand, it’s easy to write a `uniform_duration_distribution`

that uses the correct distribution for its underlying representation (using a home-made type trait to constrain it to `duration`

types).

`template `
struct is_duration : std::false_type {};
template
struct is_duration> : std::true_type {};
template ::value>>
class uniform_duration_distribution
{
public:
using result_type = Duration;
explicit uniform_duration_distribution(
const Duration& a = Duration::zero(),
const Duration& b = Duration::max())
: m_a(a), m_b(b)
{}
void reset() {}
template
result_type operator()(Generator& g)
{
auto d = make_uniform_distribution(m_a.count(), m_b.count());
return result_type(d(g));
}
result_type a() const { return m_a; }
result_type b() const { return m_b; }
result_type min() const { return m_a; }
result_type max() const { return m_b; }
private:
result_type m_a;
result_type m_b;
};

Having written this, we can once again overload `make_uniform_distribution`

to provide for `duration`

types:

`template `,
typename D = uniform_duration_distribution>
inline auto make_uniform_distribution(const A& a,
const B& b = B::max()) -> D
{
return D(a, b);
}

And now we can compute a random `duration`

more expressively and tersely, and, I think, in the spirit of the existing functionality that exists in `<chrono>`

for manipulating `duration`

s.

```
auto d = make_uniform_distribution(0s, 5000ms);
auto nextTransmissionWindow = d(gen);
```

**CHRONO + RANDOM = DREAMY**

I leave it as an exercise for the reader to solve these cryptarithmetic puzzles. As for the casting problems, for now, I’m living with them.

]]>`find`

correctly. (Although even such a trivial-looking algorithm is based on a thorough understanding of iterator categories.)
**Uncovering the overlooked**

But there are some algorithms that are non-trivial, and some are important building blocks. Take `inplace_merge`

. For brevity, let’s consider the version that just uses `operator<`

rather than being parametrized on the comparison. The one easily generalizes to the other in a way that is not important to the algorithm itself.

`template `
void inplace_merge(BidirectionalIterator first,
BidirectionalIterator middle,
BidirectionalIterator last);

It merges two consecutive sorted ranges into one sorted range. That is, if we have an input like this:

Where and

We get this result occupying the same space:

Where and the new range is a permutation of the original ranges. In addition, the standard states a few additional constraints:

`inplace_merge`

is stable - that is, the relative order of equivalent elements is preserved- it uses a
`BidirectionalIterator`

which shall be`ValueSwappable`

and whose dereferent (is that a word?) shall be`MoveConstructible`

and`MoveAssignable`

- when enough additional memory is available, (
`last-first-1`

) comparisons. Otherwise an algorithm with complexity`N log(N)`

(where`N = last-first`

) may be used

**Avenues of enquiry**

Leaving aside the possible surprise of discovering that an STL algorithm may allocate memory, some thoughts spring to mind immediately:

- Why does
`inplace_merge`

need a`BidirectionalIterator`

? *How much*memory is required to achieve O(n) performance? Is a constant amount enough?

And to a lesser extent perhaps:

- Why are
`merge`

and`inplace_merge`

not named the same way as other algorithms, where the more normal nomenclature might be`merge_copy`

and`merge`

? - What is it with the algorists' weasel-word "in-place"?

**First thoughts about the algorithm**

It seems that an O(n log n) algorithm should be possible on average, because in the general case, simply sorting the entire range produces the desired output. Although the sort has to be stable, which means something like merge sort, which leads us down a recursive rabbit hole. Hm.

At any rate, it's easy to see how to achieve `inplace_merge`

with no extra memory needed by walking iterators through the ranges:

`template `
void naive_inplace_merge(
ForwardIt first, ForwardIt middle, ForwardIt last)
{
while (first != middle && middle != last) {
if (*middle < *first) {
std::iter_swap(middle, first);
auto i = middle;
std::rotate(++first, i, ++middle);
} else {
++first;
}
}
}

After swapping (say) and , the ranges look like this:

And the call to `rotate`

fixes up to be ordered again. From there we proceed as before on the ranges and .

This algorithm actually conforms to the standard! It has O(n) comparisons, uses no extra memory, and has the advantage that it works on `ForwardIterator`

! But unfortunately, it's O(n²) overall in operations, because of course, `rotate`

is O(n). So how can we do better?

**Using a temporary buffer**

If we have a temporary buffer available that is equal in size to the smaller of the two ranges, we can move the smaller range to it, move the other range up if necessary, and perform a "normal" merge of the two ranges into the original space:

`template `
void naive_inplace_merge2(
BidirIt first, BidirIt middle, BidirIt last)
{
using T = typename std::iterator_traits::value_type;
auto d1 = std::distance(first, middle);
auto d2 = std::distance(middle, last);
auto n = std::min(d1, d2);
auto tmp = std::make_unique(n * sizeof(T));
T* begint = reinterpret_cast(tmp.get());
T* endt = begint + n;
if (d1 <= d2)
{
std::move(first, middle, begint);
std::merge(begint, endt, middle, last, first);
}
else
{
std::move(middle, last, begint);
auto i = std::move_backward(first, middle, last);
std::merge(i, last, begint, endt, first);
}
}

This is essentially the algorithm used by STL implementations if buffer space is available. And this is the reason why `inplace_merge`

requires `BidirectionalIterator`

: because `move_backward`

does.

(This isn't quite optimal: the `std::move_backward`

can be mitigated with reverse iterators and predicate negation, but the `BidirectionalIterator`

requirement remains. Also, strictly speaking, `std::merge`

is undefined behaviour here because one of the input ranges overlaps the output range, but we know the equivalent loop is algorithmically safe.)

Provisioning of the temporary buffer is also a little involved because we don't know that elements in the range are default constructible (and perhaps we wouldn't want to default-construct our temporaries anyway). So to deal correctly with non-trivial types here, `std::move`

should actually be a loop move-constructing values. And when `std::inplace_merge`

is used as a building block for e.g. `std::stable_sort`

, it would also be nice to minimize buffer allocation rather than having an allocation per call. Go look at your favourite STL implementation for more details.

**Thinking further**

The literature contains more advanced algorithms for merging if a suitably-sized buffer is not available: the basis for the STL's choice is covered in Elements of Programming chapter 11, and in general the work of Dudzinski & Dydek and of Kim & Kutzner seems to be cited a lot.

But I knew nothing of this research before tackling the problem, and attempting to solve it requiring just `ForwardIterator`

.

I spent a couple of evenings playing with how to do `inplace_merge`

. I covered a dozen or more A4 sheets of squared paper with diagrams of algorithms evolving. I highly recommend this approach! After a few hours of drawing and hacking I had a really good idea of the shape of things. Property-based testing came in very handy for breaking my attempts, and eventually led me to believe that a general solution on the lines I was pursuing would either involve keeping track of extra iterators or equivalently require extra space. Keeping track of iterators seemed a messy approach, so an extra space approach is warranted.

How much extra space? Consider the "worst case":

Assume for the moment that . When , we need extra space to hold all of . If then we will need extra space for to likewise move them out of the way. Either way, the number of units of extra space we need is `min(n, m)`

.

As we move elements of into temporary storage, we can see that in general at each stage of the algorithm we will have a situation something like this (using to mean a moved-from value):

With some values of moved into temporary storage:

The temporary storage here is a queue: we always push on to the end and pop from the beginning, since the elements in it start, and remain, ordered. Since we know an upper bound on the number of things in the queue at any one time, it can be a ring buffer (recently proposed) over a regular array of values.

**Sketching the start**

From this, we can start sketching out an algorithm:

- Allocate a buffer of size
`min(m, n)`

- call it`tmp`

- We'll walk the iterators along the
`x`

(`first`

) and`y`

(`middle`

) ranges - The output iterator
`o`

will start at`first`

- The next
`x`

to consider will either be in-place in the`x`

range, or (if`tmp`

is not empty) in`tmp`

- call it`xlow`

- If
`*y`

<`*xlow`

move`*x`

to`tmp`

, move`*y`

to`o`

, inc`y`

- Otherwise, if
`*xlow`

is in`tmp`

, move`*x`

to`tmp`

and`*xlow`

from`tmp`

to`o`

- inc
`o`

, inc`x`

- if
`y`

<`last`

and`o`

<`middle`

, goto 4 - deal with the end(s) of the ranges

**Dealing with the end**

This gets us as far as exhausting the smaller range: after this, we will be in one of two situations.

*Situation 1*. If we have exhausted the range, things look like this:

With values of in temporary storage:

To fix up this situation, we can repeatedly swap the `tmp`

range with the equivalent `x`

range until we reach `middle`

(i.e ), and then simply move the remaining `tmp`

values into place.

I originally wrote a loop repeatedly swapping the values in `tmp`

right up to the end; but I realised that would involve swapping a moved-from object, which would be wrong (it might work... until it doesn't). Moved-from objects should either be destroyed or made whole (assigned to); nothing else.

*Situation 2*. The possibility is that we have exhausted the range, in which case things look like this:

With values of in temporary storage:

To fix up this situation, we can just do a regular `merge`

on the remaining `y`

range and `tmp`

, outputting starting at middle (i.e ). (With the same proviso as before about undefined behaviour with overlapping ranges.) We know that it will be safe to do a loop equivalent to `merge`

, because we have exactly the required amount of space before to fit . This is the same as the STL's normal buffered merge strategy.

**Final thoughts**

I tackled this exercise from scratch, knowing nothing about actual implementations of `inplace_merge`

. This algorithm does some extra housekeeping under the hood, but:

- it does the minimum number of comparisons
- each element is moved at most twice: into
`tmp`

and out again - it needs only
`ForwardIterator`

Optimization and benchmarking under differing conditions of range size, comparison and move cost is left as an exercise to the reader...

I cannot recommend Elements of Programming enough. I am partway through reading it; after this exercise I skipped to chapter 11 to see what it said. Every time I dive into the STL algorithms, I am re-impressed by the genius of Alex Stepanov: Paul McJones' recent talk The Concept of Concept explains this well, in particular the key role of concepts in the STL in service of algorithmic purity. Alex knew about concepts from the start: it's taken C++ over 2 decades to catch up.

After doing this work, I discovered a recent proposal that calls for weakening the iterator categories of `inplace_merge`

and related algorithms.

An implementation of this algorithm is on github. It's just a sketch, written for clarity rather than optimality. This has been a fun exercise.

]]>Wikipedia says: “In abstract algebra, a branch of mathematics, a monoid is an algebraic structure with a single associative binary operation and an identity element.”

Wolfram says: A monoid is a set that is closed under an associative binary operation and has an identity element I ∈ S such that for all a ∈ S, Ia = aI = a.

Mathematics has to be precise, which is why it uses jargon. But what do these concise definitions mean in everyday language? Consider adding up numbers.

- The set is the whole numbers (and we need zero). 0, 1, 2, 3 etc.
- The associative binary operation is addition.
- “binary” just means it’s a thing you do to two numbers.
- “associative” means it doesn’t matter what order you group things in. 1 + 2 + 3 gives the same answer whether you add 1 and 2 first and then add 3, or add 2 and 3 first and then add the answer to 1.

- The set being “closed” under addition means that when you add two numbers you get another number – you don’t get some other kind of thing. (You might think this is obvious, but in maths it has to be stated.)
- The identity element is 0 – the thing that doesn’t make any difference when you add it. Anything plus zero is itself.

So adding whole numbers is a monoid. A mathematician would say that the non-negative integers form a monoid under addition. The important thing is that the numbers aren’t a monoid on their own; it’s the combination of the set (0, 1, 2, 3…) *and* the operation (+) that makes the monoid. If we chose another operation, we could get another monoid. Think about multiplication, for instance.

It turns out that lots of things behave the same way as addition on numbers, which is why the notion of a monoid is very useful to mathematicians and computer scientists.

]]>- Calling
`rand()`

is lame because it’s an LCG with horrible randomness properties, and we can do better. And if you’re not calling`rand()`

, there’s no reason to call`srand()`

. - Using
`time(NULL)`

to seed your RNG is lame because it doesn’t have enough entropy. It’s only at a second resolution, so in particular, starting multiple processes (e.g. a bringing up bunch of servers) at the same time is likely to seed them all the same. - No,
`rand()`

isn’t good enough even for simple uses, and it’s easy to do the right thing these days. The lower order bits of`rand()`

‘s output are particularly non-random, and odds are that if you’re using`rand()`

you’re also using`%`

to get a number in the right range. See item 6. - In C++14
`random_shuffle()`

is deprecated, and it’s removed in C++17, which ought to be reason enough. If you need more reason, one version of it is inconvenient to use properly (it uses a Fisher-Yates/Knuth shuffle so takes an RNG that has to return random results in a shifting range) and the other version of it can use`rand()`

under the hood. See item 1. `default_random_engine`

is implementation-defined, but in practice it’s going to be one of the standard generators, so why not just be explicit and cross-platform-safe (hint: item 10)?. Microsoft’s default is good, but libc++ and libstdc++ both use LCGs as their default at the moment. So not much better than`rand()`

.- It is overwhelmingly likely that whatever RNG you use, it will output something in a power-of-two range. Using
`%`

to get this into the right range probably introduces bias. Re item 3, consider a canonical simple use: rolling a d6. No power of two is divisible by 6, so inevitably,`%`

will bias the result. Use a distribution instead. STL (and others) have poured a lot of time into making sure they aren’t biased. `random_device`

is standard, easy to use, and should be high quality randomness. It may not be very well-performing, which is why you probably want to use it for seeding only. But you do want to use it (mod item 8).- Just know your platform. It might be fine in desktop-land, but
`random_device`

isn’t always great. It’s supposed to be nondeterministic and hardware based if that’s available… trust but verify, as they say. - Not handling exceptions is lame. And will bite you. I know this from experience with
`random_device`

specifically. - The Mersenne twisters are simply the best randomness currently available in the standard.
- Putting
`mt19937`

on the stack: a) it’s large (~2.5k) and b) you’re going to be initializing it each time through. So probably not the best. See item 17 for an alternative. - You’re just throwing away entropy if you don’t seed the generator’s entire state. (This is very common, though.)
- Simply,
`uniform_int_distribution`

works on a closed interval (as it must – otherwise it couldn’t produce the maximum representable value for the given integral type). If you forget this, it’s a bug in your code – and maybe one that takes a while to get noticed. Not good. - Forgetting
`ref()`

around your generator means you’re copying the state, which means you’re probably not advancing the generator like you thought you were. `seed_seq`

is designed to seed RNGs, it’s that simple. It tries to protect against poor-quality data from`random_device`

or whatever variable-quality source of entropy you have.- Not considering thread safety is always lame. Threads have been a thing for quite a while now.
`thread_local`

is an easy way to get “free” thread safety for your generators.- You should be using a Mersenne twister (item 10) so just use the right thing for
`max()`

. Job done.

If you want more, see `rand()`

Considered Harmful (a talk by Stephan T Lavavej), or The bell has tolled for `rand()`

(from the Explicit C++ blog), or see Melissa O’Neill’s Reddit thread, her talk on PCG, and the associated website.

And of course, cppreference.com.

]]>