Thoughts on Default Construction – Why is a raven like a writing desk?

What does default construction mean? Why do we write default constructors? When and why should we require them? I’ve been pondering these questions lately.

One of the great things that C++ gets right is that it grants programmers the ability to create types that behave like built-in types. Many languages don’t offer this feature, treating the built-in types as special in some way, e.g. limiting us to defining types with reference semantics, or perhaps preventing operators from working with user-defined types. But C++ allows us to define types with value semantics that work, syntactically and semantically, in an identical way to machine types like float and int.

Regular types: “When in doubt, do as the ints do.”

The concept of a regular type is probably familiar to anyone who has watched a C++ conference video or read a popular C++ blog within the past few years. It is an attempt to formalize the semantics of user-defined types in order to match built-in types. Alexander Stepanov and James C. Dehnert wrote in their paper Fundamentals of Generic Programming:

“Since we wish to extend semantics as well as syntax from built-in types to user types, we introduce the idea of a regular type, which matches the built-in type semantics, thereby making our user-defined types behave like built-in types as well.”

And they go on to define the fundamental operations that can be applied to a regular type:

default construction
copy construction
destruction
assignment
equality & inequality
ordering

The reason for choosing these operations is to provide a computational basis that supports interoperation of types with data structures (such as STL containers) and algorithms. The first four of these operations have default definitions in C++ for any type, user-defined or otherwise.

As a computational basis for data structures and algorithms, it seems that all of these operations serve a purpose in code — except default construction. Default construction can be used in specifying semantics, but it is not needed by data structures and algorithms. In From Mathematics to Generic Programming, chapter 10.3, Alexander Stepanov and Daniel Rose define regular types without the operation of default construction, then go on to say:

“Having a copy constructor implies having a default constructor, since T a(b); should be equivalent to T a; a = b;.”

This is a fine and necessary axiom for the semantics of regular types, but we never actually need to write that in C++. We would never write “T a; a = b;“. Instead, we would write it as “T a(b);“, invoking either the copy or the move constructor.

Default construction: inherited from C?

C++ has its roots in C, particularly when considering built-in types. In the Unix tradition, C is famously terse, parsimonious, and unforgiving of mistakes. C++ comes from the same stock with our maxim, “don’t pay for what you don’t use.”

We all recognize the following as undefined behaviour (applicable to C and C++ alike):

int main()
{
  int x;
  return x;
}

When we wrote “int x;” the compiler did nothing to initialize x, and quite rightly so. This is, in fact, precisely the meaning of default construction according to the axioms of regular types — to do as the ints do. Alexander Stepanov and Paul McJones use the phrase “partially formed” to convey this in Elements of Programming:

“An object is a partially formed state if it can be assigned to or destroyed. For an object that is partially formed but not well-formed, the effect of any procedure other than assignment (only on the left side) and destruction is not defined.”

“A default constructor takes no arguments and leaves the object in a partially formed state.”

On first encountering this definition some years ago, I experienced some discomfort; this was not my mental model of default construction for most of my previous programming life. I thought of default construction as a way to completely form an object in some kind of default state. But the more I thought about it, the more I appreciated this new point of view.

As a junior programmer, my notion of a default state was not at all well defined. Many of the types I’ve written over the years used default constructors as a crutch. Some used two-phase construction, ostensibly in the name of performance, but more likely because it was easier to write quickly. Commonly, a default constructor would set sentinel “invalid” values that polluted use of the type, requiring checks in other methods or at call sites. If I was lucky, “default construction” would establish the type’s invariants.

I didn’t have a rigorous idea of what it meant to make a type, nor was I able to formulate a solid argument or semantics for my mental model of default construction — because I wasn’t writing default constructors. I was writing nullary (zero-argument) constructors, and they just don’t make sense for all, or even many, types.

Aside: partially formed == moved-from?

This lack of clarity seems to echo the current situation with moved-from objects. Setting aside any arguments about destructive move, the current standard says that the state of a moved-from object is “valid but unspecified.” It does not mention partially formed objects.

But in my view, the right way to think about moved-from objects is to consider them as having this partially formed state. Moved-from objects may only be assigned to or destroyed, and nothing else (is guaranteed). Where it gets a bit murky is the guaranteed part, because there are some types for which the ability to destroy them necessarily entails the ability to call other methods. Containers spring to mind; for a vector to be properly destructible, it must — albeit coincidentally — also support size() and capacity().

This, in turn, is similar to the situation with certain types of undefined behaviour. These days, signed integer overflow is undefined behaviour but not necessarily malum in se. The overwhelming majority of us are programming on two’s-complement machines where we know exactly what behaviour to expect when wrapping the bit pattern. But the standard tells us that signed integer overflow is undefined behaviour and thus malum prohibitum, and optimizers exploit this. If the standard were to similarly define use-after-move, other than assignment or destruction, as undefined behaviour, I can imagine compilers taking advantage.

Nullary constructors vs default constructors

Back to default construction. How do we make a member of a type? My mental model of construction in general is as follows: a constructor takes an arbitrary piece of memory — a space apportioned from the heap or the stack — and makes that memory into a member of a given type.

I expect this jibes with what most C++ programmers think. The only problem is that this isn’t what default construction does. A partially formed object is not yet an object. There is no semantic link between the bit pattern it contains and the type it will eventually inhabit; that semantic link is made by assignment.

Consider the following examples of “default construction”:

int x;
// 1. Is x an int here? No!

enum struct E : int { A, B, C };
E e;
// 2. Is e an E here? No!

struct Coord { int x; int y; };
Coord c;
// 3. Is c a Coord here? No!

std::pair p;
// 4. Is p a pair here? Yes.

std::vector v;
// 5. Is v a vector here? Yes.

std::unique_ptr u;
// 6. Is u a unique_ptr here? Yes?

We would probably call all of these declarations “default construction” when, in fact, they are all slightly different.

In the first example, some people would claim that x is an int after declaration. After all, in the memory where x lives, there is no possible bit pattern that is not a valid int. Representationally, x is perfectly fine as an int — it’s just undefined. In some sense it’s just a matter of theory that we choose to view x as not yet an int.

The “theory” argument is a little more persuasive in the second example. There are many possible bit patterns in the memory where e lives that don’t contain well-formed values of E. Since we are using C++, there are still arguments that can be made for seeing e as an E, but I won’t go down that rabbit hole, as it isn’t vital to the particular argument I’m pursuing.

Compare examples 3 and 4, the Coord and the pair, respectively. This is an interesting case where c is clearly default constructed and so, by our rules, is not yet a Coord. The pair p looks identical, but the standard says that pair‘s zero-argument constructor value-initializes the elements, meaning ints are zero-initialized. This means that p isn’t default constructed according to regular type axioms; rather, it has been nullary constructed.

Example 5, the vector, is something new entirely. In the case of vector, a partially formed object that must support destruction is coincidentally a well-formed object. The only operations that are not valid on v are the ones, such as front(), that are specifically prohibited because of preconditions.

The last example is interesting because it is an example of a sentinel value within a type that is baked right into the language. A default constructed unique_ptr contains a value-initialized raw pointer. Again, partially formed coincides with well-formed here. But there’s more; the language allows the destructor to call delete on that null pointer. This is a sentinel value, like so many I’ve set inside “default constructors” over the years, but one that is so ubiquitous that we don’t even think of it as unusual. I’ll hazard a guess and say that there are many systems in the world where zero is a fine address to dereference, probably far more than there are one’s-complement systems. It is perhaps only due to history and convenience that we outlaw signed overflow while codifying null pointer sentinels within the language itself.

Considering the differences between these examples, I think it makes sense to mentally differentiate default construction from nullary construction and further, to carefully consider where nullary construction is warranted and where it doesn’t actually make sense.

Sentinels can be harmful

Probably the biggest problem with nullary construction is that it tends to introduce magic sentinel values into the type itself where they should not exist. The most famous magic sentinel is the null pointer, Tony Hoare’s “billion-dollar mistake”, but we run this risk with any number of types that we use with value semantics.

Empty strings are used much more frequently as a sentinel than is sensible. Numeric types are often given default values of zero that can lead to bugs. As any game programmer can tell you, “disappeared” objects can frequently be found at the origin.

Sometimes we take value types that have no natural defaults, give them nullary constructors because we think we have to, and choose sentinel values that deliberately stand out. Colours spring to mind here; there is no real “default value” for a colour. Still, we often write a nullary constructor anyway and use something like bright magenta, indicating that a default colour would be a bug that we want to spot. Why provide a nullary constructor in the first place?

Most types that model real-world quantities, like colour, have no good defaults. There’s no default country. There’s no default gender. There’s no default time zone. Trying to provide defaults for these can lead to bugs, bad user experiences, or both.

It isn’t my intent to build too tall of a straw man here. This sort of thing does happen, but nullary construction of this kind is also something we try to avoid in C++, especially as we get more and more features in the language to mitigate it. Unlike C, we’ve always had the ability to delay declaration until the point of initialization, thus avoiding the need for nullary construction. This is both safer and more efficient. As previously mentioned, we never default construct and initialize; we just copy construct or value-initialize. RAII is considered good, so we use it to avoid two-phase initialization. Regardless of language, using sentinels to signal invalid state can be considered a code smell and a type system power and/or usage failure.

The important point here is that we shouldn’t be making nullary constructors where they don’t really make sense just because we think they’re a requirement. They’re much less of a requirement than we have led ourselves to believe.

Magic values from nowhere

Nullary constructors hinder our ability to reason about code, especially generic code. If we know that a given function output cannot be conjured out of the ether, but can only be constructed from the arguments passed in, then we can infer things about the function. We can discount edge cases in our reasoning if we know from the function signature that the inputs must be used a certain way in order to provide specific output.

A lack of nullary constructors means that total functions are favoured, because it’s not possible to create magic values. To the extent that we can achieve this, it’s desirable, particularly in a value-oriented style of programming. It’s possible to envisage a complete lack of nullary constructors with everything built up from value initialization, n-ary constructors, conversions, et cetera. If even a subsection of the code can be partitioned in this way, it allows us to be more certain of its function.

The odd requirement for nullary construction

One sticking point remains — namely, that there are places in the STL that require nullary construction. Two in particular are commonly cited, one more problematic than the other.

vector::resize

The requirement that vector::resize has on nullary construction is fairly easy to work around. We simply don’t have to use resize, and if we never use it, it’s never instantiated.

Resizing a vector to a larger size is seldom useful; reserve and push_back, emplace_back, or insert handle those use cases. There may once have been an efficiency argument for resize-to-larger, but given move semantics and the fact that any contemporary STL implementation will use memcpy for trivially copyable types, I struggle to come up with any argument for ever calling resize-to-larger these days. Of course, if a situation where it is the best option should ever arise, it can still be used — just not with types that don’t provide nullary constructors.

Resizing to a smaller size can be achieved with erase at negligible extra cost. Of course, nullary construction is not strictly required here, so in the event that vector were revised, I would advocate removing resize from the interface and perhaps instead providing a truncate method to achieve resize-to-smaller.

map::operator[]

The index operator on map has a famously poor signature. It’s frustrating that you can’t use it on a const map even when you know that the value is contained. It is the sole member of map that requires the mapped_type to be nullary constructible.

Happily, C++17 has expanded the interface on map. We now have insert_or_assign to cover the mutable use case of the index operator, and it does not require nullary constructability. In the case of const maps, C++11 offers us map::at, which behaves analogously to vector::at. Although C++17 has optional, it is not yet integrated with container types, so there is currently no lookup function on a const map that returns an optional value.

Nullary constructor leakage

I believe that there are some types in the STL that have nullary constructors for no good reason other than satisfying existing constraints. If we look through the lens of nullary construction being unnecessary, it seems that some things in the STL have nullary constructors only because of container requirements. For example, to my mind, weak_ptr doesn’t need to have a nullary constructor. The nullary constructor of variant also seems out of place, since the entire point of variant is the choice of type contained within. It seems perverse to create a variant without knowing what to put inside of it.

Conclusions

Default construction semantics are not as straightforward as they may seem.

The ability to write default constructors is undeniably valuable if we want to give our types the same semantics as built-in types, but given C++’s heritage and quirks, it’s not always possible to achieve exact parity.

The most useful, rigorous, and consistent model is the one advanced by the works of Alex Stepanov et al. on regular type semantics: default construction produces a partially formed object. A partially formed object has the same semantics as a moved-from object. A partially formed object is not yet a member of its type.

It is instructive to mentally separate the ideas of default construction — giving a partially formed object — and nullary construction — giving a well-formed object which is a member of the type with its invariants established. Some objects are well-formed coincidentally as a result of being partially formed.

We should not write nullary constructors without consideration, particularly for value types. Sensible defaults don’t always exist and sentinels should be avoided. Reasoning about functions and data flow becomes easier if types lack nullary constructors.

The requirement that regular types be default constructible is not an operational requirement; it is merely an axiomatic requirement. The vast majority of methods on most STL containers do not require default construction, and we can work around the few specific methods which do require it. Some types in the STL seem to have nullary constructors simply to fulfil a requirement which is questionable in the first place. Future revisions of the STL should scrutinize the operational requirements for default construction and remove them where possible.

8 comments

Oliver Smith says:

16 August, 2017 at 10:45 am

The compiler would need to know whether an object is initialized.

void f (int& i, T& t)
{
i = kI;
t = kT;
}

Is the second a copy construction or operator=?

The root cause is the complexity, verbosity and non-triviality of having a function call act on new and or temperature.

TBC after work.
thewisp says:

16 August, 2017 at 12:30 pm

Very well said. I wonder though what you think of going value-semantic in game development. It seems difficult to not provide nullary constructors for most types, because the context is not limited to code scopes, but also has side-effects in the game world. Additionally, most game objects need to be serializable for editing and storage. Given the already pervasive two-phase initialization in the code, it feels almost impossible to ever achieve a modern C++ framework in game engines.
Jason Rice says:

16 August, 2017 at 5:36 pm

I think it makes sense for a variant to have a empty default type that it default constructs to.

An optional is conceptually that kind of variant.
Gal says:

16 August, 2017 at 8:20 pm

Great post
Matthieu Poullet says:

17 August, 2017 at 12:42 am

Reminds me of Marc Mutz’s blog post Stepanov-Regularity and Partially-Formed Objects vs. C++ Value Types.
Jonathan Boccara says:

17 August, 2017 at 2:42 am

This is a really interesting view (and a well written article). The cpp core guidelines C.43 (http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rc-default0) encourages to define nullary constructors because some contexts need them, and you’re taking the position of not using those contexts and not needing nullary construction. This makes sense and I find your approach more natural.
If I’m not mistaken, std::array needs nullary construction of its elements to be instantiated. What’s your opinion regarding this? Do you think arrays are used rarely enough to consider them as a side case like vector::resize, or would you make an exception for types you need to store in an array, and define a nullary constructor for them?
elbeno says:

17 August, 2017 at 9:48 am

Hi Jonathan, in the case of std::array (or indeed C-style array) I would say that the requirement for a default constructor can be satisfied by ‘= default;’ i.e. a partially formed object will do. That is consistent with the argument for not providing a nullary constructor that provides a questionable default value.

Hi Matthew, thanks for linking that blog post, it’s very much in line with this.
Zbigniew Skowron says:

17 August, 2017 at 12:43 pm

Variant has an empty state only because without it there was no way of achieving exception safety, without sacrificing performance.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

8 comments

Leave a comment