Competing constructors

We start with a known C++ gotcha:

std::vector<int> v (size_t(4), 2); // parentheses
std::vector<int> u {size_t(4), 2}; // braces
 
assert (v.size() == 4);
assert (u.size() == 2);
 
assert (v[0] == 2); // elements: {2, 2, 2, 2}
assert (u[0] == 4); // elements: {4, 2}

In this post I want to analyze the source of the problem a bit further, and offer some suggestions on class design.

There is no mistake here. For why changing parentheses to braces changes the result see here. For why there is no narrowing conversion here (that would have been caught by brace initialization) see here.

The two relevant std::vector<int> constructors here, after simplifications, are these:

vector(size_t s, int v);
vector(initializer_list<int> vals);

Suppose we only use brace notation for initialization; the above two declarations can be read as saying:

1. Whenever you see an initialization with size_t as first argument and a int as second argument, interpret it as the request to fill the vector with s identical ints of value v.

2. Whenever you see an initialization with any sequence of ints, treat them as list that needs to be stored inside the vector.

However, given that we have all these implicit conversions in C++, and that narrowing detection does not detect all conversions from bigger types to smaller types, we can say that the two declarations say something closer to:

1. An initialization with two ints means, the first one is the size, and the other is the reference value we use to fill the vector with.

2. An initialization with any sequence of ints (including 2 ints), represents the sequence to be contained in the vector.

When we put it this way, we can clearly see that we have two competing constructor declarations that try to define what happens when we initialize a std::vector<int> with two ints. We have a form of competition here, and the language resolves it somehow, but we can be sure that sooner or later someone will expect the opposite outcome, and will be surprised. And by “surprise”, unfortunately, I mean an unintended program behavior.

Note that this problem has not been introduced with C++11, and we can achieve a similar situation in C++03, when we define some competing constructors:

struct C
{
  C(long) {}          // (1)
  explicit C(int) {}  // (2)
};

int main() 
{
  C c = 1;  // selects (1)
  C d (1);  // selects (2)
}

Can you see why? (2) is a better match when it comes to type matching, but has to be ignored in the case of copy initialization because it is declared explicit. But have you ever heard anyone complain about this C++03 behavior? I have not. Probably because hardly anyone declares such competing constructors. Why do it, if it is so likely that someone will get surprised when using them?

Of course, C++11 provided room for more inadvertent competing constructors. The following is an example that many programmers have stumbled upon:

struct Wrapper
{
  Wrapper(Wrapper const&) {}  // (C)
  Wrapper(Wrapper &&) {}      // (M)
 
  template <typename T>
  explicit Wrapper(T&&) {}    // (F)
};

int main() 
{
  const Wrapper w1 (1);  // selects (F)
        Wrapper w2 (2);  // selects (F)
  const Wrapper v1 = w1; // selects (C)
  const Wrapper v2 = w2; // selects (C)
  const Wrapper u1 (w1); // selects (C)
  const Wrapper u2 (w2); // selects (F) !!
}

It has been discussed at length here. In short, a copy constructor and a perfect-forwarding constructor compete for being selected for initialization from a non-const lvalue reference to Wrapper, and based on the kind of initialization either can win, thereby surprising the users.

Note that by “competing” I mean not simply the fact that you have two constructors that would take the same sequence of arguments (if one is somehow made invisible), but also that they do different, incompatible things. For instance, the same std::vector<int> has the following two constructors:

vector();                    // default ctor        
vector(initializer_list<T>); // sequence ctor

Now, if I initialize an object in the following way:

vector<int> v = {};

It chooses the default constructor (!). But the sequence constructor also works for zero-sized sequences, it is just that the default constructor happens to get picked in this initialization. if vector didn’t define the default constructor, the sequence constructor would have been called. But it so happens that in case of std::vector, the sequence constructor with zero-sized list does exactly the same thing as the default constructor: initializes an empty vector. So, you would never tell the difference. This is why I would not call the two competing constructors. This is explained in more detail in this post.

The advice

Well, my advice here is quite simple: do not create competing constructors :). If I told you to have a constructor taking int do one thing, and have a constructor taking long do another thing, would you treat me seriously? But constructors of std::vector<int> seem to be doing exactly that. I am not really trying to say that the authors of STL did bad job. First, I share my findings after a number of years of playing with new initialization of STL containers; the authors did not have that luxury, they were exploring a new ground. Second, we are only considering a particular specialization of std::vector. The general case has the following constructors:

vector(size_t s, const T& v);
vector(initializer_list<T> vals);

And if we used vector<pair<int, int>>, we wouldn’t observe the problem. But, since we can see a flaw, we can draw some conclusions from it.

One observation to draw from the example of std::vector is that some specializations of our class templates can have competing constructors, even though in the general case the constructors are not competing.

This brings us to the first practical advice. In a declaration as the one above, size_t and T do not mean two distinct types. We are not looking at a class, but at a class template. Therefore when you provide a list of constructors for a class template, and you have some non-dependent types, like size_t or int, ask yourself how the set of constructors changes when the T is replaced with size_t or int. For instance, the following class template:

template <typename T>
struct C
{
  explicit C(T lo, T hi); // two T's
  explicit C(int n, T v); // n T's  
};

Will work for any type except for int, because then we get two identical constructor declarations.

And remember size_t and int and char practically denote the same type, because of all the nasty type conversions.

Of course, the same observation applies to any other overloaded member function, but constructors are more susceptible to this problem, because they cannot have names. I do not recommend it, but suppose for a moment, that we were doing a two-phase initialization (only default constructor, and then we call function init or some such). If we were implementing a vector this way, we could give different names to different initialization strategies:

vector<int> v, u, w, x, y;
v.init_with_values(4, 2); // {4, 2}
u.init_with_values(4);    // {4}
w.init_with_size(4, 2);   // {2, 2, 2, 2}
x.init_with_size(4);      // {0, 0, 0, 0}
y.init_with_capacity(4);  // {}

If we initialize with a function we have the ability to encode the initialization strategy in function name. We can achieve a similar effect by defining factory functions:

vector<int> v = vector_with_values<int>(4, 2);
vector<int> u = vector_with_values<int>(4);
vector<int> w = vector_with_size<int>(4, 2);
vector<int> x = vector_with_size<int>(4);
vector<int> y = vector_with_capacity<int>(4);

But while it solves some problems (remember that due to copy elision no copies are created, usually), it does not cope well with prefect forwarding. For instance, if I want to use type std::experimental::optional<std::vector<int>> and initialize objects in-place (without any temporary objects):

using opt_vec = std::experimental::optional<std::vector<int>>;
using std::experimental::in_place;

opt_vec v {in_place, vector_with_size<int>(4, 2)}; // temp
opt_vec u {in_place, 4, 2}; // does what?

The first initialization now requires a move that will not be elided. And if we were dealing with a non-movable type, the technique would simply not work. In contrast, the second initialization perfect-forwards the arguments — no temporary required — but try to guess which std::vector’s constructor gets chosen?

So can the problem be solved? Yes, the hint is already contained in the last example. This in_place is an “artificial” argument (or a tag, if you will): its purpose is only to be a different type than any other meaningful type, and to have a unique name that describes the intent. It tells the compiler how the remaining arguments need to be interpreted. In the case of in_place, the meaning is: perfect-forward the remaining arguments to the T’s constructor, which will be called internally.

This is the second piece of advice: a tag can be used to assign a name to a constructor. If vector had tags, there would never be any ambiguity, or competition between constructors:

vector<int> v {4, 2};              // {4, 2}
vector<int> u {4};                 // {4}
vector<int> w {with_size, 4, 2};   // {2, 2, 2, 2}
vector<int> x {with_size, 4};      // {0, 0, 0, 0}
vector<int> y {with_capacity, 4};  // {}

Some people have suggested that the tags could take arguments themselves, making the initialization syntax shorter:

vector<int> w {with_size{4}, 2};   // {2, 2, 2, 2}
vector<int> x {with_size{4}};      // {0, 0, 0, 0}
vector<int> y {with_capacity{4}};  // {}

I have some reservations, however. It is not a tag any more. Now with_size looks like a regular type and sooner or later someöne will want to store it in a container:

vector<with_size> w {with_size{4}, with_size{2}}; // does what?

You do not have this problem when the tag is just a tag. A pure tag is not useful as a value in a container: it can assume only one value.

In a similar manner, we can provide unambiguous constructors for 2D points:

point<double> p {cartesian_x_y, 1.4142, 1.4142};
point<double> q {polar_r_alpha, 1.0000, 0.7854};  

The third piece of advice concerns the perfect-forwarding constructor. The following declaration of a constructor template:

template <typename T>
struct Optional
{ 
  template <typename U>
  Optional(U&& v);
};

reads, “works for just any type whatsoever”. Do you really want Optional<T> to be constructible from just anything? You probably want to use the argument to initialize some T. So, in fact, you are only interested in those types that T is constructible from. If you are using the newly released GCC 6.1, which supports Concepts Lite, you can express the intent like this:

template <typename U>
  requires std::is_constructible<T, U&&>::value
Optional(U&& v);

On standard-conformant compiler, you have to resort to some enable_if tricks:

# define REQUIRES(...) \
  std::enable_if_t<(__VA_ARGS__), bool> = true

template <typename U,
          REQUIRES(std::is_constructible<T, U&&>::value)>
Optional(U&& v);

Conclusion

To summarize my point: there is a risk that you may inadvertently create a set of competing constructors. If you need more than one constructor (not counting copy and move) tags are a way to give constructors names and avoid competition. The additional gain is that by giving a name to initialization strategy you clearly communicate to everyone around what is going on.

This entry was posted in programming and tagged , , , , . Bookmark the permalink.

8 Responses to Competing constructors

    • Thank you for the interesting link! How does this work with forwarding functions? Using the example from the paper:

      std::vector<float> v {.count = 2, 1.f};
      

      If I wanted to create such vector inside a set of vectors, and avoid any temporaries, would the following or some equivalent work?

      set.emplace(.count = 2, 1.f);
      

      ?

      • TONGARI J says:

        Designator forwarding is not covered by the current design. If you go through the thread, you’ll see Richard Smith also raised the similar issue, however, to make that work will complicate things a lot (dealing with type system, adding new type of template param and ABI changes, etc). I agree it’d be a useful feature, but I’m not sure if it worth the effort.

  1. Nawaz says:

    If you make `with_size` a value, say a functor (which returns unspecified type), instead of a type, then nobody would write this:

    vector w {with_size(4), with_size(2)}; //compiler-error

    And the returned object of `with_size` also holds the value, then you can write this:

    vector w { with_size (4), 2};

    So basically now `with_size` is not only a tag here but a value also (effectively)!

    • So, you mean something like this:

      struct with_size_generator
      {
        with_size_value operator() (size_t s) const { return with_size_value{s}; }
      };
      with_size_generator with_size{};
      

      Right?

      But now suppose, I need to store N vectors and for each of them I need to pre-compute a size; and I need to store the pre-computed sizes and return them out of a function. A quite natural way to do this would be in:

      std::vector<decltype(with_size(2))> make_sizes();
      

      And If I need to return two of these sizes, I would implement make_sizes() as

      std::vector<with_size_value> make_sizes()
      {
        return {with_size(7), with_size(8)};
      }
      

      And the constructor that gets chosen is likely not the one I intended. .

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.