Intuitive interface — Part I

Let’s start with the popular C++(11) “uniform initialization” gotcha. Changing braces to parentheses in object initialization may change the semantics of the initialization:

std::vector<int> v1{5, 6}; // 2 elems: {5, 6}
std::vector<int> v2(5, 6); // 5 elems: {6, 6, 6, 6, 6}

For instance, it is described in Problem 1 of the newly revisited “Guru of The Week” series by Herb Sutter.

When seeing such an example, one might conclude that the new rules for object initialization are very confusing or error-prone. But to me it looks like the problem here lies in how class template std::vector has been defined in C++03.

Given that std::vector<int> is meant to store ints, how come the constructor that takes two ints does not put them into the collection?! How come the following does not put integer value 100 into a container?

std::vector<int> v(100);

Even if you are familiar with this gotcha, and you are very careful which type of initialization (braces or parentheses) to use, the fact that these two initializations do different things may still impact you. Imagine that we are writing a wrapper around std::vector<int>. We would like to write a “perfect forwarding” constructor template, that generates for IntVecWrap all the constructors that std::vector<int> has, and forwards the arguments to the contained vector:

struct IntVecWrap
{
  std::vector<int> data;

  template <typename... P> 
  explicit IntVecWrap(P&&... v);
  // ...
};

If you are not familiar with this syntax, this is a short explanation. The two ellipses say that this template is capable of rendering constructors taking any number or parameters (including 0). P is not a template type parameter, but a template parameter pack, representing a number of type parameters. Similarly, v is not a function parameter, but a function parameter pack, representing all arguments passed to our constructor. The && sign indicates that each single parameter in pack v will be perfect-forwarded, that is: instantiated as lvalue reference if the argument passed to the constructor is an lvalue, and as an rvalue reference if the passed argument is an rvalue, and an lvalue reference to const object, if the argument is a constant lvalue.

Now, how do we define the body of our constructor template? We have two options:

// option 1:
template <typename... P> 
  IntVecWrap::IntVecWrap(P&&... v)
  : data(std::forward<P>(v)...) {}
  //    ^
  //    parentheses

// option 2:
template <typename... P> 
  IntVecWrap::IntVecWrap(P&&... v)
  : data{std::forward<P>(v)...} {}
  //    ^
  //    braces

Before discussing the options, a word of explanation of the syntax again. Function std::forward is necessary in perfect forwarding mechanism: it turns an rvalue reference into an rvalue (because rvalue reference function argument is an lvalue!). The third ellipsis indicates that we want to expand function parameter pack v into normal parameters using pattern std::forwrd<Pn>(an), where Pn is the n-th template parameter and an is the n-th function parameter. Thus if our variadic template generated constructor:

IntVecWrap::IntVecWrap(size_t& a1, int&& a2);

the pack expansion in the constructor’s initialization list will be:

IntVecWrap::IntVecWrap(size_t& a1, int&& a2)
: data( std::forward<size_t&>(a1), std::forward<int>(a2) )

Now, back to the selection of two options:

// option 1:
template <typename... P> 
  IntVecWrap::IntVecWrap(P&&... v)
  : data(std::forward<P>(v)...) {}
  //    ^
  //    parentheses

// option 2:
template <typename... P> 
  IntVecWrap::IntVecWrap(P&&... v)
  : data{std::forward<P>(v)...} {}
  //    ^
  //    braces

Does it matter, which one we choose? It does, and neither does the right thing. If we choose option 1, we will get the following surprising effect:

// option 1:
std::vector<int> v{6, 5};
IntVecWrap       w{6, 5};

assert (v.size() == 2);
assert (w.data.size() == 6); // !!!

This is because in the initialization list of IntVecWrap we change the braces to parentheses. Now, if we choose option 2, we will fix the above problem, but we will get another surprise:

// option 2:
std::vector<int> v(6, 5);
IntVecWrap       w(6, 5);

assert (v.size() == 6);
assert (w.data.size() == 2); // !!!

This is because now we are changing parentheses into braces. We wouldn’t have this dilemma if vector’s initialization were doing the same thing when brace- and parentheses-initialized. Then we could just pick any option and we would be fine.

We faced this problem when specifying the semantics of std::optional’s forwarding constructors:

std::optional<std::vector<int>> o(std::in_place, 6, 5);

We couldn’t just say that this initializes the vector with arguments 6 and 5, because we do not know which initialization it should be: brace- or parentheses-initialization. We had to make the call which one it is. (We chose the parentheses one, you can see it in the newest Standard draft: N3690.)

Back to std::vector, and our one-argument constructor:

std::vector<int> v(100);

You might say: but we need some constructor that initializes the vector with the default capacity of 100. Default capacity, or was it default size? Damn, I always forget… This is because other libraries do provide a similar constructor that sets the initial capacity rather than size. For instance, java.util.ArrayList<E>. If someone switches from Java to C++, they will surely expect the following:

std::vector<int> v(100);
assert (v.empty());      // invalid expectation

They will not even look into the documentation, because they will fill they know what it does.

I guess the reasoning behind adding this constructor was: “in order to construct a vector with the given size I only need to pass it one value of type size_t. So let such one-argument constructor create an n-sized vector with value-initialized elements.” But when you look at it from the other side, i.e. user’s side, when you see such one-argument constructor taking parameter of type size_t, you are puzzled, because it can do a couple of things: create a 0-sized vector with capacity of n, create n-sized vector, create 1-sized vector with value n, or whatever else. Yes, you can look it up in the docs, but it is not clear immediately: it is just not intuitive.

One could argue that vector<int> is a very special case: I wouldn’t notice the braces-vs-parentheses problem problem if it was vector<string>. True, but I would still be confused whether the number indicates the initial size or initial capacity or initial “something else.” This latter problem is more general.

How to fix it?

This problem would be easily fixed if we had a feature known as named parameters. Then we could initialize the vector like this:

// NOT IN C++
std::vector<int> v1(size: 10, capacity: 200, value: 6);

But well, we don’t have them (although there is Boost.Parameter library); and it is not clear how would perfect forwarding look for functions/constructors with named parameters. However, there is a decent substitute for this feature: function ‘tag’ parameters. Tags are empty classes that do not contain any numeric data, but represent a different type that can be used to disambiguate different function/constructor overloads that would otherwise be ambiguous. If we had tags representing size and capacity of STL containers, like:

namespace std{
  constexpr struct with_size_t{} with_size{};
  constexpr struct with_value_t{} with_value{};
  constexpr struct with_capacity_t{} with_capacity{};
}

We could initialize the containers in a way that is completely unambiguous (although a bit verbose):

std::vector<int> v1(std::with_size, 10, std::with_value, 6);
std::vector<int> v2{std::with_size, 10, std::with_value, 6};

Such code takes longer to write, but takes less time to read and understand, and avoids potential confusion.

Tags are nothing new. They are already in C++ Standard Library. Consider:

std::function<void()> f1{a};
std::function<void()> f2{std::allocator_arg, a};

std::allocator_arg is a tag indicating that the following argument is to be treated as an allocator, rather than a callable object. Next:

std::tuple<std::string, int, float>   t1 = /*...*/;
std::tuple<float, float, std::string> t2 = /*...*/;
std::pair<X, Y> p1{t1, t2};
std::pair<X, Y> p2{std::piecewise_construct, t1, t2};

Here, p1 is initialized as:

p1.first(t1),
p1.second(t2);

p2 is initialized as:

p2.first(get<0>(t1), get<1>(t1), get<2>(t1)),
p2.second(get<0>(t2), get<1>(t2), get<2>(t2));

Next:

std::optional<std::string> o1{};
std::optional<std::string> o2{std::in_place};

The former creates an object that stores no string; the later creates an object that stores a string which is empty. Next:

std::mutex m;
std::unique_lock<std::mutex> l1{m};
std::unique_lock<std::mutex> l2{m, std::defer_lock};

The former does lock the mutex, while the latter does not.

Of course, tags are only needed when the meaning of the initialization would otherwise be unclear. For instance the following would be silly:

// not in C++
complex<double> z1{real_part, 1.0};
complex<double> z2{real_part, 1.0, imaginary_part, 0.0};

Tags are interesting technique to consider for constructors. With other functions we can usually use different names for different behaviour. For initialization, we do not have this option: all constructors bare the same name, so we can use the tags to somehow change constructor name. However, tags are not an ideal solution, because they pollute the namespace scope, while being only useful inside function (constructor) call.

This entry was posted in programming and tagged , . Bookmark the permalink.

25 Responses to Intuitive interface — Part I

  1. Pete says:

    For vectors, I’ve started writing

    vector<int> v;
    v.resize(10, 6);
    

    It’s not so bad. I could also see using the named param idiom, with resize returning a reference:

    auto v = vector<int>().resize(10, 6);

    Of course, both cases still have the problem of remembering which argument is the value, and which is the number of elements.

  2. Marco Arena says:

    Hi Andrzej,

    thanks for the great article. Just a question. I don’t know if you have read something about the “argument pack idiom”, it’s a named parameter techniques described in “Advanced C++ Metaprogramming” (by Davide Di Gennaro). Basically, you can define named parameters this way:

    auto aComplex = complex()[real_part = 10.0][imaginary_part = 0.0];

    What do you think about it? I just think the poweful part is the ‘=’ to clarify you are using, for example, an imaginary part equals to 0.0.

    Thanks!

    Marco

    • Marco, this “argument pack idiom” definitely makes it clear which one is the real part and which one is the imaginary part of the complex number. However, I am a bit suspicious about this notation.

      First, it introduces a new “embedded language”. Typically, square brackets are used in C++ for indexed element access. When someone new joins the project (and usually at that time you have to learn thousands of things) and he sees this, it will be an additional burden. Also, I suppose this requires a bit of magic to declare all the necessary things (like the assignment to real_part), which would likely interfere with debugging.

      Second, I suspect (although I am not sure) that this technique incurs certain runtime cost. While I can imagine how to optimize away tags, I cannot imagine how compiler could optimize your construct. In the example, you had to first default-construct a complex and then change its value. You used copy initialization, which works for complex, but would not work for non-movable types.

      I guess I would be more comfortable with this syntax:

      Complex z = ComplexParam{}.real(10.0).imaginary(0.0);
      // ComplexParam is MoveConstructible
      // Complex doesn't have to be
      

      But still, I am uncertain about runtime costs; and the construct is subject to abuses, like:

      Complex z = ComplexParam{}.real(10.0).imaginary(0.0).real(5.0);
      

      But all these concerns aside, the “named parameter idiom” does make it clear which value corresponds to which parameter.

      Regards,
      &rzej

  3. Another way to use tags:
    Your sample

    std::vector<int> v1(std::with_size, 10, std::with_value, 6);

    Can be written as

    std::vector<int> v1(std::with_size(10), std::with_value(6));

    Where with_size can be defined as

     template <class T>
     std::with_size_t<T> std::with_size(T size) {
        return  std::with_size_t<T>(size);
    }
    
    • This is quite a nice solution.

      • Another implementation of named arguments with C++

        template <class tag_name>
        struct base_tag
        {
        	template<class T>
        	struct value{
        		value(T size) : _value(size) {}
        		template <class T2>
        		operator T2() const {return _value;}
        		T _value;
        	};
        
        	template<class T>
        	static value<T> build_tag(T v) {return value<T>(v);}
        };
        
        template <class TAG_NAME>
        struct do_tag {
        	template <class T>
        	typename TAG_NAME::value<T> operator=(T val) const {return TAG_NAME::build_tag(val);}
        	template <class T>
        	typename TAG_NAME::value<T> operator()(T val) const {return TAG_NAME::build_tag(val);}
        };
        
        #define DEFINE_TAG_TYPE(tag_name) \
        	struct tag_name##_t : base_tag< tag_name##_t > {}; \
        	const do_tag< tag_name##_t > tag_name
        
        struct size_tag_t : base_tag<size_tag_t> {};
        const do_tag<size_tag_t> size_tag;
        
        //struct capacity_tag_t : base_tag<capacity_tag_t> {};
        //const do_tag<capacity_tag_t> capacity_tag;
        DEFINE_TAG_TYPE(capacity_tag);
        
        class vector
        {
        	size_t _size;
        	size_t _capacity;
        public:
        	vector(size_tag_t::value<size_t> v) : _size(v) {}
        	template <class T>
        	vector(capacity_tag_t::value<T> v) : _capacity(v) {}
        };
        
        void main() {
        	vector v1(size_tag = 10);
        	vector v2(capacity_tag = 10);
        	vector v3(size_tag(10));
        	vector v4(capacity_tag(10));
        }
        
      • Michal Mocny says:

        Are there performance implications, particularly when the values aren’t constexpr (will we incur copies at runtime)?

        Another future possibility would be to add a new named type alias which treats types with different names as unique types, even if they map to the same underlying type. Go lang has this feature, and means you can do (NOT C++) type with_size size_t; type with_capacity size_t;

        Perhaps the above provided solution could be optimized to be just as efficient.
        And perhaps some other language proposals (such as inferring class template types from constructor arguments with intermediate make_blah function templates) would make it easier to write these flag types in the future.
        But I thought it was worth mentioning..

    • Radek says:

      Oleksandr, could you clarify this and provide the shortest compiling code? I am not so familiar with template magic.

  4. tpdi says:

    > all constructors bare the same name, so we can use the tags to somehow change constructor name.

    All ctors for the same class have the same name. Why not be pragmatic and make two different classes?

    // option 1:
    template
    IntVecWrapCapacity::IntVecWrapCapacity(P&&… v): data(std::forward(v)…) {}

    // AND ALSO option 2:
    template
    IntVecWrapValues::IntVecWrapValues(P&&… v): data{std::forward(v)…}

    Since the difference only matters at construction, and since they both end up construction a std::vector named “data”, both classes could derive from the same base and can be used identically: Or depending on the wrapper’s need to control access the the memeber “data”, both could include a conversion operator vector that returns data.

  5. MJanes says:

    another solution could be to use strong aliases with associated literal type and conversion operators. In this way, say, we could have a “std::strong_size_t” convertible to “std::size_t” with literals “123_size” used to disambiguate constructors: std::vector v{ 12_size }; std::vector v{ 12_capacity }; // … etc., note that such literals could be reusable being representative of generic concepts

    • adrien c says:

      This I think is an interesting approach, it looks like the unit example of B Stroustrup, it is expressive, and we don’t need it for so many types.
      however, having a CTOR of a Type size_t strongly typed would generate a new issue: what happens if the vector’s element are of type size_t ?

      To me even if the tags approach is heavy, it is in this case better.
      Perhaps having a literal for most cases, and optionally a tag to disambiguate complex cases.

      Also about the tags, to me this will generate a dictionary, not just a list anymore. A bit like in python. So the initialize list would work often by type matching, but any element could also be a pair of value + tag.

      What happens then if you have a vector of pairs value + tags ???

      perhaps the easiest would be to add a keyword (tag) but this would be a bit ugly don’t you think ?

      Nice article nyway

      • MJanes says:

        >> what happens if the vector’s element are of type size_t ?

        nothing special, size_t would remain a typedef as it is ( and for good resons ! making it a strong alias would break existing code and ultimately change its meaning … ), the stl would just provide strong alieses side by side to their weak counterparts ( eg. say, std::strong_size_t ). Nobody would ever instantiate a vector for the same reason nobody would ever instantiate vectors of tag types. People that need a “vector of element counts” would still use vector instead, so the expression “vector{10_size}” would still retain it’s expected meaning. Note that this would work with size_t variables too, by writing something like “vector{ some_size_t_var * 1_size }”; clearly, such a level of explicitness would be necessary when dealing with integer vectors or generic vector instantiations only …

        • MJanes says:

          PS: apparently, the formatter replaced some “<" braces in my post above, sorry. Of course, those "vector" are "vector ” …

    • Przemek says:

      unfortunately, this not work for variable-size vectors which are used in most cases

  6. Andrew says:

    Wouldn’t it be easier to make std::vector::size_type a class type, explicitly constructable from an integral type (size_t), but implicitly convertible back to it (with a conversion operator)?

  7. Sebastian Redl says:

    Tag parameters are an idea that was only introduced to the standard library with C++11, though. Before that, nobody thought them necessary apparently. We learned a lot since then. (Note the examples you used: unique_lock, function and tuple are C++11, optional is C++14.)

  8. Erm, for the problem at std::optional, why not to solve with:

    std::optional<std::vector<int>> o(std::in_place, {6, 5});

    It should work already (although I didn’t read the last version of the proposal), forwarding a std::initializer_list to vector’s constructor, and it does what the initializer_list constructor do.

    I don’t find this unintuitive. If I want to initialize a vector with a list, I give it an initializer list. The template wrapper interface _is_ unintuitive: It does not get a list (just an arbitrary number of numbers), it but treats them as a list.

    • Hi Róbert. Yes, such initialization is possible with std::optional (because it explicitly provides a forwarding constructor with initializer_list parameter).

      Regarding the template wrapper interface, I do not claim that it is intuitive. My goal was to show what surprises you get, *if* you decide to implement it. I guess it also shows that “perfect forwarding” is not possible when we deal with initializer_list function parameters.

  9. jdorhu says:

    Another evidence that C++0x succeeded – at least by the “C++ definition of success” …

    They retained old ways of shooting yourself in the foot and added new ones. YAAAAAAAY !!!

    And here come fanboys, designing “clever” and not-so-clever hacks that just muddy the waters even more.

    Another few iteration of “enhancing” the “language” and hopefully it will start biting off heads of its proponents ==> PROBLEM SOLVED !!!

  10. Evrey says:

    Something that came to my mind… How about this?

    std::vector v{std::size_param{2}, std::capacity_param{42}};
    // With something like…
    template class vector {
    vector(std::initializer_list) = delete;
    };
    [\code]
    Would have the same effect, except that tagged parameters won’t take two entire parameter slots.

    • Evrey says:

      Meh, something went wrong with the code tag… Whatever, the idea was to use tags which may have values like std::size_param. A container may then delete an initializer_list constructor for those tag parameters.

      • I am not sure, I got your idea right.

        std::vector<int> v{std::size_param{2}, std::capacity_param{42}};
        

        This makes the programme’s intention clear, but you do not have to remove the sequence constructor to make this work. What is the added value of making the sequence constructor deleted?

        • Evrey says:

          The idea was to remove a sequence constructor (so that’s what it’s called…) for tag parameters in case, automatic conversions or so may ruin the day, like the int-to-size_t-conversion. But come to think about it, this may never be the case, so no, there is no value added by deleting it, right.

  11. Pingback: Consider alternatives to support named parameters in modern C++ | Growing up

  12. Pingback: 超精简Effective Modern C++ 第三章 转向现代C++ - 游戏开发之路

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.