Can you see the bug?

Recently, I was hit by one C++11 gotcha. It is funny: I know about it, I have blogged about it, and nonetheless I still fell into the trap.

Do you remember my other post on efficient optional values? I am using the tool at work, and I tried to define an empty-state policy for std::string. Which value of std::string can be “spared” to represent the non-value? I am pretty sure it cannot be the empty string. Empty strings are used too often for various purposes, and I can easily imagine that in many applications one may want to distinguish between an empty string an not-a-string. Fortunately, there exist better candidates. For instance, in my programs I never need to use character '^' and even if other people use it, they most likely never need the control character of numeric value 2. Or a string composed of three characters of numeric value 0. (Remember, std::string can contain many zeros). I decided to give the users a choice: my policy is a template, and one can specify which special character to use, and how many times it is to be repeated:

template <char CH, size_t SIZE>
struct empty_string_policy
  static std::string empty_value() // may allocate
    return std::string{SIZE, CH};
  static bool is_empty_value(const std::string& v) // no alloc
    return v.size() == SIZE &&
           std::all_of(v.begin(), v.end(),
                       [](char c){ return c == CH; });

Suppose, you want to represent the not-a-string value as three null characters. You just define an alias that reflects that:

using Null3Policy = empty_string_policy<'\0', 3>;

When I used it in my program, I observed that it had a bug. After a while of investigation, the problem boiled down to the following assertion:

Null3Policy p;
assert (p.is_empty_value(p.empty_value()));

Apparently, function is_empty_value checks something else than what function empty_value creates. But which is wrong and how?

Note that I used braces to initialize the returned string. As indicated in this post, brace initialization is intended to be a superior alternative to the old-style function-call-like syntax. We are also aware (or are we?) about the container initialization gotcha related to this feature: namely, the sequence constructor (the one with std::initializer_list) could be inadvertently selected. But it is not our case. We are passing an object of type size_t as the first argument. While size_t is convertible to char, it is definitely a narrowing conversion, and as we know, triggering a narrowing conversion in brace-initialization would result in compile-time failure (and our test compiles fine). Here is a relevant quote from the C++ (11) standard. 8.5.4/3:

List-initialization of an object or reference of type T is defined as follows: […] if T is a class type, constructors are considered. The applicable constructors are enumerated and the best one is chosen through overload resolution. If a narrowing conversion is required to convert any of the arguments, the program is ill-formed.

So, empty_value looks fine; maybe, then, the problem is in is_empty_value. The natural way (for me) to check which one it is, is to inspect the value after creation with the debugger. But the debugger displays an empty string. No wonder, it displays it as a C-string, and I wanted all zeros. But even when I change my policy to empty_string_policy<'^', 3> I get the same bug, and the debugger still renders an empty string. Only when I inspect the underlying array in a binary mode, do I observe that the first character is '\3'. So, it looks, it might have been the sequence constructor after all. I check the string size: it is 2. The sequence constructor; but if this be the case, how did it survive the narrowing conversion?

The answer lies in another surprising C++ behavior. There is no narrowing conversion in our example! If we read the standard, it says 8.5.4/7:

A narrowing conversion is an implicit conversion […] from an integer type or unscoped enumeration type to an integer type that cannot represent all the values of the original type, except where the source is a constant expression and the actual value after conversion will fit into the target type and will produce the original value when converted back to the original type.

This means that whether a conversion is narrowing or not depends not only on the source and target types, but also on the converted value! In a way this makes sense. As long as the compiler knows the value, it can see that there is no risk of loosing the information about the value after the conversion, so it can safely allow it.

My takeaway from this experience is this. I still maintain that the containers’ constructor where you specify the size is bug-prone (as indicated in this post). If you have to use it (I had to, in my example) never, ever, ever use braces. Even when you know it is absolutely safe. Do you think you know C++?

This entry was posted in programming and tagged , . Bookmark the permalink.

9 Responses to Can you see the bug?

  1. tbeloqui says:

    Great finding, just if you don’t mind, could be nice to include it

  2. litbisme says:

    Has C++14 changed the rules of this? As far as I can remember, it doesn’t matter whether or not the conversion is narrowing for the question of what constructor is selected (in the same way as it doesn’t matter whether the invoked constructor is explicit or not): The sequence constructor is always selected, and if a conversion is narrowing, the call is ill-formed. See and the follow-up messages in that thread.

    • C++11 and C++14 work the same way here. And your statement is correct. My (faulty) reasoning was that since (1) we have a narrowing conversion (this was my ad) AND (2) this program compiles, THEN for some reason it is not the sequence constructor that was selected. (Because if it were selected, the program would have been ill-formed.)

  3. “So, empty_value looks fine; maybe, then, the problem is in empty_value. ”

    I think you meant: “[…]is_empty_value”.

  4. Pingback: Is it standard behaviour that adding const to size_t can cause compile failure? - BlogoSfera

  5. quicknir says:

    A good post (I also followed links to a couple of others). Personally though, I’ve always been against braces for non-aggregate classes. For such classes, you have a proper default constructor, or default initialize members at the point of declaration. So braces buy you nothing outside of list initialization. So, don’t use braces to call a function like constructor. It’s more error prone and less clear; this is what I thought before hand and this post confirms it.

    • You didn’t say it explicitly, but I read your message as saying that brace initialization is too low-level (or too much POD-like) to be used for classes that hide abstractions and potentially lots of non-trivial logic, like containers.

      I share this part of the opinion; however I am also close to reaching the conclusion that the traditional constructor syntax is also too “low level” or too error prone. Other member functions have name, and it is usually clear from this name, what the function will do:

      cont.put_elements(1, 2, 6);

      Because during initialization, there is no way to put the name of the procedure, and because in the type like container there are many ways in which you would like to initialize it, you are left with ambiguity:

      Container a(100, 101);
      Container b(100);

      And now you wonder, does a store two elements? It does not, but does the author know that? Is it 101 put 100 times or 100 put 101 times? Is initial capacity of b 100 or is it its initial size?

      Clearly, it would be more convenient if we had a way to put a clearer message about what we want to do. It could be “named constructors” or “named function parameters”.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.