Find the bug — comments

In response to the previous post, I have received a bunch of very useful comments. I would like to share them here.

I really found this problem in a production code. What struck me was this strange use of resize where obviously reserve belonged. My intuition told me it is just wrong. Why put some temporary invalid values into a vector? To show this was my intention, but readers have observed other interesting things. Let’s recall the solution we provided in the previous post.

void Catalogue::populate(vector<string> const& names)
  decltype(vec_) tmp;
  for (auto const& name : names)
    tmp.push_back( make_unique<Entry>(name) );
  swap(tmp, vec_);

Strong exception safety guarantee comes with a price. In order to provide the commit-or-rollback guarantee we need to build another object, while still keeping the original object intact. For the period of time when our function is executed we are doubling the amount of memory in use. This might be just too much for certain environments. It also has an impact on program efficiency. That’s unavoidable if we need the strong guarantee, but do we really need it? Not every function needs to provide the strong guarantee. One should weight the benefits and the costs thereof. Our function populate looks like a reset function. And such functions typically do provide a strong guarantee. But definitely, the very short fragments of code we are seeing in this post are not enough to make the right call.

You can arbitrarily mess with objects as long as you are sure noöne else will see them. Using resize where reserve belongs is not a bug in itself, but it is bug prone, and confusing. However, it does not matter that much if we create a copy on the side (this is what we did in the previous post) and put null pointers there. Even if an exception is thrown, the side object will get destroyed and noöne will be able to observe the null pointers. So we may just as well add them, especially that we may get some secondary benefits. For instance, by default-constructing a vector and then calling reserve we risk two memory allocations. Default constructor is allowed to reserve a portion of memory. By starting with an initial non-zero size, we are guaranteed to only have one allocation. I guess it is a limitation of std::vector design that you cannot initialize it with zero-size and a given capacity.

Use swap — not move assignment. In the final solution we used an idiom called ‘do work on the side and swap.’ At the end we swapped the ‘private’ object’s (tmp) value with the target object’s (vec_) value. But in fact, we are not really interested in exchanging the values. We want vec_ to obtain the value of tmp, but we do not care about the value tmp will have. It seams that move assignment would reflect our intent better:

vec_ = std::move(tmp);

The only problem with this improvement is that std::vecotr’s move assignment (and move constructor) are allowed to throw exceptions! This is necessary to accommodate all the allocator-related functionality. But if you dig dipper into the standard, you will discover that swap on vector is not declared as noexcept either. True, but the standard also guarantees that swap on vector does not throw exceptions. Non-trowing but noexcept(false) — why? This is also related to allocator support. vector::swap has a precondition and may need to contain an assertion. As required by document N3248 (“noexcept Prevents Library Validation”), a function that has a precondition must not be noexcept, even if it guarantees not to throw exceptions. This is because when its precondition is violated it can be considered a UB, and programs should be able to do whatever they want, in particular throw an exception that does not end in a call to std::terminate in unit tests.

But even though vector::swap is not noexcept it still offers a no-fail exception safety guarantee. An exception-safety guarantee is not something testable within the language.

Do not blindly catch exceptions! This was already indicated in the previous post. One cannot just catch any exceptions at any random point in the program. Every function that potentially throws an exception needs to clearly specify what can be done with the objects it manipulated in case the function ended in exception. For instance, the strong guarantee says you can do anything you could do before the call to the throwing function; the basic guarantee says you can destroy or assign to the objects. There can be more specifications like this, e.g. you can call function f1, f2 and f3, but you cannot call functions f4 and f5. Unless you know what can be safely done when some function throws an exception, you have to assume the minimum: the basic guarantee. If you catch an exception and then call a function on an object that may be in an indeterminate state, you are entering the behavior which was likely not intended ever to occur. It is nearly like an undefined behavior.

In the previous post we suspected that someone might catch an exception thrown by populate and then still use the half-populated object. If this was to be the case, the bug would be not the leaving of null pointers, but illegally accessing the object that was left out by a throwing function. True! But why catch exceptions then if you cannot use objects? Good question. Don’t catch them! Catch them only at the level where you are sure all objects left in an indeterminate state have been destroyed.

Avoid two-phase initialization. Function Catalogue::populate looks like it might encourage this pattern:

// DON'T DO IT !

MyType myObj; // construct half-initialized
// do something or not...
myObj.init(); // perform full initialization

While there might exist some reasons to use two-phase initialization, it is a good idea to avoid it wherever possible. It makes the program safer and easier to reason about when an object is either fully initialized or there is no object. This just avoids so many defensive ifs checking if the object is half-initialized or not.


I used comments from @KaiSt, @dhardy and @pip010 on this blog, @KrzaQ2 on Reddit, and Tomasz Kamiński. Thank you for your input.

This entry was posted in programming and tagged , , . Bookmark the permalink.

One Response to Find the bug — comments

  1. anon says:

    N3248 is nuts! AFAIK googletest uses out of process execution for death tests (e.g. testing assertions), an approach not covered in the “what we tried” section of that paper.

    Not marking library functions noexcept due to N3248 is insane.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s