(Not) detecting bugs

The following code contains a bug. A developer has spent quite some time looking for the source. The intent of this code is to iterate over two vectors simultaneously, from the first up to the one-before-last element. Thus the most interesting tools that will be employed will be boost::zip_iterator and std::prev.

#include <boost/iterator/zip_iterator.hpp>
#include <boost/tuple/tuple.hpp>
#include <vector>

using zip_iter = boost::zip_iterator< 
                   boost::tuple<
                     std::vector<int>::iterator,
                     std::vector<int>::iterator
                   >
                 >;
int main()
{
  std::vector<int> v1 = {1, 2, 3, 4, 0};
  std::vector<int> v2 = {2, 3, 5, 7, 0};
    
  zip_iter beg {boost::make_tuple(v1.begin(), v2.begin())};
  zip_iter end {boost::make_tuple(v1.end(), v2.end())};
  
  auto process = [](zip_iter::reference){};
  std::for_each(beg, std::prev(end), process);
}

The purpose of a zip iterator is to iterate over a number of containers at the same time. It takes a tuple of iterators, each iterating over a different collection, and upon each dereference it returns a tuple of references: each reference to an element form the corresponding container.

I have simplified the program to the minimum. I am using GCC 5.3. This code compiles fine, but when I run it, it goes on and on. In this post we will see where the bug is, but more importantly, we will look at why this bug has not been detected during the compilation, and wasted a fair deal of the developer’s time.

Where’s the bug?

Let me tell you up front what the source of the problem is. It is the miscommunication between boost::zip_iterator and std::prev. The latter expects the input to satisfy the requirements of a BidirectionalIterator. The former does return a bidirectional iterator in some sense. For instance, the following, slightly modified, program does exactly what one would expect:

int main()
{
  std::vector<int> v1 = {1, 2, 3, 4, 0};
  std::vector<int> v2 = {2, 3, 5, 7, 0};
    
  zip_iter beg {boost::make_tuple(v1.begin(), v2.begin())};
  zip_iter end {boost::make_tuple(v1.end(), v2.end())};
  
  auto process = [](zip_iter::reference){};
  std::for_each(beg, --zip_iter{end}, process);
}

Strictly speaking, definitions of a “bidirectional iterator” in STD and in Boost.Iterator are incompatible. Perhaps the least natural, but the most significant requirement of STD’s BidirectionalIterator is that it has an iterator tag same as or derived from class std::bidirectional_iterator_tag. Formally, we expect that:

template <typename Iter>
constexpr bool is_BidirectionalIterator()
{
  using Iter_tag =
    typename std::iterator_traits<Iter>::iterator_category;

  return std::is_base_of<std::bidirectional_iterator_tag,
                         Iter_tag>::value;
}

This tag is used for tag-dispatching the calls to most specialized algorithm overloads.

In contrast, Boost.Iterator departs from the classical iterator hierarchy. The authors observe that traditional hierarchy mixes two unrelated concepts:

  • traversal — upon increment, how the iterator changes the position in collection,
  • access — upon dereference, how the position is mapped onto a value or a reference to value.

In this spirit, they provide their own tag hierarchy, which encodes only the traversal. In this hierarchy the iterator’s traversal category is checked somewhat differently. For instance, Bidirectional Traversal Iterator is tested as follows:

template <typename Iter>
constexpr bool is_BidirectionalTraversalIterator()
{
  namespace b_iter = boost::iterators;

  using Iter_tag =
    typename b_iter::iterator_traversal<Iter>::type;

  return std::is_base_of<b_iter::bidirectional_traversal_tag,
                         Iter_tag>::value;
}

These are two different ways of checking for bidirectional iterator traversal, and while Boost.Iterator library understands STL iterator categories; it does not work the other way around: STL does not understand Boost.Iterator’s concepts and iterator traversal tags. From the STL’s perspective, our zip_iter is only an Input Iterator: the weakest of all iterator concepts.

But even with this explanation, one thing should still bother us.

Why didn’t compiler complain?

First, what does the C++ Standard has to say about this? [algorithms.general]/5 reads:

Throughout this Clause, the names of template parameters are used to express type requirements. […] If an algorithm’s template parameter is BidirectionalIterator, BidirectionalIterator1, or BidirectionalIterator2, the actual template argument shall satisfy the requirements of a bidirectional iterator.

If we look at the specification of function prev in [iterator.operations]/7, it does not explicitly provide a Requires clause, but it uses this convention of calling template parameters BidirectionalIterator.

Thus, the Standard says that “the actual template argument shall satisfy the requirements of a bidirectional iterator”. Our argument does not satisfy the requirements. What should the consequence be?

The only mention of such situation, we find in [res.on.functions]:

In certain cases (replacement functions, handler functions, operations on types used to instantiate standard library template components), the C++ standard library depends on components supplied by a C++ program. If these components do not meet their requirements, the Standard places no requirements on the implementation.

In particular, the effects are undefined in the following cases:

  • […]
  • for types used as template arguments when instantiating a template component, if the operations on the type do not implement the semantics of the applicable Requirements subclause.

This appears to be addressing the situations where operator++ exists but does something else than advancing the iterator; or where a begin and end iterators are passed to a function but the latter is not reachable from the former. But it does not address the situation where a type does not satisfy a “syntactic” constraints, detectable at compile-time.

Thus, the Standard does not address such situation as ours explicitly. However, it has one note in [defns.undefined]:

Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data.

I read this as saying, “if the Standard does not define what happens, it is an undefined behavior.” This gives the compilers the choice to do what they see fit in such situation. So, what do they chose?

If I test my example with Clang, I get an error:

error: no matching function for call to 'prev'

This is because Clang (in libc++) chooses to use an enable_if trick to disable the overload for types that do not provide the appropriate bidirectional iterator tag. They obviously cannot verify all the requirements of a bidirectional iterator, but checking for the appropriate tag is very easy.

If I test my example with Visual C++, I get an error:

error C2338: prev requires bidirectional iterator

In Visual C++’s Standard Library implementation, they have a static_assert that checks for the expected bidirectional iterator tag.

GCC, on the other hand, does not check anything (It is not required to), and instead trusts you that you have passed an iterator containing a bidirectional iterator tag, and based on this assumption it dispatches the call to the most appropriate overload of another STL function: std::advance. The following is not technically correct, but it gives you an idea how the call is forwarded:

template<typename BidirectionalIter, typename Distance>
  BidirectionalIter prev(BidirectionalIter i, Distance n = 1)
{
  advance(i, -n);
  return i;
}

Function std::advance has three variants:

template<typename RandomAccessIterator, typename Distance>
  void advance(RandomAccessIterator& i, Distance n)
{
  i += n;
}

template<typename BidirectionalIterator, typename Distance>
  void advance(BidirectionalIterator& i, Distance n)
{
  if (n > 0)
    while (n--) ++i;
  else
    while (n++) --i;
}

template<typename InputIterator, typename Distance>
  void advance(InputIterator& i, Distance n)
{
  assert (n >= 0);
  while (n--) ++i;
}

(Of course, the above is not correct C++, the real implementation is more complicated, as it uses tag dispatching techniques.)

When we pass our zip_iter to std::prev, the compiler needs to dispatch to the best-matching std::advance. Since, as we already know, according to STL rules, zip_iter is only an input iterator, it picks the last variant. Thus, a negative distance is passed to the overload for InputIterators, and the while-loop which assumes a non-negative value, goes on and on.

So, what GCC does is somewhat unethical. It dispatches based on the iterator tag, but does not verify if the tag satisfies requirements initially. But is legal according to C++ Standard. In order to eradicate programmer’s bugs, GCC could implement more than the minimum the Standard requires, or the Standard itself could be more restrictive about this. It could require that wherever a requirement mismatch is easily diagnosable at compile-time, such mismatch should be treated as an ill-formed program.

In fact, the future C++ Extensions for Ranges (a.k.a. STL2) does address this problem in specification by constraining the algorithms with concepts. Moreover, GCC 6, which implements Concepts Lite, does allow you to turn some type safety checks even if you do not enable concepts with -fconcepts. All you need to do is to define macro _GLIBCXX_CONCEPT_CHECKS prior to including the Standard Library headers.

The advice

To maximize the chances of your compiler finding a bug:

  • Use more than one compiler.
  • Use more than one Standard Library implementation (e.g., use libc++ when compiling with Clang).
  • On GCC 6, compile with -D_GLIBCXX_CONCEPT_CHECKS
This entry was posted in programming and tagged , , , , . Bookmark the permalink.

3 Responses to (Not) detecting bugs

  1. Matt says:

    For completeness, one can mention that in this case the fix is to use `boost::prior` (instead of `std::prev`) from the `boost/utility.hpp` header. At the same time, it’s interesting that the name is different from `prev` (like `boost::next` corresponding to `std::next`) — if it were the same, the solution would be to use the ADL like for `begin`, `end`, and `swap` functions (as in `using std::end;` followed by the subsequent use of unqualified `end`).

  2. cha5on says:

    Just wondering, is there a gcc bug report for this? I’d be somewhat surprised if this weren’t an oversight on their part, rather than a deliberate decision.

  3. I added one while writing this post: 78830.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.