String literals make bad ranges

C++20 will come with what we call “Ranges” library. Meanwhile, range interface has been supported ever since C++11 in one context: range-based for loop. A range-based for loop can detect anything that is a range and work with it. In particular, it can work with string literals:

int main()
{
  for (char ch : "ABC")
    std::cout << ch << "\n";
}

If you test this program, it looks like it displays what you think: A, B, and C in a column; but in fact, what it does is slightly different than what one would intuitively assume.

This can be observed if we change our test example to

int main()
{
  int i = 0;
  for (char ch : "ABC")
    std::cout << i++ << ": " << ch << "\n";
}

The output from the program is:

0: A
1: B
2: C
3: 

Yes: this is due to the trailing null character.

Thus, we have a bug-prone thing here. If a string literal were not a valid range, the above code would not compile and the run-time behavior would not be able to surprise us. The bug-prone-ness is all the more tricky because in some situations we cannot observe the effects of the last invisible element in the range, as we have seen in the first example.

Once we learn the above code does not do what we want we can fix it in a number of ways, e.g.,

int main()
{
  using namespace std::literals;

  int i = 0;
  for (char ch : "ABC"sv) // Works since C++17
    std::cout << i++ << ": " << ch << "\n";
}

Which is equivalent to:

int main()
{
  int i = 0;
  for (char ch : std::string_view{"ABC"})
    std::cout << i++ << ": " << ch << "\n";
}

However, there is no way to avoid the problem in the first place. We cannot use the type system to prevent such code from compiling, the type of a string literal is an array of characters, and we could not possibly prevent an array from being a valid range. It is unquestionable that the following code should compile fine:

int main()
{
  const char ABC[3] = {'A', 'B', 'C'};
  int i = 0;
  for (char ch : ABC)
    std::cout << i++ << ": " << ch << "\n";
}

While the compiler is obliged to allow such code, it could issue a warning. Maybe at some point we will get a check for this in clang-tidy. For now, we need to be aware of such gotcha. And it may get more popular in C++20 when a string literal will be recognized as a range in more contexts:

// C++20
int main()
{
  std::cout << std::ranges::size("ABC") << "\n";
}

And that’s all about string literals; but as a bonus, did you notice that in the second range-based for loop example I had to use variable i whose life time extends beyond the scope of the loop? C++20 will offer us a way of using an index in a range-based for loop locally with the extended loop syntax:

int main() // C++20
{
  for (int i = 0; char ch : "ABC")
    std::cout << i++ << ": " << ch << "\n";
}

We could think of rewriting the above in terms of ranges, but unfortunately C++20 ranges do not have a zip view. However, we can use range-v3 library from Eric Niebler along with structured binding:

#include <iostream>
#include <range/v3/view/iota.hpp>
#include <range/v3/view/zip.hpp>
namespace views = ranges::views;

int main()
{
  for (auto[i, ch] : views::zip(views::iota(0), "ABC"))
    std::cout << i << ": " << ch << "\n";
}
Advertisements
This entry was posted in programming and tagged , , , . Bookmark the permalink.

2 Responses to String literals make bad ranges

  1. robbonline says:

    The last example with iota shouldn’t be: std::cout << i << ": " << ch << "\n";

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.