Help the compiler warn you

Compiler warnings are a very useful tool for detecting bugs in your program. Because you can enable them selectively, and because you can choose to turn them into hard errors on your compiler, you can in fact build a dialect, which is a safer subset of C++. You can enable even more warning if you assist your compiler a bit and instruct it about your intentions. We will see how to do it in one case.

I have a function in my program for splitting strings. It has the following interface:

template <typename... T>  
bool split(string_view s, char sep, T&... tokens);

You can use it like this:

int a, b;
double c;
char d;
string s = "1|99|0.25|X";

if (split(s, '|', a, b, c, d)) {
  // use a, b, c, d
}  

That is, first the string to split, second the separator by which to split, and then a variable number of arguments of different types taken by reference. From the number and the types of the out parameters the function determines how many tokens are expected and in what format. While parsing the input string the function can determine if the split succeeded. In a way, it does two things.

  1. It checks if the number and the format of tokens in the string is the one expected.
  2. If so, it populates the variables from the scope with the corresponding values.

If the string is determined not to satisfy the expectations, it is not an “error”. Someone might be calling this function only to determine if the string is in the given format — nothing more. Thus, exceptions would not work here. Since the numeric values are communicated through out parameters, we can use the return value to indicate if the split was possible or not. However, there is a danger here: you cannot use values a, b, c or a unless function split returned true. But it is obvious that in a big program, sooner or later someone will forget about it, and he will just take for granted that the split must work, and will use the variables unguarded:

int a, b;
split(s, '|', a, b);
return b - a;

One might be tempted to just assign the initial values to a and b in order to “avoid” the “problem” later. But that would be very unfortunate. Using values a and b only makes sense when we know we have read them from the string. If not, whether they have some initial values taken out of the blue by the programmer or whether they have unspecified values is the same: in either case we have a logic error. However, if we go with unspecified values, we increase the chances that this bug will be found statically by the compiler or a tool like static analyzer.

Now, it might look like I am questioning the C++ Core Guideline ES.20 (Always initialize an object). But the real essence of the guideline is, “assign initial value upon declaration provided that you know this value”. If you do not have your value yet, and you already need the object (and its address), just initializing any value is not going to do any good.

So, we will likely get into the problem of using values we have not read. Exceptions would have prevented that, but using them is an overkill: there is nothing exceptional about a string (probably read from some config file, or user) not being in the expected format. Also note that sometimes we do have good defaults, and we only want to depart from them if the string matches the format:

int lo = 0, hi = 100;
split(s, '|', lo, hi); // ok to fail
return lo <= val && val <= hi;

Since C++17 you do not need exceptions to prevent unchecked failures from breaking into the program. We now have attribute [[nodiscard]]. We can use it to annotate our function:

template <typename... T>  
[[nodiscard]] bool split(string_view s, char sep, T&... tokens);

Not reading the value returned by this function results in a compiler warning. And if you turn warnings into errors: in an error. The user has to respond to this. For instance, with an if-statement:

int a, b;
double c;
char d;
string s = "1|99|0.25|X";

if (split(s, '|', a, b, c, d)) {
  // use a, b, c, d
}
else {
  throw bad_configuration{};
}  

Now the user is throwing an exception. This makes sense. It is the user who judges whether the situation is exceptional or not — not the implementer of function split. But the warning will now also appear in the cases where we consciously decided not to check the result:

int lo = 0, hi = 100;
split(s, '|', lo, hi); // ok to fail
return lo <= val && val <= hi;

Luckily, they usually go in pairs: a warning, and a construct to disable the warning. In our case, the idiomatic way to disable the warning is the cast to void:

int lo = 0, hi = 100;
(void)split(s, '|', lo, hi);
return lo <= val && val <= hi;

C++17 is quite new, but the attribute itself has been around for a while longer. In MSVC it is available since version 2017.3. In Clang it is available since version 3.9 (not restricted to C++17 mode); in GCC, since version 7 (not restricted to C++17 mode). In earlier versions of GCC (and therefore also in Clang) we have a compiler-specific alternative: attribute warn_unused_result:

template <typename... T>  
__attribute__((warn_unused_result))
bool split(string_view s, char sep, T&... tokens);

The only (sometimes annoying) difference is that you cannot silence this one with the cast to void, you need to read it. As per the explanation in bug report, attribute warn_unused_result is treated very strictly. No one should ignore the value, even consciously. You would use it for functions like malloc.

This entry was posted in programming and tagged , , , . Bookmark the permalink.

8 Responses to Help the compiler warn you

  1. Adam Badura says:

    In my experience possibility of ignoring the error / status result is always among the arguments given for use of exceptions. This would be a counterargument for it. Then again it is silenced way easier (including not noticing it while browsing through code) than you would need to silence an exception.

  2. richtw1 says:

    Over on the Reddit thread, someone suggested the following alternative, which I really like:


    template <typename... T>
    auto split(std::string_view s, char sep)
    -> std::optional<std::tuple<T...>>;

    if (auto tokens = split<int, int>(s, '|')) {
    auto &[a, b] = *tokens;
    // ...
    }

    • Thanks. This is a nice solution! (Although it does not illustrate my point well: that [[nodiscard]] is useful.)

      • robin says:

        I may be wrong, but when an attribute [[nodiscard]] is used, aren’t we always in this situation? I think the only place where we want to use [[nodiscard]] is where we return an error code. And in this case, I think that either exceptions or the technique of richtw1 is better.

        • The first STD function to use [[nodiscard]] is likely to be async, in order to avoid the problem described at the bottom of this post. And it has nothing to do with error codes. Similarly, function operator new would be a good candidate.

      • And it will not address my other use cases for function split():

        struct Range { int min, max; };
        
        void update (Range & r)
        {
          if (split(get_config(), r.min, r.max)) ...
        }
        
  3. heto says:

    I don’t think not initializing locals in the hopes of getting compile-time warnings is the right trade-off. The problem is that compilers are, as always, quite conservative with their warnings, and so they have to be quite certain that you are actually using an uninitialized value before they will warn about it. If you happen to have complex or hidden enough logic that the compiler hasn’t got the guts to warn you, and then you use the uninitialized value anyway, the behaviour can be quite bizarre. At least by initializing to a dummy value, you get the same behaviour every time you call the function, even if the behaviour is wrong.

    The situation is quite different in languages like Java and C#, where the language specifies rules for determining whether a variable is potentially unassigned, and mandates that all local variables being read must be provably assigned prior to reading, or the compilation must fail. Do you know of similar warning flags for C++ compilers, where all reads of a local variable will warn, unless the compiler can prove that the variable must have a known value either through assignment or through initialization?

    Then again, such a flag would partially defeat its own purpose, by making the programmer write initializations even though the programmer knows the uninitialized value is never used. Consider the split example in this post, but imagine it’s not a template, and the definition of the function is provided in another translation unit, so that the compiler cannot derive conclusions from it. You are passing references to local variables to the function, and the compiler cannot know that the function will not read the variables before assigning something to them. Hence, you must initialize the variables to avoid warnings. Furthermore, the compiler cannot know that the function will assign to the variables, so even to read their values yourself after the call without warnings, you would need to initialize them.

    • Correct. compilers will only warn when they are certain that I can use the value from an uninitialized object. I am fine with this. I do not expect the compiler to detect EVERY invalid read.

Leave a reply to heto Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.