Treating symptoms instead of the cause

Compiler warnings can sometimes help us find bugs before we even build our program, but it only works provided that we can make use of the warnings. Especially, when we are aware that warnings detect only symptoms of the bugs rather than bugs themselves.

By “bug” we mean the situation where our program compiles, and is a “correct” C++ program according to C++ Standard, but it does something else than what we intended. This could occur when we make a typo. Here is one instance:

int make_i(bool c1, bool c2)
{
  int v; // no value specified
  if (c1)        fill_1(v);
  if (!c1 && c2) fill_2(v);
  if (!c1 && c2) fill_3(v); // <- BUG
  return v;
}

Functions fill_1(), fill_2(), fill_3() are used to compute and set the initial value to the object of type int that is passed by reference. This interface where a to-be-initialized object is passed by reference is very unfortunate, but suppose this is an interface of a third party library that we have to use, and in fact we are writing a wrapper around it, so that in our program we will just use make_i() that simply returns the computed value.

But I have made a typo: the three conditions were supposed to cover all possible combinations of c1 and c2, but I forgot to type a bang before c2 in line 6. However, this is still a correct program, according to the rules of C++, and my compiler will accept it. But apart from compilation we can run a static analyzer which will take far more time than the compiler, but in turn will perform a deeper analysis of the program flow. I am using Clang Static Analyzer embedded in clang.

A static analyzer is not clever enough to be able to detect that I wrote something else than what I intended: it does not know my intentions. But it can look for things that are so fishy that they can hardly be considered a programmer’s intention. One such thing is returning an object of type int (or similar) with unspecified value. And it so happens that as a result of my bug I ended up returning an unspecified value. One of the checks in the static analyzer, called core.uninitialized.UndefReturn, looks for exactly this type of fishy situations. It reports a warning. It does not know that I have a bug, but it does see a fishy situation. When I see this report, I have sufficient information to find the bug myself. The important distinction here is: it is me who found the bug (missing bang operator), not the static analyzer. The static analyzer only found that an unspecified value is returned, which is something different than a missing bang. But the return of an unspecified value is a symptom of a bug. Without this symptom, I would not be able to see that there is a bug in function make_i() that I should look for.

This reasoning may sound obvious, but some programmers get the idea wrong. When they see a warning in function make_i() saying that v is potentially returned containing an unspecified value, they immediately “fix” the problem by changing the function to:

int make_i(bool c1, bool c2)
{
  int v = 0;
  if (c1)        fill_1(v);
  if (!c1 && c2) fill_2(v);
  if (!c1 && c2) fill_3(v);
  return v;
}

And they are confident that they made the program better because there was a warning reported before, and now there is no warning reported. What they do not realize is that they have only removed the symptom, but its cause — the bug — is still there, except that now there is nothing to warn us about it.

This idea to assign the initial value did not come out of nowhere. This is a misunderstood advice, which is otherwise good, that you should assign meaningful values to your objects upon initialization. But this advice is talking about meaningful values. And because of the constraints of the interface of functions fill_1(), fill_2() and fill_3(), we do not have a meaningful value to start with, so using some arbitrary value 0 which we will overwrite anyway does no good to anyone. And this situation is not uncommon. Consider reading value from a stream:

int i;
std::cin >> i;

Interestingly, in other languages that do assign deterministic values to objects of type int upon initialization, there is no unspecified value, and a warning about a bug like the one above cannot be produced. These languages do the same thing as the described unaware programmer. This is why I value C++ over many other languages: that it offers more opportunities to catch bugs statically without even running the program or even running unit tests. This is worth considering: unspecified values can help detect bugs.

This entry was posted in programming and tagged , , . Bookmark the permalink.

8 Responses to Treating symptoms instead of the cause

  1. Bruno says:

    What other languages are your talking about? For example in Java, the compiler is required to diagnose your make_i function as an error. No static analyzer needed, no warning, an error. If the variable is marked final, the compiler does an even stronger analysis (search for “Java definite assignment”).

    For a long time, language designers know that there are better tools to detect bugs than unspecified values. Unspecified values are a source of indeterminism, and hard-to-find bugs.

    • Thanks for the feedback. Indeed, I had Java in mind when I wrote that. It is cool that Java can turn this into an error for local variables: this is definitely superior to making this legal and then tracing back in search for problems. However, the same logic does not apply to class members: they are silently initialized to default values. And then it makes you wonder: did the author forget to initialize (a bug) or was she aware what default value would be assigned and worked based on this assumption?

      If there is a superior way, then it should be definitely used. Java can do it for automatic variables (there is some static analysis employed here also, even though we do not mention “static analyzer” explicitly), because it doesn’t provide reference semantics: it only provides value semantics and pointers. Therefore the state of the object (an int or a pointer) cannot be changed only because it is passed as argument to a function. In case of reference semantics in the language, such analysis is not possible in general:

      void read_value(int& i);
      
      int main()
      {
        int i;
        read_value(i); // fills value?
        return i;
      }
      

      You could argue that la language without reference semantics is superior to a language with reference semantics.

      But in more complex cases (class member), Java compiler cannot perform this analysis and has to resort to assigning default initial values. Maybe I should have been more explicit about this.

      Now, a code like this produces nondeterministic values:

      int get_i()
      {
        int i;
        return i;
      }
      

      But here the root of the problem is not that the language allows that, but that you wrote the code that causes the indeterminate value to be returned. It is more difficult to track such bug dynamically in debugger, but it is easier to track it statically through static analysis or code reviews. I agree it would be better to get the unspecified value only upon request:

      int get_i()
      {
        int i [[uninitialized]];
        return i;
      }
      

      And have the compiler warn you if you do it by accident.

      • Bruno says:

        Hi Andrzej,

        Thanks for the reply. Indeed, Java does not apply the same analysis mechanism for class members (unless final members), but Java could have, by analyzing every constructor of the class. (The Coverity static analyzer checks constructors in this manner for C++.)

        Note that C++ similarly uses the default constructor for class objects without an initializer, and this is prone to the same confusion. When you see:

        std::string result;
        </pre
        
        You can wonder wether the programmer meant the empty string or an unspecified string. In this case, C++ programmers like Java programmers have to go look elsewhere to find the answer.
        
        (begin side-remark)
        
        I sometimes find it more clear to write an initializer, e.g. for a local:
        
        
        std::string result = "";
        

        But tools like clang-tidy will insist on removing the initializer:
        https://clang.llvm.org/extra/clang-tidy/checks/readability-redundant-string-init.html
        https://clang.llvm.org/extra/clang-tidy/checks/readability-redundant-member-init.html

        (end side-remark)

        For a function like "read_value", I would prefer to track in the type system that parameter i is an output. When compiling the function itself, the compiler could ensure that the output is assigned on all execution paths, and when compiling callers of the function, it can make use of this information. I don't think that stands in the way of the option to "build-in" static analysis, like Java did for local variables.

        In the end, in my view, unspecified values in C++ are first and foremost a performance optimization.

  2. Yankes says:

    I very like C# approach. Code you write will be hard compiler error. Even adding `!` will not fix it.
    You need structure whole code to guarantee that variable is set:

    int make_i(bool a, bool b)
    {
      int v;
      try
      {
        if (a)
        {
          fill_1(out v); //`out` is buildin mechanism guarating that value will be returned or function will throw
        }
        else
        {
          if (b) fill_2(out v) else fill_3(out v);
        }
        throwingFunction();
      }
      catch
      {
        v = 0; //`v` there is unitalized, you can use `throw` or return other value.
      }
    
      return v;
    }
    

    This system have limitations, only simple checks are done but you can easy organize your code around it to get guarantee that every code path fill variable.

    • Kristof Kerekes says:

      Usually in C++ codebases readability is more valued, than using these patterns all over the code. Also, take performance into consideration, in C++ you would have to take all the overhead runtime without any benefits compile time.

  3. Lesley Lai says:

    I think instead of uninitialized value, we should use immediately invoked lambda in such cases.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.