Assertions

So, what is the point in still talking about assertions if so much has been told and written about them already? I believe that discussions about assertions which say that “they are useless because they disappear in production”, or that “they crash the program that just must keep running”, draw the attention from one of the most important part assertions play in program code. The main purpose of assertion is simply to be there, in the code, and for the programmer to be able to see what is asserted. What it does at run-time is of secondary importance, and assertions may as well do nothing at run-time (debugging or release mode) and they would still be useful. In this post, by word “assertion” I do not only mean macro assert from standard header <cassert>, but any similar tool used for similar purpose.

Syntax-checked comments

Putting run-time behavior of assertions aside for now, they resemble comments: no code will be generated from them. Yet, comments are useful for programmers, and therefore indirectly affect programs: correctness, how fast program code can be written, and therefore a new program released. Although comments are often treated as not being part of the code, they are an important language feature. So why not just use comments for assertions about program behavior? The following example should illustrate the answer.

int i = 0;
int j = 9;

for( ; (i < 10) ; (++i, --j) ) {
    array1[i] = array2[j];
}

// at this point x should be 0

Did you ever encounter code like that? What is this x supposed to be? Most probably, the code used to looked different; whoever changed it, didn’t change the comment along, as he didn’t consider the comment a code, or he simply forgot. Or the code may be the effect of copy-and-paste from another place, where the comment made sense. The comment was put on purpose. Was the information important? Does it still apply?

This is where assertions come in handy. Compiler parses their contents as any other piece of code. Checks names, types, validity of expression. If you forget about assertion while refactoring, compiler will let you know.

Improving syntax checks

Checking expressions in assertions for syntactic correctness does not always work. At least for C-asserts. Macro NDEBUG that controls assertion behavior does not in fact indicate a release build. It simply indicates that we want to disable assertions. You can disable assertions in debugging mode and enable them in release mode. And disabling here means also disabling syntax checks. The following will compile if NDEBUG is defined.

assert( ## !! aaa ~~ ");

Also, there are assertions that cannot be checked even in debugging mode. For example, binary search requires that the range be sorted. Verifying if the range is sorted has linear complexity whereas searching itself is logarithmic. If our function is called often and our condition is checked time and again even for medium-sized data, the assertion may be an overkill. For another example, we may assert that our range has 10 elements. There is an easy way to check that: just increment the first iterator 9 times and check if we reach the second one. But if our iterators are single-pass ones, verifying the condition spoils the range.

Not executing the predicates does not require compromising syntax checks. Assertion in these cases could be defined as follows.

#define ASSERT(_COND) \
  static_cast<void>( sizeof((_COND) ? true : false) ) 

In real life the definition would be longer to avoid compiler warnings, but you get the idea. sizeof has exactly the properties we need. The additional usage of ternary selection operator is required to verify that the _COND is contextually convertible to bool.

What not to assert

We are not going to mention where assertions should be preferred to runtime error reporting tools and vice-versa. This has already been covered by many authors ([1]). This is about differentiating between assertions and concepts from Design by Contract: preconditions, postconditions and invariants ([2], [3]). The following interpretation of assertions may be controversial, but I think it is a consistent and useful approach. Assertion could be interpreted as a guarantee that function author gives to himself (or other programmers that happen to be the authors of the same function in the future). Users of the class have nothing to do with it, because what is asserted are implementation details that are hidden from class users. In this sense a function precondition cannot be asserted with an assert, because function author cannot make any guarantees about the values that function users will be providing.

The situation is different for class invariants. In this case assuring that class invariant holds is exclusively in control of class author. However, the is no good place where assertion could be planted. Class invariant is a ‘property’ of class, and should be placed somewhere in class definition, visible to every class user. Asserion, in contrast, is an expression that needs to be put somewhere in (member) function’s definition. Invariant can be implemented as function where all the checks are stated. Such invariant function cannot use private class members because invariant by definition is a contract between class author and class user, and class user does not have access to any private members which are only implementation details. This is an example

class vector
{
    // ... 
    bool invariant() {
        return distance( begin(), end() ) == size()
            && empty() == (size() == 0);
    }
};

But who would call such invariant function is not clear. Invariant is supposed to hold after constructor execution succeeds, before the destructor is called and before any public member function is called, and after each public member function finishes (this does not need to apply to protected member functions though). Shall we plant the following code everywhere?

ASSERT( invariant() );

This is tedious, error prone, and would only work for “before function starts” checks. For the “after function finishes” part there is no good place to plant the assertion. The end of function’s scope is not a candidate, because destructors of its local objects will be executed after the assertion, and it might be only those destructors that (re)establish the invariants. A similar argument applies to postconditions. Also, postconditions are part of function’s interface, specification, whereas typical assertions check tiny implementation details, likely in the middle of your function.

What to assert

What are the typical places where assertions would be encountered? Where we say something about the consequences of the code that we wrote, and about something that, we guarantee, will always true. So does this assertion make sense?

if( vec.empty() ) {
    counter_ = 0;
}
else {
    ASSERT( !vec.empty() );
        ++counter_;
}

No, this is too obvious. Assert something that is obvious to you, but you expect it might not be obvious to someone else; or to yourself in the future. The following may be a good example.

int i = 0;
int	j = 9;
	
for( ; (i < 10) ; (++i, --j) ) {
    array1[i] = array2[j];
}

ASSERT( j == -1 );

Here, the programmer did some non-trivial thinking, to copy the first 10 elements of array2 into array1 in reverse order. His intention was that both indexes get out of range at the same time. He took some time to think about the algorithm and concluded that after the above loop the value of j will always be -1. In order to save you the effort of doing the same thinking process again (and again), he wrote that down in form of assertion. Another typical usage of assertions is in functions that are very long, where you have to scroll it up and down in your screen and forget what was the beginning before you reach the middle. An assertion makes a good check-point in such situations.

For one more example, consider the following code.

int i = getNumberOfPassengers(); 

if( i % 2 == 1 ) {
    handleOdd( i );
}
else {
    ASSERT( i % 2 == 0 );
    handleEven( i );
}

Is it not the same, too obvious, case that we have seen a bit earlier? No, in the former example we were asserting that if the boolean result is not true, it must be false. Here, the result of remainder is an integer, and integer can have many values. We are asserting that, since we divide by two, the integer will only have two values. This example also illustrates how assertions help detect errors. The above assertion really signalled an error in the running application. Do you know why? It took me a while to figure out. i happened to be -1. As a result, i % 2 was also evaluated to -1.

Tools for making use of assertions

This section is mostly fiction. I am not aware of such tools, although in theory they appear feasible. First, one could imagine a tool that performs static analysis of your functions and says for which function or program inputs assertion would fire, and thereby detect a design error in the code. Second, a static code analyser could generate unit tests based on the assertions you put. Third, and this is the most interesting one, some compiler could use assertions to apply optimizations to the code. Let’s see how this would work.

Compiler trusts you: if you assert that this predicate will always evaluate to true, it will; there is no point in even considering what happens when it doesn’t? Does it sound irresponsible? Well you could use one of the other techiques described above to first test your assertions in debugging mode, and enable this optimization in release mode. After all, in many places C++ follows the rule “trust the programmer.” Let’s go on. We need an undefined behavior that the compiler would understand. We could define it as follows.

#define UNDEFINED_BEHAVIOR() \
  ( *(static_cast<int*>(NULL)) = 0 ) 

Whenever this is executed in the program, according to the definition of undefined behavior, is allowed to do anything. Our imaginary compiler would have to be aware of the special macro UNDEFINED_BEHAVIOR. Next, we define our assertion as follows.

#define ASSERT(PRED) \
  static_cast<void>( (PRED) || UNDEFINED_BEHAVIOR() ) 

This now says that in case the predicate evaluates to true we have a no-op; in case the predicate evaluates to false the compiler is allowed to put whatever it wants — after all this situation will never happen. Now, how such assertion-based optimization would work? We start from example similar to one of the above.

int passengers = getPassengers();

switch( passengers % 2 ) {
    case 1:
        return handleOdd( passengers );
    case 2:
        return handleEven( passengers );
    default:
        ASSERT( false );
}

Our UB-aware compiler is allowed to insert any code in the default case, because it is certain the undefined behavior will be fired. It chooses the following:

switch( passengers % 2 ) {
    case 1:
        return handleOdd( passengers );
    case 2:
        return handleEven( passengers );
    default:
        return handleEven( passengers );
}

And now, since the two cases are identical, compiler can further change the code to:

switch( passengers % 2 ) {
    case 1:
        return handleOdd( passengers );
    default:
        return handleEven( passengers );
}

This technique has been proposed by Niklas Matthies in comp.std.c++ (see here).

References

  1. Herb Sutter, Andrei Alexandrescu, “C++ Coding Standards: 101 Rules, Guidelines, and Best Practices”.
  2. Matthew Wilson, “Contract Programming 101”.
  3. Bertrand Meyer, “Object-Oriented Software Construction”.
  4. Andrei Alexandrescu, “Assertions”.
  5. John Maddock, Steve Cleary, “Boost.StaticAssert”.
  6. Peter Dimov, “assert.hpp”.
This entry was posted in programming and tagged , . Bookmark the permalink.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.