Contracts, Preconditions & Invariants

In this post we will see what a contract is, how preconditions and invariants can be derived from the contract, and how this process can help detect bugs. Two points that I will be stressing in this post are: (1) preconditions and invariants are not “contracts” and (2) only a subset of contract-related bugs can be detected through preconditions and invariants.

Let’s start by comparing two function declarations that are equivalent form the type system perspective:

bool f(int a, int b, int c);

bool is_in_range(int val, int lo, int hi)
  // this checks if val is in closed range [lo, hi] 
;

There is a lot more information associated with the second declaration. The function name as well as function parameter names give us a hint as to what behavior we can expect of the function. In case this wasn’t enough, we have a comment explaining this in detail. For instance, we know that although lo and hi are just two integers, they will be interpreted as a closed range. We have sufficient information to use function is_in_range() correctly. The same cannot be said about function f() though. We do not know what it does. We do not know when to call it, and if it would be a good idea. We do not know what values to pass to it, and how they would be interpreted.

The difference between the two functions is in the contract. In this context, a contract is all the information that the user (function user, here) has to know in order to use the component (function) correctly. This information can be communicated by any means: good choice of names, comments, HTML documentation or a paper one, implied, communicated over the phone or face to face, also in the type system. Even for function f() above part of the contract has been communicated: that it will take three arguments of type int. That is something, although not enough: its contract is not complete.

With the contract in place, the user has enough information to use the component correctly, but it is still possible that she will use it incorrectly. For instance, the previous library used for dealing with ranges of integers used to have a similar interface, but took the arguments in different order:

bool OldLib::isInRange(int lo, int hi, int value)
  // returns true if value is in closed range [lo, hi] 
  // otherwise returns false
;

The user may have the old interface imprinted in her mind so deeply that even after learning the contract of the new library, she may unconsciously pass the arguments to the new function in the wrong order:

bool validate(int val)
{
  int lower = config.get("LOWER_BOUND");
  int upper = config.get("UPPER_BOUND");
  return is_in_range(lower, upper, val); // Bad order of args!
} 

This type of bugs is generally unavoidable. One way to reduce it is to do thorough code reviews. But humans are faulty, and in the same way as they make mistakes writing the code, they make mistakes reviewing the code. The best code review would be performed by a machine, but machines do not understand function contracts. The idea behind preconditions is to make as much information from the function contract as possible available to the machines. We cannot teach the machine the entire contract (at least not as of today), but we can insert “bug criteria” inferred from the contract. In our case, because lo and hi represent a closed range [lo, hi] it is never possible that a user willingly in a correct program passes value lo which is greater than hi. We could write it more explicitly:

bool is_in_range(int val, int lo, int hi)
  // this checks if val is in closed range [lo, hi] 
  [[contract::precondition(lo <= hi)]]
;

Interestingly, it is a valid C++17 program. But it may not be the best idea to write it this way: (1) compilers give a warning on this attribute declaration, (2) and they have reason: this declaration my give a false impression as to what it does: it does nothing: it is just a comment but with square brackets rather than slashes. However, if we were able to teach a tool to recognize these special comments, it could use them to rewrite our programs. For instance, an intelligent tool could rewrite the above function validate() to:

bool validate(int val)
{
  int lower = config.get("LOWER_BOUND");
  int upper = config.get("UPPER_BOUND");

  assert(upper <= val); // <- Generated from the precondition 
                        //    of function is_in_range()
  return is_in_range(lower, upper, val); // Note: bad order
} 

So, by declaring a precondition in our function, we allow the tool to inject the corresponding code anywhere our function is invoked. In case the injected code is an assert(), it can perform a check at runtime and abort the program should the check fail. Note that expression upper <= val is not my mistake. The automated tool that generated it simply did not give in to the confusion caused by the arguments being passed in the wrong order to is_in_range(). Value lower may be (correctly) smaller than value upper but if it so happens that val is less than lower the bug (arguments passed in the wrong order) will be detected.

But what if val is greater than upper? We still have a bug, and the assert() generated from the precondition will not detect it. Yes; and this illustrates the important characteristics of preconditions (and postconditions and invariants and similar): they are not for detecting every bug in a program that uses annotated functions. But it makes it easy to automatically detect some bugs.

By now, you must have wondered, would it not be better to define a type Range rather than using preconditions? The answer is: yes. It is not generally possible in all cases to turn a precondition into a new type, but in our case it is a perfect solution: it is very likely that we will reuse the new type Range in many places. For instance, not only function is_in_range() can use it, but also the type representing config:

struct Range
{
  int hi;
  int lo;
};

bool is_in_range(int val, Range r)
  // this checks if val is in closed range r
;

bool validate(int val)
{
  Range rng = config.get_range("LOWER_BOUND", "UPPER_BOUND");
  return is_in_range(rng, val); // ERROR: type mismatch
} 

Now, our bug (bad order of arguments) is detected by the type system! So it is clearly an improvement: bugs are detected earlier. But, we didn’t take care of all possible things that we could easily detect. For instance, the implementation of is_in_range is likely:

bool is_in_range(int val, Range r)
  // this checks if val is in closed range r
{
  return r.lo <= val && val <= r.hi;
}

But in order for this implementation to make sense, we have to trust that [r.lo, r.hi] is a closed range. So we still have a precondition to declare: it is just spelled differently:

bool is_in_range(int val, Range r)
  // this checks if val is in closed range r
  [[contract::precondition(r.lo <= r.hi)]]
;

Does this sound wrong? It does. Making sure that r.lo <= r.hi should not be our business: it should be the business of class Range. In fact it should have a member function that checks if a given value is in range. So, let’s try to define our type Range properly. First attempt:

struct Range
{
  int lo;
  int hi;

  bool is_inside(int val) const
  {
    return lo <= val && val <= hi;
  }

  int size() const
  {
    return hi - lo;
  }
};

This looks better, but still has the same problem: in order for is_inside to make sense, it is required that lo < hi. But how can we be sure of that? The same requirement applies to function size(), and likely to any other function in the interface of Range. They have the same precondition: lo <= hi. How can we guarantee that it is satisfied?

First, we cannot afford to have lo and hi public: if anyone can change them arbitrarily at any time, there is no way that we can guarantee anything. So, the second attempt:

class Range
{
  int lo;
  int hi;

public:
  explicit Range(int l, int h) 
    [[contract::precondition(l <= h)]]
  : lo(l), hi(h) {}

  bool is_inside(int val) const
    [[contract::precondition(lo <= hi)]]
  {
    return lo <= val && val <= hi;
  }

  int size() const
    [[contract::precondition(lo <= hi)]]
  {
    return hi - lo;
  }
};

Now, we can see that except for the constructor we have the same precondition for every public function: lo <= hi. This is hardly surprising. In fact, the whole point of having this class is to signal or guarantee that this condition holds. Otherwise we would have just used std::pair<int, int>. This condition that motivates the creation of a new class is called the class invariant. We can declare it once rather than repeat it upon every member function:

class Range
{
  int lo;
  int hi;
  [[contract::invariant(lo <= hi)]];
  
public:
  explicit Range(int l, int h) 
    [[contract::precondition(l <= h)]]
  : lo(l), hi(h) {}

  bool is_inside(int val) const
  {
    return lo <= val && val <= hi;
  }

  int size() const
  {
    return hi - lo;
  }
};

A class invariant is so tightly associated with the class that you can use it as a criterion for determining whether you need to define a new class or not. If you cannot think of a class invariant, then you probably do not need a new class either: a simple aggregate will do. (Although sometimes it is not possible or easy to express the invariant as a C++ expression: for instance an invariant can be that some resource is within our ownership throughout the lifetime of the object.)

You can notice one important thing, though: we didn’t get rid of the precondition in the constructor. We cannot fully guarantee that lo is never greater than hi. We depend on the callers of the constructor: that they pass an l that is no grater than h. A code that is using our Range can still have a bug:

bool validate(int val)
{
  int lower = config.get("LOWER_BOUND");
  int upper = config.get("UPPER_BOUND");

  Range r {upper, lower}; // Note: bad order
  return r.is_inside(val); 
} 

Thus, class invariant (much like a postcondition) is something conditional: the class can guarantee that it holds, only if its preconditions have been met.

Alternatively, one could think of defining the constructor of Range differently:

Range::Range(int l, int h) 
  // no precondition 
  : lo(l), hi(h)
{
  if (l > h)
    throw RangeError{};
}

This does guarantee that we literally never have Ranges with the violated invariant. But other than this, this solution has a number of downsides. First, exceptions are just a control flow: an invisible branch. They do not help detect bugs. As long as we have a precondition in place, a clever tool can make use of it. One example we have seen earlier is to inject checking code in the callers:

bool validate(int val)
{
  int lower = config.get("LOWER_BOUND");
  int upper = config.get("UPPER_BOUND");

  assert(upper <= lower); // <- generated from the precondition
                          //    of Range's constructor 
  Range r {upper, lower}; // Note: bad order
  return r.is_inside(val); 
} 

Macro assert() is a better tool for detecting bugs. But it is just one example of making use of preconditions. Other tools can use them for performing more informed static analysis. This opportunity goes away if we abandon preconditions and start throwing exceptions.

Second, there are more mundane issues with throwing exceptions. Not everybody wants or can afford to use them. You also cannot signal errors via exceptions from noexcept functions.

This entry was posted in programming and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.