Preconditions — Part III

In this post, I examine a couple of cases and try to answer the question when and how to specify preconditions, and when it is better not to do it. I believe it gives a deeper insight into the nature of preconditions.

Update. The discussion on specifying preconditions for protected member functions has been changed based on feedback from Elron.

Stating preconditions

First, it is worth to observe that not every function has a precondition. Consider these:

bool invert(bool b);
unsigned XOR(unsigned a, unsigned b);
void output(std::string s);

They are well-defined for every possible value of the argument type. How about this one?

BigInt& BigInt::operator+= (BigInt const& b);

The addition of two integers is well defined for every pair of integers. And we know how to write the algorithm for it. But what if the addition would require allocating more memory, which is not available on the platform? Note that class BigInt might provide auxiliary functions for checking how much memory the object occupies, and you should be able to check the amount of available memory in the system.

This is how I see it. Type BigInt’s purpose is to represent the mathematical concept of integral number. Even if there are some auxiliary functions, it is still primarily about integral numbers. While it is obvious that the implementation of BigInt will require dynamic memory allocation and growth, the contract between library author and library user is that the implementation details will be hidden behind the interface, and the user is allowed to think about it as infinite precision integral number. But even an oblivious library user knows that computer memory is far from being infinite and the guarantee of representing infinite precision integral number cannot be fulfilled if we try to use really big numbers, or the system runs short of memory for other reasons. This tension between providing meaningful abstractions on the one hand and computer system limitations on the other is elegantly solved by exception handling mechanism. The contract for operator+= should probably be: it either renders a value as per the mathematical concept of integral number or throws an exception in case there are difficulties with fulfilling this guarantee. This is a prominent example of what exceptions are for: the client satisfied preconditions, but we still cannot guarantee postconditions. This is one of Herb Sutter’s pieces of advice. You can find more in this article: When and How to Use Exceptions.

How about this one:

BigInt BigInt::operator/ (BigInt const& b);

My personal preference: a non-zero divisor should be a precondition. The function should be allowed to assume that it will never get a zero divisor. How is the situation different from the previous one?

  1. The condition is related to integral numbers’ domain. In fact, the precondition enforces the domain in the mathematical sense.
  2. The user can easily verify that the precondition holds. In the worst case, she can simply check the value in the if-statement.

Am I not allowed to throw in such case? Given the definition of the precondition from Part II, violating the precondition is a UB: you can do anything; you can throw. But the user is not allowed to assume that you will throw. To her, the result will be unpredictable.

This is one option. There is another. You can specify in the contract that any value of the divisor is allowed, including 0. In such model, the behaviour of the function is: in case b != BigInt{0} the result is that expected of integral numbers in mathematics, otherwise the function throws an exception.

Some libraries and languages implement this latter way; but it feels wrong. Now the function somehow mixes two things. The mathematical abstraction is mixed with software-specific considerations. Division in mathematical sense is not defined for zero divisor. Our operator/, on the other hand, now would be well defined. The only reason that motivates people to nonetheless adopt this throwing solution is that sometimes we do not want the program to crash or enter a UB only because some third party module (which is not necessarily critical to our program) has an internal logic error. For instance, a server may be processing different client requests, the requests being pieces of code, and if there is a bug in one request, we do not necessarily want the server to crash.

I personally do not promote this option. I understand that it might be well justified in some situations; but it looks to me more like a workaround for the lack of the proper mechanism of handling preconditions in the language.

While either of these options is possible, it does not make sense to both express a precondition that you should never give a zero divisor, and at the same time to guarantee that you throw on a zero divisor. You either specify what happens on zero divisor or you don’t. Stating a precondition is just saying explicitly that you choose not to specify what happens in such case. Note that this opinion differs from the approach taken by the C++ Standard. In § 17.6.4.11 it says “Violation of the preconditions specified in a function’s Requires: paragraph results in undefined behavior unless the function’s Throws: paragraph specifies throwing an exception when the precondition is violated.” I happen to disagree with the Standard here.

Next thing to bare in mind is that expressions in preconditions should be referentially transparent or pure. For instance, the following just doesn’t even make sense:

void createFile(string path);
// precondition: removeFile(path)

Remember that preconditions are typically not evaluated at run-time. The above is an invented example, but sometimes you might really get hit by it. I did this mistake in Part I:

template <typename IIT> // requires: InputIterator<IIT>
void displayFirstSecondNext(IIT beg, IIT end);
// precondition: std::distance(beg, end) >=2

Is function template std::distance referentially transparent? The answer is: it is not templates that can or cannot be transparent but functions. Some functions instantiated from this template will be pure, others will not — it depends what iterator type the template will be instantiated with. Consider InputIterator. In the worst case (of std::istream_iterator) incrementing the iterator invalidates other iterators referring to the same stream. This is a tricky behaviour: by changing our internal copies of objects (iterators), we alter (invalidate) other, external objects. Function std::distance does increment iterators. If our precondition was to be evaluated, this might cause a UB in the program. Evaluating the precondition at run-time is one possible (although not the best) way of validating program correctness. Here we have mentioned what Matthew Wilson calls the principle of removability: a precondition may or may not be evaluated at run-time. Calling it should not affect the program behaviour (relative to not calling it) — provided that the program is correct.

If we had a tool that recognizes and evaluates preconditions at run-time, there would have to be a way to say that it can be evaluated only for certain types. Using some invented syntax:

template <InputIterator IIT> // <- concept syntax
void displayFirstSecondNext(IIT beg, IIT end)
[[ enable_if(Regular<IIT>) ]] precondition { 
  std::distance(beg, end) >=2 
};

The [[]] notation is the new attribute syntax in C++11. Concept Regular is one of the most popular ones in STL (even though STL has no concepts). It has been described in detail here. In short, one could say that it is a ‘normal’ type with no ‘surprises’ (unlike std::istream_iterator, where the ‘remote invalidation’ of different objects is ‘surprising’). The attribute preceding the precondition says, “in case you need to evaluate the precondition at run time, do it only for regular types.”

Some precondition checking frameworks (like one proposed in N1962, or one implemented in Boost.Contract) require (verify at compile-time) that only these functions/operators can be invoked in preconditions that take the input parameters by value or const reference. This makes sense, because const is used to indicate that you do not intend to modify the argument. However, it should be noted that it is still possible to modify the argument taken by constant reference (e.g., by using const_cast), and that there exist pure functions that take arguments by mutable references. There is also the question of function taking arguments by pointers. In such case, the const-correctness in preconditions will not help avoid all potential mutations to the checked values, so in the end programmers still have to rely on the discipline.

Let’s consider another example:

File openFile(string fname);
// precondition: fileExists(fname);

Is this ok or not? Compared to mathematical abstractions, file system operations are very fragile and very impure. The caller cannot guarantee that function fileExists will find the file. Even if it checked it the second before calling the function, the file might have disappeared since then. Some other thread could have erased it, or some other program on the same computer, or the file might have been on the shared drive, which someöne has just disconnected. It is better to let function openFile throw if the file does not exist.

Sometimes, verifying if the precondition holds is too difficult, or requires calling the function whose precondition we are checking:

int add(int a, int b)
// precondition: "a + b does not overflow"
{
  return a + b;
}

How would you express the precondition? One could consider writing a ‘dummy’ function that does not really evaluate the real predicate, but can be used to express the precondition:

bool additionDoesNotOverflow(int a, int b)
{
  return true;
}

int add(int a, int b)
// precondition: additionDoesNotOverflow(a, b)
{
  return a + b;
}

This approach has been adopted in N3351, for function eq().

Another interesting case is function sqrt again:

double sqrt(double x);
// precondition: x >= 0

Should we also specify in the precondition that x must not be NaN? In this particular case, we already covered it, due to the special rules for comparison with NaN: they always return false. But what if we were writing function sin: should we write:

double sin(double x);
// precondition: isfinite(x)

This looks convincing. But truth to be told, I never in my code cared about NaN. When dealing with weights, I just assumed I would get a normal, finite number and just forwarded my numbers to built in functions.

Technical details

Consider the following function:

double f(double x, double y);

You require that both x and y are non-negative. They are two ‘independent’ assertions. We could combine them into one predicate:

double f(double x, double y);
// precondition: x > 0 && y > 0

The expression is obviously correct, but it has certain negative implications inside preconditions. Imagine that you have some tool that is able to read the assertions in preconditions and evaluate them at run-time. Suppose a precondition check detects violation, the report will say that either x or y was negative, but we will not know which one. Some useful information will be lost. Therefore a framework for precondition support (like one proposed in N1962, or one implemented in Boost.Contract) will typically allow you to specify multiple preconditions for a function. A syntax for preconditions in C++ could look:

double f(double x, double y)
precondition{ x > 0; y > 0 };

Next thing to remember, especially when using comments for expressing preconditions, is to use only use class’s public interface to express the predicates in the precondition. Preconditions are part of function’s interface. If the caller is supposed to guarantee that a precondition holds, she should also have means (a valid expression) for checking the precondition. Consider:

double Matrix::operator()(unsigned i, unsigned j) const;
// precondition: i < myRowCount_;
// precondition: j < myColCount_; 

Using names myRowCount_ and myColCount_ may be natural to the class author, because he uses the names all the time. But the user may not (and should not) be aware of their existence. Even if you did not intend to provide functions checking the number of columns and rows in the class, you can consider adding them only for the sake of being able to express the preconditions.

Should we be specifying preconditions for private functions? Who can call a class’s private member functions? Its other member functions, its nested classes and its friends. All of these can be considered something internal to the class; or authorized by the class to do tricky things with the guts. But internally, between themselves, they don’t have to guarantee anything to one another. A precondition is part of the contract between the class and external components. Only for them does it make sense to specify the contract. In case of private members you would be specifying a contract between you and you. This still might nonetheless prove useful (and in that case you could also use private functions for specifying preconditions). This is what assertions are for. But it can hardly be still seen as a contract.

Somewhat similarly, should we be specifying preconditions for protected member functions? On the one hand, protected members cannot be used directly by class users. Their purpose is to be used in subclasses to implement other public member functions; and therefore they are somewhat similar to private member functions. On the other hand, a set of protected member functions is given to other people. That is, you write the class with protected members and some other programmer will have to use them. He has to know what the contract is for them, in order to use them correctly. In this sense protected member functions constitute an interface, but to a limited audience.

And that’s it for today. In case you disagree with some claims here please point it out in the comment; suggest an alternative. I am just learning the preconditions, and it may be an opportunity for me to discover something new.

In the next post on preconditions I will describe how I imagine an ideal mechanism of handling preconditions in the language.

This entry was posted in programming and tagged . Bookmark the permalink.

5 Responses to Preconditions — Part III

  1. Krzysiek says:

    Hi Andrzej,

    I did not yet have the time to read the whole article, but I agree that sometimes preconditions should not be specified.

    In the case of BigInt you should just try and add two values, and if it causes memory exhaustion, you should just handle the resulting error (exception). Checking available memory before calling the function would introduce a race condition anyway (what if between the check and actual call another thread/process allocates all the memory?).

    Sometimes “check and do” is useless, and you should “do and handle resulting errors if any”. How do you check that you have enough space on the stack to call a function? You don’t. You just limit your recursion depth to some level that seems right and assume that you can always call a function. The same should be with addition of two BigInts. You assume you can always do it, but if it proves otherwise, you just handle the bad_allocation exception.

    Krzysiek

  2. Elron says:

    “Protected members are not part of class’s public interface (which must define the contract with the users). They are implementation detail that can be used in subclasses to implement other public member functions; and therefore they are similar to private member functions.”

    They are not “public” in the meaning of the C++ keyword, but they are public in the sense that some third party may use them. The third party still needs to know the preconditions, if any.

  3. red1939 says:

    I don’t quite get your reasoning behind UB vs explicitly stated exception. I agree on the guideline that if the user is powerless to change auxiliary state (existence of file/database/whatever) we should throw, and that has to be documented. This way exceptions become our “interface”.

    On the other hand as denominator of 0 is not defined in our domain you propose to throw it into UB box. What is a UB? UB, for me, is like saying: “beware, I might `rm -rf /` your system!”. Under this assumption, every sensible developer will read the documentation and be super careful about passing arguments to your function.

    The big question is: how unrestrained you are while “handling” UB? Will you throw, log, rely on CPU interrupt, expect inner library exception? What I want to say is that as a client of some library I would like to know that it handles my invalid input in some “acceptable” way and won’t silently ignore my mistakes or mutate object state to a point that I won’t be able to tell when and why I made the mistake.

    • «I don’t quite get your reasoning behind UB vs explicitly stated exception. I agree on the guideline that if the user is powerless to change auxiliary state (existence of file/database/whatever) we should throw, and that has to be documented. This way exceptions become our “interface”.»

      — Thanks for bringing this up. This is a good question to start a discussion.

      «On the other hand as denominator of 0 is not defined in our domain you propose to throw it into UB box. What is a UB? UB, for me, is like saying: “beware, I might `rm -rf /` your system!”. Under this assumption, every sensible developer will read the documentation and be super careful about passing arguments to your function.»

      — Yes, this is also my view of UB: the function is not constrained by any contract any-more. It is allowed to do anything. It is allowed to do different thing each time. Of course, no reasonable implementation will try to execute malicious things. Typical actions include signalling error at language level (e.g. throw an exception), take other security measures like dumping the core, launching a debugger or terminating the program; and also just forward the call to lower level APIs and break their preconditions (and trigger their UB). This is the case of std::vector::operator[]. It has an obvious precondition, and it does not try to check it (at least in NDEBUG builds) and if you give it too big index, you will likely obtain access to some random chunk of memory. This is also a good example why some functions chose not to check the preconditions. You often chose C++ as your language because you need efficiency. Evaluating an additional condition that already must be satisfied may turn out to unacceptably slow down the program. True, the users have to be super careful. But this is a good advice for software development: be super careful when coding.

      «The big question is: how unrestrained you are while “handling” UB? Will you throw, log, rely on CPU interrupt, expect inner library exception? What I want to say is that as a client of some library I would like to know that it handles my invalid input in some “acceptable” way and won’t silently ignore my mistakes or mutate object state to a point that I won’t be able to tell when and why I made the mistake.»

      — Your expectation is valid. But what do you mean by ‘invalid’ input. What is an ‘invalid’ input — formally, to the program? What makes the input ‘invalid’? Lets go back to our division-by-zero example. Is providing a zero denominator invalid? Why? How is valid input different from an invalid one? As a library client, do you want the guarantee that if you give z zero denominator, the function will throw an exception? Then make zero a valid input to the division. As in this post. Say “any input is valid, but semantics are different for 0 than for any other number.” You do not need a precondition for that; especially a precondition as a language feature.

      The way I see preconditions (many people will differ in opinion, I guess), is that they help you state explicitly what values of input your function is not prepared for. Then the users know what not to feed it with, and automated tools (like static analysers) know how to help analyse the code to detect violations. If your function is prepared to handle any value — even if this handling means throwing an exception — the function does not have any precondition.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.