Preconditions — Part II

In this post I will continue sharing my thoughts on preconditions. It will cover some philosophy behind the concept of preconditions (and bugs), and investigate the possibility of employing the compiler to verify some preconditions. Many people provided a useful feedback on my previous post. I will also try to incorporate it into this post.

What preconditions can(not) help with?

First, I want to share one observation, which may be obvious to you, but I realized it only recently. Preconditions (an other contract specifications) do not prevent (or, deal with) all kinds of of bugs. They only address the problems of the misunderstanding in or the underspecification of the interfaces.

When I try to recall the bugs that I have planted into my code in the recent years, it is usually using the wrong Boolean expression:

if (!isConditionA || isConditionB) {
  relyOnConditionB();
}

Whereas I really meant to say:

if (!isConditionA && isConditionB)

I could have as well put the wrong Boolean expression into a precondition check. We can easily be misled to think that the functions we write may contain bugs (and we need to check these) but the checks we make are always correct. The same, in fact, applies to unit tests and all other kinds of tests.

Preconditions fill the missing gap in the specification of the interface. To illustrate this, if you treat functions in the code as mathematical functions, preconditions help you narrow down the domain of function arguments. If function f accepts any (but small enough) integer, you declare it:

void f(int i);

Now, if some function f1 is only well-defined for non-negative integers, you declare it as:

void f1(unsigned int i);

That is, you use the type system and its safety features to enforce at compile-time the constraints on the function domain. But now, if some function f2 is only well-defined for integral arguments greater than two. What do you do? Specify a new type capable of storing only integral numbers greater than two? This is a possible solution, although not that easy as just using a built-in type. Given the difficulties in implementing a new safe type, specifying a precondition is an attractive alternative.

There are obvious disadvantages of preconditions when compared with type safety checks: the correctness cannot be enforced that easily at compile-time. The additional run-time checks look more like a poor workaround. Therefore, they should ideally be used to compliment the type-safety checks in places where the latter are infeasible. While preconditions offer less correctness guarantee, they do offer some guarantee; for instance they can constitute a useful hint to static analyzers. We will try to address it in the next part.

Even without any tool support, preconditions are there for the programmers to see them. They help write and maintain the code. For instance, suppose you are fixing a bug in the following function:

double ratio(int nom, int den)
{
  if (den != 0) {
    return double(nom) / den;
  }
  else {
    return 0;
  }
}

You see that the return 0 is fishy. You want to replace it with instruction that throws an exception. But, you ask yourself a question, "am I going to break other people's code that may rely on the fact that our function returns 0 when den is 0?” If this function specified a precondition which says that den != 0, you have the answer: even if some people rely on the current behavior, they do it illegally: one should not make any assumptions about the behavior of the function whose precondition is broken — at least under the definition of 'precondition' that I try to advertise here.

Given the above picture of a precondition, the purpose of specifying preconditions is very similar to that of specifying concept axioms.

The definition of a precondition

Most of the people would agree with the statement that "the caller should satisfy the function's precondition before calling the function." The point of the disagreement and misunderstanding is this 'should'. Can the function take for granted that its precondition is met? Or should it do quite the opposite: assume that the precondition is not met and try to prove it (before doing any other thing)?

Resolving the ambiguity around the 'should' is very important. Without it, we are back to the problem that we started the previous post with: the one with function checkIfUserExists. If one developer expects that the function always checks the precondition and the other assumes that it can rely on the caller to check the precondition, we have the bug again — even though we are specifying preconditions diligently!

Interestingly, there is no one good answer to this question. On the one hand, imagine that std::vector’s operator[] checked for the out-of-range index each time it is called? In production code? This would kill the performance. And one of the primary reasons people decide to use C++ is performance. This is exactly why the operator specifies the precondition: in order not to check it itself.

On the other hand, in functions where performance is not critical, not checking for precondition violation that could cause serious damage, would be unwise. Also, even for std::vector’s operator[] performance is not critical on some occasions, especially in testing/debug builds.

I have already provided my definition of the 'should' in the previous post, which tries to address the two contradictory expectations. Calling a function whose precondition is not satisfied results in undefined behavior. This means that the caller is not allowed to make any assumptions about the results of the function call; and that the function is allowed to do anything that it sees fit. This 'anything' allows the function author to do the following things:

  1. Optimize based on assumption that precondition always holds.
  2. Verify the precondition and report the violation by any means (call std::terminate, throw an exception, launch debugger, etc.).
  3. Pick 1 or 2 based on other factors (like the value of macro NDEBUG).

"The caller may assume nothing" --- this is also very important: even if our function throws (today) on precondition failure, the caller cannot rely on this and make decisions which control path to take in cases she catches or doesn't catch the exception. This means that the following implementation of function AbsoluteValue is invalid:

double sqrt(double x)
// precondition: x >= 0
{
  if (x < 0) {
    throw std::domain_error{"negative number to sqrt"}; // ok
  }
  else {
    // compute...
  }
}

double AbsoluteValue(double x);
// precondition: true
try {
  sqrt(x);
  return x;
}
catch(std::domain_error const&) {
  return -x;
}

This is because AbsoluteValue relies on UB. Or, to say it more formally, if the above function AbsoluteValue is called with negative x behavior is undefined. (One of the possible outcomes of a UB is obtaining the expected results.)

Preventing precondition failures

The only proper way of dealing with precondition failures (or, in general, with bugs) is to eliminate them. Of course, this is not possible in general and some measures need to be taken to respond to bugs at run-time. This is not the ideal way, but turns out necessary. But you know how to do it, and we will not discuss it in this post. We will focus on preventing bugs. We already indicated the method by showing how unsigned int can be substituted for int. That example was not the best one: the implicit conversion from int to unsigned int is likely to spoil our efforts, so we will proceed to more robust ones.

A couple of people commented in my previous post that the discussed examples would be "secured" if we used more constrained types:

bool checkIfUserExists(Name userName);
// 'Name' never contains spaces or punctuation marks

NonNegative sqrt(NonNegative x);

Only the name of the type sends a clear message to the callers: do not give me just any string. The compiler will warn us if we just pass std::string to checkIfUserExists. Or will it not? Let's consider how the definition of type Name could look like. Here is one possible implementation

class Name
{
public:
  explicit Name(std::string const& r);
  // precondition: isValidName(r)

  explicit Name(std::string && r);
  // precondition: isValidName(r)

  std::string const& get() const;

  operator std::string const&() const;
};

Three things to be noted. First, the constructor is explicit: we want the caller to mention type Name explicitly. It is as though requiring the caller to put his signature: Yes, I guarantee that what I pass is a string suitable for a Name. Second, the conversion from Name to std::string is implicit. This is safe because any Name is also a valid std::string. When converting this way, the constraints are relaxed. Third, we need to specify the precondition! We did not get rid of the initial precondition --- we moved it to another place. This is how we can use our type.

bool autheticate()
{
  std::string userName = UI::readUserInput();
  return checkIfUserExists(userName); // compile-time error 
}

Does it solve the problem? It does prevent mistakes, where the caller naïvely assumes that userName is a valid user name. The caller can still write:

bool autheticate()
{
  std::string userName = UI::readUserInput();
  return checkIfUserExists(Name{userName}); // I know what I am doing!
}

If userName has malicious content, we will still get the same unexpected results. But at least the necessity to type Name{userName} is likely to force the caller to think for a second.

In case we chose to have Name’s constructor check the precondition at run-time and signal it somehow, the error is signaled before the function checkIfUserExists is executed. This is a very nice feature because it makes it clear to all tools (like debuggers and dump generators) that the bug is not in the function to be called, but in the caller.

The real benefit of this method is visible if we can also make function UI::readUserInput return a Name:

bool autheticate()
{
  Name userName = UI::readUserName();
  return checkIfUserExists(userName);
}

This way we introduce a generally useful abstraction to our program: Name, which represents a (typically short) sequence of letters and digits with no white spaces or punctuation marks. The "generally useful" part is important, because we can also use it in other parts of the application to represent other things than user names: identifiers, short codes. The "typically short" part is also useful, because we can apply certain optimizations to our class: e.g., store the entire string within object storage and resort to heap memory only in exceptional situations. This is where stack allocators can help. Similarly, type NonNegative also represents a generally useful abstraction, especially that with user-defined literals, we can additionally guarantee that no negative literals are allowed:

NonNegative sqrt(NonNegative x);

NonNegative y = sqrt(2.25_nn);  // "_nn" indicates a NonNegative literal 
NonNegative x = sqrt(-2.25_nn); // ERROR: negative literal

To see how this trick can be implemented see here. However, in general not every precondition renders a generally useful constrained type. Imagine the following case:

int fun(int i);
// precondition: i > 3

Of course, there is a way to construct a constrained type from this precondition. There even exists a library for making it easy: Constrained Value. But how often do you need to use type IntGreaterThan3? If you introduce such type only for the sake of one function's precondition, the costs may outweigh the benefits. Introducing a new type comes with a cost. Defining new types is not trivial in C++. You need to consider if your type is going to be copyable, movable; how do you allow it to be constructed; how it handles precondition failures. By introducing a new type you risk introducing new bugs. Also, the additional type increases compilation time, memory of your IDE: it now has to remember another type, give hints for it. We never think this way, because typically we introduce a type to solve some problem, or reduce program's complexity. In our case, we risk adding the complexity to the program only to detect situations that shall never arise anyway.

Above, we have shown only one possible implementation of type Name. There exist other possibilities. For instance, consider this one, that almost requires specifying no precondition.

class Name
{
private:
  explicit Name(std::string const& r); // precondition: isValidName(r)
  explicit Name(std::string && r);     // precondition: isValidName(r)
  friend optional<Name> makeName(std::string && r);

public:
  std::string const& get() const;
  operator std::string const&() const;
};

boost::optional<Name> makeName(std::string && s)
{
  if (isValidName(s)) 
    return Name{std::move(s)};
  else 
    return boost::none
}

While the precondition is still there, the constructors are private, and the requirement that the precondition should hold is no longer the contract between the component user and the component author, because the component user cannot ever call the constructor directly, and therefore, she cannot ever fail to satisfy the precondition. Now, our user can only type:

bool autheticate()
{
  auto userName = makeName(UI::readUserInput());
  if (userName) {
    return checkIfUserExists(*userName);
  }
  else {
    // REACT
  }
}

By optional I meant Boost.Optional library. If you need more C++11 features (like move constructor) with it, you can try this version. Only for brevity did I put the innocent-looking REACT. In reality, when the compiler identifies the problem, the user's reaction may be more invasive into the code than just adding an if-statement. It may even require changing the signature of authenticate, because it is not clear if we want to signal in the same way two different things: (1) that the entered user name does not exist and (2) that there is a bug in the code. You may also not like this implementation of Name because it adds a run-time overhead that may prove unacceptable. Even if you somehow can prove (e.g., using static analysis) that you never pass invalid strings, you still pay the penalty of always performing the correctness check. In this particular case (where we need to talk to the database) this will not be the problem, but in general, you do not know if it is affordable, or even possible, to check the precondition; therefore this technique cannot be generalized to every precondition.

And note one other notable thing with this "optional" solution. We still have one other precondition:

template <typename T>
T& optional<T>::operator*();
// precondition: bool(*this)

So, we just traded one precondition for another; hopefully, a one that is better known to everybody.

Leaving a particular implementation aside, there are some issues to be addressed in general when using constrained types to replace preconditions. First, let's try to imagine how the updated implementation of function checkIfUserExists might look like:

bool checkIfUserExists(Name userName)
{
  std::string query = "select count(*) from USERS where NAME = \'" 
                    + userName.get() + "\';"
  return DB::run_sql<int>(query) > 0;
}

Notice the call to member function get. Do you just trust object userName that it will return a good string value, or should we check the precondition for safety:

bool checkIfUserExists(Name userName)
// precondition: isValidName(userName.get())

My answer to that is: yes, make every assumption explicit. But are we not introducing type Name for nothing? The answer to this question is "no." As said above, by using type Name we disturb the smooth compilation of the program for people who forget that some user inputs may not be valid user names. This gives reasons to believe that some bugs will be detected during compilation and fixed. There is still a chance that we will get the bug (and therefore we still specify the precondition), however we reduced the probability of having one. Besides, introducing such types has other benefits apart from enforcing preconditions. It makes our intentions more explicit, makes the code easier to read and understand, and helps avoid other kinds of bugs:

bool save(Name owner, FilePath path, BigText content = {});

better reflects your intentions than:

bool save(string owner, string path, string content = {});

It also allows for certain optimizations, and it prevents bugs like:

save(owner, content);    // bad argument order

function<bool(string, string)> binaryPredicate;
binaryPredicate = save;  // accidental signature match
sort(vec.begin(), vec.end(), comparator);

So far we have been considering preconditions that "inspect" only one argument. The above technique for introducing auxiliary types does not work well if we want to put some constraint on two or more arguments simultaneously:

double atan2(double y, double x);
// precondition: y != 0 || x != 0

Also, sometimes we have only one-object constraint but it is difficult to use a constrained type because, for instance, the object is *this. Consider an example where for some reason you cannot initialize your object fully in the constructor and you need a two-phase initialization. The typical usage of your class works like this:

Machine m;                    // 1st phase
Param p = computeParamFor(m);
m.inititialze(p);             // 2nd phase
m.run();

Typically, function run will have a precondition:

void Machine::run();
// precondition: this->isInitialized();

Technically, it is possible to introduce and use a constrained type, but it might confuse the users:

Machine m;
Param p = computeParamFor(m);
InitializedMachine im{m, p};
im.run();

That's it for this post. I still didn't manage to mention all things that I wanted to share with you. I guess I will need part III. This topic grew bigger than I anticipated.

One question for the end: what type do you think should function sqrt return? double or NonNegative?

About these ads
This entry was posted in programming and tagged . Bookmark the permalink.

9 Responses to Preconditions — Part II

  1. Cedric says:

    Excellent post.
    If you make the effort of creating a nonnegative type with an implicit conversion vrom nonnegative to double, then you’d better have sqrt return an nonnegative.
    There is no additional complexity, no additional cost, and it clarifies the function signature. You could for example pass the result of sqrt as an argument to log directly.

  2. Actually `sqrt` should return a `PlusOrMinus` ;)

  3. Thomas says:

    Actually ‘sqrt’ should return an imaginary number ;) That avoids the precondition.

  4. You probably can’t go wrong with making your functions safe by default; don’t admit any input that has an undefined result. The great thing about C++ is that interface-narrowing types like ConstrainedValue can be used to build up optimizations in the unusual case where the runtime check has a real performance impact. The tradgety of C++ is that there is no common practice between separate development organizations. Without a common style, we don’t get a rich vocabulary of interface-constraining types and patterns.

  5. Chip Salzenberg says:

    I think the Boost Constrained Type library is a good idea but binds at the wrong time. You can change the constraint on an object after assigning it. It seems to me that this should change the type of the constrained. It doesn’t work conveniently with lambdas, for the same reason.

    Modulo move efficiencies, I would like to use
    constrained make_constrained(T t, C c)
    where C may be an unnameable type because c is a lambda. Of course this could make class composition awkward, so maybe I’m mistaken.

    • Indeed, the library does allows changing constraints at runtime; and indeed, we wouldn’t want that. However, this looks like an option that you do not have to use. You have a choice either use a modifiable constraint, or embed a constraint in the type:

      struct isNonNegative
      {
        bool operator () (double x) const { return x >= 0; }
      };
      
      typedef constrained<double, isNonNegative> NonNegative;
      NonNegative x; // impossible to change the constraint
      

      Also, what the wrapper does when it finds the constraint to be violated is configurable in the library.

  6. Cedric says:

    Anyway, using constrained types does not solve the problem, but move it to cast operators.
    Let me clarify: say that sqrt only take positive_double as an argument. So far so good.

    Probably, in the course of a calculation, you’ll end up needing to cast from double to positive_double. Say for example:
    positive_double norm = sqrt( (x * x) + (y * y);
    x and y are double. You know that the result is positive but you need to cast to a positive_double in order to call sqrt.

    From that point,:
    * either you redefine all operators, and all functions to handle your new positive_double type (here, say pow2(double) -> positive_double, and operator+). That’s a huge job and it won’t be possible for every case anyway.
    * or you call a cast operator from double to positive_double
    Now my 2cents question: what should we have as preconditions for the operator, and what should be the behavior.

    And then the final question: for this exemple, was that worse the effort?

    • Using constrained types does not solve the problem — I am not sure what you mean by ‘solving the problem.’ If you mean to say that these constrained types are not a proper replacement for preconditions, then I fully agree with you. But we can consider a more modest ‘problem’: how can we use the available tools to get us as close as possible to contract enforcements such as preconditions.

      In your example with sqrt, I understand that the function is user-defined rather than the one from STD, and that x and y are just doubles, right? The way I imagine the constrained types would work, is that you have to use an explicit cast:

      void example(double x, double y)
      {
        auto norm = my::sqrt( NonNegative{x * x + y * y} );
      }
      

      The explicit cast is as though you were saying, "compiler, you cannot know if the result is non-negative, but I do know, and you should trust me." The cast is noisy, but should be fine with people who favor safety over compactness of the notation.

      You are right that doing all this type safety for every usage of doubles is impractical, if possible. And you successfully illustrate why such constrained types cannot be a replacement for proper preconditions.

      Nonetheless, constrained types do offer additional safety.

      • Cedric says:

        I would say it offers a convenient way to add safety, compared to preconditions as described in your paper. In other words, it gives a symbolic name (the type) to the constraint, and it is a very wise way of handling things , especially if you have several functions using the same constraints.

        To answer another remark, I agree also that changing constraint at run time as Boost constraint typer library does not seem to be a very good idea.

        By the way, for those who did not experiment, as far as I remember the ADA language includes a subtyping feature that I liked much, directly in the base language.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s