Class invariants

The primary motivation for defining a class in C++ is to reflect and maintain a class invariant. In this post we will see what class invariants are and how you deal with them. Class invariants are important part of C++, even though there is no “invariant” keyword in C++.

Contrast the following two class definitions:

struct Point
{
  int x;
  int y;
};
class Range
{
  int _min;
  int _max;

public:
  // ...
};

In C++ a struct is practically a class but with a different default member access. The first is an aggregate: it only allows two pieces of data to travel together. If it was not for the nice member names, we might have as well used std::pair<int, int> instead. The second is a bit more. Its members are not just two pieces of data. Here we need to pay more attention to the situation where _min is greater than _max. We will probably have one of these:

Range::Range(int mn, int mx) 
  : _min(mn), _max(mx)
{
  if (_min > _max)
    throw some_exception();
}
Range::Range(int mn, int mx) 
  // precondition: mn <= mx
  : _min(mn), _max(mx)
{
  assert (_min <= _max);
}
void Range::set(int mn, int mx) 
{
  if (mn > mx)
    throw some_exception();

  _min = mn;
  _max = mx;
}
void Range::set(int mn, int mx) 
  // precondition: mn <= mx
{
  assert (mn <= mx);
  _min = mn;
  _max = mx;
}

What these functions are doing is maintaining the class invariant. It can be even expressed with a C++ expression:

_min <= _max

Many developers design invariant-maintaining classes without ever needing to give a name to this activity. But when we give it a name, we can see more. For instance, we can form a criterion for when we need to define our own class.

We only define a new class proper when we need an invariant to maintain.

If we don’t need an invariant, a simple aggregate or a tuple will do. Of course, in C++ we like doing clever things, so you will surely find exceptions to this criterion. But I think it is a good starting point. Sometimes an invariant is difficult to spot, though. Sometimes it cannot be expressed with a C++ expression. For instance, for a class representing a session with a resource, such as access to a file, the class invariant may be that for as long as the object is alive — its initialization has finished but its destruction hasn’t started — it provides access to an open file.

By defining a class proper (not just an aggregate) we provide a guarantee to our users that we will maintain the invariant. Now it is our, quite difficult, job to implement this guarantee by carefully choosing the interface of our class, as well as its implementation. The tools that help us are:

  • class member access (private, protected, public),
  • contract-related features (preconditions, assertions),
  • the exception handling mechanism.

Class invariants, more formally, specify a condition — not necessarily representable via a C++ expression — that must hold immediately after a class object is constructed, and after every function belonging to its interface has finished — normally or via exception. The class design also needs to make sure that no-one else can compromise the invariant via other means than its interface. As a consequence, we get an additional guarantee that whenever an interface function, or destructor, from the class is invoked, the invariant is also satisfied.

I use a vague term “interface”, because apart from public member functions I mean other things, like protected members, public data members and friends.

One natural thing to do when deciding to maintain the invariant is to declare the non-static data members private. Recall our example class Range. If its members _min and _max were public, anyone could easily, even inadvertently, spoil our invariant without our control. We make them private to control their modification.

A class invariant is more than a postcondition upon every member function of the class. A function postcondition is expected to be satisfied unless the function reports failure (via an exception). A class invariant is expected to hold even if a function that modified the class object reports failure. In fact, throwing an exception may be a way to make sure that the class invariant is preserved. This is what we did Range::set:

void Range::set(int mn, int mx) 
{
  if (mn > mx)
    throw some_exception();

  _min = mn;
  _max = mx;
}

Why we need class invariants

The reason we need class invariants is because they help us think about the programs in more abstract terms. It is easier to program when you deal with a Range than a pair<int, int> or even struct{int min, max;}. You may have been taught that it is classes that we use to make our code more abstract, but classes are really only means of delivering the invariant. It is all really about guaranteeing the class invariants.

If you have made an effort to guarantee your class invariant, your users no longer have to worry about it. Suppose that you only used the aggregate version of a range in your program:

struct Rng
{
  int min, max;
};

int fun(Rng rng, int val)
{
  if (is_in_range(val, rng))
    return size(rng);

  return values[rng.max - rng.min + 1];
}

At every point of your program, in every function, you would now have to consider what happens, or should happen, when rng.min > rng.max. You would have to investigate if it happens, all the time. Your attention — if you care for correctness — would be distracted with all these questions.

But all these questions disappear once you delegate the responsibility of keeping track of this condition to a class managing its invariant. From now on, you are dealing with a mathematical range rather than the relationship between two types of type int.

Strong and weak invariants

So suppose you have created a class Socket that gives you a guarantee that a living object is always representing an exclusive access to the OS-managed socket. The class acquires access to the socket in the constructor and releases the access in the destructor. Now you probably also want to make your class movable by providing a move constructor. Implementing a move constructor is quite easy. You just have to invent a special “null-state” that you will put your object in when it is being moved from. But now you have compromised your class invariant. You can no longer say “a living object is always representing an access to the socket”. Now you have to say, “a living object either represents a null-socket or an access to the socket”. Thus, whoever gets a reference to the object of type Socket will have to ask themselves a question: is this a null-socket or a proper socket? We call this effect a weakened invariant. The invariant is not as strong as it could have been. We have now “lowered” the abstraction level. We no longer think in terms of sockets, we now start thinking in terms of something that can be one of two things: a socket or a not-a-socket. We now move some responsibility of checking the correctness of the state on our users. A weak invariant is still better than no invariant. And making Socket movable may be worth it. But we do lose something.

At this point someone might think, “now that we have a null-socket state, we can add a default constructor, and we know what it will do: it will create a null-socket”. I personally find it a bad idea, as it will cause even more null-sockets to appear in our program. Sometimes, however, different frameworks force you to provide a default constructor, and you have no choice.

The choice of how strong you want your class invariant to be is much up to you. The stronger the class invariant, the less conditions your users have to pay attention to, and you leave fewer opportunities for bugs.

There may be also a practical reason for not providing a class invariant. Designing a class is in fact a difficult an error-prone task. You only do it when you know the effort will pay off: when you, or your users, will benefit from the class invariant in many places. However, if you are only going to use a new thing in one of two places, creating a class is an overkill and an unnecessary risk.

Consider algorithm minmax_element that scans through an STL-like range of elements and returns the smallest as well as the biggest element in the range. It needs to return two things, so they have to be packed together. There is an obvious relation between the two data: the first cannot be greater than the second, so specifying a class with an invariant would be an option, but we have to draw the line there. The user will surely unpack these two pieces and use them separately or with their own interface. In this case it is better to go with something simpler. And this is what STL did. The non-range version returns a std::pair, whereas the range version returns an aggregate.

And that’s it for today. I hope that this post will stir some comments or discussion. If you think that I have taken it too far or conversely, that I should have gone further, I would be interested to know about it.

Posted in programming | Tagged , , , | 1 Comment

The double life of objects

Some common knowledge: the lifetime of the object starts when its initialization is complete. Based on this we can get some further expectations: an object becomes const only after its initialization is complete. But this lifetime property of objects becomes blurred when copy elision comes into play. When a copy is elided, we have a situation where we would otherwise have two objects, each initialized separately, but now they are blended into one, its life time spanning across the caller and the calle, which has a number of surprising effects, receiving two initializations being one of them. Continue reading

Posted in programming | Tagged , | 2 Comments

The obvious final step

The title may be misleading, as I had to invent a new short term for the pattern that occurs in the code once in a while. Example first:

// write an element of an XML file

xml.begin_element("port");
xml.attribute("name", name);
xml.attribute("location", loc);
xml.end_element("port");  // <-- the obvious final step

During the construction of an XML file when you write an element, it is obvious that the last thing that you do is to write the closing tag. By obvious we mean:

  • Writing it down adds no new information (there is no other possible final instruction for this task).
  • It would be a bug if this instruction wasn’t there.

For this reason, people invent tools that allow expressing this type of business logic in a way that does not require of the programmer to write down the obvious final step. An oft used technique is to employ destructors. However, this is very difficult to get right and not implementable in general. In this post we will see why, and we will also see an alternative.

Continue reading

Posted in programming | Tagged , , , | 13 Comments

A moved-from optional

This post is in response to claims, that I have heard number of times, that the semantics of optional’s move operations are wrong:

template <typename T>
void test(T v)
{
  optional<T> o = v;
  assert (o);     // o contains a value
  optional<T> p = std::move(o);
  assert (o);     // o still contains a value!
}

Some people are surprised that the second assert holds: the unfulfilled expectation is that moving from an optional should “eject” the value and leave the object in the state of not containing any value. While such semantics could be made to work, there are good reasons for preferring the present behavior. Continue reading

Posted in programming | Tagged , , , | 2 Comments

Local Time

In this post we will see how to solve the task from the previous post, but display the time point in local time instead of system time. The solution is simple and takes only two additional lines (you can scroll down towards the end), but I want to take this opportunity to offer some reflections about the concepts of time zones and daylight saving time. Continue reading

Posted in programming | Tagged , | 6 Comments

Using std::chrono

The goal of this post is to show how the <chrono> library can be used to solve a practical but not that obvious problem. There is a lot of good material in the Internet where one can learn <chrono> from, for instance a series of lectures by Howard Hinnant — the author of the library:

And you probably know <chrono> already. But knowing the library may still not be enough to appreciate its power, scope and design. So the goal here is to start with the problem, and see which parts of the library need to be employed to solve it.

The task is this. We want to display the current system time, but rounded (down) to 20 minutes, so that 14:55 is displayed as 14:40.

Continue reading

Posted in programming | Tagged , , | 12 Comments

Concepts — case studies

This post has been inspired by the readers’ questions about using concepts to solve real problems. We will have a look at two such problems and see if, and how, concepts can help.

Case Study 1

My concept has two functions: one produces a value, and the other one later consumes this value:

template <typename T>
void test(T& encoder, span<byte> buffer)
{
  auto state = encoder.init();
  // do something with state
  encoder.encode(state, buffer);
}

How to reflect in a concept that the result of init() is a type that can be later passed to encode()?

A simple way to do it would be to put a combined expression as one of the concept requirements:

template <typename E>
concept Encoder = requires(E& e, span<byte> b)
{
  e.encode(e.init(), b);
};

But this has limitations. First, if our concept has two “init” functions init_1() and init_2() producing the value of the same type, and two “encode” functions encode_1() and encode_2() consuming the value of the same type, we will need a lot of combined expressions to reflect all the combinations.

Second, if function encode() modifies the state of the object in-place, and its argument type is an lvalue reference to non-const, such combined expression is actually ill-formed.

A solution to this problem employed in the STL is to additionally require in the concept that a constrained type provides a nested type alias that names the type in question: even if no algorithm actually uses this type.

template <typename E>
concept Encoder = requires(E& e, 
                           typename E::state_t& s,
                           span<byte> b)
{
  { e.init() } -> std::convertible_to<typename E::state_t>;
  e.encode(s, b);
};

Note that I have expressed the nested type requirement in an unusual way. It is not a separate declaration ending with a semicolon: it is in the “argument list” of the requires-expression. This makes the definition shorter, but this was not the reason. I needed to introduce the name s so that I can use it in the description of member function encode(). Otherwise, I would have to resort to solutions that look quite complicated:

template <typename E>
concept Encoder = requires(E& e)
{
  typename E::state_t;
  { e.init() } -> std::convertible_to<typename E::state_t>;
}
&& requires(E& e, 
            typename E::state_t& s,
            span<byte> b)
{
   e.encode(s, b);
};

This perfectly illustrates that requires-expression is just an expression, so you can combine it with other expressions. Token && is a conjunction operator — rather than a logical-or — and has the special compile-time short-circuiting properties. Another alternative:

template <typename E>
concept Encoder = requires(E& e)
{
  typename E::state_t;
  { e.init() } -> std::convertible_to<typename E::state_t>;

  requires requires(typename E::state_t& s,
                    span<byte> b)
  {
    e.encode(s, b);
  };
};

This is a nested requirement and uses the peculiar requires requires syntax. Another alternative uses std::declval:

template <typename E>
concept Encoder = requires(E& e, span<byte> b)
{
  typename E::state_t;
  { e.init() } -> std::convertible_to<typename E::state_t>;
  e.encode(std::declval<typename E::state_t&>(), b);
};

Which illustrates that arguments in requires-expression is just a syntactic sugar over what you can already do with std::declval. Similarly, we could substitute std::declval<E&>() for e in all the above examples, but using an argument name makes the declarations easier to read.

Anyway, in all these examples, including the first one, we have to repeat the typename E::state_t. This is a known inconvenience in C++20 concepts, and solutions are explored to resolve this problem in the future version of C++.

Case Study 2

In the above example, we were using a buffer of type std::span<std::byte>. Let’s change our concept a bit, and say that a type models concept Encoder if it has function encode that works with any type modelling concept Buffer. How do we do that? We could write something like this:

template <typename E, typename B>
concept Encoder = 
  Buffer<B> &&
  requires(E& e, typename E::state_t& s, B b)
{
  { e.init() } -> std::convertible_to<typename E::state_t>;
  e.encode(s, b);
};

This would compile, and could be used to check if a given encoder works with a given buffer, but this is not what we wanted. We want to check if an encoder can encode any type modelling concept Buffer.

Unfortunately, there is no way to do it. And this is not an omission in the concepts syntax. While this need occurs from time to time when we are doing template meta-programming, it is not in scope of what we call Generic Programming. In order to test if a given type (or set of types) satisfies a concept, we have to provide concrete types, and see if all the necessary declarations are present. Checking if a type satisfies a concept is similar to instantiating a template (although no template instantiation actually takes place).
If we have an encoder class:

struct MyEncoder
{
  using state_t = std::size_t;
  state_t init();
  void encode(state_t& s, span<byte> buf);
};

We can pass it to our concept, and it can test that all expressions are there, by substituting MyEncode for the template parameter E in concept Encoder:

static_assert(Encoder<MyEncoder>);

And once we do it, all types are known and concrete: there is no template anymore. We could test if our encoder works with one buffer type, or another buffer type, but there is no way a compiler can check — using template parameter substitution — if the encoder works with all possible types modelling concept Buffer that were ever defined or are yet to be defined. This would require an infinite number of tests. This is somewhat similar to virtual function templates: we do not have them, because it would require virtual tables of infinite sizes: one slot per any combination of function arguments’ types.


And that’s it for today. For the end, if you like to learn from videos, the organizers of C++Now offered me an opportunity to give a talk about practical aspects of concepts, based on the posts from this blog. You can watch it here.

Posted in programming | Tagged , , | 3 Comments

Decent concepts

Last year I published two posts on writing concepts:

  1. Concept Archetypes,
  2. Semantic requirements in concepts.

Having had some time to reflect upon these posts, I now realize that the model presented in them is not complete. In this post I want to give a more coherent view of concepts.

In this post we will see that:

  1. Duck-typing property cannot be relied upon for the majority of concepts;
  2. “Additional requirements” when specializing algorithms have to be reflected through concept refinement;

Continue reading

Posted in programming | Tagged , , , | 1 Comment

Contracts, Preconditions & Invariants

In this post we will see what a contract is, how preconditions and invariants can be derived from the contract, and how this process can help detect bugs. Two points that I will be stressing in this post are: (1) preconditions and invariants are not “contracts” and (2) only a subset of contract-related bugs can be detected through preconditions and invariants. Continue reading

Posted in programming | Tagged , , | 3 Comments