Null-state — part I

We are fairly used to macro NULL in C++03 and now we will be getting used to the new keyword nullptr in C++11. The goal of this post is to introduce some philosophy behind using null-pointers, generalize the concept of a null-pointer to other types, show how the concept is used in C++ standard library, show the common practice for using special member functions to implement the null-state. Null-state is like a design pattern: being able to identify that someone is using it makes the analysis of code easier, eases the communication between programmers, and makes programs easier to maintain.

A null-pointer

A pointer is a type that represents an address in memory. We can use such address to access an object at the designated location. Note that this equally applies to built-in ‘raw’ pointers inherited from C and to different ‘smart’ pointers like std::unique_ptr. This is, in short, the nature of the pointer. But there are exceptions: sometimes a pointer just does not point to any address. This situation is represented by null-pointer: a special instance of pointer type. Note the difference: a pointer is a type, a null-pointer is a value. a null-pointer is not really a pointer: it doesn’t represent an address; you cannot call ptr->mem on it, it would be a bug resulting in undefined behavior. There are only a couple of operations that you can safely perform on a pointer that stores null-pointer value. The most interesting ones:

  1. check if it is a null-pointer,
  2. assign it a valid address (or even null-pointer value, again),
  3. destroy it.

The first one, being able to detect the special case, will not really be that essential when we later generalize the concept of ‘nullness’, but it is still very useful and essential for pointers: this is how you can check at run-time if you can use the full interface of pointers (operators -> and *) or only the limited subset. This is why this check is typically used in if-statements.

Assigning a valid address is important to be able to implement a two-phase initialization. At the beginning of the scope we may not have a desired address just yet, but we may already need to instantiate an object of pointer type (for storing an address):

int* find_biggest( int(&array)[10] )        // ref to 10-element array
    int* biggest = nullptr;
    for (int& val : array) {                // foreach loop
        if (!biggest || *biggest < val) {
            biggest = std::addressof(val);  // &val
    assert (biggest);
    return biggest;

Here, although we are sure we will find a valid address (because we will receive a 10-element array), we have to start with a null-pointer. This is because we cannot make comparison *biggest < val in the first iteration of the loop. We unconditionally assign the address of the first element as biggest, and in order to do that we use the check for null-pointer. Then we assign a valid address to an object representing a not-an-address. Henceforth, we will be only replacing one valid address with another valid address.

Finally, being able to destroy a null-pointer may be silly to even mention — it is obvious that we can just create a null pointer and then leave the scope, especially that raw pointers do not have a destructor (or, they have trivial destructors). But this is not the case for smart pointers: std::unique_ptr, which is supposed to release the heap memory under the address it stores, must be able to correctly run the destructor even if what it stores is not-an-address. It is even more important when we come to generalizing the concept of null-state to non-pointer types.

Despite this obvious difference between an address and a not-an-address, to the compiler, null-pointer looks the same as any other pointer. Being a null-pointer is a run-time property and compiler cannot detect it. We have to be very cautious when dealing with null-pointers, because dereferencing it is allowed by the compiler, but might trigger disastrous result at run-time.

Generalized null-state

The same concept of a special null-state often occurs in other types, even though the name ‘null-state’ or ‘null’, or even ‘zombie’ is never mentioned. Let’s have a look at some examples from C++ Standard Library.

void read( std::string & s ) 
    std::ifstream f;      // null state
    f.exceptions( std::ios::badbit | std::ios::eofbit | std::ios::failbit );

    if (f.is_open()) {    // check for the null-state
        f >> s;
    }"file1.txt");  // assign a valid, non-null state
    assert (f.is_open());
    f >> s;
    f.close();            // assign null state again"file2.txt");  // assign a a different non-null state
    f >> s;

    assert (!f.is_open());
    f >> s;               // programmer (run-time) error!

The read instruction in the last line is a programmer error. This error will not be detected by the type system. Programmers must just be aware of such potential issue and be cautious. If this code is executed, it likely triggers an UB, but even if not, it does something different than you expect it to do, which is a serious logic error. In our case, because we requested f to throw exceptions upon any problem, the instruction will trigger a throw; but that should not make us any more comfortable. There is no good way of handling such an exception caused by logic error.

You may be confused by the word ‘assignment’ in the comments, because there are no assignment operators involved. However, conceptually we are assigning new values to the file object; and in fact, in C++11, where fstreams are moveable, we can rewrite this code using move assignment operators:

void read( std::string & s ) 
    std::ifstream f;
    f.exceptions( std::ios::badbit | std::ios::eofbit | std::ios::failbit );

    if (f.is_open()) {
        f >> s;

    f = std::ifstream{"file1.txt"}; 
    assert (f.is_open());
    f >> s;
    f = {};             // same as: f = std::ifstream{};

    f = std::ifstream{"file2.txt"};
    f >> s;
    f = {};

    assert (!f.is_open());
    f >> s;            // programmer error!

An interesting thing to observe here is that the interface for checking for null-state is usually not really needed; in our example, from the context we know at each point if f contains a null-state or not (assuming that opening of the files always succeeds), and the checks are really redundant.

Also note the funny notation f = {}. It is a very useful shorthand notation in C++11 which says: default-construct a temporary object of type decltype(f) and move-assign it to f. Or, in other words: reset f to default-created state. This has the chance to become a new C++ idiom for resetting objects.

For a similar example consider this usage of std::thread:

void job();

void run_job()
    std::thread t;                      // not-a-thread (null-state)
    std::thread u;                      // another one
    assert (!t.joinable());             // test for null-state
    assert (t.get_id() == u.get_id());  // null-states compare equal

    t = std::thread{ &job };            // assign a valid thread
    assert (t.joinable());
    assert (t.get_id() != u.get_id());  // null-state != valid thread

    u = std::move(t);                   // u is a thread, t becomes null
    assert (u.joinable());
    assert (!t.joinable()); 
    assert (t.get_id() != u.get_id());

    u.join();                           // put back to null-state
    assert (!u.joinable());
    assert (t.get_id() == u.get_id());  // two null-states

    t.join();                           // ERROR: join not allowed on null-object

Here, the interface is somewhat similar. The default constructor creates an object of type thread that does not represent a valid thread. Function joinable tests if the object represents a valid thread. The name for testing for a null-state is not standard; but here again, we do not need to check for null-state, because it is us who decide when objects t and u become null, and when they become valid threads.

For the third example, consider any STL iterator:

std::vector<int>::iterator it;

What is the value of the default-constructed vector iterator? The C++ standard calls it a singular value. It does not point to any vector element. It does not point to the one-past-the-last element. We cannot even check if it is singular, because we cannot compare it to any vec.end() and there is no is_singular function. So what can we do with it? We can assign it a non-singular (normal) value from another iterator:

it = vec.begin();

Singular iterator is an another example of a null-state; a bit different than the previous ones because we cannot query for ‘nullness,’ yet it is still a null-state. This also shows that being able to check for null-state is not critical.

Properties of null-state

In general, there are only two useful operations that you can successfully perform on such object: assign it a normal (non-null) value and safely destroy it. I say ‘in general’ because for particular types you can perform some more operations (not necessarily that useful): check for null-value, assign another null-value, use the object to assign the null-value to another one. But we will focus only on the first two.

Note that this applies also to raw pointers. A default-created raw pointer does not store a null-pointer value. It is safe to assume that it may contain a random address. Yet, it still holds that you can safely assign it a valid address, and you can safely destroy it: a raw pointer has trivial destructor.

In order to explain what to ‘safely destroy’ means we will show the example of how we can destroy an object unsafely.

template <typename T>
class dumb_ptr
    T * ptr_;
    dumb_ptr() = default;
    ~dumb_ptr() { delete ptr_; }
    // other pointer interface

Default constructor uses the default implementation which calls the default constructor of the raw pointer, which does nothing: we end up with the raw pointer that contains a random address. But if our dumb pointer goes out of scope in this state, operator delete is called with the random address as argument. If we are unlucky, the randomly stored address happens to be a valid address in the free storage (heap) and we delete someone else’s object!

Note that our pointer is dumber not only than any standard smart pointer, but it is also dumber than a raw pointer: raw pointer does not release the memory from the stored address but at least it is safely destroyed when having a null-state. Our design error can be fixed in two ways: either we do not call delete in destructor and make our pointer similar to raw pointer, or we explicitly initialize ptr_ to nullptr:

template <typename T>
class smart_ptr
    T * ptr_ = nullptr;  // legal in C++11
    smart_ptr() = default;
    ~smart_ptr() { delete ptr_; }
    // other pointer interface

Constructing an object with a null-value is usually cheap and does not involve resource acquisition. In general, such construction can be implemented by introducing a flag that indicates whether an object has a null-value and setting it accordingly, leaving all other members alone. The flag needs to be checked in destructor before launching non-trivial clean-up. Such null-value construction can usually be declared as constexpr which makes it possible to null-initialize objects at compile-time. Because no resources are involved, such null-constructor can be also declared as noexcept which may enable its usage in contexts which require that no exceptions are thrown (exception handling functions, destructors, ensuring commit-or-rollback guarantees).

If we create a null-object, it is most likely because we want to reset it to some meaningful value at some later time. Therefore, types capable of storing the null-state will provide a way to reset the value in their interface.

Standard interface

As we have seen in the above cases, default constructor is used to create an object storing a null-value. And this brings us to the idiom I wanted to show. Often (but not always) a default constructor is used to create an object storing a null-value. This is a natural choice because typically when we want to create a meaningful value we must supply some arguments to the constructor: in case of int it would be a literal, in case of file handler it would be the file name, in case of a lock it would be a mutex. Supplying no parameters usually means that no meaningful value can be created, so the null-value is the natural choice. Class designers in such cases have in fact a second option: not to provide default constructor at all. This clearly states that we want no null-values of this type; this is useful for classes modeling a scoped resource ownership (or, guards).

Thus, if you want to enable creation of null-value objects, you had better use default constructor for that purpose. However note that it does not mean that if you have a default constructor it constructs null-values. There is a class of types that have a natural default state that is a valid value: variable-size containers. In these cases default construction creates an empty container.

On the other hand, it is typically a bad practice to use default constructor to create an arbitrary valid value. This is often the case in date/time libraries where default constructor creates today’s date and current time. I call it ‘bad’ because:

  1. Code that is using it looses referential transparency: each time the constructor is called it renders a different value
  2. Default constructor — that is often used only for the object to wait until it is assigned a proper value — is expensive: it makes system calls, uses globals, may deadlock, or throw exceptions.

Therefore a decent date/time library should offer a separate function for accessing today’s date/current time, and either disable the default constructor or have it create a null-date/time. This is what Boost.Date_Time does.

Next important operation on null-able types is re-setting the value. You could use a member function reset, but as we have seen in the above examples the natural choice is to use an assignment operator. Even if your types are non-copyable (because they model a unique resource ownership), you can still use a move-assignment operator, as we showed in case of std::ifstream. The semantics of move assignment are conceptually the same as if we called a destructor for the object and then constructed it in the same place again:

void reset( MyType & x )
    x = MyType{2};                      // same as:

    x.~MyType();                        // force destructor
    new (std::addressof(x)) MyType{2};  // force constructor

However the two are only equivalent ‘conceptually.’ The latter may be less efficient and is not exception safe.

As a special case of this assignment lets recall the already mentioned idiom x = {} which sets the value of x to a null-state.


There is more to be told about null-state concept, but I will leave it to my next post.

This entry was posted in programming and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.