Efficient optional values

In this post we will see a library for storing nullable/optional values that can in certain situations replace and outperform Boost.Optional.

What do you use Boost.Optional for? In my experience, the answer was typically one of the following:

  1. Because my type T has no null state (like -1) that would indicate that I have no proper value.
  2. Because, I need to perform a two-phase initialization (I cannot initialize my T yet, but I already need it alive).
  3. Because I need an interface that would indicate to the type system that my value may not be there and that its potential absence should be checked by the users.

In this post we will focus exclusively on the third motivation. We sometimes want to change the interface of T, and not to treat all its values equally, in order to avoid bugs, as described in this post. Sometimes we want to change the type in order to affect the overload resolution, and select a desired overload. I have often seen it used in XML processing frameworks: if an XML attribute is represented as int in the model, it is simply added when generating the XML attribute:

template <typename T>
void output_attribute(output& out, const T& attr)
{
  out.print_attribute(attr);
}

But when the attribute is of type optional<int>, we can add a clever overload that can skip the outputting of an optional attribute:

template <typename T>
void output_attribute(output& out, const optional<T>& attr)
{
  if (attr)
    output_attribute(*attr);
}

However, this application of boost::optional comes with a cost. We now need to store an additional bool that indicates if we have a valid value. Conceptually, boost::optional<T> can be represented as:

template <typename T>
class optional
{
  bool _initialized;
  std::aligned_storage_t<sizeof(t), alignof(T)> _storage;
};

The consequences of such memory layout have been described in detail here. In short, for scalar types, wrapping them into boost::optional doubles the memory usage.

Because of this problem, in performance-critical parts of the program, it may be more appropriate to resort to using “magic values”, like -1 in type int to represent absent values. With this you will bring back the risk of bugs and misunderstandings into the project.

In your program, when you know that -1 can never be a valid value of int, you can combine both type safety and run-time efficiency by defining a custom type:

class opt_int
{
  int _value;

public:
  constexpr opt_int() : _value{-1} {}
  constexpr opt_int(int i) : _value{i} {}

  constexpr bool has_value() const { return _value != -1; }
  constexpr int value() const { return _value; }
  // precondition: this->has_value()
};

Of course, if you had to write such a class for every optional type T you have, or for any magic value applicable in a given context, this burden might outweigh the gain in type safety. Therefore, our solution needs to be as easy to use as possible. We would need a class template, similar to boost::optional:

compact_optional<int> oi;

However, it is not clear how we tell it that it should use -1 to represent a missing value?

One option would be to introduce a type trait that would associate a type with its “special” value. But it is not obvious which value should be selected, and in different places, even within the same program, we may want to use different values for an empty int.

Provide this value as the second template parameter?

using opt_count = 
  compact_optional<int, -1>;

using opt_int = 
  compact_optional<int, std::numeric_limits<int>::min()>;

This seems to work, and has a number of benefits. First, we do not have to store the value anywhere, it is encoded in the type. Second instantiations with different special values are different types and cannot inter-operate with one another: opt_count and opt_int represent something different, and whoever wanted to use one in place of the other, has likely made a mistake — now detected at compile-time.

But while it will work for scalar integer types, it will not work for user-defined types like std::string or type code (described in this post).

Instead, what we can do as a generic solution, as described by Vitali Lovich here, is to provide a policy parameter for constructing and detecting the special value:

using opt_count = compact_optional<int, my_int_policy>;

What we expect of a policy class is that it defines two static member functions:

struct my_int_policy
{
  static int empty_value() { return -1; }
  static bool is_empty_value(const int& v) { return v == -1; }
};

The first function creates the special value, while the second checks the current value for being special. While in our case the second function could be implemented in terms of the first, in general case we can provide a faster implementation for the second one. Consider storing an optional string, where empty string is a legitimate value, but we can use "\0\0" for representing the special value (we do not expect null characters inside strings). The policy for handling this can look like this:

struct my_string_policy
{
  static std::string empty_value() {
    return std::string("\0\0", 2);
  } // may allocate

  static bool is_empty_value(const std::string& v) {
    return v.compare(0, v.npos, "\0\0", 2) == 0;
  } // no alloc
};

Class template compact_optional

And thus, we have outlined a design for a generic library capable of representing optional T at the expense of sacrificing one value of T, but with no spacial overhead. You can find the full implementation thereof here. It is supposed to be a replacement for “hidden optional values”, where one magic value is stored in int; as well as for boost::optional<int>, where space savings are desired.

Let me give you a short overview of this tool.

1. It has an interface different than that of boost::optional. This is so for two reasons. First, its semantics differ from boost::optional, so we want to avoid the confusion. Second, this library has strong focus on being unambiguous and explicit at the expense of scarifying some convenience: this results in different trade-offs in the interface. As in the examples above, you check if the value is present using member function has_value() and read the value using const member function value():

opt_int oi;           // no value
opt_int oj (2);       // with initial value 

if (oi.has_value())   // test for having value
  int i = oi.value(); // reading value

oi = opt_int(3);      // set new value
oj = opt_int();       // set to no-value

This, at the expense of certain flexibility, avoids two gotchas of boost::optional. With the latter, many programmers incorrectly assume that the following test:

optional<bool> ob = true;
if (ob) ...

checks if we have bool with value true. And also, because there is no implicit conversion from T, we avoid this gotcha. Second, there is no way to get the write access to the contained value. This is important, because unlike in optional, in our type compact_optional the state of having no value is part of the contained value. boost::optional guarantees the following:

if (oi) {      // contains value
  *oi = some_value();
  assert (oi); // still contains value
}

compact_optional could not guarantee that.

2. Because from the no-value policy argument we can deduce what T is being stored, there is no need to provide the first template argument: you only need to provide the policy:

using opt_int = compact_optional<evp_int<int, -1>>;

Here, evp_int is a policy template: we instantiate it by providing the T and its value that would represent the no-value state. It will work for some integer types. For other types you will have to define your own policy.

3. In case you want to use boost::optional or std::experimental::optional (with their other useful features) but with the interface of compact_optional a special policy is provided:

using opt_bool_policy = evp_optional<boost::optional<bool>>;
using opt_bool = compact_optional<opt_bool_policy>

So, in the general case, the type stored internally, can be other than T, provided that it can store no less states than T. In a similar way, we provide a policy that can store a bool in the capacity of the sizeof of 1.

4. No relational ops are provided. If you really need to compare this types, you need to provide your own, and decide how you want to handle the no-value state.

5. Similarly to what we did for class template code in the other post, we provide the second parameter that serves only to discriminate two otherwise identical types:

using opt_count = 
  compact_optional<evp_int<int, -1>, class cnt_tag>;

using opt_num = 
  compact_optional<evp_int<int, -1>, class num_tag>;

Both, opt_count and opt_num are ints where -1 indicates a no-value state, but they are different types, and using one in place of the other is a type-system error.

Comparison with Boost.Optional

So far, I have been advertizing compact_optional, so one might get an impression that it is superior to boost::optional (or std::experimental::optional). But this is not the case. The former offers an advantage only in certain situations. Let’s go over their differences.

1. boost::optional<T> is generic: it will work practically with any T, to the extent that it is even possible to use type optional<optional<T>>. With compact_optional you have to make the choice case-by-case how you want to represent the no-value state. Having a nested compact_optional is technically possible, but it requires sparing two values of T.

2. boost::optional<T> gives you a useful guarantee that if you initialize it to a no-value state, no object of type T is created. This is useful for run-time performance reasons and allows a two phase initialization. In contrast, compact_optional upon construction, immediately constructs a T.

3. Some type T may not have a ‘spare’ value to indicate the empty state. In such case, compact_optional cannot help you. In contrast, boost::optional<T> will work just fine: the additional empty state is stored separately. In a way, boost::optional<T> can be thought as boost::variant<boost::none_t, T>.

4. boost::optional<T> is almost a container, capable of storing 0 or 1 elements. When it stores one element, you can alter its value. It is impossible in compact_optional: its value and the “container’s size” are one thing. But in exchange, the latter, because it only provides a non-mutable access to the contained value, it can build the return value on the fly, similarly to the proxy reference in std::vector<bool>, but because we only provide a non-mutable reference, it is much simpler and safer. This is why we can provide a compact_optional for type bool (which has no spare value) with the sizeof of a single char.

5. compact_optional has a minimalistic interface: this is also to avoid any surprises. As a cost it is less convenient. Unlike boost::optional, it does not have operator==: you have to provide your own comparator and decide yourself how you want to compare the no-value states.

In general, it is expected that in the first pass of the implementation of your program, you will use boost::optional<T> as a simple ready-to-use solution. Later, if you determine that boost::optional kills your performance, you can think of replacing it with compact_optional and how you want to represent the no-value state.

Acknowledgements

Vitali Lovich suggested to use a template parameter for representing the no-value policy (see here). This really improved the design of the library. Vitali has implemented a variation of std::experimental::optional that implements a no-value policy design; it is here.

Advertisement
This entry was posted in programming and tagged , , , , . Bookmark the permalink.

9 Responses to Efficient optional values

  1. In the other direction, towards simplicity at the possible cost of efficiency, one can use a `std::vector` as value carrier for a very simple Optional class. That avoids dependency on Boost or alternatively a complicated home-brewed implementation, while still reaping the benefits of exception-free but still guaranteed correct normal case code (only exception when one forgets to check, as is proper). Oh, while I’m commenting, let me just give credit for the general concept of possible-value-class back to Barton and Nackman, namely the Fallible class in their book Scientific and Engineering C++.

  2. ciechowoj says:

    Hi.
    I was reading your implementation of std::optional. What is the reason for having two base classes: `optional_base` and `constexpr_optional_base` ?

    • Thanks for this question. This is to implement the requirement of optional’s destructor:

      Effects: If is_trivially_destructible_v<T> != true and *this contains a value, calls val->T::~T().

      Remarks: If is_trivially_destructible_v<T> == true then this destructor shall be a trivial destructor.

      Look at destructor in constexpr_optional_base, it is trivial.

  3. Francisco Paisana says:

    What I find scarier about the use of std::optional is this:
    > struct opt_int {std::optional value;}; –> size==8
    > struct opt_int2 {int v;bool v_defined;}; –> size==8
    > struct opt_int3 {std::optional value; std::optional value2;}; –> size==16
    > struct opt_int4 {int v;int v2;bool v_defined;bool v2_defined}; –> size==12
    So the more std::optionals I use per struct, the less compact my struct will become. std::optional is unusable for me because of this.

    • Kilian says:

      The reason, opt_int2 has a size of 8 is because of the padding bytes, the compiler inserts at the end. When you change the members within opt_int4 to {int v; bool v_defined; int v2; bool v2_defined; }; then you end up with size==16, too.

  4. Chinmay says:

    Hi
    Could you explain the reason for this : “But while it will work for scalar integer types, it will not work for user-defined types like std::string or type code (described in this post).”?

    • Sure. The quoted remark applies to the following example:

      compact_optional<int, -1>;
      

      We pass -1 as (non-type) template parameter. This is in the context of C++11. Passing values as template parameters is restricted to scalar types. You cannot pass a user-defined type like:

      compact_optional<std::string, "cat"s>; // compiler error!
      

      C++20 allows values of some restricted user defined types as template arguments, but the above example with std::string would still not work.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.