Rvalues redefined

In this post we will explore what I consider the most significant language change in C++17. I call it the most significant because it changes the way you design your resource-managing types and how you think about initialization. It is often called “guaranteed copy elision”, but I will not use that name (except for this single time) because it does not reflect what this feature is. C++ has completely changed the meaning of rvalue (actually, prvalue).

In C++98 the meaning of the following definition

X f()
{
  return X(0);
}

is that we are creating, inside the function, a temporary object of type X. Next, we use this object, to copy-initialize another temporary object that will outlive the function. Once the function has finished, the remaining temporary object can be used to copy-initialize the destination object. The compiler is allowed to elide the copying and behave as if the three mentioned objects were actually one and the same. But conceptually the temporaries and the copying are there, as we can observe by declaring the copy constructor as private.

C++11’s move constructor changes the game here, because you can have something enough copy-like to be used in transferring guts from a temporary to another object, but at the same time it does not have to clone the state of the object: it alters the source temporary so that it is in a no-resource state. But this solution comes with costs.

First, we are still dealing with temporaries and while moves can be elided, conceptually they are still there, and you can see it when the move constructor is declared as deleted: returning by value will not compile.

Second, while moving is often faster than copying, it still takes time to form the no-resource state in the source object, and sometimes the move cannot be elided.

Finally, moving requires the existence of the moved-from state (or the no-resource state), which weakens the class invariants, as described in this post. An object of a type without move semantics can always represent a session with acquired resource, an object of type with move semantics either represents a session with an acquired resource or a no-session state, and we may have to check which one it is.

To some extent we can work around it with hacks. For instance, in C++11 there is a way to sort of return by value a mutex from a function. std::mutex is a non-movable type — the most relevant part in a mutex is its address, so we cannot change it with a potential move. But we can do this:

std::mutex make_mutex()
{
  return {};
}

std::mutex&& m = make_mutex();

In line 3 we do not have a return with an object, but a return with initializer. Syntax {} does not designate a temporary. It only means how the temporary object outside the function will be initialized. Then, in line 6, we do not initialize an object, but an rvalue reference: it can bind to a temporary, but is itself an lvalue. The reference also extends the lifetime of the temporary, so it is almost as if we were declaring an object.

But it is still a hack, with its limitations.

C++17 extends this to an elegant solution. Now, the definition as the one from the initial example:

X f()
{
  return X(0);
}

has a different meaning: an object will be created, with 0 as argument, but it is not yet clear what object. The function does not return a temporary. It does not create a temporary inside. It simply returns a “recipe” telling how the final object should be constructed. And if we call it like this:

X x = f();

It will be object x that will be created using this recipe. It is the only object of type X that will be created. It is equivalent to calling:

X x(0);

There are no temporaries involved. The type does not have to be movable. similarly, we can return a mutex like this:

std::mutex make_mutex()
{
  return std::mutex{};
}

std::mutex m = make_mutex();

Thus, we can return by value objects that are not movable. Or conversely, we can make our objects non-movable, and still retain the possibility to return them by value in some cases. Why would we want to do it? To make the invariants of our resource-managing classes stronger. Going back to the example from the other post, if a type is movable, it will look more-less like this:

#include <sys/socket.h> // Linux header
#include <unistd.h>     // Linux header
#include <stdexcept>
 
class Socket
{
  int socket_id;
 
public:
  explicit Socket()
    : socket_id{ socket(AF_INET, SOCK_STREAM, 0) }
  {
    if (socket_id < 0)
      throw std::runtime_error{translate(socket_id)};
  }
 
  Socket(Socket&& r) noexcept
    : socket_id{ std::exchange(r.socket_id, -1) }
  {}
 
  bool is_valid() const { return socket_id != -1; }
 
  // class invariant: !is_valid() || id() >= 0
   
  ~Socket()
  {
    if (is_valid())
      close(socket_id);
  }
 
  int id() const { return socket_id; }
  // precondition:  is_valid()
  // postcondition: return >= 0
 
  Socket(Socket const&) = delete;
};

In move constructor (line 11), since we are stealing the resource we have to give something in exchange so that the object knows it does no longer represent a session with a resource: we set value -1. Now objects of type Socket may or may not represent a session, so we have to add an observer function (line 21) that will tell which state we are in. The invariant (line 23) is weak: object’s life time is not identical with the duration of the session. Now, all the functions need to take into consideration what the object should do if it does not represent a session. We can see this in lines 27 and 32. In the destructor we have an if-statement. In function id() we have a precondition: the function will trust us that we will never call it on a no-session object, but there is a risk that we might.

In contrast, if we drop the support for moving, the class design is simpler and less bug-prone:

class Socket
{
  int socket_id;
 
public:
  explicit Socket()
    : socket_id{ socket(AF_INET, SOCK_STREAM, 0) }
  {
    if (socket_id < 0)
      throw std::runtime_error{translate(socket_id)};
  }
 
  Socket(Socket&& r) = delete;
 
  // class invariant: id() >= 0
   
  ~Socket()
  {
    close(socket_id);
  }
 
  int id() const { return socket_id; }
  // postcondition: return >= 0
 
};

No move constructor: no way to get value -1. The invariant is strong: if you have access to the object, the session with the socket is in progress; always. No need to check it in destructor, no precondition on function id(), and you just cannot call it on object not bound to a session. Hardly anyone designed their resource-managing classes like this, because until C++17 they could not be returned from functions by value. Now we can do it!

This does not make non-movable types work for every factory function, though. We can initialize and return our Socket instance like this:

Socket make_socket()
{
  return Socket{};
}

But we cannot do this:

Socket make_socket()
{
  Socket s {};
  prepare_socket(s);
  return s;
}

Because it requires a prvalue. A prvalue, informally, is an rvalue in C++03 sense: usually either a literal, or a call to function returning by value, or type name followed by parentheses or braces with arguments: a recipe specifying how some future object will be initialized. A const recipe is same as a non-const recipe, therefore initializing a const object form a non-const prvalue or vice versa works fine, like here:

Socket make_socket() 
{ 
  return Socket{}; 
}

const Socket new_socket()
{
  return make_socket(); // func call returning by value
}                       // is still a prvalue

Socket s = new_socket();

You can return more than one recipe from a function:

const Socket select_socket(bool cond)
{
  if (cond) return Socket{};
  return make_socket();
} 

In case you are wondering how this can be implemented on a compiler. Function select_socket() when called, will be passed an additional pointer that indicates at which location the destination object is going to be created, and initializes the object in that place using the recipe from prvalue. Whoever calls select_socket() to initialize his object will pass the address of this to-be-object to function select_socket().

A recipe can be transferred up, like inside function select_socket() but ultimately some object will be initialized with it. If you do not designate one, like here:

int main()
{
  make_socket();
}

a temporary object will be created. Similarly here:

int main()
{
  return make_socket().id();
}

More than just return by value

This feature, which could be called “prvalues without temporaries”, can be used to solve another problem: conditional initialization. In order to illustrate it, we need to first change our Socket class once more. Because we can now afford to return by value without any move constructor, instead of providing a constructor, we will only allow to create the instances through factory functions:

class Socket
{
private:
  explicit Socket(int AddressFamily);
  Socket() = delete;
  Socket(Socket&&) = delete;
 
public:
  static Socket make_inet() { return Socket{AF_INET}; }
  static Socket make_unix() { return Socket{AF_UNIX}; }

  // ...
};

This is superior to constructors, because now we can have two functions with identical set of parameters (empty set in our case) that perform different initialization. Now, suppose we want to use our Socket inside class Client:

class Client
{
  Socket _socket;  

public:
  explicit Client (Params params);
};

Params contains member datum isUnixDomain. Based on this parameter, we want to use one factory function or the other. We can do it like this:

Client::Client(Params params)
: _socket(params.isUnixDomain ? Socket::make_unix()
                              : Socket::make_inet())
{}

And this just works: no move constructor is needed: only one object is initialized: _socket. This syntax is correct before C++17, but previously it required a move.

Unfortunately, while it works for initializing member subobjects, the Standard is not clear whether the same thing should work for initializing base classes and delegating constructors. GCC does implement rvalue references without temporaries in constructor delegation, but this may turn out to be non-portable.

What if we wanted to emplace our Socket in a std::vector? This would not work because adding a new element might cause the vector to grow, and this requires moving the elements around. But what if we wanted to emplace a Socket in a container that doesn’t grow in this way? Let’s try to implement our own: a simplified version of std::optional: we provide a raw storage for an object of type T. By default no object is allocated, and then later we can emplace data inside the storage:

template <typename T>
class Opt
{
  std::aligned_storage_t<sizeof(T), alignof(T)> _storage;
  bool _initialized = false;
  
  void* address () { return &_storage; }
  T* pointer() { return static_cast<T*>(address()); }

public:

  Opt() = default;
  Opt(Opt&&) = delete;
  ~Opt() { if (_initialized) pointer()->T::~T(); }
 
  template <typename... Args>
  void emplace(Args&&... args) 
  {
    assert (!_initialized);
    new (address()) T(std::forward<Args>(args)...);
    _initialized = true;
  }
};

For the purpose of our discussion we make Opt non-movable, because we intend to store a non-movable T. In-place construction in line 20 also works without creating temporaries when passed a prvalue. However, function emplace() takes arguments by reference, so a temporary needs to be created, and there will need to be a move. So, the following will not work:

Opt<Socket> os;
os.emplace(Socket::make_inet()); // error

But, we can work around this by creating a temporary of a different type than Socket: with a conversion operator to Socket that will create a prvalue directly in the in-place construction. Here is how we can implement it:

template <typename F>
class rvalue
{        
  F fun;
public:
  using T = std::invoke_result_t<F>; 
  explicit rvalue(F f) : fun(std::move(f)) {}
  operator T () { return fun(); }
};

Metafunction invoke_result_t is a replacement for std::result_of in c++17. Construct invoke_result_t<F> means the result of invoking a function-like object of type F with no parameters. With this tool in place, we can emplace a Socket in our container like this:

Opt<Socket> os;
os.emplace(rvalue{&Socket::make_inet});

Let me explain. We are creating an object of type rvalue<F>. F is deduced from the argument. This is another feature of C++17 called class template argument deduction. The initialization only stores a pointer to a function. We can create temporaries of this type as they are cheap and movable. But in the in-place initialization that takes place inside emplace() an object of this type is converted to Socket. Only inside this conversion do we call the factory function and produce a prvalue that is only used to initialize the object in the raw storage of the optional object.

We can get away with passing only a pointer because the function does not take additional parameters. In general, rather than passing a pointer we would pass a closure object:

Opt<Socket> os;
os.emplace(rvalue{[&]{ return Socket::make_inet(); }});

But there is more. We have said, it is impossible to emplace into a vector because it might move elements around while growing. But moving elements around would not require a move constructor if we had a destructive move. But with C++17’s prvalues, we can implement the library part of the destructive move.

In order to do this, we require of all the types T that want to be destructively moved to provide function that can be found through ADL:

T destructive_move(T& old) noexcept;

(This is somewhat similar to how swap is used: if you want your type to be swapped efficiently, provide an overload for swap for your type.)

The semantics of destructive_move are: once this function is called on a piece of storage representing an object of type T, the object is considered destroyed: no destructor must be called for it, and another prvalue (“recipe for creating an object”) is returned.

Whereas a move constructor for some types may need to throw exceptions, it is never the case that destructive move operation should throw. We require that it never throws exceptions.

The requirement on not calling the destructor only makes sense for container-like types that manage the life-time of objects manually. This is the case for our type Opt. Let’s add member function eject to its interface. It will return the contained object by value, and leave the optional object valueless:

template <typename T>
class Opt
{
  // ...

public:
  // ...

  T eject()
  {
    assert (_initialized);
    _initialized = false;
    return destructive_move(*pointer());
  }
};

Function eject() returns a prvalue by value, that is, it returns a recipe. It marks the optional object as not containing a value. T’s destructor is not called. It is assumed that destructive_move() does anything that is required to consider the object destroyed. From now on, the life-time of the contained object is finished.

How may the implementation of destructive_move() for our Socket look like? Let’s see the rewritten class Socket. We will then explain what is going on:

class Socket
{
  int socket_id;
  // class invariant: id() >= 0

  struct destructive_t {}; // for tagging a special ctor
    
  explicit Socket(int AddressFamily)
    : socket_id{ socket(AddressFamily, SOCK_STREAM, 0) }
  {
    if (socket_id < 0)
      throw std::runtime_error{""};
  }
   
  explicit Socket(Socket& s, destructive_t)
    : socket_id{std::exchange(s.socket_id, -1)} 
  {
    s.Socket::~Socket();
  }

public:
  Socket(Socket&& r) = delete;

  ~Socket() {
     if (BOOST_LIKELY(socket_id != -1)) 
       close(socket_id); 
  }
  
  int id() const { return socket_id; }
  // postcondition: return >= 0
    
  static Socket make_inet() { return Socket{AF_INET}; }
  static Socket make_unix() { return Socket{AF_UNIX}; }
    
  friend Socket destructive_move(Socket& s) { 
    return Socket{s, destructive_t{}};
  }
};

Empty class destructive_t (line 6) is a tag that we will use for tagging a new constructor.

The new “destructive” constructor (line 15) takes another Socket by lvalue reference. It is to some degree similar to a move constructor, but it goes further. It steals the contents from s (in this case the contents are only socket_id), it puts a not-a-socket id instead (much like move constructor would do), and immediately calls the destructor of s, which ends its lifetime. The destructor again needs to check for the special not-a-socket value before it calls close() (line 25). This looks like we are back to the type with a moved-from state, but this time it is different. The not-a-value state can only be set by the special “destructive” constructor, which is private, and the next thing it does is to destroy the object with the not-a-socket value. So apart from the destructor no-one will observe this state. This safe-to-destroy state offers less guarantee than the moved-from valid but unspecified state.

We annotate the check with BOOST_LIKELY (this is a macro over GCC’s and clang’s __builtin_expect) which hints the compiler that unless there are other indications it should assume that the condition will evaluate to true. A similar annotation [[likely]] is a likely (pun intended) addition to the future revisions of C++ (see here).

In the destructive-move case the check will be optimized out by the compiler as it is performed a couple of instructions before the socket id is set to -1. Our invariant still is declared as strong, although technically this is incorrect, because sometimes it will not hold when destructor starts. This would have been cleaner if the language offered a native support for destructive moves. In that case invoking a “destructive” constructor would be recognized as ending the life-time of the object, and we would not need to call the destructor manually, and would not kave to set the special value -1.

Our friend function destructive_move (line 35) uses the “destructive” constructor in its returned prvalue. The contract of function destructive_move is: after it has been called, no attempt will be made to destroy the object referred by the argument reference.

This is all we have to do to be able to eject a non-movable type from our optional:

Opt<Socket> os;
os.emplace(rvalue{&Socket::make_inet});
Socket s = os.eject();

And we can also emplace the ejected socket:

Opt<Socket> os, ot;
os.emplace(rvalue{&Socket::make_inet});
ot.emplace(rvalue( [&]{ return os.eject(); } ));

This shows how we can move around (to some extent) a non-movable object with a strong invariant. A similar technique could be used in stl2::vector.

And that’s it for today. I would like to thank Tomasz Kamiński for explaining to me the significance and the potential of “prvalues without temporaries” feature.

Advertisements
This entry was posted in programming and tagged , . Bookmark the permalink.

23 Responses to Rvalues redefined

  1. Ian O'Shaughnessy says:

    You wrote “struct Opt” then later in the definition specify “public” visibility. Is that a mistake?

  2. JCAB says:

    When you say “returns a recipe”, would you say this a way of thinking about, or rather an actual concept from the standard?

    I ask this because “returning a recipe” evokes thoughts that wouldn’t really apply. There’s really nothing that can be done with this “recipe”, besides immediately use it to construct an object or “pass it along”. You can’t sure it for later. This is what allows it to work with the classic implementation strategy of passing forward the location where to construct the returned object (where to apply the recipe) so no recipe is actually physically returned.

    Do you envision possible implementations that actually return the recipe by returning, say, a continuation?

    Nice post! Thought provoking!

    • When I say “returns a recipe”, I want the reader to move away as far as possible from thinking “temporary as usual, but elided more often”. It is not a concept from the standard. The Standard only describes a prvalue, which is an element in the abstract syntax tree, and then it says when a temporary object is materialized.

      I am not sure I understand what you mean by a recipe. But if I guess your meaning correctly, a closure object can be thought as a recipe that can be passed around:

      auto recipe = []{ return MyClass{size, "label"}; };
      // recipe can be passed around.
      

      Additionally, if you wrap it into a type that provides a conversion to MyClass, I can use this recipe in in place construction, perfect forwarding, and probably more. This is what class template rvalue does. Maybe it should be caller recipe.

      Is this something you are after? Or did you mean something else?

      • JCAB says:

        I was just trying to probe the language you’re using here, and chewing on ways to tie it to (and dissociate it from!) actual existing implementation details.

        “Recipe” is your terminology here. By using it I only mean whatever model my mind had built from it at the time :). So far it’s a model of double-thinking: on one hand, the clear idea that the caller is the one that “owns” the object that will be initialized, so “return the recipe” indicates the caller actually performs the construction. On the other hand, in practice, compilers today would have the caller pass a pointer to the storage, asking the function to “construct me an object right here”. I don’t expect this implementation reality to change any time soon, but it’s interesting and fun to consider what-if, hence the mention of “continuations”. And thinking in terms of continuations here helps ground the mental model.

        My suggestion to you is to try and tie the novel concepts you describe to actual implementation details. Some people out there just can’t abstract themselves from such details. That’s the sort of people that think of references as pointers with funky syntax, for instance. And I think there’s value in covering every angle. It’s a thought, anyway :-).

        Thanx!

        Nit: you are missing “[=]” in the reification of the recipe.

  3. Onduril says:

    Thank you for the interesting post.
    I wonder, shouldn’t destructive_move take a pointer to discourage the use on an object that would be automatically destroyed again later?

    • Indeed, having a pointer would be likely to suggest that something special is going on. On the other hand, inside rather than having a reference to an object we would have a pointer, which represents either an address of an object or a no-address (null pointer), and people will have to deal with it (probably by making not-null a precondition). Maybe it is worth it, but it is not immediately clear to me.

  4. Drew Dormann says:

    A small typo – you mention a std::mutes that certainly was meant to be std::mutex.

  5. Ernst says:

    what is a difference with RVO? which is available for a long time , even before C++11? which also constructs return value outside of callee

    • The main difference is that when you declare your move constructor as deleted (because you want to have a non-movable strong-invariant type), compiler will give you a compile time error when you try to return your type by value. IOW, In C++03, returning by value requires a move constructor, which is optimized. In C++17 returning by value (of a prvalue) does not require a move constructor at all.

      People encounter this difference in practice, when they want to prevent returning by value by disabling the move constructor (declare it private or deleted). In C++11 when you do it, you are guaranteed that your types cannot be returned by value. This guarantee suddenly breaks in C++17.

  6. Davidbrcz says:

    Great, a new way to make code more subtle and harder to understand !

  7. einpoklum says:

    These insights alone are reason enough to switch to C++17!

    … If only nothing needed to be backwards-compatible 😦

    Thank you, Andrzej.

  8. Minor typo: potable for portable

    GCC does implement rvalue references without temporaries in constructor delegation, but this may turn out to be non-potable.

  9. Hi, Andrzej,

    I followed your article with interest. I’ve been going through guaranteed copy elision myself, mostly consulting Nicolai Josuttis’ C++ 17 (online book) and going through the appendix in C++ Templates, by David and Nicolai.

    One of the things that your post helps me with is in finding other scenarios beyond the “returning non-movable types by value without creating a temporary”.

    The examples you provided for emplace seem a little far fetched for me–it’s possible, but would one want to go through all that? I liked the “prvalue wrapper” that you provided. Granted, it requires a factory method but it does not seem to impose too much of an overhead on a given type.

    Nice article.

    • I am already missing something like this when initializing an optional resource using a factory:

      class MyClass
      {
        optional<Resource> r; // non-movable 
      public:
        explicit MyClass(with_resource)
        : r{in_place, Resource::create()} // DOES NOT WORK
        {}
      };
      

      This is because what we call “perfect forwarding” cannot perfectly forward prvalues anymore. So, I will need to rewrite the constructor to:

      class MyClass
      {
        optional<Resource> r; // non-movable 
      public:
        explicit MyClass(with_resource)
        : r{in_place, rvalue(&Resource::create)} 
        {}
      };
      

      Which looks more like a workaround, but it works, and it shows that we can (at least manually) perfect-forward prvalues; which in turn shows that implementing it in the future language should be possible.

  10. Guest says:

    Your reasoning is based on RAII. But beware, most times RAII is an illusion if one don’t throw in destructor. Most closing and destroying functions, like your used Linux close() and Windows CloseHandle(), return error values — which are really needed. If you don’t throw in destructor you can completely ignore potential errors (Cache not written? Others people luck, Harrr Harrr!) or you need a function to close and destroy manually. close() member functions need a no-resource-state which is as good as a moved-from.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.