Lvalues, rvalues and references

Update. Having read this reply by Daniel Krügler and this comment by Howard Hinnant, I realized that my post was missing some important information. I now cleared it up. The changes have been highlighted with the blueish color.

If you google it up, the question “what is an lvalue” is asked fairly often. Short answers that are typically given are “lvalue can appear on the left-hand side of the assignment” or “lvalues represent objects, rvalues represent values.” They are not very precise or correct. If you are not satisfied with these, the full and meaningful answer is difficult; primarily, because the names are confusing. Also to add to the confusion, in C++11 we now have more terms: xvalue, glvalue, prvalue. In this post I will try to describe some of these lvalue-related concepts and show that they make sense and are fairly easy to understand, if we take note of some things.

Knowing what lvalues are didn’t use to be any critical knowledge. You can successfully write good programs, use lvalues, and not even know that you are using them. This is less so now with rvalue references. In order to be able to use techniques like move constructor you need to be able to correctly tell lvalues from rvalues.

First thing to note is that lvalues and rvalues, counter to what their names suggest, are not properties of values — they are properties of expressions. In other words, every single expression in the program is either an lvalue or an rvalue. This is a static property of the expression, that can be asserted during compilation.

Let’s look at some code:

unsigned int size( Node & node ) {
  return node.is_empty() ? 0 : 1 + size( node.left() ) + size( node.right() );

How many expressions can you see? One full expression in return statement and a lot of sub-expressions: all of them can be categorized as an lvalue or an rvalue. node is an lvalue. Every name (of an object or of a reference) is an lvalue. node.is_empty() most probably returns a temporary of type bool. All expressions that return temporaries are rvalues. Expression node.left() most probably returns an lvalue reference to another node. Any expression that returns an lvalue reference is an lvalue.

But what is an “lvalue reference”? In C++11 we have two kinds of references. The one that we used to call just “reference” in C++03 is now called an lvalue reference. In the above example function parameter node is an lvalue reference to type Node.

Ok, those are just examples, but is there a simple, comprehensive and intuitive definition of an lvalue? The answer is ‘no’. The C++ standard does provide a comprehensive definition, but it is far from simple. For instance, different rules apply for expressions denoting objects than for expressions denoting functions. Yet, it should not stop us from observing some useful properties of the discrimination to lvalues and rvalues.

3 kinds of references

Technically there are two kinds of references in C++, but for the purpose of this post we will consider an lvalue reference to const a third kind of reference. It is so special, that it deserves this special treatment.

An lvalue reference (to non-const type) is a reference that can be initialized with an lvalue. Well, only with those lvalues that do not render const or volatile types. An rvalue reference (to non-const type) is a reference that can be initialized with an rvalue (again, only with those rvalues that do not designate const or volatile types). An lvalue reference to const type is a reference that can be initialized with rvalues and lvalues alike (rendering constant and non-constant types).

Thus, by selecting from the choice of three reference kinds, you can choose which expressions to pick. This is very useful when you define function overloads that differ only by reference type:

void fun( T & obj );  // pick lvalues
void fun( T && obj ); // pick rvalues

Now you can make your function fun behave slightly (or entirely) different for lvalues and for rvalues. Why would you do that? This is where the practical aspects of the division of expressions to lvalues and rvalues come into play. Rvalues denote temporaries or objects that want to look like a temporary. What is so particular about temporaries, is the fact that they will be used in a very limited way: their value will be read once, and they will be destroyed. This is a very useful observation in implementing move semantics. Since the temporary will not be inspected after its value is read, we can cheat while reading, and steal its resources. But there is a limit to that theft: the destructor will be called for such temporary, so we must leave it in the state where destructor can be safely called without causing any problems such as releasing the resources that were stolen and now owned by another object. The typical usage for move semantics is to have two function overloads:

Type::Type( T const& obj );  // expensive
Type::Type( T && obj );      // cheap

While the former function would also bind to an rvalue, the latter is just a better match; so in the end we have one constructor that is expensive, but leaves the original (that someone else may be still using) intact and the other, fast, that moves the guts from the original.

Now, it is time to explain what I meant by “objects that want to look like a temporary.” Let’s have a look at the following example.

vector<string> buildCatalogue()
  vector<string> ans; // note - an automatic variable
  if (prepare(ans)) {
    return ans;       // (1) an lvalue
  else {
    Exception ex;     // note - an automatic variable
    throw ex;         // (2) an lvalue

In return statement, ans is an lvalue, but since it is an automatic object and it is obvious it is not going to be used after the return, for the purpose of optimization it is treated ‘as if it was a temporary’, or more precisely: as an rvalue. It can be bound to an rvalue reference. Similarly, in the throw expression, ex is an lvalue, but for similar reasons it is safe to treat it as though it was a temporary: as an rvalue. Therefore in the throw expression it can be bound to rvalue reference (and thus trigger the call to a move constructor).

But there are also cases where it is also obvious to us that it is safe to treat an object (not necessarily an automatic one) as a temporary, but it is not obvious to the compiler:

void populate( vector<string> & dictionary )
  vector<string> ans;
  dictionary = ans; // (1)

In the assignment expression, it is obvious that ans is not going to be used again, but it is save to treat it as though it was a temporary, but it is less obvious to the compiler. Technically, the compiler could figure it out, but the rules of C++ are clear. This is an lvalue. However a tool for giving the compiler a hint that in this particular case it should treat an lvalue as though it were an rvalue would be useful for optimizations (and for enabling move semantics). In fact, C++ standard library even provides a tool for that.


std::string name = "name";
std::string && ref1 = name;             // illegal
std::string && ref2 = std::move(name);  // legal

Function std::move takes an lvalue expression that refers to object o and returns an rvalue expression that still refers to o. We could have achieved the same effect by casting name to rvalue reference type:

std::string && ref2 = static_cast<std::string &&>(name);

A call to a function that returns rvalue reference (like function std::move) is an xvalue. Xvalue is a special case of rvalue, where we have a reference that points to an object. Or in other words, it is a reference to an object that we want to bind to references as though it was a temporary.

But given that rvalue references are used to rob the object of its resources, isn’t using std::move unsafe? Yes, it is unsafe, similar to using pointers, unions, unchecked indexing operators: it can cause undefined behavior if misused. Yet, using it offers great performance opportunities in the contexts where we know an object will never be read again.

If you want to know the history behind the name “xvalue” you can read this article.

One important thing to note about rvalue references is that while they bind to rvalues, they are lvalues themselves when used in expressions:

std::string && rref = std::string{"temp"}; // ok
std::string &  lref = rref; // ok, 'rref' is an lvalue

There is no inconsistency in it. Any named reference (rvalue- or lvalue-) is an lvalue. The rationale for this occasionally surprising behavior has been explained in this reply by Daniel Krügler. In short, if some object is referred to by name (either original object name or an alias in form of a reference) we do not want to bind it to an rvalue reference because such binding usually means that the object will be spoiled (moved from). Any such binding to a named object should be either 100% safe (as in the case of temporaries) or requested explicitly by the programmer. The same goes for an rvalue reference (that is bound to a temporary) if you have a name in scope, there is a risk that you might refer to it later on, after binding to another rvalue reference.

Note that being bound to a reference extends the life-time of the temporary until the reference it is bound to goes out of scope. This is explained in more detail here.

This entry was posted in programming and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s