A gotcha with ptr_vector

Posted on December 10, 2014 by Andrzej Krzemieński

Recently I came across an interesting gotcha with Boost.Pointer Container library in my project. Making some incorrect assumptions as to what the library does could cause a bug.

What would you use boost::ptr_vector for? Why would you need to have a vector of pointers, which you want to delete yourself? Is it because:

You want the objects to remain at the same address even if you re-allocate the array under the vector?
You want to inter-operate with a library that already deals with owing pointers?
You want it to be faster than if you were storing values in std::vector?
You want the “polymorphic behavior” of your objects?

If your reason is (1) or (2) and you are not concerned with performance too much, you would probably do the right thing.

If your reason is (3), it is likely that you would be picking the slower solution. But do not trust me on that: measure the two solutions and check if ptr_vector is really faster.

If your reason is (4) and your familiarity with ptr_vector is superficial (as was mine when writing this post), it is likely that you would be implementing a bug. In this post we will be exploring this use case.

Suppose we have the following base class representing an interface (in OO sense):

struct Consumer
{
  virtual void consumeData(const Data&) {} // no-op by default
  virtual void consumeTime(const time&) {} // no-op by default
  virtual ~Consumer() {}
};

It can be passed to any piece of work and do either of the two the two things:

Collect points in time, e.g. for measuring time, velocity, etc.
Collect any arbitrary data, e.g. for logging

Any implementation of the interface can do both these things or only one. This is why we provide the default no-op implementation. For instance, if we want to implement a simple logger that adheres to this interface, we can do it like this:

struct CoutLogger : Consumer
{
  void consumeData(const Data& d) override { std::cout << d; }
};

No need to override consumeTime, the default no-op implementation will be used.

If, for some reason, we need to store these Consumers in a collection, we could use a ptr_vector:

boost::ptr_vector<Consumer> make_consumers()
{
  boost::ptr_vector<Consumer> ans;
  ans.push_back(new CoutLogger); // leak-safe, even on realloc
  // push_back more...
  return ans;
}

Returning it by value works fine and, as we know, it does no (or at least does not have to do any) copying. Now, suppose that at some point we need to copy such vector. For instance, because we want to use it in multiple threads, and we want each thread to have a copy in order to avoid any data races, or locking problems. What happens if we copy a ptr_vector?

The library knows that you expect elements to appear as though they were stored by value, so in order to fulfill this expectation, it will attempt to make a deep copy. But because it has no means of telling what the most derived type is stored under the pointer to Consumer, it will be copying elements assuming that their real type is Consumer. In other words, it will slice the objects. Thus a resulting copy, will only be storing pointers to Consumer, with trivial implementation of either member function! The worst thing about this is that such copy will compile, the program will be doing something that may even look correct. And likely we will learn about the problem only from the users!

This is perhaps one of the reasons why we should strive to make our interfaces abstract or at least non-copyable. My interface with default implementations may look fishy, but even if I want to preserve this idea I could have made the class abstract:

struct Consumer
{
  virtual void consumeData(const Data&) {}
  virtual void consumeTime(const time&) {}
  virtual ~Consumer() = 0;  // pure virtual destructor
};

inline Consumer::~Consumer() {} // default implementation!

This exploits an interesting feature of C++: pure virtual functions can still have a body. One other thing I could have done with my interface is to inherit it from boost::noncopyable.

If I did either of that, the attempt to copy a ptr_vector<Consumer> would result in a compilation failure. That would be an improvement, but an ideal situation would be to have the copying do the right thing. It is doable, but it requires of us some effort. The procedure is described in detail here, in the library documentation. In short, you would have to augment the interface with some member function, like clone:

struct Consumer
{
  virtual void consumeData(const Data&) {}
  virtual void consumeTime(const time&) {}
  virtual Consumer* clone() const = 0; // welcome to Java
  virtual ~Consumer() {}
};

And force every implementation, to also implement it. Then, you would have to define one function, in order to teach the library how objects in your hierarchy are cloned:

inline Consumer* new_clone(const Consumer& c)
{
  return c.clone();
}

But frankly, when you need to do this, I would consider using a value-semantic polymorphism instead as described in this post.

This entry was posted in programming and tagged Boost, gotchas, value semantics. Bookmark the permalink.

24 Responses to A gotcha with ptr_vector

Krzysztof Czaiński says:

December 10, 2014 at 11:34 am

I think this is well documented in Pointer Container. The function new_clone is intended to be overloaded in your type’s namespace and ptr_vector will find it via ADL.
http://www.boost.org/doc/libs/1_57_0/libs/ptr_container/doc/reference.html#the-clonable-concept

Reply
- Andrzej Krzemieński says:
  
  December 10, 2014 at 12:53 pm
  
  Thanks. And indeed, it is well documented. The problem is when you (that is, me) take some things for granted, and forget to nonetheless consult the documentation.
  
  Reply
  - Krzysztof Czaiński says:
    
    December 11, 2014 at 1:14 pm
    
    That is why the docs say [about boost::new_clone()]: “Warning: We are considering the removal of default implementation above. Therefore always make sure that you overload the functions for your types and do not rely on the defaults in any way.”
    But I think the source of the bug is C++ auto-generated functions (copy-constructor in this case), and not the pointer container’s Clonable concept itself. So I think your gotcha is yet another example of why one should always carefully consider whether copying should be disabled for each class one introduces.
    Btw, without providing a make_clone for your type, what kind of a copy did you primarily expect: a deep copy or a shared copy, or a compilation failure?
    
    Reply
    - Andrzej Krzemieński says:
      
      December 11, 2014 at 1:45 pm
      
      Deep copy, as in Adobe.Poly or Boost.TypeErasure. Although, I can see that the latter have an obvious advantage over clone-able pointers: at some point they are always able to see the real (most derived) type.
Ophir says:

December 10, 2014 at 4:06 pm

I wonder why in C++ we have the typeid operator, but we haven’t got a cast from a pointer/reference to the actual object’s type. For example a syntax like dynamic_cast(ptr), without the template argument. With such a cast, cloning can become very simple. Is there any reason why this cannot be implemented in the language?

Reply
- Andrzej Krzemieński says:
  
  December 10, 2014 at 4:15 pm
  
  How would you use it, and what for?
  
  Reply
  - Ophir says:
    
    December 12, 2014 at 10:30 pm
    
    What I though about was something like this:
    virtual Base* Base::clone()
    {
    return new decltype(dynamic_cast(*this)) (dynamic_cast(*this));
    }
    
    This is supposed to clone any object of a derived class using the “miracolous ” dynamic_cast without template argument, (which is suggested to cast according to the same rules as the typeid operator). But then I realized that this cannot works since decltype is deduced during compilation, not at run-time.
    
    Reply
    - luxagen says:
      
      December 26, 2019 at 12:44 pm
      
      There’s a reason this isn’t possible: your template-argument-free dynamic_cast operator cannot have a compile-time return type – the type must be dynamic, and C++ does not do dynamic typing because “you don’t pay for what you don’t use”.
      
      The way I see it, the dynamic return type isn’t necessary anyway – you simple want to be able to use the underlying object type to call the correct copy constructor and deep-clone the object; a static return type wouldn’t interfere with that.
      
      What your idea boils down to is a wish that the language would provide some kind of “base” interface for all objects (no matter how trivial) with an auto-generated clone() method that uses the derived copy constructor – à la Java. The problem with that idea is that it makes all objects polymorphic by default, and since in C++ you don’t pay for what you don’t use, this would be hard to introduce into the language. To avoid breaking the ability to define POD structs, and to preserve performance of small POD objects in general, you’d need either the ability to annotate such structures to omit vtable pointers or a compiler that does it automatically for types with no virtual methods.
      
      These problems aren’t insurmountable, and I’m not saying it’s a bad idea, but the ramifications are significant and the committee would have deal with them all in keeping with the language’s design philosophy.
Olaf van der Spek says:

December 10, 2014 at 11:48 pm

Why are you ‘suggesting’ ptr_vector is slower than vector?
What does the body for the pure virtual destructor do?
Isn’t it your fault for not deriving from noncopyable? :p

Reply
- Andrzej Krzemieński says:
  
  December 11, 2014 at 8:28 am
  
  1. Not really the ptr_vector itself, but operations on it are expected to be slower. For instance iteration: in case of std::vector (of values rather than pointers) you are iterating over a contiguous array: this is the maximum utilization of a CPU cache. In case of _ptr_vector, you get to each object through indirection, each is at a different unrelated memory location: no CPU cache utilization.
  
  2. The body is empty, as in the case of most destructors. It just calls the sub-objects’ destructors (there are none in my case).
  
  3. Yes, it is my fault that I didn’t make it non-copyable or abstract. I was trying to show how the combination of this (my not being cautious) and false expectations of std_vector together can cause a bug. If I make my interface abstract and combine it with false expectations of std_vector, I ‘only’ get a cryptic compile-time error.
  
  Reply
  - Olaf van der Spek says:
    
    December 11, 2014 at 8:43 am
    
    2. What’s the difference with a pure virtual function without body?
    3. Don’t compilers warn for this?
    
    Reply
    - Andrzej Krzemieński says:
      
      December 11, 2014 at 9:25 am
      
      2. Derived classes need to call the destructor of the base class, so it has to be defined (even if it is pure virtual).
      3. No. First, there is no way for them to know that slicing would occur in the situation like this (It is valid reasonable to assume that under the pointer to Consumer we have an object of type Consumer). Second, sometimes you do want to slice, and this has a well-defined, desired semantics. For one example, see this comment. For another, consider “veneers”.
Szymon says:

December 11, 2014 at 9:17 am

Pardon my ignorance but why would you use ptr_vector over vector<unique_ptr> in 2014? No surprises with semantics there.

Reply
- Andrzej Krzemieński says:
  
  December 11, 2014 at 9:28 am
  
  This is a very good observation. vector<unique_ptr<T>> handles 85% of the cases where you would need a ptr_vector. The only case where it fails short is when you need to make the copy of the vector.
  
  Reply
  - Szymon says:
    
    December 11, 2014 at 9:36 am
    
    Yeah, but that is exactly 15% that is problematic (and hence the post) 😉 Polymorphic cloning has to implemented anyway. Hmm this got me thinking that one would just need a smart ptr that has clone() based copy semantics and vector. But I suppose that is what ptr_vector in fact is.
    
    Reply
    - Andrzej Krzemieński says:
      
      December 11, 2014 at 9:44 am
      
      Something like this. But again, if you go this way why not do a non-intrusive value-semantic interface-based type erasure like Adobe.Poly or Boost.TypeErasure?
- Andrzej Krzemieński says:
  
  December 11, 2014 at 9:30 am
  
  Plus one other practical reason. In my company, for various reasons we have to stick to GCC 4.4. So unique_ptr is out of the question. You could say that my company is not in 2014. I suppose it is not the only one.
  
  Reply
  - Olaf van der Spek says:
    
    December 11, 2014 at 9:32 am
    
    4.4 does have unique_ptr
    
    Reply
    - Andrzej Krzemieński says:
      
      December 11, 2014 at 9:54 am
      
      Indeed. That’s a surprise!
      It will not help me much though, because adding -std=c++0x flag is a ‘political issue’ in my company.
      
      But thanks.
    - Olaf van der Spek says:
      
      December 11, 2014 at 9:55 am
      
      So you’re without range-based for as well? That sucks.
    - Andrzej Krzemieński says:
      
      December 11, 2014 at 10:06 am
      
      We are forced to use BOOST_FOREACH
- Thorsten Ottosen says:
  
  December 12, 2014 at 4:52 pm
  
  A few remarks:
  
  boost::ptr_vector and std::vector<std::unique_ptr> are different in many aspects.
  
  A. ability to deep-clone
  B. ability to guarantee that the vector never contains null_ptr’s.
  C. const-correctness propagation and verbosity
  D. interface changes that makes sense because we know we are dealing with pointers
  (e.g. pop_back() can return popped value).
  
  I don’t think there is any way to make a clone_ptr which is efficient to use with e.g. vector while preserving (B) above (Hint: to be efficient it needs to move, and to move it needs an “empty” state).
  
  Reply
  - Andrzej Krzemieński says:
    
    December 12, 2014 at 8:24 pm
    A. ability to deep-clone
    — This is where I got hit. This statement can be understood in two ways:
    
    If you make an effort, you can teach the library how to deep-clone.
    
    It does deep-cloning out of the box.
    
    I somehow assumed it does the latter. I know I should have read the docs more thoroughly. In fact I did, 10 years ago, and now I thought I remembered. I got confused because this is what the other libraries — which serve similar purpose — do: Adobe.Poly, Boost.TypeErasure.
    Reply
Szymon says:

December 11, 2014 at 9:54 am

[I am unable to reply to your comment for some reason]
I was just about to post like to this:
http://www.boost.org/doc/libs/1_57_0/doc/html/move/implementing_movable_classes.html#move.implementing_movable_classes.copyable_and_movable_cpp0x

Regarding Any or Poly: I am not a fan, from the top of my head:
1) I really complicates things a lot for no real benefit
2) For every derivative one have to create new Any/Poly
3) Down-casting is non-trivial problem

but what I dislike the most is that the original problem was with the container yet one would deal with classes that were contained. With container-smart_pointer combo semantics are very customizable and original types do not have to be modified – if I want to disable copying, i just switch to unique_ptr, if I want implicit sharing i switch to shared etc.

Reply