The One-Definition Rule

We have been hit by the same bug twice already this year. It ends in a crash, and it took developers days to find it (in each case), even though it is reproducible on each run of unit tests. The code, after significant simplifications, looks like this:

namespace tests
{
  struct Fixture
  {
    std::vector<int> v1;
    std::vector<int> v2;

    static Fixture create();
  };

  TEST_CASE(test1)
  {
    auto fixture = Fixture::create();
    std::cout << "running test1";
  }
}

We are using a unit-testing framework that uses certain clever techniques, so that by only declaring the test, we are also registering it for being run. With this, anything related to testing one piece of functionality is neatly confined to a single file.

The problem is really not with macros. They basically expand to quite a simple construct, where the test is executed during the initialization of a global object. After further simplifications, the code looks like this.

namespace tests
{
  struct Fixture
  {
    std::vector<int> v1;
    std::vector<int> v2;

    static Fixture create();
  };

  // expanded from macro TEST_CASE(test1)
  struct UTF_test1_runner
  {
    UTF_test1_runner();               // test run here
  };

  UTF_test1_runner UTF_run_test1 {};  // init of a global

  UTF_test1_runner::UTF_test1_runner()
  {
    auto fixture = Fixture::create(); // test contents
    std::cout << "running test1" << std::endl;
  }
}

After the test contents have been run, the program crashes when the destructor of fixture is in progress. Can you see why?

It is hardly possible that the problem is in the implementation of a constructor or a destructor, because they are all compiler-generated. We cannot see the definition of function Fixture::create as it is defined in another file. So, it looks like the first culprit. But its definition, if we look at it, is in fact quite harmless:

Fixture Fixture::create()
{
  return Fixture{};
}

The bug is hard to track, because it cannot be observed by looking at one file. Not only is it difficult to humans. Also static analyzers that work on single translation unit at a time cannot observe the problem. The problem is in another file, also performing unit tests:

namespace tests
{
  struct Fixture
  {
    static Fixture create();
  };

  Fixture Fixture::create() { return Fixture{}; }

  TEST_CASE(test2)
  {
    auto fixture = Fixture::create();
    std::cout << "running test2" << std::endl;
  }
}

It also defines class Fixture, in the very same namespace tests. But with different members. But why should this be a problem?

The One-Definition Rule

By default, unless you make an effort to specify otherwise, compiler assumes that each class or function defined in a normal namespace (like namespace testing or namespace std or the global namespace) is meant to be used by anyone in the program, not necessarily in the current .cpp file (or translation unit). In more technical terms, by default, each class or function symbol in a normal namespace has external linkage. Anyone who wants, or also by accident, can link with them.

There is also a contract in C++ between you (the programmer) and the compiler. If a program contains more than one definition of a class (and also of an inline function, or a function template) with external linkage, all these definitions should be identical.

It might sound strange: this contract implies that we can legally have more than one definition of a class with the same name in the program. But this is what we often (perhaps even without being aware of it) need and do. This is exactly what happens when you put the definition of your class in a header file and then include this header in two .cpp files. Because of how the header inclusion works in C++ (by copy-and-paste), this renders the situation where the two .cpp files define the same class. There is no other way in C++ to share class definitions between files; therefore the compiler must accept the repeated definition of a given class in different files, but on provision that you do not try to do nasty tricks, like changing the definition of the same class from file to file. You are only expected to include the same header file in each .cpp file. Anything other than that is risky.

If you are willing to play by the rules, the fragile header inclusion model works in C++, and the compiler can safely assume that even if you provide the definition of the same class in another file it will be the very same definition as that seen (or to-be-seen) in another file, so it can pretend that there is in fact only one definition, but repeated in a number of files. It can just pick one and use it all around.

And this is exactly what happened in our example. When compiler observed that a function Fixture::create returns an object of type tests::Fixture in the second file, an empty class, it trusted the programmer to fulfill his part of the contract, that it will be the same tests::Fixture as in the first file. So, it generated code for constructing copying and destroying an empty class.

But when it compiled the first file, it observed the definition of tests::Fixture that contains two vectors. It assumed (from the same contract) that in the second file, it will always create, copy and destroy two vectors, so in the destructor of tests::Fixture it tries to invoke two destructors of vectors which were never there. And this causes memory access violations.

The compiler worked on the assumptions which, under the contract of One-Definition Rule, it is allowed to make. It was the programmer who violated the contract. But you might ask, if violating this contract has so severe consequences why is compiler not checking the programmer’s part of the deal? The reason is performance. We mainly choose C++ because we care for our programs’ performance. But often it is also the performance of the process of building our program that we must consider. Performance measured not only in terms of time, but also in memory consumed. If the process of the compilation of translation units alone has a linear complexity, the input size being measured in the number of translation units and the number of declarations inside them, the process of linkage has quadratic complexity. Making it even bigger, might make the linkage ineffective.

Precautions

The above example illustrates only one of many potential problems caused by the One-Definition Rule (ODR) violation. We have seen that it can cause a memory access violation, which on many systems causes an application halt. And this is perhaps one of the best things that can happen to you, because you and the user are immediately informed that something needs to be fixed. In general, the ODR violation is defined as undefined behavior, which in practice means that if we have two different definitions of the same function (or a member function) in the program, compiler is free to chose any of them at any place the name is used. This means a program may be invoking a different function than you think.

To avoid these problems one has to stick to one simple rule. Unless it is your intention to expose your function or class to everyone in the program, declare it with internal linkage. Having internal linkage means that a given function cannot be accessed from another translation unit, even if this other translation unit tries to use the same name. You do it by putting your declarations in an anonymous namespace. Thus, the fixed unit-test example looks like this:

namespace tests
{
  namespace
  {
    struct Fixture
    {
      std::vector<int> v1;
      std::vector<int> v2;

      static Fixture create();
    };
  }

  TEST_CASE(test1)
  {
    auto fixture = Fixture::create();
    std::cout << "running test1";
  }
}

The rule is simple: either a name is for everyone (and declared in a header file) or is translation-unit-local in an anonymous namespace.

Some technical details

If I fix the problem from this post like in the last example, I will start getting linker errors that function tests::<anonymous>::Fixture::create is declared but never defined. It is an improvement, because I am alerted by the problem by the linker rather than the end user.

In case of our bugs, the problem was somewhat different and more fascinating. Function create was defined inline in two translation units:

namespace tests
{
  struct Fixture
  {
    std::vector<int> v1;
    std::vector<int> v2;
    static Fixture create() { return Fixture{}; }
  };
}

Inline functions are subject to the same ODR violation problems as classes. I did not show it like this in the post because in a small example the problem with the crash does not appear any longer. This is one of the aspects of undefined behavior: you are not guaranteed that something will be wrong. Instead, I used my knowledge of Clang and GCC to fabricate an example that is quite easily reproducible. I think it is still convincing. Unit tests are the place where people often keep repeating the same names, like Helper.

This entry was posted in programming and tagged , , . Bookmark the permalink.

14 Responses to The One-Definition Rule

  1. Matthew Smit says:

    I had this issue when I was starting out in C++. I had some struct that had the same name in two different cpp files. Thankfully it didn’t take me days to fix. Now I just use anonymous namespaces for everything that isn’t declared in the header.

  2. Bruno says:

    These kinds of problems can be detected in an LTO build (although that’s not exactly its purpose). Gcc 5 has -Wodr and gcc 6 added -Wlto-type-mismatch, both are enabled by default.

  3. mjanes says:

    unless I misread something, another solution could have been to define the FIxture’s destructors non inline ( eg outside the class body )
    indeed, from a practical pov, assuming that translation units are linked statically ( and hence ODR violations of non inline functions are diagnosed by the linker ) I’d say that the true reason of the mentioned bug trickiness is due to the fact that the implicitly generated destructors are inline …

    • I partially agree. That is, if I make the destructors non-inline the crash goes away. This is because the problem of un-diagnosed ODR violations applies to classes, inline functions and function templates.

      But I would hardly call defining custom non-inline destructors a “solution” to the problem. It is rather the hiding of the symptoms. Because the root of the problem is that you are giving external linkage to a class that is supposed to be an implementation detail confined to one translation unit.

      • mjanes says:

        yep I do agree with you; by ‘solution’ I meant from a testing point of view, in the sense that having those destructors non inlined should give us a nice linker error on most (all?) linkers, making the test to fail and hence the bug to not manifest at all.
        This also reminds me of the pimpl idiom, when one tries to delete a pointer to an incomplete type via the implicitly generated destructor.
        In this sense, my point was that if one were to ask you what renders such examples so pathological ( and hard to diagnose ), I’d answer that it’s the tendency of the programmer to consider “implictly generated” == “always safe”, rather than ODR or type completeness or similar techincal issues ( important but trivial once made manifest ).

  4. bootrom says:

    Linker can easily check class definitions when resolving reference to external method by using debug info. It is disappointed that none of compilers support this and in fact problem appears very often when one works as integrator for huge projects.

    I have a feeling that C++ compilers/linkers still claim it is programming language for nerds and not for everyone.

    • It is “easy” in the sense that one could easily come up with the algorithm, but less easy in terms of consumed resources. You effectively need to compare each object file against each other.

      According to Bruno, in one of the comments above, newer versions of GCC can do it. I tried to test it with my GCC 5, but my bug still goes unnoticed.

      • bootrom says:

        com’on Adrzej, I do not think resources are real problem at current times (8 cores 16GB RAM).
        There are anyway a lot of things done by linker if you enable /GL in Visual Studio (whole program optimisation). And it does merge debug information into one PDB already. Ok, GCC/ld do not do this and do not even reduce number of template instantiated functions (my current gcc is 4.9.4 by purpose).

        So at least for debug build I would switch this check on and save me week(s) of time that I look for bug in 3.5GB codebase.

        • I suppose it is the right choice for the C++ Standard not to require such linker checks in order to take care of smaller compilers that work on systems with limited resources. At the same time I agree with you that “top” compilers should implement such checks nonetheless. Maybe this is just the question of time, before we get such tools.

  5. bootrom says:

    If you compile with debug information (/Zi) in VS2015 and link with /DEBUG option and /ODR option then you will get a list of classes with incompatible layout.

    I tried on your test program and it finds mismatching class layout in sample. Unfortunately there are a lot of mismatching layouts in Microsoft standard libraries.

    • Do I read this correctly? Does this mean there are lots of potential ODR violations in VS standard libraries?

      • bootrom says:

        Yes. The reason is the latest split of standard runtime libraries into “Windows Kit”: they have different layout even if linker shows there are build from same source. No comment on this.
        The 2nd source is different layout of libcmt.lib (standard lib, single thread) and libcpmt.lib (standard lib, multithread). But in fact you never supposed to link to both !

        So either /ODR option in VS2015 is buggy and comparing layout of not-used libraries. Or linker has a bug linking libraries that never supposed to be used together.

        BTW You can try yourself: VS2015 Community Edition is free of charge.

  6. bootrom says:

    Now I can prove VS2015 violates ODR intern in standard libraries (using /ODR for amd64 target):

    User defined type ‘_TP_CALLBACK_ENVIRON_V3’ has different definitions or layouts in the following modules:
    ‘f:\binaries\Intermediate\vctools\msvcrt.nativeproj_607447030\objd\amd64\_error_.obj (C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\lib\amd64\MSVCRTD.lib)’ (‘f:\dd\externalapis\windows\8.1\sdk\inc\winnt.h’ line 19043)
    ‘f:\binaries\Intermediate\vctools\msvcrt.nativeproj_607447030\objd\amd64\_cpu_disp_.obj (C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\lib\amd64\MSVCRTD.lib)’ (‘f:\dd\externalapis\windows\8.1\sdk\inc\winnt.h’ line 19043)
    1>LINK : warning LNK4262: 2 instances of type mismatch

  7. bootrom says:

    Basically GCC and VS use different approaches for search ODR violations. -Wodr in GCC only detect violations in classes with virtual tables by signature and may not detect your sample bug. VS /ODR use debug information and detect your bug. Even different “pack” attributes trigger VS warning with /ODR that leads to false-positive on standard headers, perhaps for even unused structure types.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s