Handling short codes — part I

In my work, we often deal with codes: country codes, airport codes, airline codes, aircraft codes, and more. The thing they have in common is that they are really short. Sometimes 2-letter long, sometimes 3-letter long, sometimes a bit longer. Most of the time their value is used to compare against other codes, e.g., when searching through a collection. We hardly ever need to see their textual representation.

The initial choice was to represent them in programs by type std::string, a general-purpose type for storing sequences of characters. But, as with any general-purpose tools, they fail to take into account the particular usage cases which often offer room for improvements.

Many implementations of string offer a Short String Optimization (SSO), so that any memory allocation is avoided altogether, and the desired text is stored in the place where you would otherwise have pointers to the allocated memory. This is very useful and avoids expensive memory allocation, but it is still far from the optimum solution when we know we will always be storing sequences of length, say, 2.

Typically, string’s size will comprise of three pointers: the beginning of the allocated array, the end of the allocated array, and the end of the currently stored sequence. On my platform sizeof(void*) is 8. This means that the size of the string will be no less than 24 (chars). And we know we can store any 2-letter code in type char[2], which is 12 times smaller. An extreme implementation of string (and inefficient one) could have the size of one pointer (which points to the structure of three pointers described above), but even then we would have the size of 8 whereas technically we only need 2.

The size of the data type affects run-time performance. Suppose you need to check if a given value exists in an array of values. The question is how many of the subsequent objects from the array you can fit in one CPU cache line. IOW, how many of the objects can you inspect without accessing the main memory. Suppose my CPU’s cache line is able to store 64 chars. This means I can fit 2 full strings with SSO. Compare this with 32 small codes that can be encoded in 2 characters.

Another cause of slow-down is the run-time check required by SSO. The decision whether to allocate memory or store characters locally is made dynamically, and cannot be deduced from the type alone. this decision is recorded as a flag inside the string implementation. Later, when we want to read the value, we first need to read the flag and then branch: from what address do we want to read the characters. This branch comes with a cost.

Designing class code

The following is an attempt at the design of a type that would take advantage of the information that we are only storing character strings of length N, where N is sufficiently small (say, no longer than 8).

Since N is known statically, at compile-time, we will make it part of the type. There is no need to check the length of the string at run-rime. For storage, we can use a raw array. So, the first approximation of the implementation could be the following:

template <int N>
class code
{
  char array_[N];

public:
  // the interface
};

We leave no room for the trailing null character. In C-strings it is used to indicate the end of the sequence, but in our case we know exactly where the sequence ends: after the N-th character. There is no need to duplicate this information. Therefore we can save one character, which can prove quite a huge saving for a 2-letter code.

Relational ops

Most of the time, in our assumed usage scenario, the codes will be compared for order and equality, so let’s implement these first. The lexicographical ordering of strings can be implemented with memcmp:

template <int N>
class code
{
  // ...

  friend bool operator<(code l, code r)
  {
    return std::memcmp(l.array_, r.array_, N) < 0;
  } 
};

You might be uncomfortable about passing the arguments by value. But remember we are passing the trivially copyable object of the size of an int.

The equality comparison could be implemented in a similar manner, but there is a faster way. We can cast the two arrays to integer types and compare the integers: it should be one processor instruction.

Rather than using a reinterpret_cast, we will use a union. They are comparable and equally type-unsafe. The implementation of our storage will look like this:

template <int N>
class code
{
  typedef typename shortest_fitting_int<N>::type int_t;

  union {            // anonymous union
    char  array_[N];
    int_t as_int_;
  };

public:
  // the interface
};

We use an anonymous union. Names array_ and as_int_ can be thought of as two views through which we can observe the storage. Type int_t is the smallest built-in integer type capable of holding our array. It is defined in terms of a compile-time meta-function which selects the appropriate type given number N. You may find this syntax obscure. In C++ (with alias templates) this could be changed into:

template <int N>
class code
{
  typedef shortest_fitting_int_t<N> int_t;

  union {
    char  array_[N];
    int_t as_int_;
  };

public:
  // the interface
};

Which is a bit shorter, but it now requires the reader to know the C++11 alias templates, and for some this may turn even more confusing Besides, I want to show you that this library for processing short codes is implementable in C++03. No new features are required.

Now, how do we define meta-function shortest_fitting_int?

Conceptually, it would “return” the desired integral type as follows:

// pseudo-language; not C++

if      (N <= sizeof(int8_t))  return int8_t;
else if (N <= sizeof(int16_t)) return int16_t;
else if (N <= sizeof(int32_t)) return int32_t;
else                           return int64_t;

In C++, we have to write a big nested meta-function. If it scares you, I can only say, you are not alone. Consider it a price you have to pay for gaining additional performance.

Instead of ‘returning’ we will be defining a nested type type. Instead of an if-statement, we will use another meta-function boost::conditional.

template <int N>
struct shortest_fitting_int
{
  BOOST_STATIC_ASSERT(N > 0);
  BOOST_STATIC_ASSERT(N <= sizeof(int64_t));

  typedef
    typename boost::conditional< (N <= sizeof(int8_t)),
      int8_t,
      typename boost::conditional< (N <= sizeof(int16_t)),
        int16_t,
        typename boost::conditional< (N <= sizeof(int32_t)),
          int32_t,
          int64_t
        >::type
      >::type
    >::type type;
};

Again, with C++11, its static assertions, alias templates and built-in meta-functions, this could be simplified a bit:

template <int N>
struct shortest_fitting_int
{
  static_assert(N > 0, "negative N");
  static_assert(N <= sizeof(int64_t), "N > 8");

  typedef
    std::conditional_t< (N <= sizeof(int8_t)),
      int8_t,
      std::conditional_t< (N <= sizeof(int16_t)),
        int16_t,
        std::conditional_t< (N <= sizeof(int32_t)),
          int32_t,
          int64_t
        >
      >
    > type;
};

That was the most difficult part of this post. Now, the implementation of operator== is trivial:

template <int N>
class code
{
  typedef typename shortest_fitting_int<N>::type int_t;

  union {
    char  array_[N];
    int_t as_int_;
  };

public:
  friend bool operator==(code l, code r)
  {
    return l.as_int_ == r.as_int_;
  } 

  // other interface
};

So, why not implement operator< similarly, with int comparison? Given the endian-ness issues, the resulting order might not sort the codes alphabetically. It might be acceptable for some applications, but I wanted the behavior to be more that of a string.

Warning: this technique of writing to one union member and reading from another may look suspicious. To the best of my knowledge, it is unclear from the Standard whether this is an undefined behavior or not. See this thread for the discussion. I used it in GCC, which documents it as supported behavior (see here).

Practical considerations

By now, we have covered the most interesting topics: compact storage and efficient relational ops. But a real-life type needs to address other, more mundane expectations. Construction, interoperability with other string-like interfaces (that expect C-arrays or std::strings), IO, and a couple of others.

How will codes obtain a value in a real application? Frameworks for parsing XML, GUI forms or communicating with DBs will most likely either use C-strings or std::string, so at the very minimum we must assure conversions to and from these types.

In order to convert to a std::string, we can provide a member function:

template <int N>
class code
{
  // ...

  std::string to_string() const
  {
    return std::string(array_, N);
  }
};

We cannot provide any function like c_str(), because we do not (and do not want to) store the terminating null character. The users could use our interface like this:

c.to_string().c_str();

But this involves issues of the life-time of temporaries, and besides, we do not want to involve the creation of std::strings and risk the potential memory allocation, unless this is absolutely necessary.

Instead, we can offer a std::vector-like compatibility interface in form of member functions data() and size():

template <int N>
class code
{
  // ...

  const char* data() const { return array_; }
  unsigned size() const { return static_cast<unsigned>(N); }
};

Conversion to string is easy, because the set of valid code values is a subset of all string values. The conversion from string to code is more difficult, as it may fail. There are strings (of size different than N) that are not representable as code. We have to figure out a way of handling the conversion failure.

I have considered three options:

  1. Simply require the correct string size as precondition, and move the responsibility for checking it to whoever requests the conversion.
  2. Make no precondition, check the size manually and throw an exception if the size is wrong. This way an attempt to pass the wrong string is treated as a condition that warrants (although not requires) the application shut-down.
  3. Have a conversion function that returns a Boolean result indicating success or failure. The caller may not know if the string fits or not, and may use our conversion function to check it.

Out of the three options, I consider (2) the worst one. I know that many people will strongly object. I do not intend to convince anyone. I can only try to explain my reasoning. I believe that passing an incompatible size is either a bug or a legitimate use. If it is a bug, bugs should be fixed, not reported at run-time. If it is a legitimate use, why trigger the stack unwinding and a potential application shut-down?

Option (2) tries to be somewhere in between: “you shouldn’t pass the incompatible string, but well, you can, and we will handle it, but don’t do it, but it is in fact OK if you do…” I disagree with such diffused stance. The incompatible string is either a legitimate input or not.

Options (1) and (3) give the opposite, but clear answers to the question what is a legitimate output. Either choice is good; one has to be made. I chose (3).

template <int N>
class code
{
  // ...

  bool from_string(const char* str, size_t len)
  {
    if (BOOST_LIKELY(len == N)) {
      as_int = int_t();              // zero out the buffer
      std::memcpy(array_, str, len); 
      return true;
    }
    else {
      return false;
    }
  }

  friend bool code_from_string(const std::string& s, code& c)
  {
    return c.from_string(s.c_str(), s.length());
  }

  friend bool code_from_string(const char* s, code& c)
  {
    return c.from_string(s, std::strlen(s));
  }
};

So, the contract is this: if the string’s size matches, we copy its value and return true; otherwise we return false and leave the initial value of the code object unchanged. This is called the strong failure guarantee.

Macro BOOST_LIKELY is a fairly new addition to Boost.Config library. It gives a hint to the compiler which branch it should optimize for.

But that interface causes another issue. In order to use it, we already need to have an object of type code created, with some value: but what value? This is a problem similar to the common pattern of reading a variable from the stream:

int i; // what value?
std::cin >> i;

Using a default construction like this, with indeterminate value is often considered a bad practice, and from time to time people have suggested that the language should offer a special syntax for expressing the intention here:

int i [[uninitialized]]; // can be written to but not read 
std::cin >> i;

While changing the language is the responsibility of the C++ Standards Committee, we can do something similar for our type code.

To implement this we will define a tag class:

class uninitialized_code_t{};
const uninitialized_code_t uninitialized_code;

The purpose of such tag class is to be a type distinct from any other type. We can now provide a constructor:

template <int N>
class code
{
  // ...

  code(uninitialized_code_t) : as_int_() {}
};

We fill the array with zeros, which should be a distinct value than any other proper code, and will compare unequal to and less than any decent code.

We can now use the conversion like this:

code<3> c (uninitialized_code); // singular state

if (code_from_string(s, ans)) {
  // ok, you can read the value of c
}
else {
  // you must not read from c
}

You may find it inconvenient that this design forces you to write at least two statements in order to convert a std::string to code. But it has one benefit: it is as efficient as possible and expressive enough for you to use it to define any interface of your liking. You can use it to implement a conversion function that returns an boost::optional<code<N>> on failure, or throws, or whatever you see fit.

This poses a new question. Given that we have a way to construct a code with a singular value, should we not define a default constructor, and have it assign the singular value?

Personally, I am not at all in favor of creating by default (and silently) an object in a singular state. In the cases you need to use a singular state temporarily, having to explicitly type uninitialized_code somewhere is at least a clear indication that you are doing something risky locally. With a default constructor it is worse, because the singular state is seeded implicitly, and it is difficult to follow.

However, I must admit that I was beaten by real life here. This library was designed to replace the usage of std::string in the program I was maintaining. The reliance on the std::string’s default constructor and in general on every type having a default constructor (putting an object in a half-initialized state) was so huge, that forcing the use of uninitialized_code everywhere, only to half-initialize codes inside other default constructors was pointless. I gave up and added the default constructor.

But that just triggers another set of issues. Because it is easy to inadvertently use the default construction, we are going to have a lot of default-constructed objects floating around, and we need to make sure that all the operations from the code’s interface are well defined for a default-constructed and in fact a meaningless code. We also need to provide a way of checking if a given object is in a default-constructed state. Luckily, the half-initialized state can be represented by a regular combination of characters in the same array; so, to a great extent the default-created state is just a regular state. The only tricky part is how the null chars are treated when dealing with strings and IO functions.

With the default-construction-related functions:

template <int N>
class code
{
  // ...

  code() : as_int_() {}
  bool is_initialized() const { return as_int_ != int_t(); }
};

We need to change at least function to_string:

template <int N>
class code
{
  // ...

  std::string to_string() const
  {
    if (BOOST_LIKELY(is_initialized()))
      return std::string(data(), size());
    else
      return std::string();
  }
};

This change is not strictly necessary, but I want to make an additional guarantee: that an uninitialized code converted to string returns an empty string:

assert (code<N>().to_string().empty());

The special effort is required because of how std::string handles null characters. For details see this post.

To be continued…

I have to stop for today. There is more to be said about the interface and the implementation of class template code and, to be honest, I haven’t yet even mentioned the most interesting feature of the library: type safety. But these have to wait for another post.

This entry was posted in programming and tagged , , , . Bookmark the permalink.

32 Responses to Handling short codes — part I

  1. Do you see any disadvantage to use similar code template for codes of different but all small sizes. For example if we know that all codes will be of size two or three characters? Should we restrict usage for ASCII subset only? If we add several additional restrictions like above our parse function most likely get more complex and probably should be domain specific. It is usually not safe to convert any string of fixed size to the code type. My suggestion is to use code like this for construction:

    code raw_construct(const char* buffer, size_t count);

    • Do you see any disadvantage to use similar code template for codes of different but all small sizes. For example if we know that all codes will be of size two or three characters?

      — It will work. In fact it works for my project, I just put two template parameters: lower and upper limit. I intend to write about it in the next post.

      Should we restrict usage for ASCII subset only?

      — I cannot answer this question in general. The only use case for codes I have seen so far is to store 26 capital English letters, 10 decimal digits, plus one additional character for wildcards. So I am hesitant to generalize prematurely.

      code raw_construct(const char* buffer, size_t count);

      — How does it signal failure?

      • — How does it signal failure?
        Usually raw_* methods just eat what we pass. With no checks. This is “udefined behavior” to pass invalid data as input. And usual implementation can ignore extra data and pad with zero if too low data.

        I was not6 able to use g+ for auth, so I used facebook. Sorry for that.

  2. Typo alert:
    > bool is_initialized() const { return as_int_ == int_t(); }
    Should read
    > bool is_initialized() const { return as_int_ != int_t(); }

    (please feel free to delete this comment).

  3. Is it kosher to read as_int_ after setting array_?

    • According to this thread: http://stackoverflow.com/questions/11373203/accessing-inactive-union-member-undefined, the issue is at least sloppy (especially, see the last reply). It works for me on GCC, but perhaps I should at least make a note of the potential peril.

      • Yeah, I saw that stackoverflow thread. My memory told me that it was UB to read a union member other than the last one set but I wasn’t sure so I searched and found that thread which didn’t really put my worries to rest…

        • pdw says:

          Even if it’s allowed, it’s only safe it the length of the code matches the length of the integer. Suppose you have a three-character code. In that case the value of the fourth byte is unspecified and the value of as_int_ will be useless. Explicitly setting as_int_ to zero before writing doesn’t fix that, as the compiler is allowed to modify the fourth byte when writing to array_. E.g. the compiler might be able to change memcpy(array_, str, 3) to a 32-bit load/store.

  4. gnzlbg says:

    I fail to see the real advantage of using an union here. From the brief discussion I got:

    – accessing inactive member of an union: undefined behavior in C++>=11.
    – reinterpret cast: works just fine.

    Why did you go for the union vs reinterpret cast? Did I miss any arguments?

    Have you considered making your type constexpr? In C++14 it should be easy, but reintepret_cast will still be tricky, and the union UB doesn’t work (clang tells you that you cannot access the inactive member of an union at compile time because it is indeed UB).

    • Honestly, it was during the course of receiving comment on this blog that I learned about the potential UB when type-punning with unions. I am still not totally convinced (see here and here), but I acknowledge the controversy. I will run some tests and probably change the implementation.

      Regarding constexpr I did not make it for two reasons: I want the library to be used in C++03, and also, I believe it is impossible. constexpr prevents any type punning: static_cast or unions alike.

      • gnzlbg says:

        One can maintain C++03 compatibility by using a macro (like BOOST_CXX14_CONSTEXPR).

        unions and static_cast are perfectly fine within constexpr functions. There are 2 full constexpr variant implementations (Eggs.Variant being the nicest one) that show this in full.

        • I will have to look at the implementation of Eggs.Variant at some point.

          However, my understanding is that type punning is disallowed in core constant expressions. I also use a union in my constexpr implementation of tr2::optional. I can use a union, but I cannot type-pun with it (store one type and read another). And type-punning is core to performance of my implementation of code.

      • krzaq says:

        Check this: http://blog.qt.io/blog/2011/06/10/type-punning-and-strict-aliasing/

        I think C99 explicitly allows casting through unions, whereas all C++ standards and C89 ignore this issue. In that case, I’d say it’s safer to assume that it is simply unsafe and use memcpy to copy the data into a variable of desired type.

        All that being said, I am wondering why you didn’t simply use std::array (or boost::array). It has all the necessary operators already implemented and I don’t think quality of implementation would be an issue.

        • If I abandon the type-punning optimization, I guess I can use boost::array in the implementation. I do not want to just substitute boost::array for my code (if this is what you are suggesting), as the latter also offers additional useful features, which I intend to share in the next post.

        • gnzlbg says:

          To: Andrzej Krzemieński

          I wonder how much slower an implementation without type pruning actually is on a release build. I would expect that comparing an array of chars using a loop where the count is known at compile time to be auto vectorized by gcc and clang, but I haven’t tried and part of me also knows that compilers generally suck at this.

        • krzaq says:

          No, I meant using boost::array instead of the C-array.

          We’re using std::array as a base class for our “SizedString” template at my work, though our most common use-cases are 3, 12 and 18 characters. I’ve heard a lot of general advice against inheriting from stdlib containers (mostly “because splicing”), but in this case, it hasn’t bitten us in the two years we’ve used this solution and it is nearly ubiquitous in our code base.

          By the way, is there a reason why you use c-arrays by default instead of the boost or std template? They have the same size, they aren’t any slower, they don’t deceptively turn into a pointer as soon as they can, their syntax can be read left to right and their syntax means exactly the same thing in function declarations.

        • By the way, is there a reason why you use c-arrays by default instead of the boost or std template?

          — In my case it would be only boost::array, I guess, as I am using GCC 4.4.7. Because I am using the array as a member, and do not expose its interface to the users, and myself have to do some nasty tricks with it anyway, I do not benefit from the advantages of boost::array that much.

          On the other hand, boost::array is no worse than the raw array, so I could just use it for the elegance’s sake, if it were not for one tiny thing: it requires me to include an additional header, and an additional template library. While I cannot support this with evidence, I have an intuition that tells me that this would contribute to the already long compilation times.

      • mikebmcl says:

        The UB issues in the ISO C++ Standard come from 3.10/10 (“If a program attempts to access the stored value…”) and 9.5/1 (“In a union, at most one of the non-static data members can be active at any time, that is, the value of at
        most one of the non-static data members can be stored in a union at any time. ….”); see: N3296 – http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf .

        Since only one NSDM of a union can be active at any time, use of any other results in UB.

        If, to dodge that UB, you took the active char[N] member and cast it to an int value, you would then end up with UB because of 3.10/10 (i.e. the strict aliasing rule). It might seem like 3.10/10(10.6) would save you from UB but since 9.5/1 only allows one active data member a compiler is free using a non-active data member is still UB (as I read it).

        In practice compilers I’m familiar with currently treat that UB usage in a way that fits your usage pattern. But there are historical examples of compilers changing how they handle UB from one version to the next (I’m thinking specifically of GCC but I’m sure others have done it too).

        In a comment below, krzaq mentions std::array and to me this leaps out as something worth testing and profiling. Since you are dealing with char values that have no locale implications, it seems like std::array should prove useful here. Or even a simple class template wrapper of char[N] that make use of variants of memcmp or strcmp for comparison.

        Barring that, I would add an error preprocessor statement couched in ifdef preprocessor statements that ensures that the compiler’s treatment of your UB usage hasn’t changed (i.e. each time someone tries to build with a compiler version that hasn’t been vetted to ensure that it always produces the desired behavior, compilation fails, forcing the person building it to (hopefully) check to ensure that the desired behavior hasn’t changed before adding in an ifdef for that compiler version that would bypass the error preprocessor statement).

  5. Shea Levy says:

    Did you actually test that the obvious equality operator impl of just comparing byte-by-byte generates inefficient code? Introducing type-punning here seems like a significantly premature optimization.

    • My benchmarks (GCC 4.8.2 MinGW on Windows with -O3) for comparison of int vs char[4] show that the former is around 4 times faster. I do not know how to include a file in the comment. I will publish it with one of the next installments.

      Regarding premature optimization, I do not share your view. The primary goal of this library (and of substituting it for std::strings) is a performance gain in an already correct program that uses std::strings. The goal of this library is to be fast. Its merit is to outperform std::strings, so I would not loose the opportunity to make it faster.

  6. Roman says:

    Why not make N unsigned? You only ever use it as an unsigned number, even casting it to unsigned at one point.

    • I spent on this question a lot of time, with no satisfactory answer so far. I try to follow the philosophy “signed int is for numerical values, unsigned int is for bitmasks and modulo arithmetic, the STL is wrong”. While I never need a negative array size, I would still allow for a negative value and static_assert against it. The benefit is that if as the result of some operations I get the negative result, static assert or the compiler will warn me. In contrast, underflow of unsigned value is required to be a modulo arithmetic and results in a correct positive value.

      But I realize these are more philosophical issues, and if you use an unsigned int it will just work fine for you. I cast to unsigned int, because this is the interface to the STL-like world, which uses the “unsigend philosophy”.

      • Dan Haffey says:

        FWIW, looking at the assembly output by GCC 4.8.2 in Linux shows the char[4] version consists of ~4x as many instructions as the int version. Also, both Clang (3.2) and GCC optimize away the memcpy calls involved in a “safe type-punning” version and produce identical code to the union-based version.

  7. uxcn says:

    I think type punning through unions is technically UB in at least C99 and up. There’s the problem of type representation (IEEE 754 vs. something else), but there would also be the problem that byte-orders don’t match on every machine. I will still use it on occasion though. I realize there are slightly different type semantics, but I’m still a little surprised the optimizer doesn’t do a good job optimizing short strings. I wonder if the performance is across translation units?

  8. Andy Prowl says:

    Interesting post, as always! 🙂

    To be honest, though, I am not a huge fan of the design of from_string(). The rationale for choosing option (3) does not hold water in my opinion. What is the most common usage scenario for from_string()? Is it sensible for the caller to simply ignore the result of the conversion and keep using the object the same way whatever the outcome was? And for such a usage scenario, is the previous state of the object really the most convenient one to be left in?

    If the answer to all of these questions is “Yes”, then it’s probably true that the stack unwinding triggered by an exception would require more work on the client side, and might make your choice easier to appreciate; but I’d wager the calling code will likely need to check the result of the conversion and take different actions based on its outcome. One of the reasons why exceptions are considered superior to error codes is that you can’t forget to check errors and make no decision at all.

    Probably I would also make from_string() a builder function (template) that returns a code object rather than a member function. The reason for this is SRP: the code class template would only be concerned with efficiently storing and comparing codes, while the logic used to convert to/from a string sounds like a separate aspect to me (by a similar reasoning I would make to_string() a free function).

    Putting everything together, I’d probably have from_string() be a free function template that returns a code, and have it throw in case of failure. If you really wanted to avoid the stack unwinding, you could take an additional boolean flag and default-initialize the object rather than throwing if the conversion fails; or you could take a callable as a second argument, which would be invoked on error; both approaches allow clients to turn the conversion into a no-op in case of failure, but without you worrying about this being done inadvertently.

    Just my 2 cents 🙂 Looking forward to the second part!

  9. Yaron Keren says:

    Checkout llvm::SmallString :
    http://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallstring-h

    while at it, look at StringRef and Twine as well.

  10. ks says:

    @gnzlbg: Be cautious with reinterpret_cast! Using a union enforces the correct alignment for the int access, where a plain char[4] array reinterpreted as int will not. Violating alignment will give You a performance penalty in best case and a crash in worse case. Since C++11 one can enforce alignment with alignas specifier. Using reinterpret_cast would be safe then (at least regarding alignment). See example http://coliru.stacked-crooked.com/a/8a2cffea298a17ea

  11. MF says:

    Ain’t the use of the punning cast costing you a byte(25%) of size for strings of length 3 (and three bytes(37.5%) for strings of length 5) and also forcing the alignment of the object to 4 (8) as opposed to 1 without it? Now, this obviously is a choice between speed and size but you could spell it out.

    Also, you have yet to show any constructors except for the default one but I suppose all of the constructors zero the parts of the union that are padding in the array_ case in order to ensure that the comparision avoids remenants of stack memory?

    Finally I am asking myself if a compiler is allowed to remove the assignment to as_int in
    as_int = int_t(); // zero out the buffer
    std::memcpy(array_, str, len);
    by means of noting that the write to array_ makes the write to as_int dead? (not that I think any compiler does, but I ponder if it is allowed) but this should be possible to handle through changing the code to
    inner_union = inner_union_type(); // zero out the buffer
    std::memcpy(inner_union.array_, str, len);

    • Ain’t the use of the punning cast costing you a byte(25%) of size for strings of length 3 […]

      — True, they do. I am aware of the negative impact on the size. As you observe, it is a trade-off between the size and the efficiency between some operations. It probably deserves more discussion in the post, though.

      Also, you have yet to show any constructors except for the default one but I suppose all of the constructors zero the parts of the union that are padding in the array_ case in order to ensure that the comparision avoids remenants of stack memory?

      — Yes, in the implementation (I intend to show the full one at the end of the series) I use as_int = int_t() in nearly every constructor. Although, since you mentioned the issue below, I will need to investigate deeper in order to check whether it is sufficient.

      Finally I am asking myself if a compiler is allowed to remove the assignment […]

      — I will dig into this to see what happens. My understanding was that if I first assign a null integer and then do a memcpy to a smaller region, the reminder remains filled with zeros, but as you are the second person to question this behavior, I will retire to investigate the issue, before coming back with a reply.

      Thank you for pointing these things out.

  12. ks says:

    Some more thoughts… One way to avoid the ugly union hack is to only use an unsigned integral type. In order to make the less/greater-than compares behave the same way as the lexicographical string compare is to shift the characters appropriately to the left within the unsigned. For conversion back to string one need a small temporary character buffer for reconstruction of the character sequence. A code snipped for 2-character codes can be found at http://coliru.stacked-crooked.com/a/70ef898f41363a36

    If I really want to optimize things (did You mentioned, if You want to optimize for speed or space?), I wouldn’t start with a generic solution, but with a concrete one. In my opinion generalization should be the second step (consider YAGNI). In case of 3-letter codes, I would compress them into a uint16_t. Compression can be done in a way, that comparing such compressed uint16_t values behaves like lexicographical string comparing. See http://coliru.stacked-crooked.com/a/d21c23300550db1c for a minimal example.

    Last not least I want to mention, that I really like Your blog! Keep on writing 🙂

  13. in fact the type-punning took my attention too !
    I have type for vector having smth like:
    struct matrix44
    {union{
    double elements[16];
    double map[4][4];
    char data[16*4];
    };};

    this helps me alot whether I want to manipulate/read the matrix as 2 or 1 dimensional. Also persisting it using APIs accepting byte arrays only (char*).
    Does this constitute type-punning thus undefined behavior ???

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s