Using error codes effectively

In the previous posts we have seen what error codes and error conditions are. But the way we used them is far from optimum. In particular, the implementation of FailureSourceCategory::equivalent was huge and error category FailureSourceCategory was forced to be aware of all error codes from all sub-systems in the project. In this post we will see how we can do better, especially in more complex systems.

The problem

Let’s start with restating the problem from the previous posts and making it a bit more complicated.

We have a sub-system for finding flights, call it Flights, which documents the following error codes and their meaning.

Flights — error codes
10 non-existent locations in request
11 requested dates in the past
12 inverted date range in request
20 no flights found
30 protocol violation
31 connection error
32 resource error
33 timeout

This time we are not using an enum, but reflect the way two teams usually communicate: by documents. The only thing we are informed about is numeric values of error codes. They will be returned from subsystems as integer values, probably in XML files.

Similarly, we have a service for finding available seats on a given flight, call it Seats, which claims to be able to return the following error codes in case of failure:

Seats — error codes
1 invalid request
2 could not connect
3 internal error
4 no response
5 non-existent class
6 no seats avaialble

Now, as specified in the previous posts, I want to be able at some point to decide whether the error code returned from any of the sub-systems falls into one of the following categories:

  1. User sent us an illogical request.
  2. There is some problem with the system which the user will not understand but which prevents us from returning the requested answer.
  3. No airline we are aware of is able to offer a requested trip.

This is the same problem as in previous posts. Now, however, we will be also logging different conditions, and at some point we want to determine the severity of an error. It will be one of:

  1. A bug.
  2. An incorrect configuration.
  3. A (perhaps temporary) shortage of resources.
  4. Normal (like “no solutions found”) — user may be disappointed, but developers or the system did nothing wrong.

Thus we will be performing two different queries. Additiobally, to make the matter more complicated, sometimes we will not be using sub-system Flights, but instead we will be calling a competitive sub-system, called Flights-2, which does practically the same thing (but faster), but uses different values for errors:

Flights-2 — error codes
11 invalid request format
12 could not connect
13 database error
14 not enough memory
15 internal error
16 no such airport
17 journey in the past
18 dates not monotonic

Now you get the idea of the complexity of the problem. Obviously, the problem can grow in two directions:

  1. More sub-systems can be replaced, each sub-system providing a different set error code values.
  2. More queries on error codes may be needed.

The solution

In order to address this, we will introduce a yet another, artificial set of error conditions that will enumerate anything that can go wrong in any of the sub-systems: it will be a super-set of the error codes from any of the sub-systems. Error codes like 30 from Fights or 1 from Seats or 11 from Flights-2 will all be mapped onto one normalized value “protocol error”. Such “normalized” condition looses some information (like original numeric value, or from which system it originated) but retains sufficient granularity for high level queries (these for severity, or error source, or any future queries) to give correct answers.

Once we have this “normalized” error values, we will be able to provide:

  1. Mappings from sub-system error codes onto the normalized error conditions.
  2. Mappings from the normalized error conditions onto the high-level error queries.

This way, there is a one “centralized” source of information, but high-level queries need not be concerned with new sub-systems, and similarly, sub-systems need not be bothered with new high-level queries.

Let’s implement this then. First, we encode the list of numeric error values from sub-system Flights into a C++ enumeration, as we did in the previous post:

enum class FlightsErrc
{
  NonexistentLocations = 10,
  DatesInThePast       = 11,
  InvertedDates        = 12,
  NoFlightsFound       = 20,
  ProtocolViolation    = 30,
  ConnectionError      = 31,
  ResourceError        = 32,
  Timeout              = 33,
};

We have encoded the informal table into a piece of source code. We preserve the same numeric values, the description of what they encode, and we encode in the type that these numeric values describe error codes from sub-system Flights. This is important because sub-system Flights-2 uses same numeric values for representing different situations. If we also encode error codes from Flights-2:

enum class Flights2Errc
{
  ZeroFlightsFound     = 11,
  InvalidRequestFormat = 12,
  CouldNotConnect      = 13,
  DatabaseError        = 14,
  NotEnoughMemory      = 15,
  InternalError        = 16,
  NoSuchAirport        = 17,
  JourneyInThePast     = 18,
  DatesNotMonotonic    = 19,
};

While the error conditions are similar in both services they are assigned different numeric values. For instance number 12 indicates error in XML format in Fligts-2, but silly dates provided by the user in Flights. But we will never confuse them because they are encoded in different types.

We will need to do the same for service Seats. Next, we need similar enums for the error queries:

enum class FailureSource
{
  // no 0
  BadUserInput = 1,
  SystemError,
  NoSolution,
};

enum class Severity
{
  // no 0
  Bug = 1,
  Config,
  Resource,
  Normal,
};

Here the numeric values are not that important, so long as they are different within the enum, so we only make sure they are not 0. Finally, we provide an enumeration for our “normalized” list of error conditions:

enum class SubsystemError
{
  // no 0
  InputBadAirport = 1,
  InputPastDate,
  InputBadDateRange,
  InputBadClass,
  NoFlightFound,
  NoSeatFound,
  SubsysProtocolErr,
  SubsysInternal,
  SubsysResource,
  SubsysConfig,
  SubsysTimeout,
};

Now, we will plug SubsystemError into the error system as an error condition: it will not be used for storing numeric values of codes from sub-systems, but only for performing (fine-grained) queries. We already know how to do that: specialize trait std::is_error_condition_enum, provide factory function overload make_error_condition. I will not show it here. The interesting part is error category corresponding to SubsystemError. This is how we define it:

namespace /* anonymous */ {

class SubsystemErrorCategory : public std::error_category
{
public:
  const char* name() const noexcept override;
  std::string message(int ev) const override;
};

}

You already know this: name assigns short mnemonic name to our category, message assigns short text description to each enum value. What is interesting is that we override no other member function: we define no relation to any sub-system error or any high-level query. This category is not aware of either! It is the other categories — those for defining semantics of sub-system error codes and those for high-level query codes — that will be later defining the numeric value relations to this category.

So, let’s see how you tie an error code to this super-category. Let’s start with enum FlightsErrc. We will plug it as error code because its numeric values represent real error values from a sub-system. We want this numeric value to be transported, and preserved. Again, we already know how to do that from the previous posts: specialize trait std::is_error_code_enum, provide factory function overload make_error_code. But unlike in previous posts, we will define the corresponding error category somewhat different.

We have already indicated the difference between error codes and error conditions:

  • error_code is used for storing and transmitting error codes as they were produced by originating library, unchanged;
  • error_condition is used for performing queries on error_codes, for the purpose of grouping or classification or translation.

In this scheme, an error_category is used for defining the meaning of each value of an error code: how it is equivalent to different values of error categories. Let’s go:

namespace /* anonymous */ {
struct FlightsErrCategory : std::error_category
{
  const char* name() const noexcept override;
  std::string message(int ev) const override;
  std::error_condition default_error_condition(int ev)
    const noexcept override;
};

}

Observe: no function equivalent as in the previous posts. Instead we override a different member function: default_error_condition. You override it to indicate your preferred, default, error condition, that best describes semantics of the error code values from your enum in the context of your program/library. In our case, we already said we will be mapping error code values from any sub-system onto the normalized error condition, represented by enum SubsystemError and by category SubsystemErrorCategory.

Let’s define the function to better understand what it does:

std::error_condition 
FlightsErrCategory::default_error_condition(int ev)
  const noexcept
{
  switch (static_cast<FlightsErrc>(ev))
  {
  case FlightsErrc::NonexistentLocations:
    return SubsystemError::InputBadAirport;
  
  case FlightsErrc::DatesInThePast:
    return SubsystemError::InputPastDate;
      
  case FlightsErrc::InvertedDates:
    return SubsystemError::InputBadDateRange;
        
  case FlightsErrc::NoFlightsFound:
    return SubsystemError::NoFlightFound;
      
  case FlightsErrc::ProtocolViolation:
    return SubsystemError::SubsysProtocolErr;
      
  case FlightsErrc::ConnectionError:
    return SubsystemError::SubsysConfig;
       
  case FlightsErrc::ResourceError:
    return SubsystemError::SubsysResource;
      
  case FlightsErrc::Timeout:
    return SubsystemError::SubsysTimeout;
    
  default:
    assert (false);
    return {};
  }
}

The first case label should be read as follows, “if the error code currently inspected stores numeric value FlightsErrc::NonexistentLocations, consider it equivalent to error condition SubsystemError::InputBadAirport. At this point FlightsErrc::NonexistentLocations represents a numeric value, but SubsystemError::InputBadAirport is treated as (is converted to) an std::error_condition.

And we define a similar mapping for every numeric value from enum FlightsErrc. (If you are unhappy about the assert at the end, you can skip it, or throw an exception: I do not want to focus on such edge conditions in this post.)

With this in place, we can compare errors from Flights sub-system against our “default” error condition:

std::error_code ec = FlightsErrc::DatesInThePast;
assert (ec == SubsystemError::InputPastDate);
assert (ec != SubsystemError::InputBadDateRange);

In the above code snippet we are comparing an error code against an error condition. This error condition is so fine grained that it might appear as though we were comparing two error codes, but it is not the case. We are still preserving the distinction we presented earlier: error codes are for storing and transporting values; error conditions are for performing queries, including micro-queries. What we buy with this is separation. As the next step, we will need to define a similar mapping for enum Flights2Errc, and maybe in the future we will add a yet another set of codes Flights3Errc, but the other parts of the program, that only check for conditions using micro-queries like SubsystemError::InputPastDate will be unaffected.

Just to explain what happens in the above code. The comparison between an error code and an error condition is defined as:

bool operator==(const error_code&      lhs,
                const error_condition& rhs) noexcept
{
  return lhs.category().equivalent(lhs.value(), rhs)
      || rhs.category().equivalent(lhs, rhs.value());
}

So, this triggers a call to a virtual member function equivalent on the category in error code, which is of our type FlightsErrCategory. Because we did not override this function, the default implementation from the base class is used:

bool equivalent(int code,
                const error_condition& cond) const noexcept
{
  return default_error_condition(code) == cond;
}

So, down below we are comparing one condition against another. Comparing conditions works the same as comparing two codes: both int value and the category must match.

In a similar manner we have to define mapping for Flights2Errc and SeatsErrc, but we will not show it in this post. What we will show is how you map from a low-level query to a high-level query. We will see how to do it for query Severity. Again, we have to specialize the trait std::is_error_condition_enum and provide factory function make_error_condition this is uninteresting. What is interesting is our error category type:

namespace {
class SeverityCategory : public std::error_category
{
public:
  const char* name() const noexcept override;
  std::string message(int ev) const override;

  bool equivalent(
    const std::error_code& code,
    int condition) const noexcept override;
};
}

This time we are overriding function equivalent. The mapping in the function, from an error code to our category, is expressed by performing low-level queries. This looks like this:

bool SeverityCategory::equivalent(
      const std::error_code& ec,
      int cond) const noexcept
{     
  switch (static_cast<Severity>(cond))
  {
  case Severity::Bug:
    return ec == SubsystemError::SubsysProtocolErr
        || ec == SubsystemError::SubsysInternal;
        
  case Severity::Config:
    return ec == SubsystemError::SubsysConfig;

  case Severity::Resource:
    return ec == SubsystemError::SubsysResource        
        || ec == SubsystemError::SubsysTimeout;
        
  case Severity::Normal:
    return ec == SubsystemError::InputBadAirport
        || ec == SubsystemError::InputPastDate
        || ec == SubsystemError::InputBadDateRange
        || ec == SubsystemError::InputBadClass
        || ec == SubsystemError::NoFlightFound
        || ec == SubsystemError::NoSeatFound;
             
  default:
    return false;
  }
}

The first case label reads, “high-level condition Severity::Bug is a union of low-level conditions SubsystemError::SubsysProtocolErr and SubsystemError::SubsysInternal”. Thus, we are building high-level queries from low-level queries (error conditions) but not from error code values directly.

Similarly, we can define a mapping for FailureSource. Full compiling code illustrating all of the above can be found here.

One final thing. Sub-systems, like Flights, will be likely sending us error codes as raw ints. In order to turn a raw int into an error_code we will have to convert it to a value of a corresponding enum FlightsErrc. One way is to simply static_cast it:

int ret_value = 20; // it was returned from Flights
std::error_code ec2 = static_cast<FlightsErrc>(ret_value);

Another would be to provide a converting function (probably with a big switch-statement) that would handle the situation when the numeric value is for some reason different from those that sub-system Flights claims to return.

In closing

The above design is analogous to what has been done for the standard library. Enum std::errc corresponds to our enum SubsystemError: it is registered as error condition. It is a low-level query for determining failure condition in a platform-agnostic way. Its corresponding error category can be obtained by calling function std::generic_category(). Numeric values of error codes, on the other hand, depend on the platrorm, and may correspond to POSIX error codes on Unix systems and to HRESULT values on Windows. This is analogous to our enums FlightsErrc and Flights2Errc. Their corresponding to error category can be obtained by calling function std::system_category(). The platform-specific values are retained for debugging purposes, but in the program we are using the platform-agnostic low-lever queries from std::errc.

I am very grateful to Tomasz Kamiński for explaining to me the details of error code handling utility.

Advertisements
This entry was posted in programming and tagged , . Bookmark the permalink.

2 Responses to Using error codes effectively

  1. xABBAAA says:

    And we define a similar mapping for every numeric value from enum FlightsErrc. (If you are unhappy about the assert at the end, you can skip it, or throw an exception: I do not want to focus on such edge conditions in this post.)

    assert is usually good in the early phases of testing, ant it is more useful for C. However, that will be useful thing to know. For development phase will do it.
    It looks nice and very good article.
    I just don’t know, could you add some more stuff with specific errors that are standard and then mention the operating system specific.

    THX

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s