Parsing floats

I was recently forced to revise the way I parse floating-point numbers from GUI forms. I made two observations on the subject that I would like to share with you. For instance, do you know why the conversion from string "1.2" does not render a valid floating-point number?

String representation of floats

My natural choice for the string-to-float conversion tool was boost::lexical_cast. However, it soon turned out that my program does not recognize seemingly valid numbers as doubles in one of the test cases. The innocent expression boost::lexical_cast<double>("1.2") throws an exception each time I run my program. This might look bizarre at first. It looked bizarre to me until I realized one thing. The rules for converting a text representation of a number from string into a double are very different then the rules for initializing a double from a literal like 1.2, which in C++ program is just a piece of text. We are used to the way literals work: first a sign, then integral part, then dot, then fractional part, then e then the 10-base exponent; some parts can be skipped:

double x = 1.2;

So, what is the rule for converting a string into double? The rule depends on the locale. In my case, boost::lexical_cast relied on the global std::locale object, and this was set to Polish rules for representing decimal numbers, where a comma is used to separate fractional part, so it expects strings like "1,2". More, some locales allow additional thousands separators: "1.000.000,99".

If I knew my users would be always using one locale, I could set up a global locale appropriately, but using globals is dangerous: we do not know what other parts of the program we might be affecting. There is even a bigger issue. My users could use different formats at different times. They may be entering columns of numbers into a grid. If they do it manually they are used to using a dot for decimal separator, but at other times they may be pasting the columns copied from documents that were generated in different parts of the word, where a comma is used to separate decimals. My program should be able to tell that both "1,2" and "1.2" are valid decimal (and for me — floating-point) numbers. This was the first reason I decided to write my own conversion function.

Conversion failures

One other thing worth noting is that there is a big difference between converting your type to string and converting a string to your type. Conversion to string cannot fail due to invalid value of the object you are converting. On the other hand, just any string is not necessarily convertible to your type. For instance "@!!^K" is not a valid floating-point number regardless of locale. It is very likely that the conversion from string to your type will fail, and in case of input validation, we in fact expect a failure. Therefore the semantics of throwing an exception when conversion from string is not possible is not appropriate. I can still write my validating function as:

void validete( std::string input ) {
        value_ = boost::lexical_cast<double>( input );
        valid_ = true; 
    catch( boost::bad_lexical_cast const& ) {
        valid_ = false;

But this solution has a couple of problems. The function is not easy to read. The try-catch block indicates that we are handling the process of stack unwinding caused by the error in the program. Stack unwinding is used to break a number of levels of function calls: in our case we are throwing an exception only to catch it immediately one level up — a simple if-statement would be more appropriate. Next, our development environment or program itself may recognize any thrown exception as an error and trigger additional behavior that we do not want here: e.g., in debugging mode it may trigger a debug break. Therefore the following signatures of the conversion functions appear more appropriate in general:

template< typename T >
boost::optional<T> from_string( std::string const& str );

template< typename T >
std::string to_string( T const& obj );

Both function may still throw exceptions — but in exceptional cases. A string that does not represent any T is not rare; at least when it comes to validating user input.

This entry was posted in programming and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s