Programming

Parsing an exact decimal value using GMP

For Leaf, I need to parse decimal numbers to retain the exact value of the input without rounding. I assumed such a library would exist in C++, but nothing I found handled decimal values directly. After considering several libraries, I ended up using GMP and wrote a small parsing routine. Why I’m doing this is covered in my article 1/2 should not equal zero.

Decimal trouble

We already know that “0.1” can’t be represented in binary floating point. You can see this quickly using octave:

> output_precision(20)
> 0.1
ans =  1.0000000000000000555e-01

The same problem crops up for large integers: The number “1234567890123456789012” is too big for a 64-bit integer, and if you parse to a double you will lose the final digits. Most of the time you are either not dealing with such large numbers or the precision loss is acceptable. That’s not always the case though, and I have one such situation. I don’t want an approximate value: I want the exact number.

Rational values

GMP can represent any decimal number exactly by using the rational number type. We know this since any decimal can always be expressed as a rational without loss of precision.

1.5 => 15/10 => 3/2
0.97 => 97/100
3.89e2 => 389/1

Parsing decimals into rationals

There is unfortunately no standard method to parse a decimal value into the rational type. Guided by several test cases, I developed a small algorithm to convert a decimal value from a string into a rational number. Here is the outline of that algorithm (the actual C++ code used in Leaf is listed afterward).

- parse the string according to the pattern +/-X.Ye+/-Z, where X is the whole number part, Y is the fractional part, and Z is the base 10 exponent

//create whole
- assign Q = rational( parse_integer(X) / 1 )

//add fraction
- let P = parse_integer(Y)
- let D = 10 ^ (strlen(Y))
- let F = rational(P/D)
- assign Q = Q + F

//multiply exponent
- let E = parse_integer(Z)
- let D = 10 ^ E
- if exponent sign is negative
-- assign Q = Q / D
- else
-- assign Q = Q * D

//apply sign
- if sign is negative
-- assign Q = Q * -1

Code

I’ve listed the current version of the implementation for Leaf below. I use boost::regex to parse the bits and the C++ interface for GMP. ‘mpz_class’ is an arbitrary size integer and ‘mpq_class’ is the rational. I am very lenient about the input format as I assume we’ve already identified the string as a number.

Note the “TODO” about large exponents. If the user were to enter a very larger exponent, the resulting rational would become quite large. For my purposes I’ll use a high-precision floating point value if the exponent gets too large.

/*static*/ number number::parse_decimal( std::string const & str, unsigned & meta )
{
    meta = pdf_is_exact;

    //this is intentionally far more lenient than the one in node_parser to retain flexibility
    static boost::regex re_num( "([+-]?)([0-9]*)(\\.([0-9]*))?([eE]([+-]?)([0-9]+))?" );
    boost::match_results<std::string::const_iterator> what;
    if( !boost::regex_match( str, what, re_num ) )
        PRE_FAIL( std::string( "invalid number: " + str ) );

    mpz_class part;
    mpq_class combine;
    bool negative = false;

    if( what[1].matched && *what[1].first == '-' )
        negative = true;

    if( what[2].matched )
        combine = parse_bitz( what[2].first, what[2].second );

    if( what[3].matched ) {
        meta |= pdf_had_decimal;
        unsigned len = unsigned( what[4].second - what[4].first );
        if( len ) {
            part = parse_bitz( what[4].first, what[4].second );
            mpz_class div(10);
            mpz_pow_ui( div.get_mpz_t(), div.get_mpz_t(), len );
            combine += mpq_class( part, div );
        }
    }

    if( what[5].matched ) {
        meta |= pdf_had_exponent;
        mpz_class e = parse_bitz( what[7].first, what[7].second );
        mpz_class base(10), base_exp;

        //TODO: check that "e" is constrained to unsigned long
        mpz_pow_ui( base_exp.get_mpz_t(), base.get_mpz_t(), e.get_ui() );

        if( *what[6].first == '-' )
            combine /= base_exp;
        else
            combine *= base_exp;
    }

    if( negative )
        combine *= -1;
    combine.canonicalize();
    return number( combine );
}

Categories: Programming

Tagged as: , , ,

7 replies »

  1. What are you doing with the number? If you just need identity you could leave it as a string.

    • I need to do constant folding on operations involving that number as well as determine the best type needed to store the result (8-128bit integer/float). I’ll give more details in a followup about Leaf number conversion.

    • I did look at that. It appears to use GMP as a backend, though it also adds a base ten float (of limited precision however). It seems if one were to do a lot of math work in C++ it would be a good way to go, but my uses are quite limited.

    • No, I was really hoping to avoid getting too deep into the topic. My needs for this are quite limited in the Leaf project.

    • It wont help for the big numbers part, but for rationals i think it’s the way to go, not difficult (it’s done on less than 200 C lines) and converge quickly

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s