Tags

, , ,

For Leaf, I need to parse decimal numbers to retain the exact value of the input without rounding. I assumed such a library would exist in C++, but nothing I found handled decimal values directly. After considering several libraries, I ended up using GMP and wrote a small parsing routine. Why I’m doing this is covered in my article 1/2 should not equal zero.

Decimal trouble

We already know that “0.1” can’t be represented in binary floating point. You can see this quickly using octave:

> output_precision(20)
> 0.1
ans =  1.0000000000000000555e-01

The same problem crops up for large integers: The number “1234567890123456789012” is too big for a 64-bit integer, and if you parse to a double you will lose the final digits. Most of the time you are either not dealing with such large numbers or the precision loss is acceptable. That’s not always the case though, and I have one such situation. I don’t want an approximate value: I want the exact number.

Rational values

GMP can represent any decimal number exactly by using the rational number type. We know this since any decimal can always be expressed as a rational without loss of precision.

1.5 => 15/10 => 3/2
0.97 => 97/100
3.89e2 => 389/1

Parsing decimals into rationals

There is unfortunately no standard method to parse a decimal value into the rational type. Guided by several test cases, I developed a small algorithm to convert a decimal value from a string into a rational number. Here is the outline of that algorithm (the actual C++ code used in Leaf is listed afterward).

- parse the string according to the pattern +/-X.Ye+/-Z, where X is the whole number part, Y is the fractional part, and Z is the base 10 exponent

//create whole
- assign Q = rational( parse_integer(X) / 1 )

//add fraction
- let P = parse_integer(Y)
- let D = 10 ^ (strlen(Y))
- let F = rational(P/D)
- assign Q = Q + F

//multiply exponent
- let E = parse_integer(Z)
- let D = 10 ^ E
- if exponent sign is negative
-- assign Q = Q / D
- else
-- assign Q = Q * D

//apply sign
- if sign is negative
-- assign Q = Q * -1

Code

I’ve listed the current version of the implementation for Leaf below. I use boost::regex to parse the bits and the C++ interface for GMP. ‘mpz_class’ is an arbitrary size integer and ‘mpq_class’ is the rational. I am very lenient about the input format as I assume we’ve already identified the string as a number.

Note the “TODO” about large exponents. If the user were to enter a very larger exponent, the resulting rational would become quite large. For my purposes I’ll use a high-precision floating point value if the exponent gets too large.

/*static*/ number number::parse_decimal( std::string const & str, unsigned & meta )
{
    meta = pdf_is_exact;

    //this is intentionally far more lenient than the one in node_parser to retain flexibility
    static boost::regex re_num( "([+-]?)([0-9]*)(\\.([0-9]*))?([eE]([+-]?)([0-9]+))?" );
    boost::match_results<std::string::const_iterator> what;
    if( !boost::regex_match( str, what, re_num ) )
        PRE_FAIL( std::string( "invalid number: " + str ) );

    mpz_class part;
    mpq_class combine;
    bool negative = false;

    if( what[1].matched && *what[1].first == '-' )
        negative = true;

    if( what[2].matched )
        combine = parse_bitz( what[2].first, what[2].second );

    if( what[3].matched ) {
        meta |= pdf_had_decimal;
        unsigned len = unsigned( what[4].second - what[4].first );
        if( len ) {
            part = parse_bitz( what[4].first, what[4].second );
            mpz_class div(10);
            mpz_pow_ui( div.get_mpz_t(), div.get_mpz_t(), len );
            combine += mpq_class( part, div );
        }
    }

    if( what[5].matched ) {
        meta |= pdf_had_exponent;
        mpz_class e = parse_bitz( what[7].first, what[7].second );
        mpz_class base(10), base_exp;

        //TODO: check that "e" is constrained to unsigned long
        mpz_pow_ui( base_exp.get_mpz_t(), base.get_mpz_t(), e.get_ui() );

        if( *what[6].first == '-' )
            combine /= base_exp;
        else
            combine *= base_exp;
    }

    if( negative )
        combine *= -1;
    combine.canonicalize();
    return number( combine );
}
About these ads