There’s something wrong when a language allows 1/2 to equal 0. It’s easy to understand why this happens, and perhaps it’s even easy to forgive the limitations of C. Nonetheless it’s irksome and should be addressed by new languages. The following C++ code demonstrates the problem: both ‘a’ and ‘b’ are assigned the value 0.
int a = 1/2; float b = 1/2;
Note: This is a followup to my article “Parsing an exact decimal value using GMP“. This article explains why the Leaf compiler needs this functionality.
The Type Problem
The problem originates in the way types are handled. The compiler sees the ’1′ and the ’2′ individually and decides they are both integers. The division is then handled as integer division, which results in 0. When assigned to a floating point value, the integer 0 is happily promoted to a floating point 0. The part which produces the 0 is disconnected from the assignment, so the compiler doesn’t see anything wrong.
Large integer constants also suffer from a similar problem. It used to be that in C++ an integer constant was always just an “int” type. The standard now mandates that “long” or “long long” should be used, though even compilers that do this correctly still leave open really annoying problems.
int a = 2147483648; // convert const 'long' to an 'int', overflows std::cout << a << std::endl; std::cout << 2147483648 << std::endl; // directly outputting 'long' will work fine //output -2147483648 2147483648 int64_t a = 2147483647+1; // 'int' + 'int' overflows 'int' int64_t b = 2147483648; // 'long' holds value just fine std::cout << a << std::endl; std::cout << b << std::endl; //output -2147483648 2147483648
For 2147483647 the compiler choose an ‘int’ since it still fits within that type (it is the largest 32-bit signed integer). Adding 1 to that value overflows the “int” which results in the negative value due to 2′s complement representation (I believe the result is actually undefined according to the standard). For 2147483648 the compiler chooses a ‘long’ since it doesn’t fit in ‘int’ anymore. It converts to the ‘int’ according to a 2′s complement modulo rule, and while this is guaranteed by the standard, I find code relying on this to be somewhat suspect. At least GCC does produce a warning for the ’2147483647+1′ expression.
Not only integers are corrupted by this numeric typing problem. It applies to floating point numbers too — my initial example is one case in point. Below is another example with the Java BigDecimal class. The following two constructions of a BigDecimal don’t actually produce the same value:
System.out.println( "0.1 == " + new BigDecimal(0.1) ); System.out.println( "\"0.1\" == " + new BigDecimal("0.1") ); //prints 0.1 == 0.1000000000000000055511151231257827021181583404541015625 "0.1" == 0.1
The first line will treat the ’0.1′ as a double value before it gets passed to BigDecimal. The second line will produce a proper 0.1 value as BigDecimal does its own parsing. It seems kind of silly that a constant value, which is an exact decimal value, cannot be properly converted to an exact decimal type!
We fall victim to the same problem when we try to assign large values to integers. For example, say you wish to produce constants for various divisions of a second. The logical thing here would be to use scientific notation since it’s easy to read. However the language forces all such numbers to be floating points instead of integers.
//would be nice const long nano = 1e9; const long micro = 1e6; const long milli = 1e3; //but alas const long nano = 1000000000; const long micro = 1000000; const long milli = 1000;
A Rational Solution
When we type constant values into our code, we expect to get the exact value we actually typed in. Of course we assume that the value will be converted to the target type — but without any shenanigans prior to that point. If the target type cannot hold the value, an error should be generated.
Beyond the fundamental types, I would also expect custom types to be able to use constants. If I have a BigDecimal class, I should be able to use a normal constant number and have that class accept the full value. (In C++11 this is actually possible via constexpr and user suffixes but with somewhat unclear syntax.)
For Leaf I’m taking the approach that all constants are rational numbers. During compilation they retain their exact value until assignment. These rationals are also retained during constant folding, so our initial ’1/2′ expression is handled correctly.
int a = 1/2; //error: truncation float b = 1/2; //okay: b equals 0.5
Working with large integral constants should also be painless. Thus scientific notation should not imply a floating point value: whether it is integral or fractional depends on the value itself.
int a = 1e9; //okay int b = 1.23e1; //error: truncation float c = 1.23e1; //okay
High Precision Floating Point
Rationals don’t cover all the constants you might want to use. Many calculations, such as physics computations, will simply be better expressed with floating points. Though several natural constants are exact, even the simplest of calculations would require too many digits of precision to maintain that exactness. The intent of the solution should not be lost however, and for floating point values, using a higher precision is beneficial. Instead of processing constants in a native floating point size, a very high precision floating point will be used.
This approach alleviates nuances when dealing with common constants, such as π or e. The compiler offers expansions of these values well beyond the limits of the the target values. You can then write expressions like ‘π/2′ which will still produce a full precision result. No need for extra constants like ‘M_PI_2′, or worrying about whether you have the ‘double’ or ‘long double’ version (ex. ‘M_PI_2l’), or forgetting constant suffixes (like ‘l’). Applications where high-precision floating point truly helps will of course be a lot less common than constant rationals.
This approach also helps identify out-of-range floating point values. If the resulting exponent is too high or too low, an error can be reported. This is particularly useful if small floating point types are being used (ex. ’1e50′ doesn’t fit in a 32-bit float). Of course at runtime range and precision problems will still occur, but at least the compile-time constants won’t break.