Defective Language

1/2 Should Not Equal Zero

There’s something wrong when a language allows 1/2 to equal 0. It’s easy to understand why this happens, and perhaps it’s even easy to forgive the limitations of C. Nonetheless it’s irksome and should be addressed by new languages. The following C++ code demonstrates the problem: both ‘a’ and ‘b’ are assigned the value 0.

int a = 1/2;
float b = 1/2;

I expect the first line to produce a compilation error and the second line to assign 0.5 to ‘b’. Even with all warnings enabled, GCC does not report any problems with this code. The result is the same in C and Java. In Python, where you can’t specify the type, the result is also 0. However, these results are not universal; in JavaScript and MatLab the result is 0.5.

Note: This is a followup to my article “Parsing an exact decimal value using GMP“. This article explains why the Leaf compiler needs this functionality.

The Type Problem

The problem originates in the way types are handled. The compiler sees the ‘1’ and the ‘2’ individually and decides they are both integers. The division is then handled as integer division, which results in 0. When assigned to a floating point value, the integer 0 is happily promoted to a floating point 0. The part which produces the 0 is disconnected from the assignment, so the compiler doesn’t see anything wrong.

Large integer constants also suffer from a similar problem. It used to be that in C++ an integer constant was always just an “int” type. The standard now mandates that “long” or “long long” should be used, though even compilers that do this correctly still leave open really annoying problems.

int a = 2147483648; // convert const 'long' to an 'int', overflows
std::cout << a << std::endl;
std::cout << 2147483648 << std::endl; // directly outputting 'long' will work fine
//output
-2147483648
2147483648

int64_t a =  2147483647+1; // 'int' + 'int' overflows 'int'
int64_t b = 2147483648; // 'long' holds value just fine
std::cout << a << std::endl;
std::cout << b << std::endl;
//output
-2147483648
2147483648

For 2147483647 the compiler choose an ‘int’ since it still fits within that type (it is the largest 32-bit signed integer). Adding 1 to that value overflows the “int” which results in the negative value due to 2’s complement representation (I believe the result is actually undefined according to the standard). For 2147483648 the compiler chooses a ‘long’ since it doesn’t fit in ‘int’ anymore. It converts to the ‘int’ according to a 2’s complement modulo rule, and while this is guaranteed by the standard, I find code relying on this to be somewhat suspect. At least GCC does produce a warning for the ‘2147483647+1’ expression.

Not only integers are corrupted by this numeric typing problem. It applies to floating point numbers too — my initial example is one case in point. Below is another example with the Java BigDecimal class. The following two constructions of a BigDecimal don’t actually produce the same value:

System.out.println( "0.1 == " + new BigDecimal(0.1) );
System.out.println( "\"0.1\" == " + new BigDecimal("0.1") );
//prints
0.1 == 0.1000000000000000055511151231257827021181583404541015625
"0.1" == 0.1

The first line will treat the ‘0.1’ as a double value before it gets passed to BigDecimal. The second line will produce a proper 0.1 value as BigDecimal does its own parsing. It seems kind of silly that a constant value, which is an exact decimal value, cannot be properly converted to an exact decimal type!

We fall victim to the same problem when we try to assign large values to integers. For example, say you wish to produce constants for various divisions of a second. The logical thing here would be to use scientific notation since it’s easy to read. However the language forces all such numbers to be floating points instead of integers.

//would be nice
const long nano = 1e9;
const long micro = 1e6;
const long milli = 1e3;

//but alas
const long nano = 1000000000;
const long micro = 1000000;
const long milli = 1000;

A Rational Solution

When we type constant values into our code, we expect to get the exact value we actually typed in. Of course we assume that the value will be converted to the target type — but without any shenanigans prior to that point. If the target type cannot hold the value, an error should be generated.

Beyond the fundamental types, I would also expect custom types to be able to use constants. If I have a BigDecimal class, I should be able to use a normal constant number and have that class accept the full value. (In C++11 this is actually possible via constexpr and user suffixes but with somewhat unclear syntax.)

For Leaf I’m taking the approach that all constants are rational numbers. During compilation they retain their exact value until assignment. These rationals are also retained during constant folding, so our initial ‘1/2’ expression is handled correctly.

int a = 1/2; //error: truncation
float b = 1/2; //okay: b equals 0.5

Working with large integral constants should also be painless. Thus scientific notation should not imply a floating point value: whether it is integral or fractional depends on the value itself.

int a = 1e9; //okay
int b = 1.23e1; //error: truncation
float c = 1.23e1; //okay

High Precision Floating Point

Rationals don’t cover all the constants you might want to use. Many calculations, such as physics computations, will simply be better expressed with floating points. Though several natural constants are exact, even the simplest of calculations would require too many digits of precision to maintain that exactness. The intent of the solution should not be lost however, and for floating point values, using a higher precision is beneficial. Instead of processing constants in a native floating point size, a very high precision floating point will be used.

This approach alleviates nuances when dealing with common constants, such as π or e. The compiler offers expansions of these values well beyond the limits of the the target values. You can then write expressions like ‘π/2’ which will still produce a full precision result. No need for extra constants like ‘M_PI_2’, or worrying about whether you have the ‘double’ or ‘long double’ version (ex. ‘M_PI_2l’), or forgetting constant suffixes (like ‘l’). Applications where high-precision floating point truly helps will of course be a lot less common than constant rationals.

This approach also helps identify out-of-range floating point values. If the resulting exponent is too high or too low, an error can be reported. This is particularly useful if small floating point types are being used (ex. ‘1e50’ doesn’t fit in a 32-bit float). Of course at runtime range and precision problems will still occur, but at least the compile-time constants won’t break.

23 replies »

    • Yes, this is the big question. What does one do with operations which are not closed on the data types they involve. It’s easy for me to say the ‘const’ case should be fixed, but I am also contemplating the variable case.

    • In a (good) statically typed language it’s fairly straight forward, I think. In Ocaml, (/) has type int -> int -> int. The types give you a pretty clear indication of what 1/2 could possibly be. If you want something other than integer math you have to use (/.), which has type float -> float -> float. There is also no automatic conversion from int to float. It is consistent and clear.

    • I don’t think distinct operators solve the problem. Consider that substraction has a similar issue. Near the minimum range of integer subtraction is also no longer well defined. Nor is add near the maximum. Nor is division or multiplication of too widely separated floats.

      So I’ve never considered the approach of distinct operators a good one since it doesn’t really solve the problem and certainly makes the language harder to use (especially if you allow custom numeric types).

  1. (I can’t reply to your reply)

    I think you’re conflating issues.

    1 – What does division mean when the divisor doesn’t go into the numerator evently? This is a definition problem. The beginning of your post argues the definition is wrong. I don’t agree with this, but types let you at least see what the definition is. If (/) is int -> int -> int then obviously I cannot get a floating value.

    2 – Operations become undefined (in some languages) once they hit outside their range. This is a different issue altogether. The only language I can think of that really makes an attempt on this is Ada, otherwise most languages just do what their underlying C implementation does.

    • I’m saying the distinct operators solution isn’t a good one since it doesn’t actually address #2. That is, I don’t like the idea of having a different operator for each different type of variable. I think it makes it even worse for the programmer. Consider if ‘/’ is (int,int)->(int) then ‘1/2=0’ but even ‘1.0/2.0=0’ (assuming you don’t get a compile error of type conversion).

      Separate operators also becomes extremely burdensom with custom types and modifiers. Do unsigned and signed values get different operators? What about complex values? Doubles vs. float? BigDecimal? The list goes on. Having a single ‘/’ operator meaning division is the best approach.

    • “assuming you don’t get a compile error of type conversion”

      Then don’t have your language do automatic conversion (Ocaml does not, 1.0/2.0 is a type error).

      “isn’t a good one since it doesn’t actually address #2”

      And I am saying you are conflating two problems, which don’t necessarily need the same solution. One is the definition of an operation. The other is what to do on the edges of that operation.

      “Separate operators also becomes extremely burdensom with custom types and modifiers. Do unsigned and signed values get different operators? ”

      It depends on the goal of your language. Higher level languages tend to only have a few numeric types. Or some languages have typeclasses to deal with this, which gives you one operator but you can only use it in particular contexts. In Ocaml you can also locally import a module, so Float.(1.0/2.0) is the same as 1.0 /. 2.0.

    • I find this kind of attitude to be a bit strange. Sure, if no better solution is available, then perhaps having multiple operators for numerics is a possible way to go. But it seems parochial to say “don’t even bother looking for a better way.”

    • Where does it say not to look? If anything I argue that there are two problems, not one, and one of them can be addressed with separate operators and I don’t provide a solution to the second problem but I suggest looking at Ada.

  2. Hi orbitz, what I am trying to say is just that having multiple operators is a bit of a kludge – it’s not a very aesthetically pleasing solution, and it certainly complicates both the syntax and conceptual framework of the language. If an alternative that works well can be found, then I tend to think it is worth pursuing to see how that might work. Especially since the intent appears to be for Leaf to be used as a systems language similar to C and C++ – thus with the option to use a variety of integer and floating point types.

    • “and it certainly complicates both the syntax and conceptual framework of the language”

      It does not complicate the syntax, it’s just another function, not new syntax. And I believe you can argue that multiple operators simplifies the conceptual framework. In C, for example, what does “a/b” do? Well…it depends. What types do a and b have? Do any of the types need promotion? Can they be promoted safely? What about their rank? In Ocaml, a and b are ints (unless you redefined (/) somewhere). That’s all there is to it. Now, you can argue this doesn’t scale when you have as many numerical types as C does, which is probably true. But it’s not because it’s conceptually complicated, but because it’s just annoying. Something like typeclasses solves this to a degree (as suggested above).

    • I have a feeling that’s one of the reasons the author is trying to stick with the same symbols for multiply and divide for Leaf. For my part I did a search for Ocaml and it’s quite an interesting idea.

  3. int a = 1/2;
    float b = 1/2;
    

    I think in the first case the result should be zero, that is the non-fractional part only instead of a compliation error, and in the second case the result should be 0.5, that is because there is a floating point variable in which the expression is being assigned.

    +1 for mentioning undefined behaviour when going beyond the max int val.

    • This would be a fairly minor adjustment once the rationals are in place. Essentially what your’re saying is that the expression “1/2” is the value 0.5, but when converting to an integer it simply truncates instead of giving an error. This is a possibility, but it would violate my rule on no data loss in implicit conversion. It’s not an absolute rule however, since obviously converting 0.1 to a float also involes precision loss, albeit not nearly as much as the conversion to integer.

      A question about the first one. If I do ‘a = 1/2*4’ would you expect a to equal 2 or 0?

      (Also, there will be the option to simply do ‘ trunc(1/2)’ to get a zero value.)

    • What is (1/2)? As in, does the value of this expression depend on what it’s assigned to? If so, that is very wrong for a programming language. If it’s always 0.5 and assigning to an int makes it 0, that is better, but still not ideal since integer math is precise and float math is not.

  4. Two of the main reasons why integer math behaves as it does in C++ is that

    1. A computer may support integer math, but that doesn’t mean it supports floating point. You can’t have operations to one producing results of the other (at least in the integer -> fp direction).

    2. You really do not want to have to truncate every time you do a division and use the result as an index into an array.

    • My concern primarily revolves around constant values. Should you have two integer types I am not suggesting their division result in a float. I only wish to indicate a constant/literal number should not automatically be treated as an integer.

      In Leaf I think I’ve solved the problem by having value literls function as rationals, having an exact value until converted to the target type.

    • It looks more like // is a floored division rather than a actual integer division. This seems like a really weird feature, I’d prefer to explicitly call a floor function.

      >>> 4.5//0.5
      9.0
      >>> 4.5//3
      1.0
      >>> type(2.0//2)
      float
      >>> -4.5//2
      -3.0

    • Your are entirely correct, the python docs call // floor division.
      Note that

      type(2/2) is float
      type (2//2) is int
      type(2.0/2) is float
      type(2/2.0) is float

      so // will return an int if given two ints and return a float otherwise.

  5. Try this in Haskell:
    > let a = 1/2
    a :: Fractional a => a
    > a
    0.5

    Now try to force integer:
    > let a = 1/2 :: Integer
    Error: No instance for (Fractional Integer) arising from a use of ‘/’

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s