Programming

What a compiler does: type conversion basics

Determining how to convert object types and ensuring they are correct is a significant component of what a compiler does. For every operation the compiler must determine exactly what type is required and do the appropriate conversion. Or, if no conversion is possible emit an error. I’ll introduce the basics of type conversion in this article.

Basic type assignment

Each expression in a system, this includes all sub-expressions and individual variables, must have their type identified. This is usually a multiple step process, but it depends heavily on the language. For example:

1
b + c

The compiler may do a first pass and determine this expression is:

1
(b : integer) + (c : float)

It does this by first resolving the symbols. It’s internal data structure for each symbol will include the type of that symbol. In the above the symbol table entry for b indicates it is an integer, and c is a float.

The resulting type of this operation must also be known. Even in this simple case of adding two values, languages can differ significantly in what happens. Is the result an integer, a single precision float, a double precision float, or perhaps it’s just an error?

Let’s assume our language says it’s a float. Chances are the system doesn’t have a function that adds an integer and a float directly. The compiler must convert them both to the same type first. At a high-level the expression may become:

1
add( integer_to_float( b ), c )

Parameter type conversion

Addition is an internal operation and perhaps a special case. How does the compiler know the target type in the general case?

Functions, like other symbols, also have a type. The type is a function signature, providing both input and output types. For example, the C function sqrt takes a double as input and returns a double. Unfortunately C doesn’t have a way to talk about this signature without having a type name as well. I’ll adopt Leaf’s syntax instead here, where we can say the type of sqrt is ( : double ) -> ( : double ).

When the compiler types code it also assigns types to the functions:

1
2
3
sqrt( a )
    sqrt : ( : double ) -> ( : double )
    a : integer

The compiler now knows that a must be converted to a double. In this case it will use an integer_to_double conversion. It also knows the overall type of this expression is double: it’s what sqrt returns.

What if there is no conversion?

1
2
3
sqrt( "Hello" )
    sqrt ( : double ) -> ( : double )
    "Hello" : char const *

In this case the compiler will attempt to convert a string to a floating point type. It doesn’t know how to do this. An error will be reported and compilation fails.

Overload resolution and ranking

In C things are simple as there are no function overloads. But in many other type languages you can overload a function, each version accepting different parameters and providing different results. In C++ the sqrt function has many different forms. One for double, float and long double. C++ 11 adds a formal integer template to the list, and we also have one for the complex template. The user may also further overload it. Let’s keep the list small for our example however:

1
2
3
4
5
6
7
sqrt( b )
    b : float
    sqrt : [
        ( : double ) -> ( : double ),
        ( : float ) -> ( : float ),
        ( : long double ) -> ( : long double ),
    ]

Now the compiler has a list of functions that are named sqrt. It must pick one of these functions. The choice of the ( : float ) -> ( : float ) seems obvious, yet the other ones would actually work. That is, the compiler knows how to convert a float to both a double and a long double, just as it can convert an integer to any of these values.

A choice must be made as to what is the best function to use. The list of functions is sorted based on the relative cost of the conversions. float to double costs a bit less than float to long double. float to float is the cheapest, since it isn’t a conversion at all. Thus the sqrt with signature ( : float ) -> ( : float) is chosen.

This ranking must deal with a myriad of types and user conversion functions. To make it really interesting we can add parametric type functions (templates/generics). A language must specify all the relative costs of type conversion to ensure the compiler either picks the right function, or reports an ambiguity error.

Internal types

In this article I’ve covered just the basics of type conversion. Noticeably lacking from my examples is an assignment expression, like:

1
a = b

To properly type that statement we need to consider internal compiler types, such as lvalue. A normal integer can’t be assigned to, but an integer lvalue can. The above may be fully typed like this:

1
assign( ( a : integer lvalue ), drop_lvalue( b : integer lvalue ) )

This is going a bit beyond the basics, so I’ll go into more detail in a future article. (Follow me on Twitter to get informed of that.). You can also read more of my current articles about compilers.

I’m the creator of the Leaf programming language: a beautiful culmination of my wide range of development experience. If you would like a presentation about Leaf, or any of my articles please contact me. I’m also available as a consultant and would be happy to discuss your project.

Categories: Programming

Tagged as: ,

3 replies »

  1. With overloaded functions, surely the compiler doesn’t consider conversion-requiring signatures when a direct match is available? Given a float parameter, doesn’t it have to pick the one that takes a float — the direct match?

    It’s only after the compiler fails to find a direct match that it comes back and says, “Okay. I have an integer… that’s a kind of long integer… do I have any matches on those? Nope. Okay, well, an integer is also a kind of float. How about a float match? Ah-ha! I do have one of those!”

    Similar logic with the add expression: “Do I have an (:integer,:integer):integer match? Nope… That integer is a kind of float. Do I have a (:float,:float):float match? Ah-ha!”

    Where it gets interesting is whether — having exhausted lossless upwards promotions — the language allows demoting the float down to an integer. For example: abs(-3.1415), assuming a single abs() function that takes only integers?

    • I will certainly allow the possibility for optimizing cases where the matching might be simpler. However, even with basic typing the obvious matches aren’t always so obvious. Consider that in C++ you can have functions with these signatures:

      void foo( my_object a );
      void foo( my_object & a );
      void foo( my_object const a );
      void foo( my_object const & a );
      

      And you can call the `foo` function with a variety of `my_object` type variants: a constant reference, a value type, a constant expression, a non-constant expression. It, at least logically, must consider all overloads and check the rank of them against the parameters.

      It actually gets more interesting once we add internal type variations (which I’ll get into later). This simply magnifies the number of possibly ways a parameter is passed to a function.

      Also keep in mind the single parameter case may be special, often we’re dealing with multiple parameters where each would undergo distinct conversion. So I’m not going to say that no short-cuts are taken by some compilers, I do believe they are just optimizations for special cases.

      As for implicit conversion with loss of information (the demoting of a float to integer for example), I abhor this in languages. It think it’s a significant cause of defects. Loss of information should never be done implicitly. In Leaf only conversions which can be 100% reversed are allowed.

    • I think we’re talking about the same thing — I’m not referring to any special optimization. The four overloaded examples you just gave all have very distinct type signatures. C++ is static and strongly typed, so the compiler knows the type of any expression for foo(). If that expression isn’t const, it can’t use the last two. If the expression isn’t a reference, it can’t use the 2nd or 4th.

      I was just getting at that it’s all very deterministic and proceeds in a predictable way. The compiler follows a well-known set of rules.

      There are a few situations where I’m not entirely opposed to silently treating floating point values as integers, but generally I’m with you. I don’t like too much “behind my back” stuff from languages (or apps).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s