What a compiler does: type conversion basics

Tags

,

Determining how to convert object types and ensuring they are correct is a significant component of what a compiler does. For every operation the compiler must determine exactly what type is required and do the appropriate conversion. Or, if no conversion is possible emit an error. I’ll introduce the basics of type conversion in this article.

Basic type assignment

Each expression in a system, this includes all sub-expressions and individual variables, must have their type identified. This is usually a multiple step process, but it depends heavily on the language. For example:

1
b + c

The compiler may do a first pass and determine this expression is:

1
(b : integer) + (c : float)

It does this by first resolving the symbols. It’s internal data structure for each symbol will include the type of that symbol. In the above the symbol table entry for b indicates it is an integer, and c is a float.

The resulting type of this operation must also be known. Even in this simple case of adding two values, languages can differ significantly in what happens. Is the result an integer, a single precision float, a double precision float, or perhaps it’s just an error?

Let’s assume our language says it’s a float. Chances are the system doesn’t have a function that adds an integer and a float directly. The compiler must convert them both to the same type first. At a high-level the expression may become:

1
add( integer_to_float( b ), c )

Parameter type conversion

Addition is an internal operation and perhaps a special case. How does the compiler know the target type in the general case?

Functions, like other symbols, also have a type. The type is a function signature, providing both input and output types. For example, the C function sqrt takes a double as input and returns a double. Unfortunately C doesn’t have a way to talk about this signature without having a type name as well. I’ll adopt Leaf’s syntax instead here, where we can say the type of sqrt is ( : double ) -> ( : double ).

When the compiler types code it also assigns types to the functions:

1
2
3
sqrt( a )
    sqrt : ( : double ) -> ( : double )
    a : integer

The compiler now knows that a must be converted to a double. In this case it will use an integer_to_double conversion. It also knows the overall type of this expression is double: it’s what sqrt returns.

What if there is no conversion?

1
2
3
sqrt( "Hello" )
    sqrt ( : double ) -> ( : double )
    "Hello" : char const *

In this case the compiler will attempt to convert a string to a floating point type. It doesn’t know how to do this. An error will be reported and compilation fails.

Overload resolution and ranking

In C things are simple as there are no function overloads. But in many other type languages you can overload a function, each version accepting different parameters and providing different results. In C++ the sqrt function has many different forms. One for double, float and long double. C++ 11 adds a formal integer template to the list, and we also have one for the complex template. The user may also further overload it. Let’s keep the list small for our example however:

1
2
3
4
5
6
7
sqrt( b )
    b : float
    sqrt : [
        ( : double ) -> ( : double ),
        ( : float ) -> ( : float ),
        ( : long double ) -> ( : long double ),
    ]

Now the compiler has a list of functions that are named sqrt. It must pick one of these functions. The choice of the ( : float ) -> ( : float ) seems obvious, yet the other ones would actually work. That is, the compiler knows how to convert a float to both a double and a long double, just as it can convert an integer to any of these values.

A choice must be made as to what is the best function to use. The list of functions is sorted based on the relative cost of the conversions. float to double costs a bit less than float to long double. float to float is the cheapest, since it isn’t a conversion at all. Thus the sqrt with signature ( : float ) -> ( : float) is chosen.

This ranking must deal with a myriad of types and user conversion functions. To make it really interesting we can add parametric type functions (templates/generics). A language must specify all the relative costs of type conversion to ensure the compiler either picks the right function, or reports an ambiguity error.

Internal types

In this article I’ve covered just the basics of type conversion. Noticeably lacking from my examples is an assignment expression, like:

1
a = b

To properly type that statement we need to consider internal compiler types, such as lvalue. A normal integer can’t be assigned to, but an integer lvalue can. The above may be fully typed like this:

1
assign( ( a : integer lvalue ), drop_lvalue( b : integer lvalue ) )

This is going a bit beyond the basics, so I’ll go into more detail in a future article. (Follow me on Twitter to get informed of that.)

I’m the creator of the Leaf programming language: a beautiful culmination of my wide range of development experience. If you would like a presentation about Leaf, or any of my articles please contact me. I’m also available as a consultant and would be happy to discuss your project.

About these ads
Follow

Get every new post delivered to your Inbox.

Join 303 other followers