All expressions have types. We often think only of the types of variables, as we explicitly declare those (at least, sometimes). But even though we can't see them, all expressions have types as well. Virtually any snippet of code has type information.
Determining types, converting between types, and ensuring their correctness are significant components of what a compiler does. For every expression, the compiler must determine exactly what type is required and convert as required. Or, if no conversion is possible, emit an error. I’ll introduce the basics of type conversion in this article.
Basic type assignment
Each expression in a system, this includes all sub-expressions and individual variables, must have their type identified. This is a multi-step process that depends heavily on the language. For example:
b + c
The compiler may do a first pass and determine this expression is:
(b : integer) + (c : float)
It does this by first resolving the symbols. It's internal data structure for each symbol will include the type of that symbol. In the above, the symbol table says
b says is an
c is a
The resulting type of this operation must also be known. Even in this simple case of adding two values, languages can differ significantly in what happens. Is the result an integer, a single precision float, a double precision float, or perhaps it's just an error?
Assume our language says it's a
float. Chances are, the system doesn't have a function that adds an
integer and a
float directly. The compiler must convert them both to the same type first. At a high-level the expression may become:
add( integer_to_float( b ), c )
Parameter type conversion
Addition is an internal operation and perhaps a special case. How does the compiler know the target type in the general case?
Functions, like other symbols, also have a type. The type is a function signature, providing both input and output types. For example, the C function
sqrt takes a
double as input and returns a
double. Unfortunately, C doesn't have a way to talk about this signature without having a type name as well. I'll adopt Leaf's syntax instead here, where we can say the type of
( : double ) -> ( : double ).
When the compiler types code, it also assigns types to the functions:
sqrt( a ) sqrt : ( : double ) -> ( : double ) a : integer
The compiler now knows that it must convert
a to a
double. In this case it will use an
integer_to_double conversion. It also knows the overall type of this expression is
double: it's what
What if there is no conversion?
sqrt( "Hello" ) sqrt ( : double ) -> ( : double ) "Hello" : char const *
In this case, the compiler will attempt to convert a string to a floating point type. It doesn't know how to do this. An error will be reported and compilation fails.
Overload resolution and ranking
In C things are simple as there are no function overloads. But in many other type languages you can overload a function, each version accepting different parameters and providing different results. In C++ the
sqrt function has many forms. One for
long double. C++ 11 adds a formal integer template to the list, and we also have one for the
complex template. The user may also further overload it. Let's keep the list small for our example:
sqrt( b ) b : float sqrt : [ ( : double ) -> ( : double ), ( : float ) -> ( : float ), ( : long double ) -> ( : long double ), ]
Now the compiler has a list of functions that are named
sqrt. It must pick one of these functions. The choice of the
( : float ) -> ( : float ) seems obvious, yet the other ones would actually work. The compiler knows how to convert a
float to both a
double and a
long double, just as it can convert an integer to any of these values.
A choice must be made as to what is the best function to use. The list of functions is sorted based on the relative cost of the conversions.
double costs a bit less than
float is the cheapest, since it isn't a conversion at all. Thus the
sqrt with signature
( : float ) -> ( : float) is chosen.
This ranking must deal with a myriad of types and user conversion functions. To make it really interesting, we can add parametric type functions (templates/generics). A language must specify all the relative costs of type conversion to ensure the compiler either picks the right function, or reports an ambiguity error.
In this article I've covered just the basics of type conversion. Noticeably lacking from my examples is an assignment expression, like:
a = b
To properly type that statement we need to consider internal compiler types, such as
lvalue. A normal
integer can't be assigned to, but an
integer lvalue can. The above may be fully typed like this:
assign( ( a : integer lvalue ), drop_lvalue( b : integer lvalue ) )
This is going beyond the basics, so I'll go into more detail in a future article.