All expressions have types. We often think only of the types of variables, as we explicitly declare those (at least, sometimes). But even though we can't see them, all expressions have types as well. Virtually any snippet of code has type information.
Determining types, converting between types, and ensuring their correctness are significant components of what a compiler does. For every expression, the compiler must determine exactly what type is required and convert as required. Or, if no conversion is possible, emit an error. Iβll introduce the basics of type conversion in this article.
Basic type assignment
Each expression in a system, this includes all sub-expressions and individual variables, must have their type identified. This is a multi-step process that depends heavily on the language. For example:
b + c
The compiler may do a first pass and determine this expression is:
(b : integer) + (c : float)
It does this by first resolving the symbols. It's internal data structure for each symbol will include the type of that symbol. In the above, the symbol table says b
says is an integer
and c
is a float
.
The resulting type of this operation must also be known. Even in this simple case of adding two values, languages can differ significantly in what happens. Is the result an integer, a single precision float, a double precision float, or perhaps it's just an error?
Assume our language says it's a float
. Chances are, the system doesn't have a function that adds an integer
and a float
directly. The compiler must convert them both to the same type first. At a high-level the expression may become:
add( integer_to_float( b ), c )
Parameter type conversion
Addition is an internal operation and perhaps a special case. How does the compiler know the target type in the general case?
Functions, like other symbols, also have a type. The type is a function signature, providing both input and output types. For example, the C function sqrt
takes a double
as input and returns a double
. Unfortunately, C doesn't have a way to talk about this signature without having a type name as well. I'll adopt Leaf's syntax instead here, where we can say the type of sqrt
is ( : double ) -> ( : double )
.
When the compiler types code, it also assigns types to the functions:
sqrt( a ) sqrt : ( : double ) -> ( : double ) a : integer
The compiler now knows that it must convert a
to a double
. In this case it will use an integer_to_double
conversion. It also knows the overall type of this expression is double
: it's what sqrt
returns.
What if there is no conversion?
sqrt( "Hello" ) sqrt ( : double ) -> ( : double ) "Hello" : char const *
In this case, the compiler will attempt to convert a string to a floating point type. It doesn't know how to do this. An error will be reported and compilation fails.
Overload resolution and ranking
In C things are simple as there are no function overloads. But in many other type languages you can overload a function, each version accepting different parameters and providing different results. In C++ the sqrt
function has many forms. One for double
, float
and long double
. C++ 11 adds a formal integer template to the list, and we also have one for the complex
template. The user may also further overload it. Let's keep the list small for our example:
sqrt( b ) b : float sqrt : [ ( : double ) -> ( : double ), ( : float ) -> ( : float ), ( : long double ) -> ( : long double ), ]
Now the compiler has a list of functions that are named sqrt
. It must pick one of these functions. The choice of the ( : float ) -> ( : float )
seems obvious, yet the other ones would actually work. The compiler knows how to convert a float
to both a double
and a long double
, just as it can convert an integer to any of these values.
A choice must be made as to what is the best function to use. The list of functions is sorted based on the relative cost of the conversions. float
to double
costs a bit less than float
to long double
. float
to float
is the cheapest, since it isn't a conversion at all. Thus the sqrt
with signature ( : float ) -> ( : float)
is chosen.
This ranking must deal with a myriad of types and user conversion functions. To make it really interesting, we can add parametric type functions (templates/generics). A language must specify all the relative costs of type conversion to ensure the compiler either picks the right function, or reports an ambiguity error.
Internal types
In this article I've covered just the basics of type conversion. Noticeably lacking from my examples is an assignment expression, like:
a = b
To properly type that statement we need to consider internal compiler types, such as lvalue
. A normal integer
can't be assigned to, but an integer lvalue
can. The above may be fully typed like this:
assign( ( a : integer lvalue ), drop_lvalue( b : integer lvalue ) )
This is going beyond the basics, so I'll go into more detail in a future article.