The Life of a Programmer

Search

What a compiler does: type conversion basics

All expressions have types. We often think only of the types of variables, as we explicitly declare those (at least, sometimes). But even though we can't see them, all expressions have types as well. Virtually any snippet of code has type information.

Determining types, converting between types, and ensuring their correctness are significant components of what a compiler does. For every expression, the compiler must determine exactly what type is required and convert as required. Or, if no conversion is possible, emit an error. I’ll introduce the basics of type conversion in this article.

Basic type assignment

Each expression in a system, this includes all sub-expressions and individual variables, must have their type identified. This is a multi-step process that depends heavily on the language. For example:

b + c

The compiler may do a first pass and determine this expression is:

(b : integer) + (c : float)

It does this by first resolving the symbols. It's internal data structure for each symbol will include the type of that symbol. In the above, the symbol table says b says is an integer and c is a float.

The resulting type of this operation must also be known. Even in this simple case of adding two values, languages can differ significantly in what happens. Is the result an integer, a single precision float, a double precision float, or perhaps it's just an error?

Assume our language says it's a float. Chances are, the system doesn't have a function that adds an integer and a float directly. The compiler must convert them both to the same type first. At a high-level the expression may become:

add( integer_to_float( b ), c )

Parameter type conversion

Addition is an internal operation and perhaps a special case. How does the compiler know the target type in the general case?

Functions, like other symbols, also have a type. The type is a function signature, providing both input and output types. For example, the C function sqrt takes a double as input and returns a double. Unfortunately, C doesn't have a way to talk about this signature without having a type name as well. I'll adopt Leaf's syntax instead here, where we can say the type of sqrt is ( : double ) -> ( : double ).

When the compiler types code, it also assigns types to the functions:

sqrt( a )
	sqrt : ( : double ) -> ( : double )
	a : integer

The compiler now knows that it must convert a to a double. In this case it will use an integer_to_double conversion. It also knows the overall type of this expression is double: it's what sqrt returns.

What if there is no conversion?

sqrt( "Hello" )
	sqrt ( : double ) -> ( : double )
	"Hello" : char const *

In this case, the compiler will attempt to convert a string to a floating point type. It doesn't know how to do this. An error will be reported and compilation fails.

Overload resolution and ranking

In C things are simple as there are no function overloads. But in many other type languages you can overload a function, each version accepting different parameters and providing different results. In C++ the sqrt function has many forms. One for double, float and long double. C++ 11 adds a formal integer template to the list, and we also have one for the complex template. The user may also further overload it. Let's keep the list small for our example:

sqrt( b )
	b : float
	sqrt : [
		( : double ) -> ( : double ),
		( : float ) -> ( : float ),
		( : long double ) -> ( : long double ),
	]

Now the compiler has a list of functions that are named sqrt. It must pick one of these functions. The choice of the ( : float ) -> ( : float ) seems obvious, yet the other ones would actually work. The compiler knows how to convert a float to both a double and a long double, just as it can convert an integer to any of these values.

A choice must be made as to what is the best function to use. The list of functions is sorted based on the relative cost of the conversions. float to double costs a bit less than float to long double. float to float is the cheapest, since it isn't a conversion at all. Thus the sqrt with signature ( : float ) -> ( : float) is chosen.

This ranking must deal with a myriad of types and user conversion functions. To make it really interesting, we can add parametric type functions (templates/generics). A language must specify all the relative costs of type conversion to ensure the compiler either picks the right function, or reports an ambiguity error.

Internal types

In this article I've covered just the basics of type conversion. Noticeably lacking from my examples is an assignment expression, like:

a = b

To properly type that statement we need to consider internal compiler types, such as lvalue. A normal integer can't be assigned to, but an integer lvalue can. The above may be fully typed like this:

assign( ( a : integer lvalue ), drop_lvalue( b : integer lvalue ) )

This is going beyond the basics, so I'll go into more detail in a future article.

Please join me on Discord to discuss, or ping me on Mastadon.

What a compiler does: type conversion basics

A quick introduction to how compilers use type information to convert types while evaluating expressions.

A Harmony of People. Code That Runs the World. And the Individual Behind the Keyboard.

Mailing List

Signup to my mailing list to get notified of each article I publish.

Recent Posts

Search