I’m in the midst of reworking a significant aspect of the type system in Leaf. It’s a lot of work, and I’d like to avoid doing it too many times. I need to explore the way types can be represented in a language, both logically for the programmer and specifically to the compiler.
I will assume you have a general idea of what a type is, but I do need to explore that understanding itself. Yes, there have been many books and papers written about types. Here I am going to try and cover what is different in Leaf and why it needs to be modelled perhaps differently. If you’ve worked on compilers before you might wish to skip this article and wait for the second part: the one that actually addresses the Leaf issue.
For this article I will describe the compiler structures using a C++ like pseudo code. I’ll avoid Leaf since inheritance is not fully defined yet, but also because it might get quite confusing to describe a type system using its own types!
The Simple
I’ll start with perhaps the simplest possible type in any system, a ‘boolean’. This type is simple because it represents an object which can have only one of two possible values: ‘true’ or ‘false’. There is no variation to this type. To model this in a compiler we could simply make it the most derived class of a type tree.
[sourcecode]
class type_boolean : type { };
[/sourcecode]
An ‘integer’ on the other hand has some variation. Abstractly, or rather in mathematics, they are unbound and thus the term ‘integer’ is enough to fully describe the type. A computer however has limits. Simply saying ‘integer’ is not enough to define the type: it needs a ‘size’ attribute which says how big it is, either in bytes or bits. In C these sizes are partially abstract hiding behind words like ‘short’ and ‘long’, in Leaf the sizes are explicit:
[sourcecode]
var a : integer 32bit;
var b : integer 11bit;
[/sourcecode]
In both C and Leaf you can specify an integer without a size. This doesn’t mean it doesn’t have a size, rather it should just use the default size for the platform. This is apparent when we model this in the type tree.
[sourcecode]
class type_integer : type {
int size;
}
[/sourcecode]
In both cases I’ve derived from a root type simply called ‘type’. This is the root of our type hierarchy. It allows the compiler to use types abstractly. When it comes to actually working with a particular type it will need to determine which one it is. In my Leaf compiler the root ‘type’ class has a ‘get_fundamental’ function which returns an enum of the fundamental type classes.
[sourcecode]
class type {
enum fundamental {
boolean,
integer,
};
virtual fundamental get_fundamental();
}
[/sourcecode]
Each derived classes would then implement ‘get_fundamental’ and return an appropriate value. (In Leaf’s compiler however I don’t use virtual functions for this. I simply store the enum value as part of the ‘type’ class. The effect is however the same.)
Tuples
Those two basic types give us enough to construct a compound type. Let’s try this simple Leaf tuple which combines a boolean and an integer.
[sourcecode]
type mark = {
: boolean,
: integer
}
[/sourcecode]
In the compiler we need a way to represent this type. It may be tempting to just create a similar C++ structure:
[sourcecode]
class mark : type {
type_boolean p0;
type_integer p1;
}
[/sourcecode]
This of course won’t work since our compiler has already been compiled: we cannot introduce new types into C++ at runtime. So instead we need a generic mechanism to hold these tuples. For example, a 2-tuple might look like this:
[sourcecode]
class type_2tuple : type {
type * p0;
type * p1;
}
[/sourcecode]
Leaf of course supports more than two fields in a tuple. It also allows you to name those fields, thus the generic structure has to be a bit more robust.
[sourcecode]
class type_tuple : type {
class part {
string name;
type * sub;
};
vector<part> parts;
}
[/sourcecode]
Arrays and pointers
With just integers and tuples you can actually construct a very rich type system. It would not be complete however as you actually need pointers of some kind. Even if you don’t expose pointers in the language the type system itself needs a way to deal with them. This creates our first wrapper type. I’ll use C++ syntax here for an example since Leaf doesn’t expose pointers this way:
[sourcecode]
typedef (int*) int_ptr;
[/sourcecode]
To model this in our compiler we can create a “pointer” type.
[sourcecode]
class type_pointer : type {
type * sub;
}
[/sourcecode]
It really is quite a simple type. It even handles multiple levels of indirection since ‘sub’ can point to another ‘type_pointer’ class. This tends also to be enough to represent arrays. At the lower levels pointers and arrays are the same, so this would give the compiler enough flexibility to use the pointer as an array. This may not be a good idea however, and many languages won’t model them the same. In Leaf an ‘array’ is indeed a distinct type which has logically nothing to do with a pointer. Yet, due to the their underlying similarity the model class looks about the same:
[sourcecode]
class type_array : type {
type * sub;
}
[/sourcecode]
Conclusion
This is really just a simple introduction to types. I’m writing this to provide a basis for my next article on type attributes, and how to model non-traditional types.