The Life of a Programmer

Breaking the type tree with modifiers

A language’s type hierarchy is formed with both linking and attributes. But what if a uniform hierarchy is not the correct way to represent a type? In Leaf I’m about to refactor the type system to do exactly that: split up the type tree. Compared to C++ I have more fundamental type modifiers and conversion rules which make a simple tree hard to use (which is not to imply C++ compilers get away with using simple type trees).

To recall, in my previous article I used the ‘integer’ and ‘pointer’ types to demonstrate this. The ‘integer’ has an attribute ‘size’ whereas the pointer is a linking type which forms a simple list.

[sourcecode]
class type_integer : type {
int size;
}

class type_pointer : type {
type * sub;
}
[/sourcecode]

A tuple has multiple links and thus forms a full tree, but for this article a simple attribute and list-like type are enough.

What about ‘const’?

‘const’ is the first type keyword which puts a slight wrinkle in the tree system. Is it a tree construct like a pointer, or is an attribute, like size is for an integer. Either way it could apply to all other types. Here’s what those two approaches look like.

[sourcecode]
//as a tree-type
class type_const : type {
type * sub;
}

//as an attribute we modify the base type
class type {
bool const;
}
[/sourcecode]

Problems with the tree approach

The first problem with the approach is that it allows you to create types which aren’t actually possible in the language. Since ‘sub’ can be anything it could actually be a another ‘type_const’. This would allow the type tree to contain types like ‘const->const->const->integer’. From a compiler’s perspective this becomes an issue since you have to decide whether these are real types, or are simply errors in the system. In the modifier approach this cannot happen: the type tree cannot represent unreal types.

The second problem is with conversion. In C++ a ‘const’ value can be copied just like a normal value. The ‘const’ modifier plays no role in the ability to copy the value. In the compiler however you now have to deal with this in two manners. If you wish to copy you must first check if the type is ‘const’, then strip that off to get to the underlying type. In the attribute approach we always immediately have the true type and can simply ignore the ‘const’ modifier.

Key problem with the attribute approach

There is a problem with producing new types. Say you already have a type object and merely wish to create the ‘const’ form of that type. In the tree-approach you’d simply create a new ‘type_const’ and link to the existing type. In the modifier approach however you need clone the type and then set the ‘const’ modifier. At first this may not seem so bad, and it’s the way my Leaf compiler currently works.

But it leads to a big problem, the one I’m trying to solve. The moment you clone a type you end up with a second unrelated type. For fundamentals like ‘integer’, and even anonymous ‘tuple’ types this isn’t an issue. The moment you introduce named types, as created by a user, you get a problem. Say the user creates a tuple type called ‘point’. Then later they want a ‘const point’. In the modifer approach those two types are unrelated as you cloned one to create the other. This becomes problematic when matching types. There’s no longer an easy way to know that a ‘const point’ can be copied to produce a ‘point’ value.

Splitting the tree

The downside of the attribute approach may seem overwhelming and drive one towards the tree approach. In Leaf however the problem gets more complex. Beyond just a simple ‘const’ modifier, I have several other core attributes, such as ‘optional’ and ‘shared’. There are also several implicit types like ‘lvalue’ and ‘rvalue’ (which C++ also has to deal with). Trying to handle these in a tree is extremely troublesome for type conversion. The modifiers cannot form chains and need to be visible as a set, not one at a time. Consider the following Leaf variable.

[sourcecode]
alias type_a : optional shared integer;
[/sourcecode]

Logically forms a proper tree: ‘optional->shared->integer’. The other order ‘shared->optional->integer’ is simply not possible in Leaf. During conversion however, ‘optional’ and ‘shared’ have both individual and combined rules. In the tree-form it’s too hard to figure out which attributes have actually been applied.

Note that the chain form is used when I lower the code to Leaf IR (the simplified language between high-level Leaf and the back-end, LLVM at the moment). At the lower levels the attributes become somewhat problematic to use, so I take the chained approach. Again here however it makes type-collapsing/combining optimizations a bit more difficult.

Neither approach is ideal, perhaps a combined approach makes sense. Basically I retain the tree, but instead of linking directly to other types they all use a reference intermediate. The top-level of any tree then also starts with this reference.

[sourcecode]
class reference {
bool optional;
bool shared;
bool const;
type * underlying;
}

//now a compound type uses the references instead of the direct type, for example a 2-tuple would be
class type_2tuple : type {
reference first;
reference second;
}
[/sourcecode]

The ‘type’ hierarchy is now a bare hierarchy which tracks the “true” object type ignoring any storage, access, and other modifiers. I think this will work, but I’m not absolutely positive. I am however certain that the current approach is failing and I need to try something else.

Followup

I’ve converted Leaf to the proposed system. Generally it works well and my code is greatly simplified. A large advantage is that the type-tree is now immutable (post-construction) and thus compound types really share their sub-part types (through a type reference of course).

Perhaps the biggest problem is that it makes it easy to forget about the modifiers. Numerous times I would just check for a given fundamental type. My tests would also work since they are written only for the feature I’m adding. The moment I add some modifiers to the test, it breaks. For example, a function takes a 3-tuple parameter and I am trying to pass an optional 3-tuple. Originally the conversion code just checked the tuple part and forgot about the optional bit. The lower Leaf-IR layer then produces an error since it generates type-incompatible instructions.

A bigger problem is perhaps the ‘const’ modifier, especially when dealing with lvalues. Does the ‘const’ apply to the variable, or to the underlying dereferenced value? I haven’t thought more about this as I haven’t implemented ‘const’ yet. I will also be implementing a ‘readonly’ modifier and need the system to be consistent. It’s easy enough to have both ‘pre’ and ‘post’ variants, but then the syntax might become ambiguous.

Please join me on Discord to discuss, or ping me on Mastadon.

Breaking the type tree with modifiers

A Harmony of People. Code That Runs the World. And the Individual Behind the Keyboard.

Mailing List

Signup to my mailing list to get notified of each article I publish.

Recent Posts