Programming

Visions of generics and templates: How parametrics are compiled

In the world of parametric programming our languages often seem quite varied and unique. At the core however are two primary approaches, let’s call them templates and generics. In this article I’ll look briefly at how these are implemented by a compiler and touch on the features and limitations of each.

I should note right up front that each compiler and language has it’s own way of doing things and might not conform strictly to these approaches. Mostly they tend to behave “as-if” this is how it is implemented. Of course there’s variation, if there wasn’t that wouldn’t be any fun.

Templates

C++ style templates are perhaps easier to understand what the compiler is doing. It treats the template similar to a macro and substitutes the type. In this code:

1
2
3
4
5
6
7
8
template<typename T> struct sum {
    T value;
    void add( T incr ) {
        value += incr;
    }
}

typedef sum<int> int_sum;

int_sum is equivalent to this code:

1
2
3
4
5
6
struct int_sum {
    int value;
    void add( int incr ) {
        value += incr;
    }
}

It’s not treated exactly like a macro: the substitution and syntax rules have many guidelines. Yet looking at it this was is sufficient to understand the basics of what the compiler is doing. It takes the template code and makes a specific version of it for each instance of the template. Post-compilation the template is gone and we’re left with only the individual instances.

Pros

The primary advantage to this technique is the expressiveness of the templates. Virtually any code we can write without a template can be trivially transformed into a template version. We can treat T exactly as the instance type. We know what it is, we can new it, use operator overloads, and call any functions on it. Expressiveness is perhaps the biggest advantage of this approach.

Templates also allow partial specialization, at least in C++. This is really helpful when you have two classes that behave nearly the same, but one aspect differs for a certain type. The shared part need not even be valid for all types so long as the proper specializations exist. Everything is resolved at compile-time so there is no fear of runtime type incompatibility.

Another advantage of this approach is speed. Each instance of the template has code generated just for it. That givers the optimizer full knowledge to do its work. Fundamental types can be used directly without any kind of boxing, giving good space efficiency. This makes templates suitable for computational expensive, or performance critical code.

Cons

Distribution of such templates is perhaps the biggest problem. The full template code for a library must reside in header files, and is compiled directly into each object. While it is possible to get some code into a shared library, it’s very hard to swap out this library with a new version: it’s basically just not done. Changes in the template code will require a recompile of all projects that use it. I’d say this is the key drawback to this approach.

A big downside to this approach is code-size. Template classes can be big, and duplicating that code for every instance type can bloat the object code. In some cases this could lead to the CPU swapping it’s caches more, but for most uses the size itself isn’t the problem. All these copies have to be individually created by the compiler, and then compiled. Anybody who’s worked on a large C++ project with lots of templates can attest to the impact on compilation speed this has.

It is certainly possible for an optimizer and linker to actually merge a lot of code. While it may be different at the language level, quite often similar types result in the same assembly level code. LLVM contains a nice MergeFunctions optimization pass.

We also can’t forget the error messages. Perhaps this is just a symptom of C++ and other languages have faired better. In C++, it’s not uncommon to get pages and pages of cryptic template messages when making even trivial errors. This arises from the compiler being unable to conceptualize what logical error was made. All it can do is show the lowest level error, deep within the nested templates, and a path back to instantiation, hoping you can identify the problem.

C++11 had concepts as a proposal to help this issue. This later became constraints in C++14. The goal is to detect problems at a much higher level in the template code and be able to match it to logical error messages.

Generics

Java or C# style generics produce only one version of the parametric code. The generic parameter is substituted with an object, or a constrained base-level type. Using a constrained type is easier to show what happens:

1
2
3
4
5
class Adder<T> where T : IAddable {
    T Add( T a, T b ) {
        return a.Add(b);
    }
}

That code becomes something similar to:

1
2
3
4
5
class Adder {
    IAddable Add( IAddable a, IAddable b ) {
        return a.Add(b);
    }
}

The resulting class is entirely generic, and well defined. The added value of generics is that it will take care of automatic casting and type conversion. If we have a variable of type Adder<Complex>, the Add function will only take Complex values and will return a Complex as a result.

While C# logically processes generics this way, for value types the IL compiler takes the C++ template approach and does type substitution. It produces a copy of the code for each value type used. For reference types only one copy of the code is used. This is hidden from the user though, so the feature set remains as though only the singular object form is used.

Pros

Generics can be provided in shared libraries. They can be used without access to the source code and be can updated without having to recompile all the programs that use them. This allows generics to be safely used as part of a public API. Though often underrated, this is a really nice advantage of generics.

Only one version of the code is needed to support the various instances. The keeps code size down and also significantly improves compilation time. The compiler just compiles the generic once. C# has the exception with value types, which does produce copies. However, since this is done at the IL level the compilation burden is far reduced and not likely a speed issue.

As the generic code on its own must also be semantically valid the compiler doesn’t need to know its parameter types in order to compile. This usually results in clear and location relevant error messages. It may also lead to more readable code, though I’m not certain on this point.

Cons

Generics are quite inflexible. They work well for collection like types, but have a hard time adapting to other uses. The generic class itself must be semantically valid for the constraint type used. This means any deviation from very basic behaviour requires defining and implementing interfaces on the objects. The lack of expressiveness is perhaps the key drawback to generics.

There is an efficiency issue when working with fundamental types. The generic only works with object types, so all fundamentals need to be boxed and unboxed. This has a direct speed impact as well as creating a lot more objects to be managed. This is perhaps the main motivation why C# actually handles value type generics as C++ does: by producing copies of the code the generic need not box fundamentals.

A problem that had existed in Java, but not in C#, was the generic did not know what type it was working with. This is major drawback since a collection is unable to create the objects it holds. There’s no fundamental reason why this approach can’t know it’s type at runtime.

In the case of C# reference types I was unable to figure out how the .Net VM actually exposes this type to generic functions. I suspect it’s a hidden parameter, but if somebody knows for sure please let me know. I’m curious.

There is no type specialization. Since there is only one version of the code it isn’t possible to alter behaviour based on the actual type used. In C# you can of course compare the type at runtime to provide specialized conditions. It is a bit clumsy, and inefficient, compared to specialization, but it does work. Often this lack of specialization can prevent two nearly identical code blocks from being merged.

More options

There are more options, and more combined ways to look at the issue. Consider that dynamic languages are basically parametric by default, employing a duck typing approach based on the runtime type. This makes them rather expressive using only one copy of the code, but at the loss of efficiency and specialization.

In Leaf I’m considering a hybrid method. The core of the template will follow the generics approach, using a generalized object type. To keep it expressive though an automatic interface will be generated. All the operations needed in the parametric class or function will be bundled into an template. This template is copied for each type and passed to the generic code. It’s just a rough idea at this point.

If you know of a different parametric approach in other compilers, for any language, then please leave a comment.

If you’d like to learn more about compilers then follow me on Twitter. I have many more things to write on this topic. If there’s something special you’d like to hear about, or want to arrange a presentation feel free to contact me.

5 replies »

  1. You should check how Rust does it. All the templates are constrained, so they can be verified in isolation and compiled to an AST. That AST is embedded then in shared libraries, so although you cannot dynamically link against them, you can do so statically without having to include any source code.

    • Currently in Leaf I also compile the AST and put it in the shared library. I thus also don’t need the header files, but also can’t actually link dynamically. I don’t constrain at the moment at all though, so it’s really an untyped AST I’m sticking in my libraries.

  2. I’m working on an interesting approach for a language I’m designing at the moment. Like you, I’m planning on storing ASTs for any parametric class or function, which will enable use at run time with any type (I’ll have a JIT for type combinations not identified ahead-of-time), so one option will be to do a quick compile and leave expansion of all the types to run time. Where my plan really seems to depart from yours, however, its that I plan on making all functions that don’t have a precise type for their parameters use this mechanism. I’ll then use the template approach to build the final version of each function/parameter type combination that can be determined statically, and only use the generic approach when the type cannot be determined statically. This will, among other things, allow me to have fully polymorphic value types, something I often wish was possible in C++.

    • Actually, in Leaf I’m taking the same approach to typing. If you simply leave off type information to a function it becomes a parametric. This is is a typed function:

      defn pine = ( x : integer, y : float ) -> ( q : integer) { … }

      To make this a parametric you just leave off the types (and make the return implicit):

      defn pine = ( x, y ) -> { … }

      It’s an interesting idea to have them all compiled at runtime by a VM. That would solve the distribution problem. This isn’t an option for leaf since one of my requirements is that it runs without any significant runtime support (compiled to platform ABI).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s