Variables contain values with types, but what uses this information, and when? In this article I look at how languages use type information, in particular, the case where the type information is required at runtime. Understanding type handling gives one a deeper understanding of how a language behaves, and some of its limitations.
I’ll start from the viewpoint of a statically typed language, as it’s easier to understand. Then I’ll show how dynamically typed languages differ.
In the following pseudo-code I introduce a few different typed variables. I’ll refer to these variables throughout this article.
age: Integer = 17 name: String = "Kesara" user: User = User(id=123, security=AccessType.read, name=name)
In this example,
age has an integer type,
name has a string type, and
user has a
User type. We know this since we can see it written in the code. The translator also knows this since it understands the code. Here I’ll use the term translator to refer to the compiler, or interpreter, and the abstract machine on which this code will run.
Static Type Information
When source code is being processed, this type information that is known prior to execution is called “static” type information. We know its type simply from looking at the code. What we call statically typed languages are those where most type information can be known from the code. Let’s assume we have a statically typed language for now.
The translator uses this static information to decide how to process expressions. For example, given the expression
age + 5, the translator knows it is adding two integers together. It knows this statically and thus immediately knows how to implement it. The static type information is also what allows the translator to know that
name + “Suffix” is string concatenation, and not addition.
To put it another way, the translator knows which functions to call without needing to execute the code. For example, a compiler might allocate 8-bytes for the integer
age, then write a value to it. Later it will call a specific function
add_integer_8( age, 5 ), and that will assume that the bytes pointed to by
age represent an integer. It doesn’t need to explicitly track at runtime that
age is an integer — the knowledge of the static types at compile time was enough to call the right function.
Take a look at my article on type conversion for a perspective from a compiler.
With static types, functions and libraries are built to work with specific types. What looks like a single addition operator in the code may be mapped to a variety of functions, such as one for each size of integer and one for each floating point type. The compilation phase already determined the correct function to call, so these functions don’t need to check the type of their arguments. Since nothing needs this type information after compilation, it can be discarded — it won’t be in the compiled, or executed, form for the code.
Dynamic Type Information
Even in static languages, there is also “dynamic” type information, which cannot be known until the code is run. This comes up in type hierarchies and union types.
Let’s create a simple type hierarchy to demonstrate.
abstract class SecurityObject: security: AccessType id: Integer abstract function Identify() : String class User extends SecurityObject: name: String function Identify() : String return name class Process extends SecurityObject: executable: String function Identify() : String return executable
Now consider this function.
function Show( actor: SecurityObject ): print( actor.Identifty() )
Show function takes a
actor, that we know has the
SecurityObject type from reading the code. That is its static type information. However, we can’t yet know whether
actor is a
User or a
Process. Since the type of
actor changes, we say that it is “dynamic”.
This dynamic type information is one kind of runtime type information. The translator can’t know the dynamic type until it executes the function, implying it has to know at runtime which type it is. Though, we should be careful here. It technically only needs to know which
Identify function to call, which it can know without having knowing the type fully.
If you are curious about one way to handle inheritence, lookup virtual function tables.
Union types also require runtime type information. Given a function that takes an argument
value: Integer | String, we can’t know statically whether
value is an integer or a string. It’s dynamic. If we want to print out that value, we will need to know that information.
type Sample = Integer | String function print_sample(sample: Sample): if sample.is<Integer>(): print( sample.as<Integer>() + 5 ) else: print( "Prefix " + sample.as<String>() )
print_sample function first has to ascertain whether it is dealing with an integer or a string before it can print the value. In order for this to work, variables of type
Sample must carry additional information that lets the
is<Integer> expression work.
Operations with dynamic types
We’ve seen that for inheritance and union types, using the value requires runtime information about the type of the value. Let’s look at some other common language operations that also require runtime type information.
if( actor instanceof User ): ...
If we have a branch on the type of an object, then the translator needs to track enough type information to evaluate that condition.
This is a limited form of type information, providing only a true or false value. Many times, languages may only offer this check on specific types, and only where it’s possible to be the checked type. In this example, it makes sense to ask if
actor is an instance of
User, since that’s one of the possible types it could be. It makes little sense to ask if it’s an
Integer, since it couldn’t possibly be one.
This leaves the compiler a lot of room to decide how to represent this type information. It is runtime type information, but it may not be the full “type” you’d expect.
age_type_info = typeof age print( age_type_info.name ) // Prints 'Integer' function Debug( actor: SecurityObject ): actor_type_info = typeof actor print( actor_type_info.name ) // Prints 'User' or 'Process'
In both of the above cases, we are using runtime type information — we are printing out a type’s name.
get_type requires that even more type information be available at runtime than the simple
instanceof expression. The information may still be limited, since we often only need to compare types, or get their name for debugging.
The above example illustrates a difference between dynamic and runtime type information. In the case of
age_type_info, there is nothing dynamic. Since the type of
age is statically known, the translator also knows the exact type of
age_type_info. I’m using an operator syntax for
typeof to demonstrate it’s a builtin aspect of a language. Whether it does a compile-time replacement, or runtime lookup, of the type, is up to the translator.
typeof actor though, we know it has to be doing some kind of dynamic resolution. There is no way, at compile time, for it to know whether
actor is a
User or a
I mentioned the translator knows the type of
age_type_info. It gets confusing here, as the result of
typeof ageneeds its own type. Though the
typeofpart is special, the result of the expression is just another value, much as though you defined your own class. Here, the class that defines types is usually called
Type. Type is best expressed as a parametric class, which may help in understanding: the specific type of
ageis logically an
Integer, the data structured that represents this knowledge at runtime is a
Some languages offer a way to get at properties, including methods, of an object by their name.
id = get_property( actor, "id" )
Assuming the translator can’t know the type of
actor statically here, it’ll need at least a mapping of names to values available at runtime. Often this is paired with the ability to get the type of property, a
get_property_type function for example.
typeof have only needed bits of type information at runtime, something like
get_property requires a lot more. The presence of this expression may require the entire type information for objects being known at runtime: their names, methods, properties, base classes, and more.
In Java, the reflections API is the gateway to this level of runtime type information. Often this extended level of type information is called "type introspection".
In C++ the
instanceof operator is called
dynamic_cast — it directly tells us it is a dynamic operation. C++ also has a
static_cast operator, which requires a cast is possible only using static type information.
The interesting part about C++, is that the
dynamic_cast operator can only be called on dynamic types. The most common dynamic type is a class that has virtual functions, implying there is a type hierarchy. You cannot use
dynamic_cast on objects that don’t have any virtual functions. This isn’t a problem, since you don’t need to dynamically cast non-dynamic objects.
Dynamically Typed Languages
In a dynamically typed language, the translator doesn’t have any static knowledge about the types of variables, or it has very limited static knowledge.
Don’t confuse dynamic types with inferred types. In several statically typed languages, the static type of variable can be inferred — not having an explicit type in the code doesn’t mean it’s dynamic.
age = 17 name = "Kesara" user = User(id=123, security=AccessType.read, name=name)
Obviously, the translator knows that
17 is an integer when it parses it, as it also knows that
"Kesara" is a string and
User(…) is a
User object. But it doesn’t track this information statically. Therefore it doesn’t care if you assign them all to the same variable.
age = 17 age = "Kesara" age = User(...)
So what happens when the translator encounters the expression
age + 5? In the static type case, we saw the translator knows from the type which function it needs to call. It knew that
age was an integer, and calls an integer addition function. But in the dynamic case, the translator doesn’t know what type
age has until it attempts to execute the code.
As the addition operator has two operands, the translator will have to check both types at runtime. Maybe it decides which function to call solely based on the first type and then coerces the second type into an appropriate form. For example, we might have this logic for the
age + 5 expression.
function dynamic_add( a, b ): if( a instanceof Integer ): return integer_add( a, coerce_integer( b ) ) if( a instanceof String ): return string_concatenate( a, coerce_string( b ) ) raise UnsupportedAddType
That’s one potential approach. The translator could instead take the Python approach and treat every value as a class, where operators are a special function on that class. So
age + 5 becomes
age.add( 5 ). This still needs runtime information though, since we saw earlier that virtual functions need runtime type information. And the function implementation would still need to coerce the second value to the correct type.
In a dynamic language, virtually every expression is using a dynamic operation. This may be a virtual dispatch, a
typeof operator, or a
get_property function. This contrasts statically typed languages, where only specific operations require dynamic information.
Stored in memory
Since the translator can’t know how a dynamic value will be used, it needs to track the complete type information at runtime. This generally means that every variable contains both the actual value, and a pointer to the type information for that value. This pointer is real, as is the type information it points to. If you could inspect the raw memory of a dynamic variable, you’d be able to see this information — though it might be optimized and hard to decipher.
Indeed, static languages that support dynamic types tend to also have this same information in memory. This is because it’s hard, when compiling across several units and libraries, to know precisely which information will be required. That said, something like C++’s virtual functions are stored separately from the generic type information to ensure they are fast.
Values returned by a
get_type operation also have to be stored somewhere, even if the values involved are not dynamic. This is also runtime type information.
Union types also need to distinguish the type stored at runtime. In a dynamic language, since all values are dynamic, union types require no special handling. But something like C++’s variant class needs to work with non-dynamic types. It therefore stores a variable in the variant object itself, saying which of the types it is. This is technically runtime type information, but usually not what is meant by that term. It gets blurry as languages mix dynamic and static concepts.
Runtime Type Information
In short, runtime type information, is data about the types of values that is accessible at runtime, while the program is executing. This includes data that is necessary for the behaviour of dynamic types, as well as introspective information about the structure of types themselves. In the case of dynamically typed languages, the translator relies heavily on this information for most expressions, whereas in a statically typed language, the translator relies on this information only in specific cases.