Variables contain values with types, but what uses this information, and when? In this article I look at how languages use type information, in particular, the case where the type information is required at runtime. Understanding type handling gives one a deeper understanding of how a language behaves, and some of its limitations.
I’ll start from the viewpoint of a statically typed language, as it’s easier to understand. Then I’ll show how dynamically typed languages differ.
Some Variables
In the following pseudo-code I introduce a few different typed variables. I’ll refer to these variables throughout this article.
age: Integer = 17 name: String = "Kesara" user: User = User(id=123, security=AccessType.read, name=name)
In this example, age
has an integer type, name
has a string type, and user
has a User
type. We know this since we can see it written in the code. The translator also knows this since it understands the code. Here I’ll use the term translator to refer to the compiler, or interpreter, and the abstract machine on which this code will run.
Static Type Information
When source code is being processed, this type information that is known prior to execution is called “static” type information. We know its type simply from looking at the code. What we call statically typed languages are those where most type information can be known from the code. Let’s assume we have a statically typed language for now.
The translator uses this static information to decide how to process expressions. For example, given the expression age + 5
, the translator knows it is adding two integers together. It knows this statically and thus immediately knows how to implement it. The static type information is also what allows the translator to know that name + “Suffix”
is string concatenation, and not addition.
To put it another way, the translator knows which functions to call without needing to execute the code. For example, a compiler might allocate 8-bytes for the integer age
, then write a value to it. Later it will call a specific function add_integer_8( age, 5 )
, and that will assume that the bytes pointed to by age
represent an integer. It doesn’t need to explicitly track at runtime that age
is an integer — the knowledge of the static types at compile time was enough to call the right function.
Take a look at my article on type conversion for a perspective from a compiler.
With static types, functions and libraries are built to work with specific types. What looks like a single addition operator in the code may be mapped to a variety of functions, such as one for each size of integer and one for each floating point type. The compilation phase already determined the correct function to call, so these functions don’t need to check the type of their arguments. Since nothing needs this type information after compilation, it can be discarded — it won’t be in the compiled, or executed, form for the code.
Dynamic Type Information
Even in static languages, there is also “dynamic” type information, which cannot be known until the code is run. This comes up in type hierarchies and union types.
Let’s create a simple type hierarchy to demonstrate.
abstract class SecurityObject: security: AccessType id: Integer abstract function Identify() : String class User extends SecurityObject: name: String function Identify() : String return name class Process extends SecurityObject: executable: String function Identify() : String return executable
Now consider this function.
function Show( actor: SecurityObject ): print( actor.Identifty() )
The Show
function takes a actor
, that we know has the SecurityObject
type from reading the code. That is its static type information. However, we can’t yet know whether actor
is a User
or a Process
. Since the type of actor
changes, we say that it is “dynamic”.
This dynamic type information is one kind of runtime type information. The translator can’t know the dynamic type until it executes the function, implying it has to know at runtime which type it is. Though, we should be careful here. It technically only needs to know which Identify
function to call, which it can know without having knowing the type fully.
If you are curious about one way to handle inheritence, lookup virtual function tables.
Union
Union types also require runtime type information. Given a function that takes an argument value: Integer | String
, we can’t know statically whether value
is an integer or a string. It’s dynamic. If we want to print out that value, we will need to know that information.
type Sample = Integer | String function print_sample(sample: Sample): if sample.is<Integer>(): print( sample.as<Integer>() + 5 ) else: print( "Prefix " + sample.as<String>() )
The print_sample
function first has to ascertain whether it is dealing with an integer or a string before it can print the value. In order for this to work, variables of type Sample
must carry additional information that lets the is<Integer>
expression work.
Operations with dynamic types
We’ve seen that for inheritance and union types, using the value requires runtime information about the type of the value. Let’s look at some other common language operations that also require runtime type information.
instanceof
if( actor instanceof User ): ...
If we have a branch on the type of an object, then the translator needs to track enough type information to evaluate that condition.
This is a limited form of type information, providing only a true or false value. Many times, languages may only offer this check on specific types, and only where it’s possible to be the checked type. In this example, it makes sense to ask if actor
is an instance of User
, since that’s one of the possible types it could be. It makes little sense to ask if it’s an Integer
, since it couldn’t possibly be one.
This leaves the compiler a lot of room to decide how to represent this type information. It is runtime type information, but it may not be the full “type” you’d expect.
typeof
age_type_info = typeof age print( age_type_info.name ) // Prints 'Integer' function Debug( actor: SecurityObject ): actor_type_info = typeof actor print( actor_type_info.name ) // Prints 'User' or 'Process'
In both of the above cases, we are using runtime type information — we are printing out a type’s name.
get_type
requires that even more type information be available at runtime than the simple instanceof
expression. The information may still be limited, since we often only need to compare types, or get their name for debugging.
The above example illustrates a difference between dynamic and runtime type information. In the case of age_type_info
, there is nothing dynamic. Since the type of age
is statically known, the translator also knows the exact type of age_type_info
. I’m using an operator syntax for typeof
to demonstrate it’s a builtin aspect of a language. Whether it does a compile-time replacement, or runtime lookup, of the type, is up to the translator.
For typeof actor
though, we know it has to be doing some kind of dynamic resolution. There is no way, at compile time, for it to know whether actor
is a User
or a Process
.
I mentioned the translator knows the type of
age_type_info
. It gets confusing here, as the result oftypeof age
needs its own type. Though thetypeof
part is special, the result of the expression is just another value, much as though you defined your own class. Here, the class that defines types is usually calledType
. Type is best expressed as a parametric class, which may help in understanding: the specific type ofage_type_info
isType<Integer>
. Whileage
is logically anInteger
, the data structured that represents this knowledge at runtime is aType<Integer>
.
get_property
Some languages offer a way to get at properties, including methods, of an object by their name.
id = get_property( actor, "id" )
Assuming the translator can’t know the type of actor
statically here, it’ll need at least a mapping of names to values available at runtime. Often this is paired with the ability to get the type of property, a get_property_type
function for example.
Whereas instanceof
and typeof
have only needed bits of type information at runtime, something like get_property
requires a lot more. The presence of this expression may require the entire type information for objects being known at runtime: their names, methods, properties, base classes, and more.
In Java, the reflections API is the gateway to this level of runtime type information. Often this extended level of type information is called "type introspection".
C++
In C++ the instanceof
operator is called dynamic_cast
— it directly tells us it is a dynamic operation. C++ also has a static_cast
operator, which requires a cast is possible only using static type information.
The interesting part about C++, is that the dynamic_cast
operator can only be called on dynamic types. The most common dynamic type is a class that has virtual functions, implying there is a type hierarchy. You cannot use dynamic_cast
on objects that don’t have any virtual functions. This isn’t a problem, since you don’t need to dynamically cast non-dynamic objects.
Dynamically Typed Languages
In a dynamically typed language, the translator doesn’t have any static knowledge about the types of variables, or it has very limited static knowledge.
Don’t confuse dynamic types with inferred types. In several statically typed languages, the static type of variable can be inferred — not having an explicit type in the code doesn’t mean it’s dynamic.
Let’s drop the type information from our variables and assume we’re using a dynamic language like JavaScript, or untyped Python.
age = 17 name = "Kesara" user = User(id=123, security=AccessType.read, name=name)
Obviously, the translator knows that 17
is an integer when it parses it, as it also knows that "Kesara"
is a string and User(…)
is a User
object. But it doesn’t track this information statically. Therefore it doesn’t care if you assign them all to the same variable.
age = 17 age = "Kesara" age = User(...)
So what happens when the translator encounters the expression age + 5
? In the static type case, we saw the translator knows from the type which function it needs to call. It knew that age
was an integer, and calls an integer addition function. But in the dynamic case, the translator doesn’t know what type age
has until it attempts to execute the code.
As the addition operator has two operands, the translator will have to check both types at runtime. Maybe it decides which function to call solely based on the first type and then coerces the second type into an appropriate form. For example, we might have this logic for the age + 5
expression.
function dynamic_add( a, b ): if( a instanceof Integer ): return integer_add( a, coerce_integer( b ) ) if( a instanceof String ): return string_concatenate( a, coerce_string( b ) ) raise UnsupportedAddType
That’s one potential approach. The translator could instead take the Python approach and treat every value as a class, where operators are a special function on that class. So age + 5
becomes age.add( 5 )
. This still needs runtime information though, since we saw earlier that virtual functions need runtime type information. And the function implementation would still need to coerce the second value to the correct type.
In a dynamic language, virtually every expression is using a dynamic operation. This may be a virtual dispatch, a typeof
operator, or a get_property
function. This contrasts statically typed languages, where only specific operations require dynamic information.
Stored in memory
Since the translator can’t know how a dynamic value will be used, it needs to track the complete type information at runtime. This generally means that every variable contains both the actual value, and a pointer to the type information for that value. This pointer is real, as is the type information it points to. If you could inspect the raw memory of a dynamic variable, you’d be able to see this information — though it might be optimized and hard to decipher.
Indeed, static languages that support dynamic types tend to also have this same information in memory. This is because it’s hard, when compiling across several units and libraries, to know precisely which information will be required. That said, something like C++’s virtual functions are stored separately from the generic type information to ensure they are fast.
Values returned by a get_type
operation also have to be stored somewhere, even if the values involved are not dynamic. This is also runtime type information.
Union types also need to distinguish the type stored at runtime. In a dynamic language, since all values are dynamic, union types require no special handling. But something like C++’s variant class needs to work with non-dynamic types. It therefore stores a variable in the variant object itself, saying which of the types it is. This is technically runtime type information, but usually not what is meant by that term. It gets blurry as languages mix dynamic and static concepts.
Runtime Type Information
In short, runtime type information, is data about the types of values that is accessible at runtime, while the program is executing. This includes data that is necessary for the behaviour of dynamic types, as well as introspective information about the structure of types themselves. In the case of dynamically typed languages, the translator relies heavily on this information for most expressions, whereas in a statically typed language, the translator relies on this information only in specific cases.