The Life of a Programmer

Search

What is Runtime Type Information?

Variables contain values with types, but what uses this information, and when? In this article I look at how languages use type information, in particular, the case where the type information is required at runtime. Understanding type handling gives one a deeper understanding of how a language behaves, and some of its limitations.

I’ll start from the viewpoint of a statically typed language, as it’s easier to understand. Then I’ll show how dynamically typed languages differ.

Some Variables

In the following pseudo-code I introduce a few different typed variables. I’ll refer to these variables throughout this article.

age: Integer = 17
name: String = "Kesara"
user: User = User(id=123, security=AccessType.read, name=name)

In this example, age has an integer type, name has a string type, and user has a User type. We know this since we can see it written in the code. The translator also knows this since it understands the code. Here I’ll use the term translator to refer to the compiler, or interpreter, and the abstract machine on which this code will run.

Static Type Information

When source code is being processed, this type information that is known prior to execution is called “static” type information. We know its type simply from looking at the code. What we call statically typed languages are those where most type information can be known from the code. Let’s assume we have a statically typed language for now.

The translator uses this static information to decide how to process expressions. For example, given the expression age + 5, the translator knows it is adding two integers together. It knows this statically and thus immediately knows how to implement it. The static type information is also what allows the translator to know that name + “Suffix” is string concatenation, and not addition.

To put it another way, the translator knows which functions to call without needing to execute the code. For example, a compiler might allocate 8-bytes for the integer age, then write a value to it. Later it will call a specific function add_integer_8( age, 5 ), and that will assume that the bytes pointed to by age represent an integer. It doesn’t need to explicitly track at runtime that age is an integer — the knowledge of the static types at compile time was enough to call the right function.

Take a look at my article on type conversion for a perspective from a compiler.

With static types, functions and libraries are built to work with specific types. What looks like a single addition operator in the code may be mapped to a variety of functions, such as one for each size of integer and one for each floating point type. The compilation phase already determined the correct function to call, so these functions don’t need to check the type of their arguments. Since nothing needs this type information after compilation, it can be discarded — it won’t be in the compiled, or executed, form for the code.

Dynamic Type Information

Even in static languages, there is also “dynamic” type information, which cannot be known until the code is run. This comes up in type hierarchies and union types.

Let’s create a simple type hierarchy to demonstrate.

abstract class SecurityObject:
	security: AccessType
	id: Integer
	
	abstract function Identify() : String
		
class User extends SecurityObject:
	name: String
	
	function Identify() : String
		return name
	
class Process extends SecurityObject:
	executable: String
	
	function Identify() : String
		return executable

Now consider this function.

function Show( actor: SecurityObject ):
	print( actor.Identifty() )

The Show function takes a actor, that we know has the SecurityObject type from reading the code. That is its static type information. However, we can’t yet know whether actor is a User or a Process. Since the type of actor changes, we say that it is “dynamic”.

This dynamic type information is one kind of runtime type information. The translator can’t know the dynamic type until it executes the function, implying it has to know at runtime which type it is. Though, we should be careful here. It technically only needs to know which Identify function to call, which it can know without having knowing the type fully.

If you are curious about one way to handle inheritence, lookup virtual function tables.

Union

Union types also require runtime type information. Given a function that takes an argument value: Integer | String, we can’t know statically whether value is an integer or a string. It’s dynamic. If we want to print out that value, we will need to know that information.

type Sample = Integer | String

function print_sample(sample: Sample):
	if sample.is<Integer>():
		print( sample.as<Integer>() + 5 )
	else:
		print( "Prefix " + sample.as<String>() )

The print_sample function first has to ascertain whether it is dealing with an integer or a string before it can print the value. In order for this to work, variables of type Sample must carry additional information that lets the is<Integer> expression work.

Operations with dynamic types

We’ve seen that for inheritance and union types, using the value requires runtime information about the type of the value. Let’s look at some other common language operations that also require runtime type information.

instanceof

if( actor instanceof User ):
	...

If we have a branch on the type of an object, then the translator needs to track enough type information to evaluate that condition.

This is a limited form of type information, providing only a true or false value. Many times, languages may only offer this check on specific types, and only where it’s possible to be the checked type. In this example, it makes sense to ask if actor is an instance of User, since that’s one of the possible types it could be. It makes little sense to ask if it’s an Integer, since it couldn’t possibly be one.

This leaves the compiler a lot of room to decide how to represent this type information. It is runtime type information, but it may not be the full “type” you’d expect.

typeof

age_type_info = typeof age
print( age_type_info.name ) // Prints 'Integer'

function Debug( actor: SecurityObject ):
	actor_type_info = typeof actor
	print( actor_type_info.name ) // Prints 'User' or 'Process'

In both of the above cases, we are using runtime type information — we are printing out a type’s name.

get_type requires that even more type information be available at runtime than the simple instanceof expression. The information may still be limited, since we often only need to compare types, or get their name for debugging.

The above example illustrates a difference between dynamic and runtime type information. In the case of age_type_info, there is nothing dynamic. Since the type of age is statically known, the translator also knows the exact type of age_type_info. I’m using an operator syntax for typeof to demonstrate it’s a builtin aspect of a language. Whether it does a compile-time replacement, or runtime lookup, of the type, is up to the translator.

For typeof actor though, we know it has to be doing some kind of dynamic resolution. There is no way, at compile time, for it to know whether actor is a User or a Process.

I mentioned the translator knows the type of age_type_info. It gets confusing here, as the result of typeof age needs its own type. Though the typeof part is special, the result of the expression is just another value, much as though you defined your own class. Here, the class that defines types is usually called Type. Type is best expressed as a parametric class, which may help in understanding: the specific type of age_type_info is Type<Integer>. While age is logically an Integer, the data structured that represents this knowledge at runtime is a Type<Integer>.

get_property

Some languages offer a way to get at properties, including methods, of an object by their name.

id = get_property( actor, "id" )

Assuming the translator can’t know the type of actor statically here, it’ll need at least a mapping of names to values available at runtime. Often this is paired with the ability to get the type of property, a get_property_type function for example.

Whereas instanceof and typeof have only needed bits of type information at runtime, something like get_property requires a lot more. The presence of this expression may require the entire type information for objects being known at runtime: their names, methods, properties, base classes, and more.

In Java, the reflections API is the gateway to this level of runtime type information. Often this extended level of type information is called "type introspection".

C++

In C++ the instanceof operator is called dynamic_cast — it directly tells us it is a dynamic operation. C++ also has a static_cast operator, which requires a cast is possible only using static type information.

The interesting part about C++, is that the dynamic_cast operator can only be called on dynamic types. The most common dynamic type is a class that has virtual functions, implying there is a type hierarchy. You cannot use dynamic_cast on objects that don’t have any virtual functions. This isn’t a problem, since you don’t need to dynamically cast non-dynamic objects.

Dynamically Typed Languages

In a dynamically typed language, the translator doesn’t have any static knowledge about the types of variables, or it has very limited static knowledge.

Don’t confuse dynamic types with inferred types. In several statically typed languages, the static type of variable can be inferred — not having an explicit type in the code doesn’t mean it’s dynamic.

Let’s drop the type information from our variables and assume we’re using a dynamic language like JavaScript, or untyped Python.

age = 17
name = "Kesara"
user = User(id=123, security=AccessType.read, name=name)

Obviously, the translator knows that 17 is an integer when it parses it, as it also knows that "Kesara" is a string and User(…) is a User object. But it doesn’t track this information statically. Therefore it doesn’t care if you assign them all to the same variable.

age = 17
age = "Kesara"
age = User(...)

So what happens when the translator encounters the expression age + 5? In the static type case, we saw the translator knows from the type which function it needs to call. It knew that age was an integer, and calls an integer addition function. But in the dynamic case, the translator doesn’t know what type age has until it attempts to execute the code.

As the addition operator has two operands, the translator will have to check both types at runtime. Maybe it decides which function to call solely based on the first type and then coerces the second type into an appropriate form. For example, we might have this logic for the age + 5 expression.

function dynamic_add( a, b ):
	if( a instanceof Integer ):
		return integer_add( a, coerce_integer( b ) )
	if( a instanceof String ):
		return string_concatenate( a, coerce_string( b ) )
	raise UnsupportedAddType

That’s one potential approach. The translator could instead take the Python approach and treat every value as a class, where operators are a special function on that class. So age + 5 becomes age.add( 5 ). This still needs runtime information though, since we saw earlier that virtual functions need runtime type information. And the function implementation would still need to coerce the second value to the correct type.

In a dynamic language, virtually every expression is using a dynamic operation. This may be a virtual dispatch, a typeof operator, or a get_property function. This contrasts statically typed languages, where only specific operations require dynamic information.

Stored in memory

Since the translator can’t know how a dynamic value will be used, it needs to track the complete type information at runtime. This generally means that every variable contains both the actual value, and a pointer to the type information for that value. This pointer is real, as is the type information it points to. If you could inspect the raw memory of a dynamic variable, you’d be able to see this information — though it might be optimized and hard to decipher.

Indeed, static languages that support dynamic types tend to also have this same information in memory. This is because it’s hard, when compiling across several units and libraries, to know precisely which information will be required. That said, something like C++’s virtual functions are stored separately from the generic type information to ensure they are fast.

Values returned by a get_type operation also have to be stored somewhere, even if the values involved are not dynamic. This is also runtime type information.

Union types also need to distinguish the type stored at runtime. In a dynamic language, since all values are dynamic, union types require no special handling. But something like C++’s variant class needs to work with non-dynamic types. It therefore stores a variable in the variant object itself, saying which of the types it is. This is technically runtime type information, but usually not what is meant by that term. It gets blurry as languages mix dynamic and static concepts.

Runtime Type Information

In short, runtime type information, is data about the types of values that is accessible at runtime, while the program is executing. This includes data that is necessary for the behaviour of dynamic types, as well as introspective information about the structure of types themselves. In the case of dynamically typed languages, the translator relies heavily on this information for most expressions, whereas in a statically typed language, the translator relies on this information only in specific cases.

Please join me on Discord to discuss, or ping me on Mastadon.

What is Runtime Type Information?

How a language can work with dynamic types and provide information about a type while the code is executed.

A Harmony of People. Code That Runs the World. And the Individual Behind the Keyboard.

Mailing List

Signup to my mailing list to get notified of each article I publish.

Recent Posts

Search