Polymorphism: the core of object oriented programming. Most modern languages have some concept of interfaces, virtual functions, and classes. Though each language differs in details, and may have specialized concepts, the core idea remains the same. You define a base class with virtual functions; a derived class can override some, all, or none of those functions. Have you ever stopped to wonder how this works? What overhead, or cost, is involved in such object oriented programming?
I’ll walk us through how polymorphism works. Rather than just explain it, we’ll recreate polymorphism from the ground up. That is, using C, the language without polymorphism, I’ll show how you can create it, and let you discover how languages implement this feature. I won’t jump directly to a full implementation, instead opting to go through a logical set of steps which eventually brings us to an implementation used in common compilers.
The Interface
Rather than start at a base class we’ll start with an interface. You’ll quickly see that every base class actually has an implicit interface, thus it seems like the reasonable place to start. Let’s define a basic interface using Java syntax — it is simple enough that anybody can understand it.
interface PolyFunc { public void oneParam( int a ); public int withReturn( ); }
A class that implements this interface must define both functions. These are the member functions that an implementation defines.
Now consider a function that takes a parameter of PolyFunc. What is that function actually expecting? Step down to C at this point and consider a function with the signature void someFunc( PolyFunc * pf ). It is a function that takes a pointer to a PolyFunc interface. What is this function expecting behind that pointer? It is an interface with functions so it must have some way to call those functions. Since functions can be expressed by pointers themselves it makes even more sense if it were just a set of function pointers.
In C we define the interface as a struct with two members: one for each of the functions. If you haven’t done much C coding you’ll have to excuse the rather ugly syntax for declaring function pointers.
struct PolyFunc { void (*oneParam)(int); int (*withReturn)(); };
Does that look right or does something appear to be missing. In Java when a member function is called it has access to a variable called this, which points to the current object. In C there are no implicit parameters to functions, so we’ll need an explicit way to communicate this to the function. What type is this? The only type we currently have is PolyFunc so we’ll have to assume that is the type of the pointer. Let’s redefine our structure.
struct PolyFunc { void (*oneParam)( struct PolyFunc *, int); int (*withReturn)( struct PolyFunc *); };
Not seeing how such functions are invoked leaves this a bit unclear. So here is a function, called someFunc, which does exactly that.
void someFunc( struct PolyFunc * pf ) { int r = (pf->withReturn)( pf ); (pf->oneParam)( pf, r ); }
In C the (pf->withReturn)( pf ); is saying to call the function pointed to by withReturn from the structure pf. It also says to call it with one parameter, the value of pf itself. The second call adds an additional parameter to show how this is just normal function call syntax.
There you have it. PolyFunc now completely defines the interface we need as a base of polymorphism. Our example shows exactly how one would call functions via this interface. We just need an implementation now.
The Base Class
An interface isn’t very useful if nobody implements it. We’ll define our class again in Java as it has a clear syntax for implementing interfaces. We won’t care too much about making a useful class, but rather just something that demonstrates how implement works.
class PolyBase implements PolyFunc { public int value; public void oneParam( int a ) { System.out.println( "" + value ); } public int withReturn( ) { return value; } }
That has to be converted into C syntax that represents about the same thing. We know that we have the interface PolyFunc in there somewhere. We also have a variable value. Let’s just create a simple structure which contains both.
struct PolyBase { struct PolyFunc interfacePolyFunc; int value; };
Recall that we’ll need a pointer to the PolyFunc interface. This structure has a PolyFunc of which we can take an address. It is also the first member of the structure; this gives us something extra. In C the first member will actually have the same pointer address as the structure itself. This may not seem important now, but remember it for later.
We have to initialize this structure. Before we can initialize it we need to define each of our functions. We’ll do that all together in the below code. To indicate the function relates to our PolyBase class we’ll prefix each name. We won’t define the body of the functions just yet; this is known as a foward declaration.
void PolyBase_oneParam( struct PolyFunc * pf, int a ); int PolyBase_withReturn( struct PolyFunc * pf ); void init_PolyBase( struct PolyBase * pb ) { pb->interfacePolyFunc.oneParam = &PolyBase_oneParam; pb->interfacePolyFunc.withReturn = &PolyBase_withReturn; }
It may look complicated, but we actually haven’t done much. First we declared the functions that will be our member functions in the PolyBase class. The second step is far more important: it is the first part of the actual polymorphism. Here we have populated the PolyFunc struct in a PolyBase with pointers to our member functions. Now, if some code would like to instantiate a PolyBase object it could do so as follows.
struct PolyBase pb; init_PolyBase( &pb ); pb.value= 123; someFunc( &pb.interfacePolyFunc );
In C that declares an instance of PolyBase on the stack and calls the initializer. Once initialized we can pass the pointer to our previously defined someFunc function. We have explicitly passed a pointer to the interface variable to show what we are doing. The point was made previously however that &pb.interfacePolyFunc and &pb will actually be the same address. This is an extremely important point to consider. We have to look at the body of our member functions to understand why.
Consider the signature for our second member function int PolyBase_withReturn( PolyFunc * pf ). The pf parameter is of type PolyFunc and not PolyBase. Yet we can forsee that our member functions, if they intend to access the value variable, will need a pointer to PolyBase. Let’s use what we know and define our function.
int PolyBase_withReturn( struct PolyFunc * pf ) { struct PolyBase * this = (struct PolyBase*)pf; return this->value; }
That was actually quite easy. Since we initialized the interface we know that callers to this function actually have a pointer to PolyBase.interfacePolyFunc. We also know from before that this will actually be the exact same pointer as the PolyBase object itself. Thus we can statically cast it to our type and we have our familiar this pointer.
That’s everything we need to define classes implementing the PolyFunc interface. You can see that we could easily define any number of PolyBase like classes, each with their own member functions. The next step is to show how we can derive a class from PolyBase.
A Derived Class
One of the key features of polymorphism is virtual functions. A dervied class should be able to override functions in a base class, specializing its behaviour. Continuing with our PolyBase example we’ll now create a class called PolyDerived. First we’ll start with what this looks like in Java.
class PolyDerived extends PolyBase { String prefix; PolyDerived() { prefix = "ABC:"; } void oneParam( int a ) { System.out.println( prefix + a ); } }
In this derived class oneParam is overridden to write the value with a prefix. Notice that a constructor has also been added in this class; when we convert to C you’ll see this isn’t really much of a change. Since we haven’t introduced any new functions PolyDesired simply implements the PolyFunc interface. Converting to C we get the below.
struct PolyDerived { struct PolyBase base_PolyBase; char prefix[20]; }; void PolyDerived_oneParam( struct PolyFunc * pf, int a ); void init_PolyDerived( struct PolyDerived * pd ) { init_PolyBase( (struct PolyBase*)pd ); pd->base_PolyBase.interfacePolyFunc.oneParam = &PolyDerived_oneParam; strcpy( pd->prefix, "ABC:" ); }
The derived structure PolyDerived simply contains the PolyBase structure as its first member. Recall that tip about the pointers in PolyBase? In this case we have three equivalent pointers, the pointer to PolyDerived is also a pointer to PolyBase which is in turn a pointer to the interface PolyFunc. Notice how the derived init function simply calls the base class init function and overrides a single function pointer.
In addition to the base class we’ve also defined the additional member prefix. It is also initialized in the init_PolyDerived method. The base class doesn’t initialize any members, though it could, and probably should initialize the value member. In this regards init_PolyDerived has become the constructor for our class.
That completes the derived class. Quite easy wasn’t it? Now we can simply use the same code we had before and substitue PolyDerived for PolyBase. someFunc will now call a function in the base class, and in the derived class, exactly as you’d expect a virtual function to work.
struct PolyDerived pd; init_PolyDerived( &pd ); pd.base_PolyBase.value= 123; someFunc( (struct PolyFunc*)&pd );
First Review
In this article we’ve learned a way to implement simple polymorphism in C. The intent is to expose what is required to get virtual function calls working in a language like Java, or C++. I do not know of any compiler which actually implements virtual functions this way. We’ll get to the reason in the next article about virtual tables. Nonetheless the techniques you’ll encounter are quite similar to this. They simply improve on some aspects, and offer more features.
If you want to analyze the cost of virtual tables this is your starting point. Rather than making a simple function call you see that you first have to look up the value of the function pointer. You also need to pass a this pointer, but that isn’t really an additional cost: you would presume a non-virtual function version would also need a pointer to a data structure. In this simple case we’ve seen that conversion of the this pointer itself is zero-cost, so no overhead there. Our biggest overhead is the size of our structures, and that is what we’ll get to in the next article.