In the article on “How Polymorphism Works: Part 1” we learned how to create virtual functions. The method which we chose has at one significant problem with it: it requires a lot of memory and repeated initialization. In this article we will look at a way to improve on the situation and bring us closer to a modern vtable implementation.
Previous Problem
In our rather simple virtual function implementation we simply put a pointer to each virtual function inside the data for an instance of a class. This looked like the below C structures.
[sourcecode language=”c”]
struct PolyFunc
{
void (*oneParam)(struct PolyFunc *,int);
int (*withReturn)(struct PolyFunc *);
};
struct PolyBase
{
struct PolyFunc interfacePolyFunc;
int value;
};
[/sourcecode]
This means that for each instance of PolyBase we have a complete copy of the member function pointers. That seems like a lot of wasted memory, especially for small objects. It also seems unusual that every addtional virtual function would increase the size of an object. Additionally each time we initialize a new object we have to set all of those pointers. While it may not seem like much for a tiny class, this will add up and create a performance issue.
A Virtual Table
Knowing that the pointers to virtual functions are the same for every instance of the class it seems like we should be able to create that list of pointers just once. That is, we should only ever need one instance of PolyFunc per implementing class. Easily done, we simply modify the definition of PolyBase.
[sourcecode language=”c”]
struct PolyBase
{
struct PolyFunc * interfacePolyFunc;
int value;
};
[/sourcecode]
Now, instead of setting the pointers each time we initialize this class we can define a single global singleton and assign the pointer at construction time.
[sourcecode language=”c”]
struct PolyFunc PolyBase_VTable =
{
&PolyBase_oneParam,
&PolyBase_withReturn
};
void init_PolyBase( struct PolyBase * pb )
{
pb->interfacePolyFunc = &PolyBase_VTable;
}
[/sourcecode]
We’ve called this global instance a VTable, short for Virtual Table, and you’ll find it called that quite often. The C initialization syntax is used here to set each member to the address of the function. With this setup we can add any number of virtual functions and never increase the size of an object nor increase the initialization time.
The Callee
We have a bit of a problem however. The function we are calling is expecting a pointer to the PolyFunc struct, but it is also expecting that pointer to be a pointer to the instance itself. Seen in isolation you may not notice a problem, so below it is shown alongside the calling code and the member function.
[sourcecode language=”c”]
void someFunc( struct PolyFunc * pf )
{
int r = (pf->withReturn)( pf );
(pf->oneParam)( pf, r );
}
int PolyBase_withReturn( struct PolyFunc * pf )
{
struct PolyBase * this = (struct PolyBase*)pf;
return this->value;
}
void calling()
{
struct PolyBase pb;
init_PolyBase( &pb );
someFunc( pb.interfacePolyFunc ); //No & here!
}
[/sourcecode]
Look at calling first. Previously we took the address of interfacePolyFunc since that is what someFunc expects. Now since we already have a pointer we shouldn’t need to do that. This seems okay so far. Inside the someFunc code the first function call will actually work, the PolyFunc object we’ve provided is valid in this regards. The trouble starts with the parameter we pass to the function.
In PolyBase_withReturn the first line expects the PolyFunc pointer to be directly convertible to a PolyBase pointer. This is no longer valid and this function will now fail: it may cause the program to crash, or silently return an invalid value. The problem started with the call to someFunc. By passing the value of the virtual table directly we’ve completely lost the reference to the original object. We need a way to pass the original object and the virtual table.
Pointer to Pointer
We could simple pass two pointers, but this introduces a slight overhead in every function that we do and we’d like to avoid that. It may not be a terrible idea however, as there are likely situations where it is the better thing to do. In fact there is a very common situation where something like this is done: with callback functions. Within many APIs you’ll see functions with signatures that include a callback function and a user defined value to provide to that function.
[sourcecode language=”c”]
void FuncWithCallback( void (*callback)( int ), int userValue )
{
callback( userValue );
}
[/sourcecode]
This is pretty much the standard C way of doing callback functions. You provide both an explicit function and the data to pass to that function. Here only a single function pointer is being passed rather than a table of functions, but that is rather a small detail. Since this method is used often you can’t completely discount the two parameter approach. In our general case however we will try to stick with one parameter.
In order to preserve the reference to the original object and provide a pointer to the virtual table someFunc is going to have to expect something slightly different as a parameter. Let’s first start by making it expect a pointer to the pointer.
[sourcecode language=”c”]
void someFunc( struct PolyFunc * * pf )
{
int r = ((*pf)->withReturn)( pf );
((*pf)->oneParam)( pf, r );
}
void calling()
{
struct PolyBase pb;
init_PolyBase( &pb );
someFunc( &pb.interfacePolyFunc );
}
[/sourcecode]
This works! The calling code is exactly as we had in the last article and someFunc needs only to dereference the pointer. Again, since interfacePolyFunc is the first member in the PolyBase structure it actually has the same address as the instance itself. This allows the member functions to trivially cast this pointer back into their own type as they did before.
The use of a pointer to a pointer may not look so clean however. Here C provides many options to make this cleaner. We could simply make a typedef of this type, or we could instead make a new structure which contains a pointer as its only member. Whichever way we chose won’t alter the actual semantics of the code, so we won’t go over it here.
The Common VTable
This implementation of a virtual table is a rather common one. What essentially defines it is that the first member of every object instance is a pointer to its virtual table. It reduces the overhead of instantiating classes and reduces the amount of memory used. The tradeoff is that functions have one additional dereference before they can call the function. The cost of this dereference is extremely low, generally a single low-cost CPU operation. You’d generally find this cost to be insignificant in most programs.
GCC, MSVC, and many other compilers use this setup for C++ virtual tables. In fact MS defined this as the binary standard for C++ interfaces in windows at one point. I’d suspect it is still the standard in use today. This doesn’t mean a compiler has to use this method (well, on Windows it kind of does). Ultimately any compiler will need to maintain a list of function tables somewhere. Knowing how this works for C++ will be a good step to understanding whatever other method you may encounter.