In the grand scheme of execution costs function calls come very close to the bottom of the list. They don’t cost very much at all — at least not in most compiled languages. It is nonetheless interesting to examine the costs involved. It is even more interesting to find out that a function call could be the dominant cost in your program. Knowing how they work also helps explain why they aren’t so cheap in languages like PHP or Flash, whereas with an optimizing compiler they become free under certain circumstances.
What is a function call
Examining a function call first requires us to define what a function call is. For the most part we’ll assume this involves everything included in evaluating the function syntax. That looks something like this:
result = do_something( param1, param2 );
Exactly what happens here varies greatly depending on the language, OS, and calling convention. In every situation though all these things have to happen:
- resolve the parameters “param1″ and “param2″ into memory addresses and types
- resolve the result variable “result” into a memory address and type
- resolve the function name “do_something”, find the exact function to be called, address and signature
- convert the parameters to the type of the receiving function
- push the converted parameters onto the stack
- push a return address onto the stack
- jump to the function address
- function does something here
- jump to the return address
- pop the return value from the stack
- pop the parameters from the stack
- convert the return value to the type of “result” and store it
That’s a lot of steps for something I said doesn’t cost much! Understanding how it goes from a very long list of steps to a highly efficient function call needs some explaining.
Some of the expensive steps are those that involve resolving a name into an actual variable. Resolution means locating the memory address and the type of the variable. For compiled languages, like C, this is done at compilation time. At runtime only a few simple additions have to be done to get the real location of a variable.
C limits conversion to type promotion. Types may be converted to any larger type that entirely contains the smaller type. This includes things like a short int to an int, or a float to a double. It actually allows the other direction as well, but most compilers will emit a warning in such cases, prompting the user to add an explicit conversion. Integer conversions are extremely cheap, either having native functions or not requiring a function call at all (some versions of the stack push operations do implicit conversion). Floating point conversions however are a bit costlier, whether from integer to float, or float to double. Most modern CPUs again have optimized functions for such things, but not all do. And in any case it is costlier than integer conversion.
Java has a special type of conversion known as boxing. In this case fundamental types are automatically converted to/from their class counterparts, such as int to Integer. We can assume that unboxing is quite cheap, since it simply reads a value from the object. Boxing however involves the instantiation of a new object and incurs all the costs related to that.
C++, Java, and other languages with object hierarchies can incur yet another kind of conversion cost. They need to cast from a derived object to the base object if required by the function. If you have Class B, which inherits from Class A, and you call a function expecting A with a type of B, it needs to be converted to the base type. Since the hierarchies are static however, this can be resolved once and results in a static addition to the address. Explicit downwards casts, from a base to a derive class, can become costlier, but those are explicit conversions and don’t belong in our current discussion.
Pushing and Popping
Pushing values on the stack, and popping them off later, is one of the cheapest things a CPU can do. There are all sorts of special functions, and addressing modes, which make working with the stack very cheap.
Optimization also starts to play a significant role here. In compiled languages the compiler can look at both the caller and callee and do argument passing as efficiently as possible. In some cases this may mean reusing variables on the stack, thus avoiding a certain number of push and pop operations. In other cases it may decide to pass arguments via registers instead of the stack. Under ideal circumstances the pushing and popping of arguments can actually be completely eliminated by the compiler. This should also apply to just-in-time compilers, though they may have less flexibility in this regards. Very dynamic languages also have a hard time taking advantage of such gains. Languages that run exclusively in a VM, never becoming native code, may gain nothing from such optimizations.
At some point the function code actually has to be executed. This is done simply by jumping to the location of that code. This doesn’t change whether the language is compiled or interpreted. Once the function code is complete it jumps back to its call location, or rather just after its call location to continue processing. This return location is other stored on the stack just like any other parameter. The CPU has special functions to do this however as it is so common. We’d assume they are faster than doing the pushing and popping of these addresses on our own.
Compilation provides another option at this point. It may inline the function. In C++ you have a special keyword, but at best this is only a hint; the compiler will decide on its own what it should do. In C, and in any language built on GCC, the compiler will consider what to do and will inline functions as it sees fit. If a function is small enough inlining actually makes the code smaller in addition to faster. In other cases it makes the code a bit larger, but faster. Java’s JIT may also do inlining. It has the added benefit of runtime statistics to determine when this makes sense.
The Function Call
Under ideal circumstances the cost of a function call approaches zero. This is most commonly true for compiled languages with a good optimizer. Interpreted, or runtime languages, suffer additional costs during a function call. In both of these cases one hopes that the cost of the function call is actually insignificant compared to with what the function is doing. If the function calls are expensive it is more than likely the cost of an implicit conversion. In C++ for example the conversion from char * and std::string is quite expensive. Not to mention the creation of temporary containers which may well be the dominant cost in the code!
So there we have it: the function call. In best case zero cost. In worst case the most expensive part of your program. The biggest word of caution are people working in multiple languages. Though the function call always tends to look the same, it may behave quite differently, especially with respect to performance. Since function calls make up such a huge part of most programs, even tiny costs will add up. Thus it makes sense to look carefully and reduce the cost of function calls in any key loops or frequently used code.