Reference counting is a common form of memory management. In some languages, like Python, it’s the natural way objects are tracked, and in others it’s in the standard library, such as shared_ptr
in C++. Like all memory management, reference counting has some definite overhead costs. In this article I’ll show how basic reference counting works and explore the costs associated with it.
How reference counting works
When we talk about reference counting we’re talking about how memory is tracked, in particular when assigning a pointer from one variable to another:
1 |
a = b |
In the above example a
and b
are a pointer, or reference, or other kind of shared variable type; the assignment results in a
and b
referring to the same object.
With classic C pointers this assignment is nothing more than a simple memory address assignment: a basic mov
operation. With reference counting the objects are wrappers to the “true” object. Each assignment involves incrementing and decrementing the reference count. The pseudo-code for a reference counting assignment looks like this:
1 2 3 4 5 6 7 8 9 10 11 |
if( a != null) { if( --a.referenceCount == 0 ) { a.release() a = null } } if (b != null) { a = b a.referenceCount++ } |
To allow unset pointers (nulls), we need this very defensive coding style. We can also see it’s the releasing of the previous reference, not the assignment of the new one, that involves most of work.
The costly concern
The call to release
isn’t a new cost. This is the destructor call we’d expect to happen at some point anyway. This is true even in a manually managed system, like C’s free
and malloc
, or a scanning GC: at some point we need to actually cleanup the objects we’ve initialized. Having this “call” to the release throughout the code can however bloat its size, which can negatively impact on performance.
The if
statements also have some overhead. Both the code size increase and the actual comparison can have a minor impact on the code speed.
But it’s the increment and decrement that are by far the most troublesome. These aren’t just plain integer operations, they need to be synchronized operations; multiple threads need to have a consistent view of the count.
These type of instructions can result in cache flushing, waiting, and also prevent several CPU optimizations (refer to CPU memory). It’s this aspect of modifying the reference count which has the highest cost.
How often though
If we actually had that overhead on every logical pointer assignment then reference counting might be slow. We can’t discount how good CPUs are at dealing with this though; they’ve got really good at handling this kind of cross-thread synchronization without being too slow. So even with this reference counting overhead we’d still have a usable system.
But there’s a more important point: the vast majority of pointer assignments don’t actually need the reference counting overhead. Consider a simple function call, though this does logically copy the pointer, it doesn’t need to modify the reference count.
1 2 3 4 5 6 7 |
defn cone = ( x : shared integer ) -> { } defn pine = -> { var y : shared integer cone(y) } |
When foo(y)
is called the compiler knows that y
must be a valid shared object (in Leaf the shared
type creates a reference counted object). It doesn’t need to increment the count on y
to call cone
since it knows the lifetime of y
will be longer than the function. The parameter assignment ends up being the same as it would in C, just the raw pointer being copied.
Consider how this looks in C++ using
shared_ptr
. A function with the signature( shared_ptr<integer> x )
would require the reference overhead on the parameter. But a function with the signature( shared_ptr<integer> const & x )
does not have any reference counting overhead. Unfortunately in C++ this syntax is not as flexible as the non-constshared_ptr
due to it being a library, rather than core language, feature.
Short-lived local assignments, and certain kinds of transitive assignments don’t need to modify the reference counts either. Not all variables need to be reference counted: if the lifetime is limited to a scope it could be locally allocated on the stack without a count.
In short
The primary cost of reference counting are the synchronized increment and decrement operations. These mess with the CPU pipelining and lead to delays. To reduce this cost, compilers optimize by eliminating reference counting in many places where it’d be redundant.