Philosophy

What is reference counting?

Objects are created, live for a while, and then destroyed. While creation is fairly clear, the when and how of destruction is fairly language dependent. In languages like C you’re basically on your own, whereas in a very high level language like Python you don’t even think about destruction. But whether it is manually done, or controlled by the language, there needs to be some way to track what needs to be destroyed. Reference counting is one popular technique.

First we’ll cover the basics of object lifetime — you may nonetheless wish to read “What’s an Object” as quick referesher on objects. Then we’ll move on to the basics of refernce counting.

Scope Lifetime

The most common lifetime for an object is scope-based: the object exists for the duration of a function or bracketed block of code. Even when using references the name of the reference itself tends to be a scope-based variable.

	int function( int a )
	{
		int b = a + 1;
		a_struct c;

		if( b > 2 )
		{
			int d = b;
			a_struct e;
		}
	}

The above C example shows us basic scope based lifetime. The variable “b” has function scope. When the function is entered an integer is allocated to be referenced by the name “b”. When the function exits this object will automatically be destroyed. The same happens for “c”, but here more space will be allocated as it is a structure and not a primitive. The “if” statement introduces a new scope, and in C the life of “d” and “e” will only be until the end of that “if” block.

It is important to note that virtually all languages have some kind of variable scoping. Even in languages where memory management is automatic, the reference variables still have a scope of some kind. While you as the programmer may not be aware you are working with references, the language does. The basic scope rules play an important role in memory management of all languages.

Pointer Lifetime

For a pointer it is essential to see the difference between the reference (the pointer variable itself) and the refered to object. The actual pointers follow the same scope lifetime rules, in that the pointer object itself disappears once the scope returns, but the refered to object persists.

	int* function( )
	{
		int * a = new int( 5 );
		return a;
	}

	void caller()
	{
		int * b = function();
		delete b;
	}

In this basic C++ example the function creates a new integer object. Inside the function there is a pointer called “a” which refers to the new object. The return statement of “function” creates a copy of the pointer, which wil be assigned to the “b” pointer in the caller function. “b” in turn uses the pointer to delete the underlying object.

This is completely manual memory management. You are responsible for tracking and deleting the object.

Reference Counting

It is often difficult to manually track the life of an object. This is particular true when the object is used widely through the program. It would be helpful to have a somewhat automatic lifetime management. Once everything is done using the object it should be deleted. Reference counting is one such technique.

This method is simply keeping an extra counter along with each object that is created. The counter is the number of references that exist to the object, in the C/C++ case this would how many pointers refer to this object. Anytime a pointer is copied we increment the count, and anytime a pointer goes out of scope, or is reset, we decrement the count. When the count hits zero the object is deleted since nothing more is using it.

Invasive Counting

There are two basic approaches to reference counting, either invasive or non-invasive. Let’s first look at invasive as the examples are slightly clearer. In this approach the objects themselves are aware of the reference counting mechanism. Users of the pointers explicitly increment and decrement the count.

	MyType * a = get_object();
	//do something with a
	a->decrement();

The above is a very typical use of an object with invasive reference counting. You call some function which returns an object. Once you are done with the object you are to release it. Here we call “decrement”; if the count reaches zero the object will be deleted. The called function might look ilke this:

	MyType * current_object;
	
	MyType * get_object()
	{
		current_object->increment();
		return current_object;
	}

The function provides access to some global resource, or more likely a member variable. It calls “increment” prior to return to indicate there is a new reference to the object now. This makes the object safe to use by the caller since the count must be greater than zero after being returned.

To implement this approach all these objects simply need to derive from some common base class. A much simplified form of this class is shown below — many essential details are omitted to demonstrate strictly the core reference counting mechanism.

	class ReferenceCount
	{
		int count;

		ReferenceCount()
		{
			count = 1; //start at 1 as creation implies at least once reference is being made
		}

		void increment()
		{
			count++;
		}

		void decrement()
		{
			count--;
			if( count == 0 )
				delete this;
		}
	};

	//any reference counted object simply derives from the above type
	class MyType : public ReferenceCount { ... }

Microsoft’s COM system uses this approach. The base classes is called “IUnknown” and has the two functions “AddRef” and “Release”.

Non-Invasive Counting

Non-invasive reference counting does not require the objects to derive from any common base type. It is called non-invasive since there is nothing added to the objects for the reference counting, it is handled external to the object itself. This allows reference counting to be added after the fact. The fundamentals remain essentially the same.

	Counter<MyType> current_object;
	
	Counter<MyType> get_object()
	{
		current_object.increment();
		return current_object;
	}

	Counter<MyType> a = get_object();
	//use a.getObject() to obtain the object itself
	doSomething( a.getObject() );
	a.decrement();

This is the same logic as the non-invasive case but simply with a wrapper class. Here MyType isn’t aware that it is being reference counted. One approach is not necessarily better than the other. Each has its own set of advantages and disadvantages.

Still Not Automatic

Neither of the above methods manage to avoid the feeling of being manual. While they certainly help track the use of an object, the programmer is still left calling increment and decrement on their own. How automated this is, or can be, depends highly on the language.

PHP does this completely automatically. All variables are implicitly reference counted. When you assign to a new name in PHP it increments the count, if you call “unset” the count is decremented. The counting is completely hidden from the PHP programmer.

C of course does nothing for you: due to the language itself automatic counting is simply not possible. C++ on the other hand provides a library of smart pointers which can do it automatically.

Example: C++ shared_ptr

C++ offers a shared_ptr wrapper which is a non-invasive reference counter. It simplifies reference counting by doing all the increments and decrements automatically based on assignments and scoping rules.

	shared_ptr<int> function()
	{
		shared_ptr<int> a( new int(5) ); //count is now 1
		return a; //count is now 2, one for "a" and one for return temporary (*)
	} //count drops to 1 as "a" goes out of scope

	void caller()
	{
		shared_ptr<int> b( function() ); //count is now 2, one for tempoary and one for b
		//now count is 1 since the temporary is gone
		shared_ptr<int> c = b; //up to 2 since we have another pointer copy
	} //now count is 0 since "b" and "c" go out of scope, thus the object will be deleted

In the example the current count is shown at each step. The key to note here is that any copy of the “shared_ptr” will increment the count on the object. Once that count hits zero the object will be deleted. In the context of C++ the “shared_ptr” is not a special language feature, but rather part of the standard library. There is nothing magical about it, it is merely a structure similar to below (where T is the object type, usually a template parameter in C++).

	struct shared_ptr
	{
		T * obj;
		int * count;
	};

This reference counted pointer is thus no more than a reference to the underlying object and a reference to the count itself. In addtion it has many operators and functions overloaded to enable the automatic counting mechanics.

(*Some of these steps are merely logical, since an optimizer can actually avoid certain copy operations, like a return statement and assignment from temporary.)

4 replies »

  1. Nice explanation – it is this reference counting that forms the basis of all garbage collectors in Java and elsewhere

    • No, this type of reference counting is not the basis of Java style garbage collection. Java uses a scanning garbage collector. It need not explicitly track the references to an individual object, instead it scans the memory every so often looking for objects that don’t have references anymore. I should probably do an article about this as well, to show the contrast. There are also some languages VMs which use a combination of reference counting and scanning.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s