The Life of a Programmer

Why do we need pointers/references?

Below I’ve tried to catalog some of the typical uses of references in programming. My aim is to clarify the role references play in a language to help me decide how they will be handled in Leaf. I’m looking specifically at cases where we want an explicit reference (such as a pointer) to an object as opposed to an implicit value access mechanism (internal to the compiler).

Input/Output Function Parameters

Modiying an object passed to a function is perhaps one of the most common uses for a reference.

[sourcecode]
void inplace_sort( Vector & data )
{ … }
[/sourcecode]

Such functions could of course be written to return the new data instead of modifying it in place. While that may be better at times, situations in which in place modification is desirable will come up from time to time.

Share an Object Between Collections

Often we have data which must exist in multiple collections. Either it is indexed in different ways or it serves multiple roles: We don’t want copies of the object in each collection but a shared reference to the same object.

[sourcecode]
Object a = create_object();
by_name.add( a.name, a );
all.add( a );
[/sourcecode]

Data Structures

Construction of data structures, not the data they contain, usually involves references.

[sourcecode]
//a singly-linked list
struct LinkedListNode<T>
{
T data;
LinkedListNode<T> nextNode;
};
[/sourcecode]

Here we aren’t concerned with the references to the data, but to the structure itself. In the above example ‘data’ in fact need not be a reference at all but ‘nextNode’ does.

Services

Many class types don’t represent data at all: they abstract a service. Only references make sense in such cases since we always wish to refer to the same service.

[sourcecode]
SoundSystem ss = obtainDefaultSoundSystem();
ss.stopSound();
ss.playSound( "file.wav" );
[/sourcecode]

Above we’d expect the same SoundSystem whenever ‘obtainDefaultSoundSystem’ is called.

[sourcecode]
SoundChannel ch = ss.createChannel();
ch.setVolume( 0.5 );
ch.playSound( "beep.ogg" );
[/sourcecode]

In the example above SoundChannel is also a service since we don’t actually control the underlying implementation. We’re just using ‘ch’ to reference the channel that was created. Services are the key situation in which references play a dominant role: you always want a reference to a service, and most often it will be shared (either internally or externally).

Access Data in Member Functions

The ‘this’ pointer is a way to refer to an object’s data within a member function. In Java and C++ ‘this’ is implicit, but in Python you must always specify ‘self’.

[sourcecode]
struct Point
{
float a, b;

float distance()
{
return sqrt((a*a)+(b*b));
}
}
[/sourcecode]

I’m not sure if this is a good example of an explicit reference. We don’t really care how ‘this’ is stored or seen externally. It is merely a way for us to refer to our internal data. A different convention could also be used for this purpose.

Self-Registration from Member Functions

However, in the case of an object that may wish to register or add itself to a collection from within a member function, ‘this’ must be an actual reference. I’ve encountered this kind of scenario quite often in GUI applications where widgets register listeners and validation functions.

[sourcecode]
class GraphWidget : public Widget, public DataListener
{
GraphWidget()
{
Application::instance().installDataWatcher( this );
}

void dataAvailable( … );
}
[/sourcecode]

The GraphWidget is behaving as a service to ‘Application’. In all cases where interfaces are used we tend to think of them as services. The ability of a service to register itself is common and needs special attention because the object itself needs to be aware of how it is being stored/referenced.

Sub-Object Return

In C++ you can provide access to collection data which can be modified directly.

[sourcecode]
T& get( int ndx ) { return array[ndx]; }

get(10) = 15;
[/sourcecode]

This kind of syntax is valuable to produce custom data types which are natural to use.

Optimized Parameter Passing

When reading from a function’s parameter we usually don’t care whether it was passed by value or by reference. Passing large data structures by value is often not efficient though. Even for parameters which would typically be seen as a value only it therefore makes sense to use a reference as an optimization.

[sourcecode]
float product( Collection<float> const & p ) { … }
[/sourcecode]

We can thus avoid creating the copies needed in pass-by-value. In C this can be accomplished using pointers.

I’m not sure this is a strong case in favour of references though, since it is strictly an optimization. A compiler may be better at figuring out when to do this automatically.

Iterator Access

Like sub-object return, data is often accessed via iterators. This is useful for modifying collections in place as opposed to just modifying the data in the collection.

[sourcecode]
for( auto iter = array.begin(); iter != array.end(); ++iter )
{
if( iter->distance() > 10 )
*iter = Point(0,0);
}
[/sourcecode]

The generic C++ mechanism of returning objects by reference and operator overloading make this possible.

Heap Memory

In C, and if you want in C++, structures can be placed in an arbitrary memory location (usually returned via malloc). This is done by reinterpreting the memory address as a reference to the object.

[sourcecode]
struct T * t = malloc( sizeof(struct T) );
[/sourcecode]

In general I don’t find this to be a very good way to create generic heap objects. The ‘new’ operator in C++ and Java is more appropriate. However, I’ve worked on enough projects with raw data to know that the ability to work with arbitrary memory can be of value in some situations. This needn’t be for all class types though, just for a subset of pure data types (in C these are always structs).

Weak Cache Data

Though not used all that frequently, weak references come up often enough that they can’t be ignored. Caching data is one common use of weak references. Objects in the cache should be allowed to disappear during garbage collection or when no longer used.

[sourcecode]
Data calculate( query )
{
weak Data wptr = cache.find( query );
if( wptr )
{
Data t = wptr.lock(); // don’t allow garbage collection when in use outside of the cache
if( t )
return t;
cache.erase( query ); // t has already been garbage collected
}

Data t = expensiveCalculation( query ); //re-generate the data
cache.insert( query, weak(t) ); //store in the cache

return t;
}
[/sourcecode]

There are also other cases where weak references may be helpful of course – this was just one typical example.

Please join me on Discord to discuss, or ping me on Mastadon.

Why do we need pointers/references?

A Harmony of People. Code That Runs the World. And the Individual Behind the Keyboard.

Mailing List

Signup to my mailing list to get notified of each article I publish.

Recent Posts