The Life of a Programmer

Divorcing a value from its name

Understanding “values” is perhaps the most critical part of understanding programming. We are inundated with a variety of terms like “by value”, “by reference”, “member copying”, “binding”, “pointer”, “object reference”, “heap variable” and a myriad of others. To make sense of this, we need a firm understanding of what a “value” actually is.

For my Leaf language project I have been seeking an elegant solution to the problem of “value” versus “reference” semantics. Each has its place, but these two paradigms tend to be somewhat muddled in existing programming languages. I am going to write a series of articles discussing my thoughts on this topic as I progress with my implementation.

A Name and a Value

In static compiled languages, a declaration introduces a variable. In C, C++, and Java, for example, a simple integer looks like this:

int a;

What does this trivial piece of code actually do? I tend to think of this as having an integer named ‘a’. But what does “having an integer” mean? This code tells the compiler that we wish to have an integer value, and we will refer to it with the name ‘a’.

a = 5; // a now refers to an integer with value '5'
print( a ); // it should print 5 since that is what a refers to

We are not saying where to store that integer value. It is up to the compiler to decide whether it goes into memory, a register, or some other location. To be clear, ‘a’ is not the value itself. It is merely a way to refer to the value. ‘a’ is the name, and the integer itself is the value.

Now let’s look at similar code using an object in Java and a plain integer in Python.

//Java
Integer a = new Integer(5);
print( a );
//Python
a = 5
print a

Again we have a name ‘a’ which refers to an integer value ‘5’ located somewhere. The underlying mechanics differ wildly between the languages, but at a logical level, these code snippets have the same meaning: the compiler is responsible for somehow mapping ‘a’ to its integer value.

By value or by reference?

At some point, we may want to assign a new value to ‘a’. Instead of worrying about the specifics of what a given language does, let’s think about the logic at a higher level.

a = 5
a = 17

In the first step, we know ‘a’ refers to some location where the value ‘5’ is stored. For simplicity let’s assume it is in memory at location 0x100. When we assign ’17’ to ‘a’ one of two things can happen. The first option is to replace the value at 0x100 with ’17’. In this case, ‘a’ still refers to the same location, but now it stores a new value. Alternatively, the compiler can make ‘a’ refer to a different location. Let’s say it chooses to point ‘a’ to 0x200 where the value ’17’ resides. In the first case we have changed the actual value whereas in the second case we have changed the reference.

What happens when two variables are involved? For this to be interesting, I will introduce a ‘set’ function which modifies the actual value. Regardless of how the compiler locates the value associated with a variable name, it will modify it in place.

a = 5
b = a
a.set( 17 )
print( b )

Let’s first consider the case where each name has a distinct value:

  1. The compiler chooses location 0x100 for ‘a’ and puts the value ‘5’ there
  2. The compiler chooses location 0x200 for ‘b’ and copies into it the value ‘5’ from ‘a’
  3. Location 0x100, referenced by ‘a’, is overwritten with value ’17’
  4. Print loads the value of ‘b’ at location 0x200 which is still ‘5’

In this scenario, the value ‘5’ is printed since ‘b’ is unaffected by the ‘a.set( 17 )’ statement.

What about the scenario where the values are left untouched and only the references are changed?

  1. The compiler places the value ‘5’ into location 0x100 and refers ‘a’ to that location
  2. The compiler refers ‘b’ to the same location as ‘a’ at 0x100
  3. Location 0x100, referenced by ‘a’, is overwritten with value ’17’
  4. Print loads the value of ‘b’ which is now ’17’ too since it also refers to the value at location 0x100

Here ’17’ is printed for the value of ‘b’ instead of ‘5’.

The difference comes down to whether ‘a’ and ‘b’ both refer to the same value or distinct values. If they share same value then overwriting one will affect the other. If they have distinct values then overwriting one will leave the other variable’s value unchanged. This makes it rather essential to know precisely what the assignment operator ‘=’ actually does!

Careful Now

In the previous example, I use memory locations as they are a convenient way to explain. But I early noted that values could be stored elsewhere; the compiler determines where values are stored. It may end up in memory or a register. Some values may end up embedded in the code itself (the optimizer may embed small values in machine code). All the compiler guarantees is a consistent mapping between the name and the value.

C++ and C are quite specific in how they refer to values. A name does not bind directly to a logical value but instead refers to an “object”. This object is a series of bytes which is capable of holding a value. Assigning a new value changes these bits which in turn modifies the value. These languages require a stricter definition as they define a lot of operations that work directly on these raw memory bytes. For most code though, the logic of name to value mapping holds.

Moving On

Different languages use different terms for the same concepts, which can hinder understanding. Ultimately each language must adhere to the same principle, mapping a name of the variable to a value located somewhere. The potential for difference is exemplified in the final example, contrasting shared and distinct value. You must understand the language you are using!

In the coming articles I’ll look further into generic notions like immutability and parameter passing. I will also show how a variety of languages implement these mechanics and how I hope to make some improvements in Leaf over existing languages.