Tags

, ,

Understanding “values” is perhaps the most critical part of understanding programming. We are inundated with a variety pf terms like “by value”, “by reference”, “member copying”, “binding”, “pointer”, “object reference”, “heap variable” and a myriad of others. In order to make sense of this we need a firm understanding of what a “value” actually is. I will start with some very elementary ideas, but it’s important to understand these basics deeply before we can proceed to talk about some of the more complex problems in language implementation I’ve been exploring for Leaf.

For my Leaf language project I have been seeking an elegant solution to the problem of “value” vs. “reference” semantics. Each has its place but these two paradigms tend to be somewhat muddled in existing programming languages. I am going to write a series of articles discussing my thoughts about this topic as I progress with my own implementation.

A Name and a Value

In static compiled languages, a declaration introduces a variable. In C, C++, and Java for example, a simple integer looks like this:

int a;

What does this trivial piece of code actually do? I tend to think of this as having an integer named ‘a’. But what does “having an integer” mean? This code tells the compiler that we wish to have an integer value and we will refer to it with the name ‘a’.

a = 5; // a now refers to an integer with value '5'
print( a ); // it should print 5 since that is what a refers to

What we are not telling the compiler is where that integer value should be stored. It is up to the compiler to decide whether it goes into memory, a register, or some other special location. To be clear, ‘a’ is not the value itself. It is merely a way to refer to the value. ‘a’ is the name, and the integer itself is the value.

Now let’s look at similar code using an object in Java and a plain integer in Python.

//Java
Integer a = new Integer(5);
print( a );

//Python
a = 5
print a

Again we have a name ‘a’ which refers to an integer value ‘5’ located somewhere. The underlying mechanics differ wildly between the languages, but at a logical level these code snippets have the same meaning: the compiler is responsible for somehow mapping  ‘a’ to its integer value.

By value or by reference?

At some point we may want to assign a new value to ‘a’. Instead of worrying about the specifics of what a given language does, let’s think about the basic logic at a higher level.

a = 5;
a = 17;

In the first step we know ‘a’ is associated with some location where the value ‘5’ is stored. For simplicity let’s assume it is in memory at location 0x100.  When we assign ’17’ to ‘a’ one of two things can happen: the value at 0x100 can be replaced with ’17’. In this case ‘a’ still refers to the same location but now it stores a new value. Alternatively, instead of changing the value at location 0x100, the compiler can make ‘a’ refer to a different location. Let’s say it chooses to point ‘a’ to 0x200 where the value ’17’ resides. In the first case we have changed the actual value whereas in the second case we have changed the reference.

What happens when two variables are involved? For this to be interesting I will introduce a ‘set’ function which modifies the actual value. That is, regardless of how the compiler locates the value associated with a variable name, it will find it and modify it in place.

a = 5;
b = a;
a.set( 17 );
print( b );

Let’s first consider the case where each name has its own distinct value:

  1. The compiler chooses location 0x100 for ‘a’ and puts the value ‘5’ there
  2. The compiler chooses location 0x200 for ‘b’ and copies into it the value ‘5’ from ‘a’
  3. Location 0x100, referenced by ‘a’, is overwritten with value ’17’
  4. Print loads the value of ‘b’ at location 0x200 which is still ‘5’

In this scenario the value ‘5’ is printed since ‘b’ is unaffected by the ‘a.set( 17 )’ statement.

What about the scenario where the values are left untouched and only the references are changed?

  1. The compiler places the value ‘5’ into location 0x100 and refers ‘a’ to that location
  2. The compiler refers ‘b’ to the same location as ‘a’ at 0x100
  3. Location 0x100, referenced by ‘a’, is overwritten with value ’17’
  4. Print loads the value of ‘b’ which is now ’17’ too since it also refers to the value at location 0x100

Here ’17’ is printed for the value of ‘b’ instead of ‘5’.

The difference comes down to whether ‘a’ and ‘b’ both refer to the same value or each refers to its own value. If they refer to the same value then overwriting one will affect the other. Otherwise overwriting one will leave the other variable’s value unchanged. This makes it rather essential to know precisely what the assignment operator ‘=’ actually does!

Careful Now

In the previous example I use memory locations as they are a simple way to explain the examples. But note near the beginning I say that values can be stored elsewhere. The compiler ultimately gets to determine where the value is stored. It may end up in memory or in a register. In some cases values may end up embedded in the code itself (the optimizer may embed small values in machine code). Regardless of where the compiler puts its values, it must ensure the logical mapping of name to value holds.

C++, C, and other languages, are somewhat different in how they refer to values. A name does not bind directly to a value, but instead refers to an ‘object’. This object is a series of bytes which is capable of holding a value. Assigning doesn’t change the value in these languages, instead it changes the bytes in the object. I’m still quite undecided on what is the best abstract way to consider this topic.

Moving On

I have stayed away from giving language-specific information since different languages use different terms for the same concepts. My key point in this article is to show how the name of a variable is merely a way for the compiler to locate a value somewhere. The final example then shows that different variables could either refer to the same value, or to distinct values.

In the coming articles I’ll look further into generic notions like immutability and parameter passing. I will also show how a variety of languages implement these mechanics and how I hope to make some improvements in Leaf over existing languages.

About these ads