A variable is is the most fundamental concept in programming. You can’t do anything without variables. Yet most languages let you gloss over what these actually are. Simplicity often hides the truth. For a large degree of programming this is actually okay, and indeed quite helpful. But when it comes to understanding memory, and managing resources, one has to better understand what a variable actually is.
This article assumes you know the basics and tries to expose the lesser known aspects of working with variables. It breaks apart a variable into more specific terms like object, name, and reference. It is by no means a definitive guide, but intends only to expose some underlying concepts often obscured by many languages.
What is an object?
Programming is a way to manipulate data and control hardware. Data tends to be exposed in discrete blocks and hardware via distinct channels. At the most abstract level we can consider both memory blocks and hardware resources under the same umbrella term: object. In an object-oriented language this could be an instance of class, whereas in a lower level language it may be a small structure, or just a single value. For system resources this may be a file, or network stream.
All objects share two things in common: creation and destruction. In order to use an object it has to first be created. This may be a simple declaration, an instantion statement like new, or a system function. Once created the objects are used until they are no longer desired. At this point they are destroyed. Here the variable may smiply go out of scope, the instance explicitly deleted, or a system function called to release the object.
//allocate/free memory in C
char * data = malloc(123);
//open/close socket in Flash
Socket sock = new Socket(…);
The terms creation and destruction are often used in this context, perhaps coming from the higher level OOP languages. In some cases we actually have more of an acquire/release pattern. The pattern of use is nonetheless identical.
Names and References
All of those objects we’ve created are not very useful unless we have a way to access them. Every language has a lot of variation in this area, and unfortunately also uses different and overlapping terminology. Syntax aside, there are a few primary ways which we access an object.
When you declare a variable your compiler creates the backing object for you and lets you access it via the name you’ve given it. This case is so prevalent that often we don’t even consider there to be a distinction; we look at the variable name as though it were the object itself. This name can only ever refer to one object.
//create an integer object, refer to it with the name "a"
//assign 5 to the integer object referred to be "a"
a = 5;
In the above code you have an integer object created somewhere in memory. In this case you really don’t care where this object is created. You do however need a way to access this data, thus you have also given it the name “a”. It’s tempting to consider “a” to be synonomous with the object itself, but that manner of thinking will cause problems later. You must consider that you have an integer object and the name “a” by which you access it as two distinct entities.
The second most common way to access an object is by using some kind of reference, other than a name, to this object. We must be careful in how we use terms since every language does something slightly differently here. It is easiest to start with C since it has a very straight-forward definition of references, which is simply the address of the object in memory.
//C memory references are called pointers
int * b = get_integer();
*b = 5;
The above code seems very similar to our example with “a”, but has some important differences. The type of “b” is an “integer pointer” rather than an integer itself. What we are saying is that “b” is actually the address of some integer. The “get_integer” function gives us this address — we don’t care where it is. The second part is the assignment. Here we are not actually assigning to “b”, but rather to the object pointed to by “b”.
What becomes confusing at this point is that you still have the name “b”, which is a distinct object. “b” is a name for the object which is of type “integer pointer”. You can, like with “a”, assign new values to “b” itself, and thus refer to a new object. “b” has its own memory location. Inside this memory is the address of another memory location.
Java references are very similar to C pointers. All class type variables are implicitly references and thus no class instances have any direct names. They are addressed by reference only. For example, here we will compare roughly equivalent code in Java and C++.
MyClass c = new MyClass();
c.setValue( 123 );
MyClass * c = new MyClass();
c->setValue( 123 );
In both classes a new instance of MyClass is being created. “c” is not the object itself, but rather a way to refer to the object. The new instance is actually anonymous, it has no name.
On top of C pointers C++ added a reference construct. The standard tries to distinguish a reference from a pointer, but in the end, a reference becomes nothing more than a normal pointer with a slightly different syntax and restricted copy semantics. Here is a common, and misleading example, about what a reference is.
int & b = a;
The classic text at this point is to say that “b” is an alias for “a”. Alias sounds lot like it is referring to the “name” of the variable. This is wrong, C++ references are not established at the name level, but strictly at the reference/memory address level. Consider if our source object is not a named variable.
int * a = get_integer();
int & b = *a;
“b” now refers to the object pointed to by “a”. There is no way at this point at the code to know what “name”, if any , “b” is bound to. It is no longer related to “a” at all; you can freely assign to “a” and what “b” refers to will not change.
Names also exist at the operating system level and though feel slightly different are quite related. For example a file has a particular name on the drive, and this name can be used to locate that file. In this sense we have a named object just as the integer above, though most languages require a special construct to access the file.
First, a basic low-level C example.
//open a file for reading
int out = open( "output.txt", O_WRONLY | O_CREAT );
//write a string to the file
char const * text = "my text";
write( out, text, strlen(text) );
We provide a name to the “open” command and obtain a reference to the resulting file. Note here that the reference is stored in a plain old “int” type. It should be clear that our file is not actually an integer, simply that the file can be referred to via this integer value. That is, the “name” of the file is still “output.txt”, and the name “out” is an integer which stores a reference to the file. Such references are often referred to as “handles”.
Many OS commands can get confusing as to what actually happens. In the “open” example here we’ve actually created two things: the file and the OS file reference. “out” is not a direct reference to the file, but rather a reference to an internal OS structure which in turn knows how to work with the file. So we actually have a reference to a reference. This is clear when you call the “close” command, the file isn’t deleted; just the OS structure is freed, but the file on the disk remains intact.