Tags

, , , ,

Programmers don’t usually spend a lot of time thinking carefully about names and values. As long as we follow a few basic rules, our code will generally behave as expected. As I develop my language Leaf however, I need to give a great deal of consideration to these rules. The separation between names and values, and how things are assigned, form the core of a language. Of course I’d like to get this right. Unfortunately it’s not as simple as deciding between “by value” or “by reference” semantics. There are many situations and programming techniques to consider.

In this article I am going to talk about one such related concept, immutability. An immutable is a special type of value which has the property that after construction it never changes. Immutables can play a significant role in language design.

Note: My use of the term “value” here is similar to the term “object” used in the C and C++ standards. Refer to my previous article for further explanation.

The Obvious Case

When you see the token ’5′ in source code, you assume it will always have the numeric value ’5′. Writing a statement like the following doesn’t make sense:

    5.set( 6 );

This statement is, generally speaking, syntactically correct: the compiler’s parser will accept it and will form a valid syntax tree. It is when the compiler analyzes the statement semantically that it will be rejected. ’5′ is an immutable value so the compiler won’t let you change it.

Immutable Values

I’m going to continue now using a vector value. This will help to demonstrate exactly what I mean by the term “immutable”, which isn’t always clear when using trivial data types. I will use a mathematical style vector of integers, which correlates with an array of ‘int’ in C or Java.

Let’s say that in the following code the value ‘[1,2,3]‘ is deemed to be immutable and is assigned to ‘a’. Because it’s immutable, any code attempting to modify a part of the value will not be allowed:

    a = [1,2,3];
    a[1] = 4; //error

Immutables don’t have to be compile time constants however. One may use any variable to construct a new immutable. They may also be used in expressions which result in new values:

    t0 = 1;
    t1 = 2;
    t2 = 3;
    a = [t1,t2,t3];
    b = 2 * a; //results in [2,4,6]
    c = a + b; //results in [3,6,9]

In the example above no values change. Rather, new values are created and the variables ‘b’ and ‘c’ are referred to those new values. The original vector is left untouched. Use of immutables entails this rather functional approach to programming. For the time being I will not get into whether this is advantageous or not.

For contrast we should consider what isn’t present in the above code: in-place operations. These are functions which would alter the value of the vector. In code with mutable data, we encounter such functions all the time. The following code uses mutable vectors to produce the same resulting value in ‘c’.

    v : mutable_vector[3];
    v[0] = 1;
    v[1] = 2;
    v[2] = 3;

    c : mutable vector[3] = v;
    c.scale( 2 );
    c.add( v ); // c is now [3,6,9]

Invariant Names

I’ll come back a lot to these two very different styles of programming over the course of the development of the Leaf language. For now I’ll continue with a concept closely related to an immutable, what I will call an “invariant name”. An invariant name relates to a variable’s name whereas an immutable relates to its value.

    a = [1,2,3];
    a = [4,5,6];

Recalling my previous article about values and references, even though our vector is immutable, the above code will still work as long as the second assignment refers ‘a’ to a new value instead of changing the existing one: on the first line, the compiler refers ‘a’ to the value ‘[1,2,3]‘. On the second line, the compiler refers ‘a’ to a different value ‘[4,5,6]‘. The assignment doesn’t actually change any values.

A similiar concept applies to variable names. Post construction an invariant name name may never refer to a different value.

    a : invariant = [1,2,3]; // declared to be an invariant name
    a = [4,5,6]; //error

The line where we declare ‘a’ forever binds it to a specific value. Note that this invariance applies only to the name. While we may not be able to change the name, we can still modify its value. Consider how this looks with our ‘mutable_vector’ from earlier:

    v : mutable_vector[3];

    a : invariant = v;
    a[0] = 1;
    a[1] = 2;
    a[2] = 3;
    a.scale( 2 );

In this code ‘a’ always refers to the same mutable vector ‘v’. The value of that vector is changed, but ‘a’ can never refer to a completely different vector. In Java the keyword ‘final’ can be used to create invariant names. In C you can declare a ‘const’ pointer to get a similar result (more on this further down in the article).

Constants

If you combine an immutable value with an invariant name you get a constant. This is something which has an unchangeable value and a name which always refers to that value. The following are examples of constants.

    pi = 3.14159
    e = 2.71828
    ETIMEDOUT = 110
    riff_block = "RIFF"

Anywhere the compiler sees the name of a constant, it may freely substitute the value itself. This is possible since the name is invariant and the value is immutable.

Language Examples

Java

Fundamental types like int, and their wrapper objects such as Integer, are immutable in Java. After they are created they cannot be changed. Often it is said that fundamentals are stored “by value”, but I believe it’s easier to think of them as immutable objects. Java does not have any explicit way for you to define your own immutables. At a logical level, defining a class such that no function can modify instance attributes after construction is generally sufficient.

    int a = 5;
    Integer immutableInt = new Integer(5);
    MyImmutableClass immutableObject = new MyImmutableClass(...);
    String text = "Hello";

None of the values in the above code can be changed, but the names can be reassigned to a different value.

Invariant names are available in Java with the ‘final’ keyword. The value to which a final variable refers can only be set during construction (either in the declaration or in a class construtor). In the code snippet below, the ‘collection’ name can only refer to a single vector. However, that vector can be modified.

    final Vector<int> collection = new Vector<int>(); //the Vector is mutable

Constants are available in Java exactly as I’ve described: simply combine an immutable value with an invariant name.

    final float pi = 3.14159;
    fiinal String riff_block = "RIFF";

C

In C all non-decorated types are mutable and have invariant names. This applies both to fundamental types and to user-defined structures. The names below always refer to the same values (called objects in the C standard).

    int a;
    my_struct ms;

The value of ‘a’ can change, and the internal state of ‘ms’ can also change, but the variables are permanently linked to their values.

There is a way to create variant names in C: use a pointer. Instead of regarding a pointer as a specific memory address, the variant name concept gives us a different way to look at it.

    int a = 5;
    int b = 6;
    int * c = &a;
    c = &b; //change the value 'c' refers to

Note however that pointers are used for more than just this purpose, arrays for example.

Immutables can be created with the ‘const’ keyword, though this can be a bit tricky. If a pointer is declared as a pointer to a ‘const’ value, the value itself doesn’t necessarily have to be immutable.

    int const a = 5; //a is an actual constant
    int b = 6; //b is mutable
    int const * c = &a; //c refers to an immutable
    c = &b; // c refers to a mutable

The first line ‘int const a = 5′ introduces a true constant. It has both an invariant name, and thanks to the ‘const’ keyword, an immutable value. A pointer to a ‘const int’ can also be pointed to a regular int value as shown on the last line. In this case, even though the target value is mutable, this value cannot be changed via the pointer: The pointer provides read-only access to the value.

It is also possible to create invariant names which refer to a shared value.

    int a = 5;
    int b = 6;
    int * const c = &a;
    *c = 7; //post-condition: *c == a == 7
    c = &b; //error ('c' is an invariant name)

Here ‘c’ will always refer to the value of ‘a’, but we can still modify ‘a’ via the pointer. Note the difference in the syntax of the pointer declaration here. Reading from right to left, ‘int * const’ is a const pointer to a (mutable) int value, not a pointer to a const int as in the previous example. The value of ‘a’ can be changed using the pointer, but the pointer cannot be re-assigned to a different variable (to create a const pointer to a const int, again reading from right to left, you’d use ‘int const * const’ as the declaration type).

C++

C++ offers a distinct way to create invariant names with what it calls “references”. Post initialization the value to which a C++ reference refers can’t be changed.

    int a = 5;
    int b = 6;
    int & c = a;

    c = b; //modifies the value of 'a'
    //post-condition: a == b == 6

This is actually a syntactic oddity in C++. During construction of a reference, the assignment operator indicates which value it refers to, but afterward the same operator modifies that value. Just as in C, marking an invariant name (a reference) as ‘const’ does not make the target value an immutable. This is particularly relevant for function calls and user-defined objects.

    class my_object { };

    void func( my_object const & mo )
    {
        //read-only view of 'mo'
    }

    my_object a;
    func( a ); // 'a' has a mutable value, but 'func' sees a read-only version of it

Python

In Python several fundamental types are immutable but in general user-defined types are mutable. There also appears to be no way to specify an invariant name. However, given that you can override the `__settattr_` and `__delattr__` functions for a class you could create your own immutable types. This might also allow you to create an invariant name, but only for member variables.

About these ads