Philosophy

What is a closure?

You’ve heard the term, you’ve probably even used them, but what exactly is a closure? It’s a combination of data and code that have become a staple of modern programming. They offer a natural functional feature; quite useful even if you don’t fully understand them. Let’s take a closer look and demystify this curious construct.

A simple example

We can start with a simple local form:

defn pine = -> {
    var x = 1

    defn incr = -> {
        print(x)
        x = x + 1
    }

    incr() // prints 1
    incr() // prints 2
}

Here were define a function pine. Inside that function we define another function called incr. The x variable, used inside incr, is not defined there: it’s part of the local variables of the pine function. The function incr is a closure of it’s code and the variables in the surrounding scope.

By calling incr twice we see that x persists between calls. Moreover, x is shared between the scopes:

    incr() // 1
    print(x) // 2
    x = 4
    incr() // 4

Both incr and the pine scope are referring to the same x. This depends a bit on the language, in Leaf, shown here, this sharing is the default. In C++, and other languages, cloning is the default; in this case the value is not shared, but a copy is taken when the function is created. For example:

defn pine = -> {
    var x = 5

    //a copy of `x` is made now
    defn incr = -> clone {
        print(x)
        x = x + 1
    }

    incr() // 5
    print(x) // still 5 (would be 6 if shared)
    x = 1
    incr() // 6 (would be 1 if shared)
    print(x) // 1
}

Subsequent calls to incr use the same x, but the one in pine is independent of it. This creates two different x variables. These are both types of closures as they combine code and a data scope. The results are quite different, so it’s important to know what type you’re dealing with.

High-order functions

When we combine closures with high-order functions we get interesting new possibilities. For example, the data scope for a closure can persist outside of the function that creates it.

defn addr = (x) -> {
    var f = (y) -> {
        return x + y
    }

    return f
}

var a5 = addr(5)
print( a5(1) ) // 6
pinrt( a5(3) ) // 8

var a7 = addr(7)
print( a7(1) ) // 8
pinrt( a7(3) ) // 10

addr(5) is creating a function that adds 5 to a number. We aren’t calling that function from within addr though, instead we assign to the variable a5 and then call it. When addr(7) is called it creates a new environment, it’s x value is not the same as the one from a5.

A closure can be passed to a function that knows nothing about the surrounding scope.

defn key_sort( data : listany」, key : string ) {
    sort( data, (a,b) -> {
        return a#key < b#key
    })
}

The sort function will be expecting a comparator for two objects. It does not know how that comparison is done, or that it might be accessing data from the enclosing scope. This code demonstrates that the closure, which we pass to sort, is truly enclosing the code and data, it’s not just some syntax trickery.

A compiler may well opt to use trickery in many cases where closures are used. The most generic approach, one that works for sort and return has a bit of overhead. The simpler approach, shown with incr earlier, can often be compiled without any runtime closure support.

A technical nit for the highly interested

The above explains basically what a closure is, and how it can be used. If you’re into language theory, you might not be satisfied with some details. If you’re not into language theory, feel free to stop reading now and go about having fun with closures.

Let’s go back to this code:

defn pine = -> {
    var x = 1

    defn incr = -> {
        print(x)
        x = x + 1
    }

    incr() // 1
    incr() // 2
}

I called incr the closure which may not be technically true — this depends on the language. In Leaf it’s not yet a closure, it’s a function that takes a “context”. It’s not much different than this function in C:

struct pine_context {
    int x;
}

void incr( pine_context * pctx ) {
    print( pctx->x );
    pctx->x++;
}

It’s when the function is called, incr(), that a closure is made. The compiler sees that the function is expecting a pine_context environment; it finds one in the immediately enclosing scope, that of pine, which seems logical. It binds the context to the function and then makes the call.

This is one place where a compiler could instead use trickery. It avoids creating any closure object and instead calls the incr function with the pine_context. This optimization allows using local closures without overhead cost.

This binding also needs to happen if incr would escape the scope where it’s declared.

defn pine = -> {
    var x = 1

    defn incr = -> {
        print(x)
        x = x + 1
    }

    return incr
}

var q = pine()
q() // 1
q() // 2

When the compiler encounters the return statement, it realizes it needs to bind the incr function to the pine_context before it returns. After the return statement, there will be no further opportunity to find the context.

Isn’t a closure just a class and instance in disguise? Good observation. Yes, there are usually syntactic, and some behavioural differences, but they both model the same relationship between data and code. In the Leaf compiler, they mostly share the same processing code.

Just the surfaces

Languages can behave quite differently with closures. It’s important to understand that fundamentally it’s just a combination of a function and some data. Knowing whether the scope is being shared or cloned is an important detail. Or where possible, like in C++, when parts of the context are shared and parts cloned.

I’ve strayed a bit into Leaf specific behaviour, just to give an idea of the complexities involved. If you’d like to dive even deeper into how closures are implemented, then let me know.

3 replies »

  1. can you give an example of the key_sort usage please. I am not sure I understand the call mechanics in that case, I am especially confused about the part where the actual sorter is added.

    it feels like:
    var testKeySet = key_sort(sampleMap, “test_key”);
    testKeySet(/*a,b???*/); //calls the nested sort closure and should effectively sort the sampleMap after “test_key”

    going further into the implementation specifics I feel that the cloning vs nesting behavior may lead to some crazy results especially combined with the options to return closures. Do we get a clone of the original structure that is created by the compiler to replace the parent function, if no(shared case) is the exposed closure thread safe?

    Apart from that it is interesting how and if closures can replace run-time generics and templates in c++ while augmenting interfaces.
    I thing it is also important to know how the scope propagates(in the case where we return a closure as a value with embedded context pointer) in a larger class hierarchies. Are there any limitations in that will trigger the compiler or will throw in that case?

    • Cloning vs. nesting, and returning the closures, is a big issue for a programming language. Something like C++, where variables don’t survive their enclsoing scope, you run the risk of returning functions not being safe to call. This is something you need to know while using C++. IN other languages, like in JavaScript, where sharing is the only option, you are always safe in returning and passing these closures (where “safe” doesn’t address teh confusion of having always shared scopes though).

      In Leaf any variable used as part of a shared closure will automatically be elevated into a shared storage area. It is safe to return closres from functions that use local variables (and this applies to any scope/nesting depth of the functions).

      It doesn’t work for classes however, and some other unsupported scenarios. I will have to emit errors, and I imagine other languages may also emit errors about unsupported scenarios.

  2. I would call key_sort on a map like `key_sort( people, “name” )`. Hmm, I guess sorting a map isn’t a great example. It would make more sense if this returned the sorted list, or I didn’t actually take a map, but just a list. Let me change that to just `list「any」`.

    The `a#key` syntax is a subscription syntax in Leaf, like `a[key]` in a Python object. This assume there is a subscript operator defined on the type. Normally you’d probably call sort like this instead:

    sort( my_list, (a,b) -> {
    return a.name < b.name
    })

    That won’t use any dynamic field lookup, but also doesn’t demonstrate that it’s a closure then. It’s just a normal lamba in this case.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s