You’ve heard the term, you’ve probably even used them, but what exactly is a closure? It’s a combination of data and code that have become a staple of modern programming. They offer a natural functional feature; quite useful even if you don’t fully understand them. Let’s take a closer look and demystify this curious construct.
A simple example
We can start with a simple local form:
defn pine = -> { var x = 1 defn incr = -> { print(x) x = x + 1 } incr() // prints 1 incr() // prints 2 }
Here were define a function pine
. Inside that function we define another function called incr
. The x
variable, used inside incr
, is not defined there: it’s part of the local variables of the pine
function. The function incr
is a closure of it’s code and the variables in the surrounding scope.
By calling incr
twice we see that x
persists between calls. Moreover, x
is shared between the scopes:
incr() // 1 print(x) // 2 x = 4 incr() // 4
Both incr
and the pine
scope are referring to the same x
. This depends a bit on the language, in Leaf, shown here, this sharing is the default. In C++, and other languages, cloning is the default; in this case the value is not shared, but a copy is taken when the function is created. For example:
defn pine = -> { var x = 5 //a copy of `x` is made now defn incr = -> clone { print(x) x = x + 1 } incr() // 5 print(x) // still 5 (would be 6 if shared) x = 1 incr() // 6 (would be 1 if shared) print(x) // 1 }
Subsequent calls to incr
use the same x
, but the one in pine
is independent of it. This creates two different x
variables. These are both types of closures as they combine code and a data scope. The results are quite different, so it’s important to know what type you’re dealing with.
High-order functions
When we combine closures with high-order functions we get interesting new possibilities. For example, the data scope for a closure can persist outside of the function that creates it.
defn addr = (x) -> { var f = (y) -> { return x + y } return f } var a5 = addr(5) print( a5(1) ) // 6 pinrt( a5(3) ) // 8 var a7 = addr(7) print( a7(1) ) // 8 pinrt( a7(3) ) // 10
addr(5)
is creating a function that adds 5
to a number. We aren’t calling that function from within addr
though, instead we assign to the variable a5
and then call it. When addr(7)
is called it creates a new environment, it’s x
value is not the same as the one from a5
.
A closure can be passed to a function that knows nothing about the surrounding scope.
defn key_sort( data : list「any」, key : string ) { sort( data, (a,b) -> { return a#key < b#key }) }
The sort
function will be expecting a comparator for two objects. It does not know how that comparison is done, or that it might be accessing data from the enclosing scope. This code demonstrates that the closure, which we pass to sort
, is truly enclosing the code and data, it’s not just some syntax trickery.
A compiler may well opt to use trickery in many cases where closures are used. The most generic approach, one that works for
sort
andreturn
has a bit of overhead. The simpler approach, shown withincr
earlier, can often be compiled without any runtime closure support.
A technical nit for the highly interested
The above explains basically what a closure is, and how it can be used. If you’re into language theory, you might not be satisfied with some details. If you’re not into language theory, feel free to stop reading now and go about having fun with closures.
Let’s go back to this code:
defn pine = -> { var x = 1 defn incr = -> { print(x) x = x + 1 } incr() // 1 incr() // 2 }
I called incr
the closure which may not be technically true — this depends on the language. In Leaf it’s not yet a closure, it’s a function that takes a “context”. It’s not much different than this function in C:
struct pine_context { int x; } void incr( pine_context * pctx ) { print( pctx->x ); pctx->x++; }
It’s when the function is called, incr()
, that a closure is made. The compiler sees that the function is expecting a pine_context
environment; it finds one in the immediately enclosing scope, that of pine
, which seems logical. It binds the context to the function and then makes the call.
This is one place where a compiler could instead use trickery. It avoids creating any closure object and instead calls the
incr
function with thepine_context
. This optimization allows using local closures without overhead cost.
This binding also needs to happen if incr
would escape the scope where it’s declared.
defn pine = -> { var x = 1 defn incr = -> { print(x) x = x + 1 } return incr } var q = pine() q() // 1 q() // 2
When the compiler encounters the return
statement, it realizes it needs to bind the incr
function to the pine_context
before it returns. After the return statement, there will be no further opportunity to find the context.
Isn’t a closure just a class and instance in disguise? Good observation. Yes, there are usually syntactic, and some behavioural differences, but they both model the same relationship between data and code. In the Leaf compiler, they mostly share the same processing code.
Just the surfaces
Languages can behave quite differently with closures. It’s important to understand that fundamentally it’s just a combination of a function and some data. Knowing whether the scope is being shared or cloned is an important detail. Or where possible, like in C++, when parts of the context are shared and parts cloned.
I’ve strayed a bit into Leaf specific behaviour, just to give an idea of the complexities involved. If you’d like to dive even deeper into how closures are implemented, then let me know.
Categories: Philosophy, Programming
can you give an example of the key_sort usage please. I am not sure I understand the call mechanics in that case, I am especially confused about the part where the actual sorter is added.
it feels like:
var testKeySet = key_sort(sampleMap, “test_key”);
testKeySet(/*a,b???*/); //calls the nested sort closure and should effectively sort the sampleMap after “test_key”
going further into the implementation specifics I feel that the cloning vs nesting behavior may lead to some crazy results especially combined with the options to return closures. Do we get a clone of the original structure that is created by the compiler to replace the parent function, if no(shared case) is the exposed closure thread safe?
Apart from that it is interesting how and if closures can replace run-time generics and templates in c++ while augmenting interfaces.
I thing it is also important to know how the scope propagates(in the case where we return a closure as a value with embedded context pointer) in a larger class hierarchies. Are there any limitations in that will trigger the compiler or will throw in that case?
Cloning vs. nesting, and returning the closures, is a big issue for a programming language. Something like C++, where variables don’t survive their enclsoing scope, you run the risk of returning functions not being safe to call. This is something you need to know while using C++. IN other languages, like in JavaScript, where sharing is the only option, you are always safe in returning and passing these closures (where “safe” doesn’t address teh confusion of having always shared scopes though).
In Leaf any variable used as part of a shared closure will automatically be elevated into a shared storage area. It is safe to return closres from functions that use local variables (and this applies to any scope/nesting depth of the functions).
It doesn’t work for classes however, and some other unsupported scenarios. I will have to emit errors, and I imagine other languages may also emit errors about unsupported scenarios.
I would call key_sort on a map like `key_sort( people, “name” )`. Hmm, I guess sorting a map isn’t a great example. It would make more sense if this returned the sorted list, or I didn’t actually take a map, but just a list. Let me change that to just `list「any」`.
The `a#key` syntax is a subscription syntax in Leaf, like `a[key]` in a Python object. This assume there is a subscript operator defined on the type. Normally you’d probably call sort like this instead:
sort( my_list, (a,b) -> {
return a.name < b.name
})
That won’t use any dynamic field lookup, but also doesn’t demonstrate that it’s a closure then. It’s just a normal lamba in this case.