Tags

, , ,

It is important to write exception-safe code. A truly exception-safe function produces no side-effects when an error occurs. It returns to the caller with the exact state of the system as prior to the call. Not upholding this guarantee leaves the caller wondering about the state of the system. But can we really accomplish this?

Don’t mistake this to be solely about exceptions! “Exception-safety” is just the term it happens to have. It equally applies to any error handling, including the use of return values. While I find exceptions to be the better model, needing to write error-safe code applies to all models.

This very basic example is a function that adds all elements from an iterator to the end of a vector. Assume the function is parametric and deals with any data type and any iterator type.

1
2
3
4
5
func add_all = ( collection, iterator )-> {
    for item in iterator {
        collection.append( item )
    }
}

Do you think you can make this function exception safe? Work it out with pseudo-code to see what you would do.

Isolate the problems

Consider everything that could actually go wrong in this loop. It may seem small, but there are enough trouble points that line-by-line handling may be too difficult. I said it handles any type, thus the ‘append’ function must involve copying items. We don’t know what these items represent, thus copying possibly results in an error*. Perhaps the internal append mechanism itself can fail. While we might trust the loop mechanics itself, that iterator could be doing anything.

The issue isn’t so much where the error occurs, but when the error occurs. After the loop body executes once, the variable ‘collection’ has been modified. If an exception escapes now the safety guarantee fails: ‘collection’ is not the same as prior to the function call.

*I’m discussing this point with a colleague. I now believe that, like destructors, copy constructors should never cause an error. I’ll get back to this in another article.

Three approaches

Functional

The simplest approach is the functional programming approach. If the input is not altered at all by the function then there is no problem at all.

1
2
3
4
5
6
func add_all = ( collection, iterator )->( new_collection ) {
    new_collection = collection.clone()
    for item in iterator {
        new_collection.append( item )
    }
}

At no point is the state of collection modified: an error can safely happen anywhere. An issue here is the change in signature: the calling code must also change. This has effectively pushed off error handling to a higher level, which in general is a good thing.

The drawback to this approach is efficiency. A clone of the input must be created. In some languages the return value will also have to be copied by the calling function. If using a pointer, or by reference mechanism, that isn’t an issue. In C++ move semantics can solve the return value part. The cloning of the input is however still an issue.

Swapping

This is a variant of the previous approach which relies on an atomic ‘swap’ operation. Though many collections may not support them, they likely could with just a little work. This approach still suffers from needing to clone the input, but the function signature doesn’t need to change.

1
2
3
4
5
6
7
8
func add_all = ( collection, iterator )-> {
    var new_collection = collection.clone()
    for item in iterator {
        new_collection.append( item )
    }

    collection.swap( new_collection )
}

In this code we rely on the destruction of objects never causing an error. Otherwise we’d have to worry about what happens during destruction of the old collection. Dealing with errors during cleanup can become very complicated. It makes much more sense to go with common wisdom and require that destruction cannot cause errors. I’m sure there’s a lot written on just this topic, so I won’t go further into it here.

Rollback

In many domains the overhead of cloning the input cannot be tolerated. A rollback mechanism can be used instead. Proceed as though everything will be fine and just undo the work should something go wrong.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
func add_all = ( collection, iterator )-> {
    var prev_size = collection.size()
    defer_error {
        collection.resize( prev_size )
    }

    for item in iterator {
        collection.append( item )
    }
}

This uses the Leaf defer_error mechanism: the code is executed at the end of the function, but only if an error occurs. This is equivalent to putting the code in a catch-all block and rethrowing the exception. The code handles the error by returning ‘collection’ to its previous state. We assume that resizing to a smaller size cannot generate an error; one finds that exception safety is only possible if everybody plays along. This function is now exception safe without having an inefficient clone operation.

That’s nice, but it doesn’t work for all data types? If the input was a map there won’t be any kind of rollback mechanism. One could create a temporary map and swap entries, then swap-back on error. Each data structure will likely require some other mechanism. The clone approach suddenly looks appealing since it generally works on any collection. Now is the time to really question the need for efficiency.

One might question whether the truncated collection is truly in the same state as before. There is more memory allocated now, and objects may have been copied. We only care about the logical/effective state of the system in this context. This new modified collection behaves, for all intents and purposes, the same as the previous one.

Hey, what about the iterator?

Something of course has to be amiss; the article title questions whether a safe function can be made. I’ve cleverly ignored the iterator until now. All the above code has made the assumption that the caller’s iterator is unaffected by iteration in this function.

How realistic this is depends on the language. Passing the iterator by value, or creating a copy of the iterator is perhaps the simplest approach. Again, for efficiency we might wish to pass an iterator by reference and will thus need to rollback.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
func add_all = ( collection, iterator )-> {
    var prev_size = collection.size()
    var prev_loc = iterator.location()
    defer_error {
        collection.resize( prev_size )
        iterator.rewind( prev_loc )
    }

    for item in iterator {
        collection.append( item )
    }
}

What if the iterator cannot be copied, nor is there any rollback functionality? Instead of being rare, this is actually quite common. Python generators (functions which use yield) are a very easy way to create such iterators. More problematic is the entire class of streaming iterators, including actual network streams. None of these are rewindable.

There’s a bit of irony here that makes it even more annoying. There’s a very good chance that the iterator is actually a temporary object. Since the temporary will be lost in any case there is no need to roll it back. A function that knows it is dealing with a temporary could do this differently. I only know that C++ has a way to do this, using an r-value reference type.

1
2
//the caller is often using a temporary iterator, which doesn't need to be rewound
add_all( cur_data, new_data.iter() )

Solutions?

Is there a general solution to this problem? I’m sorry, but I don’t have one. Side-effects and safety really just don’t like each other. Is there at least a way to make the problem easier to identify and perhaps resolve individually? Perhaps. It is something I’m actively researching. I would like a nice solution for Leaf.

Don’t let this discourage you however. A large amount of code can be made exception safe without hitting any fundamental issues. In many cases it also isn’t required to be completely safe (like with temporary iterators). So please keep trying and let me know about the difficult points.