Iterating over the elements of a recursive data structure can be difficult. Given a set of objects, where each object could potentially link to any other object, how do we visit each object without getting stuck in a loop?
The invasively bad approach
My example comes from Leaf’s type system; type structures are recursive, like in most languages. You can create a class that includes itself as a member, either directly or through another class. I wanted to create a validate
function that ensures the type is valid for further compilation. validate
either returns or throws an exception.
A first option is an invasive tracking variable. This is a member variable, say is_traversing
on each object. It’d work something like this:
1 2 3 4 5 6 7 8 |
void type_tuple::validate() { if( is_traversing ) return; is_traversing = true; //logic is_traversing = false; } |
This has a few problems.
- It isn’t thread-safe. You can’t have multiple threads calling validate at one time.
- It isn’t re-entrant.
validate
is a virtual function so we really have no idea at a high-level what it needs to do. Perhaps it must calls other traversal functions on the same objects. - It isn’t exception safe. If anything inside
//logic
throws an exceptionis_traversing
will not be reset. This can however be fixed.
A traversal context
From the code it’s clear that all we really need is an entry guard. Anything that can tell us if this node has already been visited will work.
For an example I’ll show the is_parametric
function. This traverses the type and says whether any component makes it a template or generic type. It returns a boolean value. This is what the code looks like using a context parameter.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
bool type_tuple::is_parametric( traversal_context & ctx ) { if( ctx.in( this ) ) { return false; } ctx.add( this ); for( auto & sub : parts ) { if( sub.is_parametric( ctx ) ) { return true; } } ctx.remove( this ); return false; } |
The context provides a place to track the traversal status of this object. It solves the first and second problem of using an invasive variable: it is re-entrant and thread-friendly. It still isn’t exception safe though. We still have to add a finally block to call ctx.remove
. In C++ it’s easy by using a scoped object:
1 2 3 4 5 6 7 |
bool type_tuple::is_parametric( traversal_context & ctx ) { if( ctx.in( this ) ) { return false; } traversal_scope ts( ctx, this ); ... } |
The constructor of traversal_scope
calls ctx.add
and its destructor calls ctx.remove
. This makes the code exception safe.
It may not be clear that
false
should be returned if this object is already being traversed. Returning values from recursive structures is a difficult problem. Though I can say the typical result here is the zero value, either false, or the integer 0, depending on what the function is doing. We know that a call higher in the stack is executing the loop and doing the actual check.
Calling validate
Calling a function with an added parameter of course requires more effort. This can be a problem if the function is called frequently in the code. For this I’ve used helper functions that setup the context.
1 2 3 4 5 |
template<class T> void validate( T const & what ) { traversal_context vctx; what.validate( vctx ); } |
This template can be used for any type that contains a matching validate
function. The other function is_parametric
does not have a namespace equivalent. I put the helper directly in the base class since it is specific to that type tree. This is the syntax for comparison:
1 2 3 |
validate( object ); object.is_parametric(); |
Just use whichever approach is easier in your code.
Single-traversal for validate
Once I had the traversal_context
in place for validate
I saw an easy optimization. Since validate
does not return a value there is no reason to ever process the same node twice. I made a cover
function inside the traversal_function
.
1 2 3 4 5 6 7 |
void type_tuple::validate( traversal_context & ctx ) { if( ctx.cover( this ) ) { return; } //logic } |
If this
is already in the context cover
returns true
. Otherwise it adds it and returns false. this
is never removed from the context in this setup, thus no traversal_scope
.
A peak inside
Just for completeness I’ll show the full code for the traversal_context
and traversal_scope
. They are very simple classes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
struct traversal_context { std::unordered_set<void const*> traversing; bool in( void const * what ) { return traversing.find( what ) != traversing.end(); } bool cover( void const * what ) { auto result = traversing.insert( what ); //second is true if item was already in the set return !result.second; } }; struct traversal_scope { traversal_context & ctx; void const * what; traversal_scope( traversal_context & ctx, void const * what ) : ctx( ctx ), what( what ) { ctx.traversing.insert( what ); } ~traversal_scope() { ctx.traversing.erase( what ); } }; |
This is one of those cases where void*
can be safely used in C++. The code doesn’t care what type it contains, only that it has a unique value. In other languages you would instead use something like Object
or call a GetID
function. Note you cannot use a hash code here, you require a guarantee of uniqueness.