Tags

, , ,

Iterating over the elements of a recursive data structure can be difficult. Given a set of objects, where each object could potentially link to any other object, how do we visit each object without getting stuck in a loop?

The invasively bad approach

My example comes from Leaf’s type system; type structures are recursive, like in most languages. You can create a class that includes itself as a member, either directly or through another class. I wanted to create a validate function that ensures the type is valid for further compilation. validate either returns or throws an exception.

A first option is an invasive tracking variable. This is a member variable, say is_traversing on each object. It’d work something like this:

1
2
3
4
5
6
7
8
void type_tuple::validate() {
    if( is_traversing ) 
        return;

    is_traversing = true;
    //logic
    is_traversing = false;
}

This has a few problems.

  1. It isn’t thread-safe. You can’t have multiple threads calling validate at one time.
  2. It isn’t re-entrant. validate is a virtual function so we really have no idea at a high-level what it needs to do. Perhaps it must calls other traversal functions on the same objects.
  3. It isn’t exception safe. If anything inside //logic throws an exception is_traversing will not be reset. This can however be fixed.

A traversal context

From the code it’s clear that all we really need is an entry guard. Anything that can tell us if this node has already been visited will work.

For an example I’ll show the is_parametric function. This traverses the type and says whether any component makes it a template or generic type. It returns a boolean value. This is what the code looks like using a context parameter.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
bool type_tuple::is_parametric( traversal_context & ctx ) {
    if( ctx.in( this ) ) {
        return false;
    }
    ctx.add( this );

    for( auto & sub : parts ) {
        if( sub.is_parametric( ctx ) ) {
            return true;
        }
    }

    ctx.remove( this );
    return false;
}

The context provides a place to track the traversal status of this object. It solves the first and second problem of using an invasive variable: it is re-entrant and thread-friendly. It still isn’t exception safe though. We still have to add a finally block to call ctx.remove. In C++ it’s easy by using a scoped object:

1
2
3
4
5
6
7
bool type_tuple::is_parametric( traversal_context & ctx ) {
    if( ctx.in( this ) ) {  
        return false;
    }
    traversal_scope ts( ctx, this );
    ...
}

The constructor of traversal_scope calls ctx.add and its destructor calls ctx.remove. This makes the code exception safe.

It may not be clear that false should be returned if this object is already being traversed. Returning values from recursive structures is a difficult problem. Though I can say the typical result here is the zero value, either false, or the integer 0, depending on what the function is doing. We know that a call higher in the stack is executing the loop and doing the actual check.

Calling validate

Calling a function with an added parameter of course requires more effort. This can be a problem if the function is called frequently in the code. For this I’ve used helper functions that setup the context.

1
2
3
4
5
template<class T>
void validate( T const & what ) {
    traversal_context vctx;
    what.validate( vctx );
}

This template can be used for any type that contains a matching validate function. The other function is_parametric does not have a namespace equivalent. I put the helper directly in the base class since it is specific to that type tree. This is the syntax for comparison:

1
2
3
validate( object );

object.is_parametric();

Just use whichever approach is easier in your code.

Single-traversal for validate

Once I had the traversal_context in place for validate I saw an easy optimization. Since validate does not return a value there is no reason to ever process the same node twice. I made a cover function inside the traversal_function.

1
2
3
4
5
6
7
void type_tuple::validate( traversal_context & ctx ) {
    if( ctx.cover( this ) ) {
        return;
    }

    //logic
}

If this is already in the context cover returns true. Otherwise it adds it and returns false. this is never removed from the context in this setup, thus no traversal_scope.

A peak inside

Just for completeness I’ll show the full code for the traversal_context and traversal_scope. They are very simple classes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
struct traversal_context {
    std::unordered_set<void const*> traversing;

    bool in( void const * what ) {
        return traversing.find( what ) != traversing.end();
    }

    bool cover( void const * what ) {
        auto result = traversing.insert( what );
        //second is true if item was already in the set
        return !result.second;
    }
};

struct traversal_scope {
    traversal_context & ctx;
    void const * what;

    traversal_scope( traversal_context & ctx, void const * what ) : 
        ctx( ctx ), 
        what( what ) {
        ctx.traversing.insert( what );
    }
    ~traversal_scope() {
        ctx.traversing.erase( what );
    }
};

This is one of those cases where void* can be safely used in C++. The code doesn’t care what type it contains, only that it has a unique value. In other languages you would instead use something like Object or call a GetID function. Note you cannot use a hash code here, you require a guarantee of uniqueness.

Advertisements