Tags

, ,

I like using enums and type hierarchies. What I don’t like is writing verbose switch statements to use them. In C++ I use macros to significantly reduce the burden of creating visitors. In this article I’ll show how I applied this technique to my Leaf souce code.

The direct method

In Leaf I have an expression class hierarchy. Each of the types is identified with an enum value. Originally I had an enum like this (stripped for brevity):

1
2
3
4
5
6
7
8
9
class expression {
    enum form_t {
        f_binary,
        f_funccall,
        f_transform,
    };

    ...
};

I have several visitors in my code. They follow a pattern like this:

1
2
3
4
5
6
7
8
switch( expr.get_form() ) {
    case expression::f_binary:
        return call_binary( static_cast<expression_binary>(expr) );
    case expression::f_funccall:
        return call_binary( static_cast<expression_funccall>(expr) );
    case expression::f_transform:
        return call_binary( static_cast<expression_transform>(expr) );
}

This approach has two problems with it:

  1. I could accidentally omit a case in that switch statement. There is a GCC flag to emit a warning, but warnings aren’t guarantees, and this option isn’t available in all compilers or languages.
  2. For enums with many more values, like I have, this extra code becomes highly redundant. The meaning is obscured and the chance of making a typo increases.

The macro approach

What I want is a way to add a new enum value and automatically have all my switches updated and compiler errors produced if I’m missing an appropriate function.

Instead of directly coding the enum values I instead create a macro:

1
2
3
4
#define EXPRESSION_FORMS \
    EXPRESSION_FORM(binary) \
    EXPRESSION_FORM(funccall) \
    EXPRESSION_FORM(transform)

This is essentially just a list of the enum value names. At this point I don’t define the single EXPRESSION_FORM, it becomes the callback for users of this list.

My enum statement gets converted to this:

1
2
3
4
5
    enum form_t {
        #define EXPRESSION_FORM(form) f_##form,
        EXPRESSION_FORMS
        #undef EXPRESSION_FORM
    };

And my switch statement becomes:

1
2
3
4
5
6
7
switch( expr.get_form() ) {
    #define EXPRESSION_FORM(form) \
        case expression::f_##form: \
            return call_##form( static_cast<expression_##form>(expr) );
    EXPRESSION_FORMS
    #undef EXPRESSION_FORM
}

Deferring the definition of EXPRESSION_FORM let’s me create different types of code based on the same listing of values.

This code has some clear advantages over the previous approach:

  1. Adding a new enum value is easy. I just update the EXPRESSIONS_FORM list and my enum and all my switch statements automatically include the new value.
  2. If I forget a visitor function the compiler produces an error, refusing to compile.
  3. It forces a standard naming convention onto my enum, types, and functions. The association between the various components is very clear.

Not just visitors

I use the pattern for more than just visitor function dispatching. For example, now I can easily program a missing feature in C++: converting enums into names.

1
2
3
4
5
6
7
8
char const * name_of( expression::form_t f ) {
    switch( f ) {
        #define EXPRESSION_FORM(tag) case f_##tag: return #tag;
        EXPRESSION_FORMS
        #undef EXPRESSION_FORM
    }
    return "invalid";
}

For strict object-oriented programming it also becomes beneficial. If you don’t like using the macros directly it’s easy to convert to a virtual member based visitor class. This way I can create a visitor base and simply override the functions for the types of interest. I do this in my Leaf code as well, creating an expression walker. It looks something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
struct expression_transform {
    shared_ptr<expression> apply_expression( shared_ptr<expression> in ) {
        switch( in->form ) {
            #define EXPRESSION_FORM(form) \
                case expression::f_##form: \
                    return apply_##form( std::static_pointer_cast<expr_##form>(in) );
            EXPRESSION_FORMS
            #undef EXPRESSION_FORM
        }
    }

protected:
    #define EXPRESSION_FORM(form) \
        virtual shared_ptr<expression> apply_##form( shared_ptr<expression_##form> in ) { return in; }
    EXPRESSION_FORMS
    #undef EXPRESSION_FORM
};

A derived class overrides just the functions of interest, for example:

1
2
3
4
5
struct sample_transform : public expression_transform {
protected:
    shared_ptr<expression_binary> apply_binary( shared_ptr<expression_binary> in );
    shared_ptr<expression_funccall> apply_funccall( shared_ptr<expression_funccall> in );
};

Once you get used to using list macros you’ll find all sorts of nice ways to use them.

No Macros?

The major problem with the macro approach is that it requires macros, and not all languages have them. I have the same situation in C# on my game Radial Blitz, but am unable to find a good solution. I find this very unfortunate. All that redundant code just buries the semantics and does not add value to the project.

In some dynamic languages there is another option: converting the enum value to a string, appending to a common function prefix, and calling the resulting function by name. This approach of course provides no guarantee that I haven’t missed a function. I’ll only know at runtime when an exception is thrown. This, of course, has to do with dynamic languages as a whole, rather than just this isolated scenario.

I’m still a firm believer in macros. Sure, they can be abused, but often, like here, they make the code safer, more readable, and easier to maintain. I have yet to see a language with strong enough syntax and semantic constructs to eliminate the need for macros. At least they could provide a builtin visitor syntax, perhaps I’ll expand that idea later.

Advertisements