Clay McCoy: October 2007

In the last two blogs I showed why and how you might hide your Java code with a DSL. I mentioned how one of the strengths of DSL's is that it separates the concerns of describing data from actions on that data. Here is a simple example.

Two things are obvious. First, this is a todo list. Second, I'm likely to lose some data if my harddrive fails. The todo reads fairly well. Now let's try associating an action to the data. We'll print the items.

With a Java mindset you would probably parse the data and write it out. Maybe someone who knew you were a Java developer provided you XML for the same data. Then you just have to deal with one of several painful XML API's rather than write a parser yourself. If you don't want to parse it at all then you can be the parser and put the data directly into your code.

Since this todo list description is actually a DSL internal to Ruby we can deal with it very easily. We just need to associate actions to keywords in the language like "todo" and "item." Here is how we do that.

When we execute the todo list with these methods defined we get this.

To do list:
mow the lawn
write blog about this
back up harddrive

This quickly gave us the results we wanted, without taking the time to parse our custom todo language. We just associated behavior with parts of the DSL and there it was. In Ruby, yield is used to execute the block that was passed in. In this case three calls to item() were made when todo() yields to the block of items in the DSL. We see all three descriptions because item() is defined as printing out the item description. A specific grouping of bindings like this is what we refer to as a context. We can do anything we can think of with this todo list just by evaluating it in a different context.

We can see only high priority items by changing the behavior associated with item().

To do list:
mow the lawn

Maybe we would like to see all past due items.

To do list:
mow the lawn
write blog about this

Let's do something really different and count how many items we have.

As you can see, evaluating this todo list description in different contexts is a pretty easy way to use our data. We have a lot of control over the situation since our data is in the same form as our code. This is much closer to how a LISP hacker would deal with the data than how a Java/XML developer would.

The tree DSL that was the subject of my last two blog entries had only one context. We could build the tree from the description, but we would also like to assert that a structure is part of a tree and query that structure. So you probably see where I am going with this. Let's add an assert context and a query context for our tree DSL.

This is a bit harder than our trivial todo list example. Our tree DSL is a little different than most languages because it doesn't have any keywords that we can associate specific behavior with. All the words are used to describe tree node names. The build context implementation makes heavy use of method_missing() that was added directly to the Java TreeNode class with JRuby. We are going to do the same thing here, we just want to have a choice about what method missing does. Let's review what we had before.

Having the behavior on the TreeNode directly keeps us from having to write a tree visitor. Most of this method is centered around recursively walking the tree. First we try to find the child with the name we are looking for. We build it if we can't find it. Then we evaluate any left over sub-structure on the child. Interestingly enough, for an assert and a query the only change needed is what action to take if the child is missing. Let's make that into an injectable strategy.

This is a more general purpose TreeNode. We can inject an action to happen when a named child is not found. That action will have the parent and the name of the missing child as parameters. Notice that if no action is injected then method_missing becomes private so that it can't be abused. But how do we inject the actions, and what exactly will they do?

We extend the Tree class that we used to create our structure last time. Build, assert, and query all create procs that will be used as the child missing actions. In each of them three steps occur. First, the action is set on the TreeNode class. Then the structure is evaluated with that action in place. Finally, the action is cleared.

I would like a way of doing this without actually storing the action on the TreeNode class. This implementation wouldn't work well with multi-threading, and I feel dirty causing side-effects like changing the missing child behavior of all TreeNodes temporarily. I could also take some of the repetition out, but I left it because it makes the important concepts more obvious at first glance. For our current requirements this implementation will get us by. Hopefully we have good unit tests so we can easily refactor this later.

The build action does exactly what was done before by creating the missing child. Assert returns a list of discrepancies that were seen while trying to compare the structures. When assert returns an empty list, it means the whole structure that was asserted is indeed in the tree. A query throws an exception because we were purposefully looking for something, and if it wasn't there it is a problem. Lets see this in action.

After using this a little, I see that I'd like to have a wildcard to use in place of a TreeNode sometimes. This would basically be a call to getChildren() on the Java object backing that node. I can still call the Java method directly you know.

These all do the same thing. The whole point of the DSL is to make this read well, so I'd really rather see something like this.

This is really only useful for query. I can accomplish this improved readability simply by defining the * method on TreeNode to delegate to getChildren() like so.

Now I see an even more convenient way to use this new wildcard feature. I want to be able to use it in the middle of a path rather than just at the end. This will open up a world of new possibilities.

Now I can build, assert, or query similar parts of the tree that belong in parallel forks at once. I implemented this by adding a method to Array itself, which delegates any method called on the Array to all elements in it. Java people tend to have a problem with adding methods to a core class. They just don't know how to handle things like this because Java doesn't let you do anything powerful since it might be dangerous.

Now I have a very concise and powerful way to manage my trees. I can build, query, and make assertions on them way more effectively than I could do in Java. I also use the same tree description DSL for all of these actions. Even if you cannot use Ruby or these techniques in production, it is ideal for prototyping and unit tests. I've greatly increased test coverage using this strategy on a project with a model that was difficult to create, inspect and manage. Not only that, but the DSL enabled other team members to write tests with much less ramp up time on the underlying architecture.

Multiple contexts for a DSL.