Wandering Thoughts archives

2023-07-28

The issue with control flow in interpreters using the 'eval' pattern

When I wrote about the 'eval' pattern for interpreting simple languages, where you parse the language into a syntax tree, give each node an 'Eval()' method, and call the top level Eval() to evaluate (interpret) the tree, I mentioned that this approach got increasingly complex if you needed to deal with loops and other control flow. However, I didn't explain what the problem was. That's what this entry is about.

The ideal situation for an 'eval()' style interpreter is where you're evaluating expressions and you only have to return a simple result (a number, a boolean answer, or the like). Life gets a bit more complex if you need to return a multi-option result where there are rules for combining the options together; for example, you might have a DSL where the result of an expression could be 'yes', 'no', or 'we have to defer this because DNS isn't cooperating'. When you have such a multi-option situation, every non-terminal Eval() method may need to specifically handle the third option in some way that makes sense (and may be domain specific).

In theory, control flow by itself doesn't complicate your life too much. However, almost every time you want control flow, you want the control flow to offer the possibility of early exits. You want to be able to do things like 'return', 'continue', and 'break' from loops, if conditions, and so on. Sometimes you want variable-level breaks, so you can break out of several levels of loops at once. Even without multi-level breaks, some of these early exits will need to propagate through nested control structures, like nested ifs, ifs inside of switches inside of loops, and so on.

In a model where your API to every node is an Eval() method that returns a result, including for control flow nodes, this means that the result the Eval() method returns has to be able to signal all of these various different early exits, and every control flow node probably needs to know at least something about them. If you have an 'if' node with a list of sub-nodes to execute in sequence in one condition, at every step in the list it has to check if the Eval() result is an early exit. Your 'loop' node (or nodes) similarly needs to check at every sub-Eval() and distinguish between the various types of early exits, because they change what it wants to return to its caller.

(A loop's Eval() should pass through a 'return', swallow a 'continue' and restart from the top, do a normal return for a 'break' that ends at it, and propagate a multi-level 'break' up but increase the number of levels it's passed through.)

This spreads knowledge of your early exits through many control flow nodes, and specifically their Eval() methods. If you want to add another type of early exit (such as 'raise an exception'), you may need to reach into all sorts of Eval() methods to update them.

One semi-alternative is to borrow the idea of the Lisp 'progn' special operator, which groups a bunch of statements ('forms') together. If you put all multi-statement things inside an implicit 'progn' node, then only the 'progn' Eval() method has to handle the logic of checking the sub-node Eval() result for early exits after every statement. However, your loop nodes and similar things will still have to handle whatever type of early exit they may get handed by their progn child node. You'll probably also need to modify your syntax tree during parsing to insert these implicit progn nodes, moving the syntax tree you interpret further away from the 'natural' tree you get from parsing.

None of this is insurmountable. But it adds complexity and that complexity is distributed through the interpreter. What was a simple and clean idea at the start has grown warts.

programming/InterpreterEvalAndControlFlow written at 21:36:32;


Page tools: See As Normal.
Search:
Login: Password:

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.