Smacc: a Compiler-Compiler
Grammar Idioms
In this chapter, we share some coding idioms for grammars that help create more compact ASTs.
Managing Lists
Smacc automatically determines if the production rules contain a recursion that represents a list.
In such case, it adds an s
to the name of the generated instance variable and manages it as a list.
Let us take an example.
Here we see that Line is recursive. Smacc will generate a class SmaccTutorialLine
with an instance variable lines
initialized as an ordered collection.
Note that, if the right-hand-side of a rule is completely empty, SmaCC does not recognise the list.
To avoid the empty right-hand-side, you should write this as follows:
Using Shortcuts
You may prefer to define your lists using the shortcuts question mark (?
) for 0 or 1 occurrences,
star (*
) for 0 or more, and plus (+
) for 1 or more, rather than with recursion.
Let's compare the two approaches.
Let's look at a grammar that defines a parameter list recursively.
If the above grammar is used to parse a list of three names, it will generate an AST node class
called SmaccTutorialParameterList
with a params
instance variable that holds an ordered collection.
However, the contents of the ordered collection will not be the three parameters.
Instead, the collection will have two elements: a parameter list (which will contain an ordered collection of two parameters),
and a parameter that contains the third.
Why? Because that's what the grammar specifies!
There is a trick that will instead generate a collection of three elements: remove the name 'param'
from after the
recursive appearace of the non-terminal ParameterList
in the second alternative for ParameterList
:
Now you will get a collection params
containing all the parameters.
You can also specify the same language using +
, like this:
Not only is this grammar easier to read, but the generated AST will contain a single collection of parameters.
If you parse three names, the result will be a SmaccTutorialParameterList
object that contains an instance variable params
that will be initialized to be an OrderedCollection
of three SmaCCTutorialParameter
nodes.
In a similar way, if you use a *
, you will get an ordered collection containing zero or more items.
However, if you use a ?
, you don't get a collection: you get either nil
(if the item was absent), or the generated node (if it was present).
Expressing Optional Features
Often, lists contain separators, which makes specifying them a little more complex.
Here is a grammar in which lists of names can be of arbitrary length, but the list items must be separated with commas.
It expresses this with the ?
shortcut.
SmaCC recognizes this idiom, and will generate an ordered collection of zero or more names.
If you want this behaviour, it is important to use the same instance variable name (here n
) for both the base case and the *
repetition.
If you use different names,
then the generated node will have two instance variables: n1
will be either nil
(if the input list is empty) or will contain the first Name
(if it is not), while n2s
will be a collection containing the remaining Names (zero when the input list has length one).
If you prefer not to use the *
and ?
shortcuts (or are using a verison of SmaCC that does not support them), you can get the same effect using recursion:
Once again, note that no name is given to the recursive use of NonEmptyNameList
.
In general, empty alternatives will be represented as nil
.
This avoids generating many useless objects.
NameList
will return nil when it matches the empty input.
If instead you want an empty NameList
node, use {{}}
for the empty alternative: