Physche: A Little Scheme in Pharo
Introduction
In this booklet we will build together a little interpreter for a subset of the Scheme language, that we called Physche. The idea is to implement it as simply as possible to illustrate the key aspects and share with you the fun of building language interpreters. Doing so we will explore several concepts:
- limited parsing
- basic interpreter, and
- closure concepts and implementation.
As future readings, I suggest Structure and Interpretation of Computer Programs by Abselson, Sussman and Sussman. I simply love it. There is also the excellent book of Jacques Chazarain (which is one of the persons who taught me Lisp) "Programmer avec Scheme" by International Thomson Publishing.
A more personal note. We will implement a subset of Scheme because the language is simple but not trivial, really powerful and also because I love it. And while Pharo is my favorite language, it always has the taste of a Lisp language but with lovely objects. Since I implemented several mini Schemes in Scheme, I got inspired by the (How to Write a (Lisp) Interpreter (in Python)) post of Peter Norvig to write one in Pharo for fun.
Thanks to Clément Béra for feedback on the early version and special thanks for Quentin D. for his great feedback on the understandibility and the copy edit suggestions.
Please contact me if you noticed I wrote something wrong or not fully precise.
S. Ducasse (stephane.ducasse@inria.fr) 7 June 2018
Scheme in a (super) nutshell
We will start with a limited version of Phsyche, our small functional programming language inspired from Scheme. In the first version, we will not support the definition of new functions (also called procedures). In the second version, we will support function definition and closures, in particular. We start by presenting the subset of Scheme that we will implement.
Our objective here is not to write a Scheme following the latest language specification. We will just cover a tiny subset. Purists may not like what I will write but I take this tiny subset as a pretext for a first exploration. I will only present the parts that we will implement. Phsyche does not support vectors, dotted pairs, continuations, macros.
For a fast yet more complete description of Scheme I like Teach yourself Scheme in fixnum days by Dorai Sitaram http://ds26gte.github.io/tyscheme/index.html.
Here is simple function expressing the length of a list and one example.
There is another way to define functions, but for simplicity we will focus for now on definitions making an explicit use of lambda
.
Once we defined our function, we can call it as follows:
S-Expressions
In Scheme, everything is a s-expression. A S-expression can be:
- atomic for booleans (
#t
,#f
), number (1
), symbols (we will treat strings as atomic). - compound as with lists: A list starts with an opening parenthesis
(
and finishes by a closing one)
. Lists also represent function application. - a function: functions can be normal (i.e., evaluating all their arguments) or special-forms (i.e., having special ways to evaluate their arguments. This is needed to build control-flow for example).
Values
We briefly present booleans, numbers and symbols. Then we focus on function application since this is a much more interesting concept.
Still there is one important point to raise: the value of an atomic expression is itself. We will see that the value of list is function application: it applies the value of the first element on the value of the rest of the list.
List operations
Scheme, in the path of Lisp, offers list operations such as list
to create a list, car
to access the first element of a list, cdr
to access the rest of the list, '()
to represent the empty list, and cons
to add an element as first element of a list. Here are some examples showing their use. We do not present dotted list or pairs to keep the implementation to the bare minimum.
Function application
Scheme follows a prefixed syntax (function args ...)
where the first element refers to a function and the rest are arguments whose values are passed to the function. A list represents function application.
In the code above, the function associated with the symbol +
is looked up and the values of the arguments (* 3 2)
and 5
are computed and passed to function.
By default the evaluation of a function application (a list) evaluates all its components. The function returned as value of the first element is applied to the values returned for the rest of the list.
So far, functions evaluate all their arguments before executing the function.
However, we will see later on that some other forms that look like functions should not evaluate all their arguments.
For example, this is the case of define
, lambda
, quote
and if
.
Such functions are called special-forms and we will have to define their semantics one by one.
Variable definition
To define a variable and set its value, we use the define
special form: it sets the value of the second argument to symbol represented by the first argument. Notice that define
does not evaluate its first argument, only its second one.
Defining and applying functions
To define a function we use the lambda
special form. Its first argument is a list representing the function arguments and the second argument the body of the function.
To use this function, we need to apply it an argument using the form (function args)
. The following piece of code shows how we can apply the argument 3.
To reuse a function in a program, we can assign a function to variable using define
.
Closures
Now a closure is more than a function. A closure refers to its definitionPhsyche environment.
The following example illustrates it. It defines a function that returns a function. This function (having y
as a parameter) will add x to y and x is bound to 3 due to the application of the first function.
The function y is a closure that has an environment in which the variable x is bound to 3.
Similarly the following function shows that we can modify this definition environment during function execution.
and now each time the function sy3
is executed its definition environment is modified and the
value of x
is incremented.
Quote
quote
is an interesting special form: it does not evaluate its argument but instead returns it. It is useful when manipulating lists.
For example, the following expressions returns then a list which looks like a function application but it just a list.
This operation is so current that it has a special syntax: (quote x)
can be written 'x
. Phsyche will not support the '
notation.
List as data
To manipulate a list we can simply quote it.
The following primitive functions allow one to manipulate lists: car
(to access the first element), cdr
(to access the rest of the list), cons
(to create a list) and ()
represents the empty list.
Program as data
What is particularly interesting is that the syntax of the language is centered on the syntax of lists and on the unification between lists as a data structure and function application.
In addition, what is really powerful is that quote
is the bridge between function application and data structure. Indeed, the special function quote
we can turn a program into its abstract syntax tree and we can access its elements using
normal list operations such as car
or cdr
.
In the following example, we can access the argument of the +
application by simply quoting the invocation. Then we can access its elements using plain list operators such as cdr
.
Phsyche's overview
Since we do not want to have to build a full parser, we will bend a bit the syntax of Phsyche to be compatible with the one of Pharo.
- Booleans are represented as
true
andfalse
instead of#t
and#f
. - Numbers are the ones of Pharo.
- Symbols and strings will be the ones of Pharo:
#pharo
and'pharo'
. - Lists will be represented as arrays to be able to get benefit from the scanner facilities of Pharo as we will explain right after. The empty list is represented by
#()
.
We are now ready to implement the first version of Phsyche. We will start with a first version that only let programmers to express function use and composition without having the possibility to define their own functions. Then in a second iteration we will add closures and function definition.
A simple parser for Phsyche
We will start to implement an extremely simple parser. Then we will define an interpreter for a limited version of the language.
Simple interpretation architecture
When implementing language compilers, parsing is the process that takes a text as input and produces an abstract representation of the program (see Figure ). This process is often composed of a scanner and a parser phase. The scanner cuts the text into a list of tokens. Then the parser consumes this list of tokens to build an intermediate representation such as an abstract syntax tree. This abstract syntax tree is then analyzed, annotated, and transformed by a compiler to finally generate different outputs (bytecode or assembly). The generated code embeds the semantics of the implemented language.
Besides having a compiler, we can also have an interpreter, i.e., a program that executes programs of the implemented language. The idea is that the interpreter will consume the intermediate representation and act adequately. For example, when it sees a variable definition, it will declare a binding for such variable in a structure (usually an environment).
Note that the view depicted in Figure is naive in sense that the compiler may also emit abstract instructions (for example bytecode) that will be interpreted by an (bytecode) interpreter and may be converted on the fly to assembly code. This is what the Pharo Virtual Machine does.
In our interpreter we will take a simpler route. Since Scheme syntax is simple we will just use a simple scanner and our interpreter will take as input the tokens produced by the scanner. We will use the natural structure of arrays as simple abstract syntax trees.
Starting
Let us start by defining some tests to drive the development of Phsyche's interpreter.
In the following test we see that we use the natural nesting of arrays of Pharo to represent Scheme lists.
Here we check that an empty list is recognised as an empty array.
Parsing input text
Now we are ready to implement the parse:
method. We create the class Phsyche which is the language interpreter.
To implement the parse:
method, we take advantage of the Pharo's Scanner
and the fact that we map list to arrays.
Note that this implementation can really be bent and it is absolutely not robust but it serves our teaching purpose.
As an exercise, we suggest you to represent lists with pairs as in traditional Lisp and Scheme. To do so, you will need to define a better parser.
Now we are ready to interpret the parsed programs.
Limited Phsyche
In this chapter we will build Limited Phsyche: a simple interpreter that can evaluate atomic elements and simple function application. It will not support function definition and closures.
Evaluating elementary elements
Let us start to specify the expected behavior of the language evaluation.
Now we can add the following methods to Phsyche
We define the eval:
method. It is a short cut to the main eval:in:
method.
The eval:in:
method is central to the interpreter.
For now, the eval:in:
returns its argument. Quite limited and trivial so far.
Defining a variable
We add support for the first special form: define
.
We start by supporting variable definition.
Here is a test showing the behavior we expect.
First we should add a dictionary that holds defined variables and their values.
We redefine the eval:
method as follows:
Now we define a better eval:in:
method. If the expression is a symbol, we return the value of the expression in the enviroment. Note that we do not refer to the instance variable environment
but the parameter anEnvironment
because in the future we will show that we may want to look for values in a different environment than the one of the interpreter.
When the expression is a variable definition, we define it. What you should see is that define
is a special form since it does not evaluate its first parameter only the second one.
This is what the method evalDefineSpecialForm:in:
is doing.
The following test shows that a variable points to a value.
Introducing quote
Quote is an interesting special form. It is the one that does not evaluate its argument.
In addition as we mentioned in an earlier chapter, quote
turns function application ((prog args)
)
into a manipulable data-structure: a list.
Setting up the primitives
We will introduce some primitive behavior such as addition, multiplication, and list manipulation. In this implementation of Phsyche we will define them as block closures. A more object-oriented implementation reifying the operations is possible as we will show in the latest chapter of this booklet.
We will use a dictionary to represent the environment that keep primitives names and their respective values. But let us write a test first to specify what we want to get.
We define the multiplication and addition as follows:
The method multBinding
returns a pair containing the primitive name and its associated Pharo closure.
The environment will contain a binding whose value is the primitive name (#+
) and whose value will be the corresponding block.
Primitive environment initialization
To represent the environment, the interpreter defines one instance variable environment
initialized to a dictionary. It has a primitives
collection to keep the names of the primitives.
We redefine the initialize
method to initialize the environment, the primitive name data structure (here we use an ordered collection) and to call the initializeEnvBindings
method.
The initializeEnvBindings
method automatically executes all the methods ending with 'Binding'
.
We use the method perform:
to execute a method whose name is given as argument.
We add a primitive bindings to the environment and the primitive name to the primitive name list.
The following expression returns the list of method selectors that ends with 'Binding'.
We take the opportunity to add the primitive name to the list of primitives since it will help use later during the evaluation.
Handling primitive evaluation
We are ready to implement the evaluation of a call (primitive arg1 ... argn)
. The method evalPrimitive:in:
finds the block closure and executes it with the values of the arguments. The method valueWithPossibleArgs:
executes a block closure paying attention to the possible multiple number of required arguments.
What is interesting in this method is that this is the place that defines the semantics of the evaluation of argument calls. Here we see that the arguments are all evaluated from left to right.
We should change the eval:in:
method to take into account that we have support for primitive call. What is interesting is that we have to be clear about the semantic of primitive execution, obviously. We know that we can get the closure associated to the primitive name in the environment, and a primitive should evaluate all its arguments and pass to the closure.
At this point our tests should all pass.
Some consideration
Note that for now we consider that the mathematical operations are only working on pairs and not list of elements. Another point to consider is that explicit check for primitives in the environment prevents us to overload locally their definition and this could be changed.
Some more arithmetic primitives
Here are definitions for more primitives
Adding substraction and division
We add
Adding not
Adding list primitives
We should add some primitives to manage lists such as the elementary operations cons
, car
, and cdr
.
Here are some tests to make sure that such primitives are acting as expected.
Note that we consider that cons
is only working on lists and does not produce dotted pairs.
Here are the primitives definitions.
Now we can get back to the implementation of more special forms.
Adding if
We are ready to implement if
. It has the following form: (if condition iftrue iffalse)
. if
is a special form since it does not evaluate all its arguments. Indeed, if
should only evaluate the correct argument based on the boolean value.
Well we can do better. Since numbers are auto-evaluating, this test does not verify that only one branch is evaluated.
Let us use a division by zero checks that for us as follows:
And we define the semantics of if
as follows:
And we can introduce this new method to the main eval:in:
method as follows:
All our lovely tests are passing.
As a short conclusion, our interpreter supports the execution of expression but it is really limited in the sense that we cannot add new functions. This is what we will address in the next chapter as well as the definition of closures (functions which refer to the defining environment).
Function definition and application
We will add user function (also called procedure in some languages) and function application to Phsyche. We go step by step to describe the different aspects of function definition and application. In the next chapter we will introduce closures.
About environments
Let us start with a first function definition.
We can execute the function:
Let us analyse the definition and application of the function area
.
What is important to see is that during the application (area 10)
, the argument r
acts as a local variable of the function. During (area 10)
execution, r
gets the value of the argument (here 10).
The following program checks that the argument value takes precedence over variables defined in outer scope.
We define a variable r
with 5 as value and the function area
has an argument with the same name.
The execution of (area 10)
shows several important points:
- During function application, an environment should be created with the parameter and its value. The body of the function should be evaluated within the context of such an environment.
- Second the argument value takes precedence over the value of
r
defined previously. - Third, when executing the body of
area
we should be able to access variable defined in outer scope such aspi
.
We used a simple dictionary to represent an environment to store variables and their value. But this is not sufficient because it makes the lookup of outer variables more cumbersome. We should use a list of bindings: an environment. This way the environment created during the function application can be linked to the global as shown in Figure . With environments, we can also have function application nested into function application without any problems.
Defining an environment class
An environment is kind of linked list of bindings. It can also be seen as a dictionary that when a binding is not found locally continues the binding lookup in another environment (called its parent or outer scope). Figure shows an example of environments.
Let us define some tests first
Our setup makes sure that the inner
environment is pointing to another context called outer
here.
The first test is to check that we can access the values set in each scope.
The second test is to check that we can reach the outer value from the inner scope.
The final test checks that unknown keys are not found.
We will improve the environment implementation to cover the definition of new binding but we will do that when we will add functionality to change the value of binding (i.e., when we will implement the primitive set
for example).
Implement an environment class
There are multiple ways to implement an environment.
Our implementation is just one special kind of dictionary that when it does not find the value associated to a key, looks up in its father dictionary.
We define the class PEnvironment
as a subclass of Dictionary
as follows:
We need an accessor to set the outer context.
We redefine the method at:
so that we first look up locally for a value. If the value is found, we return it, else when there is an outer environment we look the value in this outer environment. If there is no outer environment, we simply execute the default behavior which will lead to raise an error.
Now we are ready to define what a function is.
Function definition
We need a way to represent a function. A function has a list of parameter and a body. Let us start to write a test.
We define the class PFunction
to represent functions. It is for now straightforward.
Define the accessors for its instance variables. Now we will extend the interpreter to create functions. When an expression starts with a lambda keyword, we should return a function.
Note that this implementation of lambda
is not correct as it does not keep a reference to its defining environment but we will do it later when we will introduce closures.
Now Phsyche supports the definition of functions but not their application. Let us look at that now.
Function application
As we saw languages following Lisp-like syntax follow the pattern (proc args)
to mean that the function proc
is applied to the arguments args
. Such arguments are evaluated prior being passed to the function.
Let us write tests to control such a behavior.
Again we will implement function application step by step to understand the different aspects.
What is important to see is that while executing a function body, we have to have access to the interpreter environment, for example, to access primitive definitions. We have to define a new environment where we bind the arguments to their values. Such an environment will only be used for one application. It will be the inner environment of Figure .
A first function application
The not really good implementation is then the following:
The following method evalApplicativeOrder:in:
is responsible for the function application: it evaluates its first element, which will return a function.
It creates an new environment with the parameters and the values of the arguments as values and execute the body of the function within this environment.
Note that for function application, the application environment (newEnv
) has the global interpreter environment as parent. This will not be the case for closures as we will see later.
The method setEnvironmentForParameters:in:
is a simple helper function that creates a new environment based on the function parameters and their values, and an outer environment.
The class method newFromKeys:andValues:
only exists in Pharo 70. Here is its definition.
With such an implementation, all our current tests should pass. Now we are ready to implement closures.
Adding closures to Phsyche
In this chapter we add closures to Phsyche. A closure is a function capturing the environment in which it is defined. We start studying some examples and then we will implement on the closure semantics.
Studying a closure
A closure is a function which contains a reference to the environment at its definition time. Since evaluating a lambda defines a temporary environment in which the parameters are bound, we use this fact to create an environment local to a function. Let us have a look at a simple example:
This expression creates a lambda expression and apply to 7. The execution of this lambda expression leads to the creation of another lambda expression applied to 3.
The following subexpression returns a function lambda (y) ...
adding 3 to its parameter y
.
This is why the first expression result is 10. It does so by executing the first function with 3 as parameter value for x
. This first execution creates an environment in which x is bound to 3.
Then it returns a function taking y
as parameter and referring to this environment. This is why when (+ x y)
is executed, x
is bound to 3.
The following expressions illustrate the same process by defining a function fy
and executing such a function.
About let
In fact defining local environments is so frequent that Scheme and Lisp languages offer the let
special form to define local environment as follow:
You can add let
to Phsyche as an exercise.
Implementing closure
To support closures, the first point is that we should change the lambda
special form to refer to the environment in which it is defined. Indeed at the time the function is created, it holds a reference to its defining environment.
To test that lambda
effectively creates an environment, we use this expression:
Here the nested function (with y as parameter) is simply returning the value of x. And during the function execution, the value of x will be looked up in the created environment. Here is a first test.
Note that it is difficult to test the fact that a function declaration defines a new environment without using function application. This is why we are using basic introspection to access the value of the variable.
The following test uses function application to verify the creation of the environment.
What is important to see is that the function lambda (y)...
will be executed in an environment where y
is bound to 7 and this environment will have as outer environment an environment with x
bound to 3 as shown in Figure
We add the environment instance variable to the class PFunction
and set the current environment of the interpreter when creating the function in during execution of the lambda special form.
Now we should revisit function evaluation to use the function environment instead of the one of the interpreter.
Here we create the newEnv
using the function environment as expressed by proc environment
. What is important to see with closure application is that the execution environment has as outer environment the one of the function.
We have implemented the most important aspect of closure semantics. Now we will add some support to modify environments and conclude with this first version of Phsyche.
Adding set! and begin
To be able to experiment more with closures, we add support for changing the value of a binding using the set!
special form and performing a sequence of instructions using the begin
special form.
The following tests specify that any modification should happen in the environment defining the existing binding. Notice that when a binding is not in the current environment but in one of the outer environment, this is the environment that contains the binding that should be modified.
We implement a new method called lookupAt:put:
as follows:
The following simple test checks that we can change the value of a binding. We will add more complex tests once we get the begin
primitives implemented.
We are ready to implement set!
.
Our tests should pass.
Implementing begin
The special form (begin exp1 exp2 ... expn)
evaluates one after another the expressions in the list and return the value of the last one. The following tests are super simple but makes sure that all the elements are evaluated and that the result of the last one is returned.
Now the implementation of begin
is the following one.
A more complex test
The following test is a bit more complex. It shows that we can change the binding of a variable created over function application. The expression (set x (+ x 2))
modifies the value of x created by the function application. This is why even if the first value of x
is 3, it is then latter sets to 5. Then when the function fy
is applied to 5, it returns 10.
We are done.
Fun with closures
In this section, we will develop super naive and little account objects with an internal state, their balance. For this, we define a function whose captured environment will keep some state (a number) that can only be modified by an internal function.
The function makeAccount
is a function with one parameter (balance
). This function will return a function that has in its scope the balance
variable and will be able to modify it.
We can create several accounts each having its own state.
Using the same principle you can create objects having multiple functions accessing their internal state. For this the returned function should act as a case based and execute corresponding function. In fact we prefer to use real objects such as the ones in Pharo instead of simulating them.
Conclusion
This ends our naive implementation of a subset of Scheme.
In the following chapter we revisit the implementation to reduce the complexity of the eval:in:
method.
Phsycoo
This chapter is optional. In this alternate implementation we treat primitives and special forms the same way. Doing so we reduce the explicit switching logic of the eval:in:
function. We show that the core of an interpreter can take advantage of dynamic dispatch.
Basically we create an object for each case and give the object the possibility to specify the full evaluation.
Reusing tests
To make sure that Phsycoo is implementing the same functionality than Phsyche, we define PhsycooTest
as a subclass of PhsycheTest
.
In the setup we create an instance of the Phsycoo instead of Phsyche.
Now using the TestRunner you will be able to run all the tests on a Phsycoo instance.
Phsycoo interpreter
Phsycoo has the same structure and initialization than Phsyche except that primitives and special forms are uniformly represented by objects, instances of their corresponding class in the PhsycooPrimitives hierarchy as shown in Figure
Modeling primitives and special forms
Primitives and special forms are now expressed as subclasses of the root class PhsycooPrimitives
. Each subclass should define valueFor: aCollection with: anInterpreter environment: anEnvironment
and the class method tag
.
Given a collection representing a primitive or special form application, an interpreter and an environment, the method valueFor:with:environment:
should return the corresponding value or modify the internal state of the interpreter.
A tag is a simple identifier that will be used to define the primitive or special form in the interpreter environment. For plus it will be #+
.
The Plus primitive
For example the primitive plus
is expressed as follows:
The if special form
The special form if
is expressed as a subclass of PhsycooPrimitives
.
The method valueFor:with:environment:
defines the semantics of condition.
Initializing the environment
Once we have defined the primitives and special forms we should add them to the interpreter environment. For this, we specialize the initialization of the interpreter as follows:
This is where we see that the tag
method is used to populate interpreter environment.
Reconsidering eval:in:
And we are ready to have a simpler and more systematic evaluation logic. As you can see it is much more compact and regular.
There are basically four cases
- we get an empty array (list), we return it.
- we get a symbol (value), we look it up and return it.
- we get self-evaluating entities such as #t, number, we return them.
- we get a function application: we distinguish between applicative order (this is a user defined or primitive and special forms execution (non application order).
The method evaluateNonApplicativeOrder:in:
dispatches to the associate instance. Any primitive or special forms are executed via this calling point.
For applicative order, the method is the same as in Phsyche.
If we let aside the possibility to redefine primitives and special forms, there is nothing more than in Phsyche. It is just expressed differently.
Conclusion
We have implemented a more modular implementation. Adding a new behavior is just defining a new subclass with two methods.