Assignment 3: Boa: Adding new operators

8.14

Assignment 3: Boa: Adding new operators🔗

Due: Thursday 01/30 at 8:59pm

git clone

In this compiler, you’ll enhance your existing work with Binary Operators and Arithmetic.

Reminder: Test names cannot have spaces; this is due to the way the Makefile relies on test names being used for filenames.

1 The Boa Language🔗

1.1 Concrete Syntax🔗

The concrete syntax of Boa is:1To read the given parser.mly file, start from the line that says %type <(Lexing.position * Lexing.position) Exprs.expr> program, which means that “the parse result of a program should be an Exprs.expr value” (decorated with a pair of start- and end-positions), and then %start program means that the entirety of the file should be parseable as a program. The %token lines indicate the literals in the program (and, optionally, their corresponding types), and the %left line indicates that the specified infix operators are left-associative. From there, hopefully the grammar should be relatively straightforward to read. The expressions in braces on each line tell the parser “when you successfully parse the grammatical pattern on the left, construct a new value as follows...”

‹expr›: | let ‹bindings› in ‹expr› | if ‹expr› : ‹expr› else: ‹expr› | ‹binop-expr› ‹binop-expr›: | NUMBER | IDENTIFIER | add1 ( ‹expr› ) | sub1 ( ‹expr› ) | ‹expr› + ‹expr› | ‹expr› - ‹expr› | ‹expr› * ‹expr› | ( ‹expr› ) ‹bindings›: | IDENTIFIER = ‹expr› | IDENTIFIER = ‹expr› , ‹bindings›

As in Adder, a let expression can have one or more bindings.

1.2 Abstract Syntax🔗

type prim1 =
  | Add1
  | Sub1

type prim2 =
  | Plus
  | Minus
  | Times

type 'a bind =
  (* the 3rd component is any information specifically about the identifier, like its position *)
  (string * 'a expr * 'a)

and 'a expr =
  | ELet of 'a bind list * 'a expr * 'a
  | EPrim1 of prim1 * 'a expr * 'a
  | EPrim2 of prim2 * 'a expr * 'a expr * 'a
  | EIf of 'a expr * 'a expr * 'a expr * 'a
  | ENumber of int * 'a
  | EId of string * 'a

1.3 Semantics🔗

In addition to the semantics of Adder, we now have infix binary operators (addition, subtraction and multiplication), that are evaluated leftmost-innermost first (i.e., the standard left-to-right order that obeys parentheses), and conditional expressions. An EIf expression evaluates its condition, then evaluates its then-branch if the condition is non-zero, and evaluates its else-branch if the condition was zero.

To compile these expressions, we need a few more assembly instructions:

type instruction = ...
  | ISub of arg * arg
  | IMul of arg * arg
  | ILabel of string
  | ICmp of arg * arg
  | IJne of string
  | IJe of string
  | IJmp of string

The sub and imul instructions are analogous to add, that take two arguments, apply their respective operations, and place their results in RAX.2We use imul rather than mul for signed multiplication. I suppose we should name the variant IImul, then, but that’s clunky and we’re never going to need mul itself anyway... Labels let us name the first of a sequence of instructions, akin to how we label our_code_starts_here: to begin our code. The cmp instruction compares its two arguments, and sets a processor flag to EQUAL or NOT-EQUAL accordingly. This flag is something separate from any registers we’ve worked with so far, and doesn’t have an explicit name. Instead, we can test the value of this flag with the jump instructions: jne will jump control to the named label if the flag is NOT-EQUAL, and je will jump control to the named label when the flag is EQUAL. Finally, jmp will unconditionally jump to the named label.

2 Starter code for this assignment🔗

You’ve been given a starter codebase that has several pieces of infrastructure:

The types for your AST and assembly instructions (types.ml). You don’t need to edit this file.
A lexer and parser for the Boa language (lexer.mll and parser.mly), which together will produce expr values tagged by a pair of source locations (Lexing.position values) indicating the start and end of each expression. You don’t need to edit these files.
A main program (main.ml) that uses the parser and compiler to produce assembly code from an input Boa text file. You don’t need to edit this.
A Makefile that builds main.ml, builds a tester for Boa that you will modify (test.ml), and manipulates assembly programs created by the Boa compiler. You don’t need to edit the Makefile, but you will edit test.ml.3Note: the .PRECIOUS pseudo-target ensures that the Makefile won’t delete your assembly files when it’s finished, so that you can manually inspect the results afterward for debugging purposes.
An OCaml program (runner.ml) that works in concert with the Makefile to allow you to compile and run a Boa program from within OCaml, which is quite useful for testing. You don’t need to edit runner.ml.
A pretty-printer (pretty.ml) for expressions. There are a few pretty-printers in this file of different levels of detail, mostly for your edification. The simplest, string_of_expr, produces a string that’s in Boa syntax, and pretty close to the input program (with some extra parentheses, just for clarity). You should be able to read and understand this function with little difficulty; it’s a straightforward tree-traversing function. Additionally, string_of_pos will print out a textual rendition of a (start,end) pair of source positions.
The two functions ast_of_expr and ast_of_pos_expr will pretty-print the program in a form that’s closer to ML’s syntax, so as to better visualize the actual expression value you’re working with. The _pos variant will also print out the position information, if it’s available. This function is a lot trickier to read, as it relies on ML’s Format library. (Actually writing a pretty-printer is a distraction from our goal, but having a pretty printer is often a welcome debugging aid...) You’re welcome to try to figure out how this function works, in tandem with the library documentation (which, frankly, is much easier to understand when you have an example program that uses it!).
You do not need to modify this file, but you’re welcome to enhance it if you define additional types.

All of your edits — which will be to write the compiler for Boa, and test it — will happen in test.ml and compile.ml.

3 Implementing a Compiler for Boa🔗

Again, the primary task of writing the Boa compiler is simple to state: take an instance of the expr datatype and turn it into a list of assembly instructions. But since we now have more complicated expressions, we need to worry about simplifying the program, first.

3.1 Checking for scoping problems🔗

In Adder, we asked you to confirm that no Let expression bound two identifiers with the same name, and to confirm that no expression referred to an unbound identifier. You likely interspersed that code with the compiler itself. While this was fine for Adder, let’s refactor this code into something more maintainable.

Define the function check_scope (e : (Lexing.position * Lexing.position) expr) : unit that abstracts out these two checks. This function returns nothing of any consequence; it just raises any appropriate BindingErrors as its only side-effect. (Hint: you may find the function ignore : 'a -> unit to be useful.)

3.2 Tagging🔗

Let type tag = int.

Design (and test!) a function tag : 'a expr -> tag expr that gives every expression and let-binding a unique numerical id.
Hint: you will likely want to define a helper function of type 'a expr -> tag -> (tag expr * tag) that takes an expression and the starting tag to use, and returns both the tagged expression and the next available tag for use. You may also want to write a function untag : 'a expr -> unit expr that strips out whatever information is decorating the expression; this is most useful for writing small test cases.

3.3 Renaming🔗

As described in the lecture notes, ANF transformations can inadvertently introduce new shadowing errors.

Design (and test!) a function rename : tag expr -> tag expr that uniquely renames all names in the program such that there are no longer any shadowing collisions, and all name bindings are unique.
Hint: you may pick any naming convention to produce new unique names; a convenient one may be sprintf "%s#%d" name tag, where name is the original name, and tag is the output of your previous function for this node.
Hitnt: you will need a helper function that takes in an environment of renamings from original names to new unique names, which you will update in the ELet case and use in the EId case. Be careful about the ordering of this list, and about the ordering of the bindings you produce when transforming the ELet!

3.4 Converting to A-Normal Form🔗

A-Normal Form asserts that throughout a program, any function-call or operator expression contains arguments that are immediate: that is, are numbers or identifiers, and therefore don’t perform any computation of their own. Additionally, we can think of the decision in an EIf expression to be an operation, so we also need to ensure that the condition is immediate. The following function is a reliable predicate for checking this property:

let rec is_anf (e : 'a expr) : bool =
  match e with
  | EPrim1(_, e, _) -> is_imm e
  | EPrim2(_, e1, e2, _) -> is_imm e1 && is_imm e2
  | ELet(binds, body, _) ->
     List.for_all (fun (_, e, _) -> is_anf e) binds
     && is_anf body
  | EIf(cond, thn, els, _) -> is_imm cond && is_anf thn && is_anf els
  | _ -> is_imm e
and is_imm e =
  match e with
  | ENumber _ -> true
  | EId _ -> true
  | _ -> false
;;

Design (and test!) a function anf : tag expr -> unit expr that takes any tagged expression and produces a new expression that is in A-Normal Form. Because this program will include some new expressions, we’re willing to discard any decorations we had on the expression, which explains why the input could be decorated with tag information, but the output will just be decorated with unit values. You will very likely need to write one or more helper functions.
There are several ways to define this function. The simplest is to start with a helper help : tag expr -> (unit expr * (string * unit expr) list) that takes an expression and returns
- The final answer of the program
- A list of bindings that form the context that need to run as “setup,” in which the answer makes sense.
All of these expressions (the answer and the bound values) must all be in A-Normal Form.
When you need to generate fresh names (i.e., unique names that aren’t used anywhere in the expression), generate names of the form "$reason_###", where reason is “prim1”, “prim2”, “if”, etc., and the numbers come from the expression’s tag.
Then, define a helper function that takes this resulting answer-and-context pair, and collapses it into a single unit expr that is the final, converted program. (You will need to be careful about the order of bindings in your context list, and the order in which they are processed by this second helper.)
(6410 students: required; 4410 students: optional) Once you’ve defined anf as above, you may notice that it produces unnecessarily-verbose output, with extra let-bindings that could be avoided. For example, 1 + 2 should not need any let-bindings, since all the arguments are immediate. Split your help function into two mutually-recursive helpers,
- helpC can return an answer that is any ANF expression
- helpI must return an answer that is an immediate expression
such that you use as few let-bindings as possible. (For what it’s worth, in my reference solution, helpC calls helpI four times and itself three times; conversely helpI calls helpC just once and itself six times.)

3.5 Compilation🔗

Enhance your compile function from Adder to compile the new expression forms in Boa. You may want to restrict its signature to only accept tagged expressions that are in A-Normal Form. Remember that the invariant for compile is that it always leaves its answer in RAX; this invariant will definitely help you!

4 Recommendations🔗

Here’s an order in which you could consider tackling the implementation:

Write some tests for the input and output of anf for nested EPrim1 expressions to understand what the ANF transformation looks like on those examples.
Finish the EIf case for compile_expr (with immediate conditions) so you can run simple programs with if. You will also need to fill in the printing of the relevant assembly instructions, so that they can be output...
Write some tests for the input and output of performing anf on if-expressions, again to get a feel for what the transformation looks like.
Work through both the anf implementation and the EPrim2 case of compile_expr. Write tests as you go.
Work through both the anf implementation and the ELet case of compile_expr. Write tests as you go.

5 Testing the Compiler🔗

The predefined testing helpers for Boa are mostly the same as for Adder; see Assignment 2 for details. I’ve added tprog, which is similar to t except it takes the filename of a Boa input file as input, rather than source text, and teprog, which is similar but expects an error to occur. There are also tanf, which tests your anf function alone, and ta, which attempts to compile a program that’s already in ANF (so that you can test your compiler independently of testing your ANF conversion).

Testing is truly important. My initial draft of a solution for this assignment had an incredibly silly typo, that was causing my program to behave in a very odd manner. I thought I could figure it out just by looking at the final assembly output. Instead I had to go back and test each individual function, at which point the error became immediately obvious.

Write several test suites, one for each major function that you’re trying to test. Sequence them together at the end of test.ml to run them all.

6 Running main🔗

Running your own programs is the same as with Adder, except you’ll give them the .boa file extension.

7 List of Deliverables🔗

your compile.ml
any additional modules you saw fit to write
tests in an OUnit test module (test.ml)
any test input programs (input/*.boa files)

Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!

8 Grading Standards🔗

For this assignment, you will be graded on

Whether your code implements the specification (functional correctness),
the clarity and cleanliness of your code, and
the comprehensiveness of your test coverage

9 Submission🔗

Wait! Please read the assignment again and verify that you have not forgotten anything!

Please submit your homework to https://handins.khoury.northeastern.edu/ by the above deadline.

1To read the given parser.mly file, start from the line that says %type <(Lexing.position * Lexing.position) Exprs.expr> program, which means that “the parse result of a program should be an Exprs.expr value” (decorated with a pair of start- and end-positions), and then %start program means that the entirety of the file should be parseable as a program. The %token lines indicate the literals in the program (and, optionally, their corresponding types), and the %left line indicates that the specified infix operators are left-associative. From there, hopefully the grammar should be relatively straightforward to read. The expressions in braces on each line tell the parser “when you successfully parse the grammatical pattern on the left, construct a new value as follows...”

2We use imul rather than mul for signed multiplication. I suppose we should name the variant IImul, then, but that’s clunky and we’re never going to need mul itself anyway...

3Note: the .PRECIOUS pseudo-target ensures that the Makefile won’t delete your assembly files when it’s finished, so that you can manually inspect the results afterward for debugging purposes.

1	The Boa Language
2	Starter code for this assignment
3	Implementing a Compiler for Boa
4	Recommendations
5	Testing the Compiler
6	Running main
7	List of Deliverables
8	Grading Standards
9	Submission