Assignment 3: Boa:   Adding new operators
1 The Boa Language
1.1 Concrete Syntax
1.2 Abstract Syntax
1.3 Semantics
2 Starter code for this assignment
3 Implementing a Compiler for Boa
3.1 Checking for scoping problems
3.2 Tagging
3.3 Renaming
3.4 Converting to A-Normal Form
3.5 Compilation
4 Recommendations
5 Testing the Compiler
6 Running main
7 List of Deliverables
8 Grading Standards
9 Submission
8.12

Assignment 3: Boa: Adding new operators🔗

Due: Thursday 02/01 at 11:59pm

git clone

In this compiler, you’ll enhance your existing work with Binary Operators and Arithmetic.

Reminder: Test names cannot have spaces; this is due to the way the Makefile relies on test names being used for filenames.

1 The Boa Language🔗

1.1 Concrete Syntax🔗

The concrete syntax of Boa is:1To read the given parser.mly file, start from the line that says %type <(Lexing.position * Lexing.position) Exprs.expr> program, which means that “the parse result of a program should be an Exprs.expr value” (decorated with a pair of start- and end-positions), and then %start program means that the entirety of the file should be parseable as a program. The %token lines indicate the literals in the program (and, optionally, their corresponding types), and the %left line indicates that the specified infix operators are left-associative. From there, hopefully the grammar should be relatively straightforward to read. The expressions in braces on each line tell the parser “when you successfully parse the grammatical pattern on the left, construct a new value as follows...”

‹expr›: | let ‹bindings› in ‹expr› | if ‹expr› : ‹expr› else: ‹expr› | ‹binop-expr› ‹binop-expr›: | NUMBER | IDENTIFIER | add1 ( ‹expr› ) | sub1 ( ‹expr› ) | ‹expr› + ‹expr› | ‹expr› - ‹expr› | ‹expr› * ‹expr› | ( ‹expr› ) ‹bindings›: | IDENTIFIER = ‹expr› | IDENTIFIER = ‹expr› , ‹bindings›

As in Adder, a Let can have one or more bindings.

1.2 Abstract Syntax🔗

type prim1 =
  | Add1
  | Sub1

type prim2 =
  | Plus
  | Minus
  | Times

type 'a bind =
  (* the 3rd component is any information specifically about the identifier, like its position *)
  (string * 'a expr * 'a)

and 'a expr =
  | ELet of 'a bind list * 'a expr * 'a
  | EPrim1 of prim1 * 'a expr * 'a
  | EPrim2 of prim2 * 'a expr * 'a expr * 'a
  | EIf of 'a expr * 'a expr * 'a expr * 'a
  | ENumber of int * 'a
  | EId of string * 'a

1.3 Semantics🔗

In addition to the semantics of Adder, we now have infix binary operators (addition, subtraction and multiplication), that are evaluated leftmost-innermost first (i.e., the standard left-to-right order that obeys parentheses), and conditional expressions. An If expression evaluates its condition, then evaluates its then-branch if the condition is non-zero, and evaluates its else-branch if the condition was zero.

To compile these expressions, we need a few more assembly instructions:

type instruction = ...
  | ISub of arg * arg
  | IMul of arg * arg
  | ILabel of string
  | ICmp of arg * arg
  | IJne of string
  | IJe of string
  | IJmp of string

The sub and imul instructions are analogous to add, that take two arguments, apply their respective operations, and place their results in RAX.2We use imul rather than mul for signed multiplication. I suppose we should name the variant IImul, then, but that’s clunky and we’re never going to need mul itself anyway... Labels let us name the first of a sequence of instructions, akin to how we label our_code_starts_here: to begin our code. The cmp instruction compares its two arguments, and sets a processor flag to EQUAL or NOT-EQUAL accordingly. This flag is something separate from any registers we’ve worked with so far, and doesn’t have an explicit name. Instead, we can test the value of this flag with the jump instructions: jne will jump control to the named label if the flag is NOT-EQUAL, and je will jump control to the named label when the flag is EQUAL. Finally, jmp will unconditionally jump to the named label.

2 Starter code for this assignment🔗

You’ve been given a starter codebase that has several pieces of infrastructure:

All of your edits — which will be to write the compiler for Boa, and test it — will happen in test.ml and compile.ml.

3 Implementing a Compiler for Boa🔗

Again, the primary task of writing the Boa compiler is simple to state: take an instance of the expr datatype and turn it into a list of assembly instructions. But since we now have more complicated expressions, we need to worry about simplifying the program, first.

3.1 Checking for scoping problems🔗

In Adder, we asked you to confirm that no Let expression bound two identifiers with the same name, and to confirm that no expression referred to an unbound identifier. You likely interspersed that code with the compiler itself. While this was fine for Adder, let’s refactor this code into something more maintainable.

  1. Define the function check_scope (e : (Lexing.position * Lexing.position) expr) : unit that abstracts out these two checks. This function returns nothing of any consequence; it just raises any appropriate BindingErrors as its only side-effect. (Hint: you may find the function ignore : 'a -> unit to be useful.)

3.2 Tagging🔗

Let type tag = int.

  1. Design (and test!) a function tag : 'a expr -> tag expr that gives every expression and let-binding a unique numerical id.

    Hint: you will likely want to define a helper function of type 'a expr -> tag -> (tag expr * tag) that takes an expression and the starting tag to use, and returns both the tagged expression and the next available tag for use. You may also want to write a function untag : 'a expr -> unit expr that strips out whatever information is decorating the expression; this is most useful for writing small test cases.

3.3 Renaming🔗

As described in the lecture notes, ANF transformations can inadvertently introduce new shadowing errors.

  1. Design (and test!) a function rename : tag expr -> tag expr that uniquely renames all names in the program such that there are no longer any shadowing collisions, and all name bindings are unique.

    Hint: you may pick any naming convention to produce new unique names; a convenient one may be sprintf "%s#%d" name tag, where name is the original name, and tag is the output of your previous function for this node.

    Hitnt: you will need a helper function that takes in an environment of renamings from original names to new unique names, which you will update in the ELet case and use in the EId case. Be careful about the ordering of this list, and about the ordering of the bindings you produce when transforming the ELet!

3.4 Converting to A-Normal Form🔗

A-Normal Form asserts that throughout a program, any function-call or operator expression contains arguments that are immediate: that is, are numbers or identifiers, and therefore don’t perform any computation of their own. Additionally, we can think of the decision in an If expression to be an operation, so we also need to ensure that the condition is immediate. The following function is a reliable predicate for checking this property:

let rec is_anf (e : 'a expr) : bool =
  match e with
  | EPrim1(_, e, _) -> is_imm e
  | EPrim2(_, e1, e2, _) -> is_imm e1 && is_imm e2
  | ELet(binds, body, _) ->
     List.for_all (fun (_, e, _) -> is_anf e) binds
     && is_anf body
  | EIf(cond, thn, els, _) -> is_imm cond && is_anf thn && is_anf els
  | _ -> is_imm e
and is_imm e =
  match e with
  | ENumber _ -> true
  | EId _ -> true
  | _ -> false
;;

  1. Design (and test!) a function anf : tag expr -> unit expr that takes any tagged expression and produces a new expression that is in A-Normal Form. Because this program will include some new expressions, we’re willing to discard any decorations we had on the expression, which explains why the input could be decorated with tag information, but the output will just be decorated with unit values. You will very likely need to write one or more helper functions.

    There are several ways to define this function. The simplest is to start with a helper help : tag expr -> (unit expr * (string * unit expr) list) that takes an expression and returns

    • The final answer of the program

    • A list of bindings that form the context that need to run as “setup,” in which the answer makes sense.

    All of these expressions (the answer and the bound values) must all be in A-Normal Form.

    When you need to generate fresh names (i.e., unique names that aren’t used anywhere in the expression), generate names of the form "$reason_###", where reason is “prim1”, “prim2”, “if”, etc., and the numbers come from the expression’s tag.

    Then, define a helper function that takes this resulting answer-and-context pair, and collapses it into a single unit expr that is the final, converted program. (You will need to be careful about the order of bindings in your context list, and the order in which they are processed by this second helper.)

  2. (6410 students: required; 4410 students: optional) Once you’ve defined anf as above, you may notice that it produces unnecessarily-verbose output, with extra let-bindings that could be avoided. For example, 1 + 2 should not need any let-bindings, since all the arguments are immediate. Split your help function into two mutually-recursive helpers,

    • helpC can return an answer that is any ANF expression

    • helpI must return an answer that is an immediate expression

    such that you use as few let-bindings as possible. (For what it’s worth, in my reference solution, helpC calls helpI four times and itself three times; conversely helpI calls helpC just once and itself six times.)

3.5 Compilation🔗

  1. Enhance your compile function from Adder to compile the new expression forms in Boa. You may want to restrict its signature to only accept tagged expressions that are in A-Normal Form. Remember that the invariant for compile is that it always leaves its answer in RAX; this invariant will definitely help you!

4 Recommendations🔗

Here’s an order in which you could consider tackling the implementation:

5 Testing the Compiler🔗

The predefined testing helpers for Boa are mostly the same as for Adder; see Assignment 2 for details. I’ve added tprog, which is similar to t except it takes the filename of a Boa input file as input, rather than source text, and teprog, which is similar but expects an error to occur. There are also tanf, which tests your anf function alone, and ta, which attempts to compile a program that’s already in ANF (so that you can test your compiler independently of testing your ANF conversion).

Testing is truly important. My initial draft of a solution for this assignment had an incredibly silly typo, that was causing my program to behave in a very odd manner. I thought I could figure it out just by looking at the final assembly output. Instead I had to go back and test each individual function, at which point the error became immediately obvious.

Write several test suites, one for each major function that you’re trying to test. Sequence them together at the end of test.ml to run them all.

6 Running main🔗

Running your own programs is the same as with Adder, except you’ll give them the .boa file extension.

7 List of Deliverables🔗

Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!

8 Grading Standards🔗

For this assignment, you will be graded on

9 Submission🔗

Wait! Please read the assignment again and verify that you have not forgotten anything!

Please submit your homework to https://handins.khoury.northeastern.edu/ by the above deadline.

1To read the given parser.mly file, start from the line that says %type <(Lexing.position * Lexing.position) Exprs.expr> program, which means that “the parse result of a program should be an Exprs.expr value” (decorated with a pair of start- and end-positions), and then %start program means that the entirety of the file should be parseable as a program. The %token lines indicate the literals in the program (and, optionally, their corresponding types), and the %left line indicates that the specified infix operators are left-associative. From there, hopefully the grammar should be relatively straightforward to read. The expressions in braces on each line tell the parser “when you successfully parse the grammatical pattern on the left, construct a new value as follows...”

2We use imul rather than mul for signed multiplication. I suppose we should name the variant IImul, then, but that’s clunky and we’re never going to need mul itself anyway...

3Note: the .PRECIOUS pseudo-target ensures that the Makefile won’t delete your assembly files when its finished, so that you can manually inspect the results afterward for debugging purposes.