The goal of this project is to practice using trees.
One important use of the tree data structure is in the intermediate representation of programming languages inside a compiler or interpreter. A typical compiler does not merely scan the source file as one giant String and come up with a compiled version directly; modern programming languages and the translation process is much too complex for that. Instead, the compiler will transform the program several times through intermediate representations--- ways to represent the program that are "intermediate" in the sense that they are not the final product of the compiler. One of the first representations that the compiler uses is a parse tree.
For example, suppose we have a simple language like we looked at in class and lab:
EXPR ::= NUM | ( EXPR OP EXPR )
Now say we have an expression in this language:
((14 - 3) * ((2 + 51) / ((3 + 1) * 4)))
We can interpret (sub)expressions that fit the first production
(NUM
) to be leaves in a tree, and other expressions
to be non-leaf nodes.
We can represent this expression, then, with the following tree.
* / \ / \ / \ - '/' / \ / \ 14 3 / \ / \ + * / \ / \ 2 51 / \ + 4 / \ 3 1
Notice that the parentheses don't appear; they were merely used for discerning the tree structure.
In some simple interpreters, trees of some sort can be used as the final representation. The program can be interpreted during a traversal of the trees.
In this program you will write an interpreter for a simple language. This language will have one-letter (lowercase) variables, and statements and expressions will be made using the following grammar:
STMT ::= VAR = EXPR EXPR ::= NUM | VAR | (EXPR OP EXPR) OP ::= + | - | * | /
Each line of the program will be a statement, except for the last line, which will be a single variable. The output of the program will be the value of the variable in the last line. For example, we could have a program
t = (32 + 21) a = (614 / (t * ((12 + t) - 45))) b = (a - t) bâ
The output of the program would be -499.
After making the directory for this project, copy the following files from the course public directory:
cp /homeemp/tvandrun/pub/245/ProcessFile.java . cp /homeemp/tvandrun/pub/245/Interpreter.java .
The first file is a class I wrote that will "tokenize" a program in the source language; specifically it will chop up the lines into Strings containing symbols, numbers, or variable names. This is for your convenience; you may decide to modify this file (or, I suppose, not use it at all). The second file contains a skeleton of the interpreter program, which you will complete.
The interpreter has four phases: tokenization (technically called "lexical analysis"), tree building (parsing or syntactical analysis), tree optimization, and tree interpretation. ˆ
The tokenization is already done for you (though, as mentioned above, you may decide to modify this).
In the next step, you will need to turn each line into a tree. More accurately, each line is statement, and a statement has two things: a target variable, and an expression tree (or the root node of an expression tree) You will need to design classes to hold this, something like
public class Statement { String targetVariable; Tree expr; } public class Tree { Node root; }
The entire program, then, may be something like a Vector of these Statements. This phase is about building trees. You need to think how to do this.
In the optimization phase, you will look for inefficiencies
in the trees.
šThis phase is about comparing and modifying trees.
Specifically, you should change them according to the following
arithmetic identities, where tr
stands for some
subexpression/subtree:
tr + tr
translates to tr * 2
tr + 0
and 0 + tr
translate to
tr
tr * 1
and 1 * tr
translate to
tr
tr - 0
and tr / 1
translate to
tr
The final phases is about traversing trees. Here you will need to interpret the program by doing a depth-first post-order traversal of the expression trees and storing the results in appropriate variables (use a HashMap for this?).
In all this, you may assume the input programs are correct (including that no variable is used before it is initialized). All values in the language are integers.
Copy all the files you made or modified to a turn-in directory I've made for you.
cp
filename
/homeemp/tvandrun/turnin/245/
{jessica,sam,chet,jan}
DUE: Thursday, Mar 29, 5:00 pm.