Lab 14: Parse trees

Finish lab 13 before starting this one.

The goal of this lab is to practice writing operations for binary trees, as an application of and variation on linked lists.

1. Set up

As usual, clone the repository for this lab and then cd into it.

hg clone /cslab/class/csci235/lab14
cd lab14

Note that because it is a repository, you should feel free to commit your changes at any point that you deem appropriate.

2. Introduction

In class we have looked at lists, where each node has one node following it. We can use links to make more general structures, of which the tree is a particularly useful kind. In a tree, a node can have multiple children; the case of at most two children is called a binary tree. If we draw a picture with each node above its children, we get something like this:

        1
      /   \
     2     3
    / \   / \
   4   5 6   7

We call the node at the top, which has no parent, the root, and the nodes at the bottom, that have no children, are called leaves. (Yes, computer scientists grow trees upside down.)

Change into the tree subdirectory and look at the code there. You'll see that there are two classes, TreeNode and TreeDriver. Look at the main method in TreeDriver to see how it builds the tree drawn above. (Note that, in general, either child of a node might be null.) Note that the node 2 is the left child of the node 1, but it is also the root of a subtree made up of it and its children (and its children's children...).

Trees are natually traversed in a recursive manner. Fill in the method count() so that it tells you how many nodes there are in the subtree rooted at the node on which it is called. Note that the number of nodes in a tree is 1 (for the node) plus the number of nodes in its left subtree, plus the number of nodes in its right subtree.

When walking a binary tree recursively, you have three choices for when to process a node relative to its children. If you do the node before the children, it is called preorder; if you do it after the children, it is called postorder; and if you do a node between its children, it is called inorder. Fill in the three display methods so that they print out each node's datum in the appropriate order. (You can print them on one line, but put a space between them.)

Finally, add a few more nodes to the construction of the tree so that there are nodes that have only a left child or only a right child. Make a script of your program running.

3. Parsing

Change into the other directory for this lab; from the tree directory, you can do that with

cd ../parse

(Did you notice that directories for a tree, and that .. names the parent?)

One of the files there is ExprStringSlicer.class, which saves you some work. It would be a good idea if you make it harder to discard, which you can do by taking away your own permission to write it. The command to do that is

chmod u-w ExprStringSlicer.class
(A brief explanation: "chmod" means "change mode"; "mode" is another name for permissions. The next part "u-w" says to take away write (w) permission from the user (u) who owns the file--which is you.)

One use for trees is in the grammatical (or syntactical) analysis (or "parsing") of languages, both human languages and programming languages. As a simple example, the following grammar describes the language of fully parenthesized arithmetic expressions:

expression --> integer | ( expression op expression)

op --> + | - | / | *

This means "An expression is either a single integer, or it is a left parenthesis, a (sub-)expression, an operator, another (sub-)expression, and a right parenthesis. An operator is a plus, a minu, a slash, or a star." According to this grammar, the following are expressions:

  • 15
  • (42 - 17)
  • ((18 * 10) / 147)
  • ((14 * 91) + (22 - 17))

    These can be represented as trees. (We can ignore the parentheses now, since the information they give about grouping is contained in the structure of the tree itself.)

    15      -        /           +
           / \      / \        /   \
         42   17   *   147    *     -
                  / \        / \   / \
                18   10    14  91 22  17
    

    Expression can be modelled by a node-like interface with two implementing classes: one for leaf nodes containing integers, and one for nodes containing an operator and having two children. Evaluation of the expressions can be performed with a depth-first post-order traversal of the tree.

    3a. ExprNode classes

    Your first task is to design classes that model the two kinds of nodes, implementing the ExprNode interface. These nodes will have to have evaluate() methods, but we'll deal with that later.

    So, write two classes, one for leaf/integer nodes, the other for non-leaf/operation nodes. What instance variables will they have?

    (So that your repository knows about the new files, you will want to either use hg copy to make them as copies of ExprNode.java or hg add if you make them some other way.)

    3b. Building expression trees

    Next, you will need to write code to build an expression tree given a String containing a fully-parenthesized expression. The driver program Interpreter is intended to work so that the user can write a expression in the command-line, and the program will build an appropriate expression tree, and then evaluate the expression using the tree. The main method in Interpreter expects the method parse() to turn a string into the appropriately constructed tree of the node classes that you wrote above.

    One thing you will find useful is a method Dr. Van Drunen wrote and included in the code you copied. In the class ExprStringSlicer, the static method slice() takes a String, assumed to contain an expression in our grammar. It will return an array of Strings: if the String passed to it is just an integer, then it will return an array with exactly one element, a String version of that integer; if it is passed a String with a parenthesized operation over subexpressions, it will return an array with three elements, the first subexpression, the operator, and the second subexpression. For example, given "5", it will return { "5" }. Given "((14 * 91) + (22 - 17))", it will return { "(14 * 91)", "+", "(22 - 17")}.

    (Extra credit: Do not use the provided ExprStringSlicer class. Write your own slice() method.)

    Note that ExprStringSlicer.slice() will thrown one of several unchecked exceptions if you hand it a malformed string; the most likely is StringIndexOutOfBoundsException. It is reasonable for your constructors to throw similar exceptions (e.g., NumberFormatException) if the arguments passed to them are not well-formed.

    When you test out your program, you will need to put quotes around the string you give to Interpreter. For example, you could run the program with

    java Interpreter "((14 * 91) + (22 - 17))"
    

    The quotes tell the shell to pass ((14 * 91) + (22 - 17)) to Java as a single String (it will be the zeroth item in args). Otherwise every space would be assumed to begin and end a separate String; furthermore, the parentheses would really mess things up.

    3c. Evaluating expression trees

    Finally, the interpreter calls the evaluate() method on the trees you have built. Write the evaluate methods so that they return the integer value of the given expression.

    Create a script file in which you remove the .class files for your sources (do not remove ExprStringSlicer.class!) and then compile and run your program for several different expressions.

    4. Turn in

    To turn in all of your files, get in the lab14 directory (the one containing directories tree and parse) and hand in the entire directory with
    /cslab/class/csci235/bin/handin lab14 .
    
    Note the dot, which refers to the current directory.
    Thomas VanDrunen, Cary Gray
    Last modified: Tue Nov 8 11:23:49 CST 2011