Project: AVL Trees

1. Introduction to this and the following two projects

[The following talks about "three projects." Only the first two of the three are assigned projects to be turned in. The third (one left-leaning red-black trees) is recommended for practice, but does not need to be turned in.]

The goal of these next three projects (AVL trees, traditional red-black trees, and left-learning red-black trees) Is to understand how these three balanced binary search tree strategies work by implementing them. The three projects will use the same code base, and in some ways can be seen as a single, big project. Splitting them up into three projects, however, will help you spread out the work on them; for example, you are encouraged to start work on the AVL tree project (described here) after we learn AVL trees rather than waiting until we learn all the varieties of trees.

One part that is missing from this series of projects is an experimental section. I haven't developed that part enough yet. You are encouraged to write experiments to compare the running time for your own learning. (Caution: you'll probably find that you need fairly large amounts of data to see the effects.)

2. Set-up of this and the following two projects

As mentioned before, the code base is the same for all three projects, and this section will give an overview of the whole code. Copy the given code from ~tvandrun/Public/cs345/bal-tree and make an Eclipse project for it. As usual, you will find adt, impl, and test packages.

The most important part of the adt package is the Map interface, which has a slight modification from the previous versions we've seen: the remove() method signature has (ironically) been removed.

Because the various kinds of balanced BSTs have a fair amount of code in common, we have a complicated type heirarchy:

BasicIterativeBSTMap does not share any code with the other types, since it takes a completely different approach. In fact, it's not even included with the given code of this project, though we have seen it in class.

The abstract class AbstractRecursiveBSTMap contains all the code for manipulating a binary search tree except for anything that would verify that the tree meets the properties of various balancing strategies and any code that would fix up a tree that is out of balance. That is deferred to the child classes. The child class BasicRecursiveBSTMap implements verificaation and fixup by... doing nothing.

The abstract classes AbstractAVLBSTMap, AbstractRedBlackTreeMap, and AbstractLLRedBlackTreeMap contain code for verifying the properties of AVL trees, general red-black trees, and left-leaning red-black trees, respectively. The classes AVLBSTMap, TraditionalRedBlackTreeMap, and LLRedBlackTreeMap each provide the fix-up code---or they will, once you finish them, since that is your task in the three projects.

The reason for separating the verification and fixup code into different levels of the class hierarchy is to prevent students for submitting code that is wrong (doesn't rebalance properly) but appears correct (because the code to check if the trees are balanced is wrong). In the set up as given, you will modify and submit the files that do the fix up, not the files that do the verification. Accordingly, you should not modify the verification code found in the abstract classes or, if you do, know that your changes will not be used in grading your project.

However, since these are implemented recursively in the nodes, the type hierarchy for nodes is just as important and even more complicated:

This mainly mirrors the type hierarchy for the tree classes, but with another dimension: The trees are not going to have acutal null references, since that would require extra checks every time we use a link. Instead, "null" links will be references to special objects called null nodes. The advantage is that these objects can respond to the same methods as real nodes. Hence for every kind of tree, we have both a null node class and a "real" node class.

Take some time to understand how AbstractRecursiveBSTMap and its node classes are set up and how their code works. In the node classes in particular notice realHeight(), countLeaves() and totalDepth() to compute simple statistics about the trees. verify() is to check that the tree meets certain conditions, which will be different for each kind of tree. Look carefully at how AbstractNullNode implements these things.

AbstractRealNode, on the other hand, has an additional method signature called fixup(). Most of your work in each of these three projects will be writing implementations for this, to rebalance the tree when it is in violation of the balance properties.

3. AVL trees

Turn your attention to AbstractAVLBSTMap. The interface AVLNode defines some additional operations for the nodes of AVL trees. AVL tree nodes will store information about the size, height, and balance of the subtree rooted at that node. Note that "balance" is defined as an integer which is the left height minus the right height. Thus if the subtrees have the same height, balance is zero. That doesn't mean the subtee is perfectly balanced, since the left and right subtrees might themselves be off balance. But it means that there is no problem with respect to each other.

These attributes could be computed on demand, recursively. However, that would require traversing the whole (sub-)tree, which would kill performance---it would defeat the purpose of using binary search trees. So instead we store that pre-computed information in the nodes. However, that information could become out of date when an insertion is made or when the tree rotates. We'll need to recompute those values. But even then, we don't want to traverse the whole tree; we recompute a node's height, size, and balance by assuming the node's children's values are correct and recomputing based on those values (for example, subtracting the children's heights to get a new balance value). We'll call that a soft recompute.

By contrast a hard recompute is when we traverse the whole tree and recompute all the attributes of all the nodes, brute-force. We would do this only when debugging. When running for performance, we would never need nor want to do this.

The verify() method soft-recomputes and then checkes that (recursively, each node) has a balance no more than one a way from zero. Otherwise an ImbalanceException is thrown, with a message that will hopefully help with debugging.

The only thing left undone in AbstractAVLBSTMap and its node classes is fixup(), which is triggered by put() in AbstractRecursiveBSTMap.AbstractRealNode. That's...

4. Your task

Write the body of AVLBSTMap.AVLRealNode.fixup(). This is a hard task; my solution took around 60 lines of code, and there's no other way to do this than work through the details of the various cases. Here's a way to organize it:

This method needs to return a node, which will take the place of the node on which it is called. If no rotation is necessary, the node on which it is called is its own "replacement". That's why replacement is initialized to this.
Is the balance negative and worse than -1? Then
- Determine whether to do a right-left rotation.
- Whether or not you do a right-left rotation, check whether you should (instead or additionally) do a right-right rotation.
Do something analagous but mirror-imaged if the balance is positive and worse than 1.

Other hints:

I used variables with names like oldRight and oldLeftRight to keep track of nodes in relation to this while doing rotations.
Don't be afraid to use assert. If you think something has to be a certain way (balance is in or outside a certain range, a node has to have a certain type), then assert it.
Node references will be AVLNode, which means the objects they refer to could be AVLNullNode. This makes a difference because you'll need to get at a node's left and right, but AVLNullNode doesn't have them. If you think you need to get a node's left and right, then you better be sure it is not null. In that case, cast it. Make a new variable of the type AbstractAVLRealNode, and then you can get at its left and right. For example:
```
        AbstractAVLRealNode oldLeft = (AbstractAVLRealNode) left;
        AbstractAVLRealNode oldLeftRight = (AbstractAVLRealNode) (oldLeft.right);
```
Make sure you understand how casting works! This doesn't change any node object to be a different type, and this doesn't make any new objects. This merely asserts that left is an AbstractAVLRealNode and uses it as such. This won't work (ie, it will throw a ClassCastException) if the object left refers to is not an instance of a subtype of AbstractAVLRealNode.

Test using AVLBSTMTest.

5. Turn in

Copy the file you modified (AVLBSTMap) to your turn-in folder /cslab/class/cs345/(your id)/avl .

To keep up with the course, this should be finished by March 17.

Thomas VanDrunen

Last modified: Wed Mar 1 10:44:38 CST 2017