Project: Tries

1. Introduction

The goal of this project is to understand how to use tries to implement a set (modifying this to implement a map or bag would be pretty simple).

My intention was that this project be lighter-weight, an easier one to finish up the semester with. It easily could have been a nasty one, however. But I'm going to put the hard parts into a series of "bonus" exercises, which you can try for extra credit. [Do not attempt to do the bonus exercises as a replacement for earlier exercises. That is a very bad strategy and violates the intention of "extra credit." Extra credit is for the occupation of students who have already done the required stuff, not to compensate for missing the required stuff.]

2. Set up

Find the project code for this and the next project at /homes/tvandrun/Public/cs345/trie. As usual, the code is organized into the packages adt, impl and test, but adt and imple each have only one file.

Familiarize yourself with the given code in impl/TrieSet. A trie is a tree whose nodes each have a letter and as many children (potentially) as there are letters. If a string is in the set that this trie represents, then there is a path in the tree from the root to a node following the letters in that string. But the converse isn't true: just because a path in the tree exists following the letters of a certain string doesn't mean that string is in the set: it might merely be a prefix of a string that is in the set. Thus each nodes needs a tag indicating whether it is terminal---whether it is the end of a path indicating a string in the set. Note that not all terminal nodes are leaves, though all leaves ought to be terminal nodes---otherwise they're wasting space being in the tree.

3. Implementing basic functionality

The four operations you need to implement are add, contains, size, and remove. The first two will be done iteratively in the TrieSet class itself; the others will be done recursively in the TrieNode class.

Test using test.TrieTest.

4. Bonus problems

There are a bunch of harder problems that one could solve. Too bad the semester has to end. If you have some extra time after the required part of this project (and all other projects), here are a few nifty problems to trie, I mean, try. I'll give some extra credit for each of these you do (but don't do these in place of completing an earlier project, do it for the fun/challenge).

A. The iterator

Complete the iterator operation for TrieSet. This is set up so that the iterator operation itself is delegated to the TrieNode class. That class's iterator() method returns an iterator over the strings that terminate in the subtrie rooted at that node. This means that that node's iterator will make use of the iterators returned by its children---it will be an iterator over iterators. Also, since a node's subtrie contains only the suffixes of the strings that terminate in its subtrie, then in order for the iterator to return the entire string, we need to pass the prefix to the recursive calls to iterator.

Test using TrieTestIterator.

B. Finding keys with a prefix

Don't do this one until you have the iterator working, because this will make use of the iterator. You want to return an iterable that will give an iterator over just the keys with a specific prefix. This requires getting to the root node of the subtrie containing all strings that have the given prefix and returning that node's iterator... that is, if that node exists. If there is no such subtrie (which should be the same thing as there being no strings with that prefix), then this method should make a vacuous iterator whose hasNext() returns false the first time.

Oh, and since this method returns an Iterable, not an Iterator, make sure you wrap the iterator appropriately. If that throws you off, here's how to wrap an iterator in an iterable:

return new Iterable {
   public Iterator iterator() {
        // put the stuff you would put for an iterator method here
   }
};

Test using TrieTestKeysWithPrefix.

C. Finding the longest prefix

Given a string (which might not be a key in set), find the longest string in the set, if any, that is a prefix of the given string.

Test using TrieTestLongestPrefix.

D. Finding keys that match (very hard)

This problem is similar to the iterator problem (you will, indeed, make a new iterator), except that instead of traversing the entire trie, the iterator will descend only those branches whose strings match the given pattern of characters and wild cards. For example, ELLEN and ELLIE both match the pattern ELL...

Test using TrieTestKeysThatMatch.

5. Turn in

Copy the file you modified (TrieSet.java) to your turn-in folder /cslab/class/cs345/(your id)/trie .

To keep up with the course, this should be finished by April 27.


Thomas VanDrunen
Last modified: Thu Apr 26 16:52:04 CDT 2018