The goal of this project is to understand how to use tries to implement a set (modifying this to implement a map or bag would be pretty simple).
My intention was that this project be lighter-weight, an easier one to finish up the semester with. It easily could have been a nasty one, however. But I'm going to put the hard parts into a series of "bonus" exercises, which you can try for extra credit. [Do not attempt to do the bonus exercises as a replacement for earlier exercises. That is a very bad strategy and violates the intention of "extra credit." Extra credit is for the occupation of students who have already done the required stuff, not to compensate for missing the required stuff.]
Find the project code for this and the next project at
/homes/tvandrun/Public/cs345/trie
.
As usual, the code is organized into the packages
adt
, impl
and test
,
but adt
and imple
each have only one
file.
Familiarize yourself with the given code
in impl/TrieSet
.
A trie is a tree whose nodes each have a letter
and as many children (potentially) as there are letters.
If a string is in the set that this trie represents, then
there is a path in the tree from the root to a node
following the letters in that string.
But the converse isn't true: just because a path in the tree
exists following the letters of a certain string doesn't
mean that string is in the set: it might merely be a prefix
of a string that is in the set.
Thus each nodes needs a tag indicating whether it is
terminal---whether it is the end of a path indicating a
string in the set.
Note that not all terminal nodes are leaves, though all
leaves ought to be terminal nodes---otherwise they're
wasting space being in the tree.
The four operations you need to implement are
add, contains, size, and remove.
The first two will be done iteratively in the TrieSet
class itself; the others will be done recursively in
the TrieNode
class.
contains()
can't be used until
the add()
method is implemented, you may want to do this one
first simply because it is the easiest.
This method will have a loop
that simultaneously navigates a branch of the trie
and steps through the given string (item).
For each character in the string, we move to the appropriate
child node.
If we ever hit a null child, or if we come to the end of the string
and are at a non-terminal node, then
the given item is not in the set.
If we end on a terminal node, on the other hand, then
the item is in the set.
add()
method, like contains()
,
will also navigate the tree as it steps through the characters
in the given item.
However, if will make new nodes as it goes whenever it
would enter a branch that doesn't (yet) exist.
The last node it makes (or visits, if the node is already there), needs
to be set as terminal.
size()
method in TrieNode
is a simple depth first traversal through the tree.
We simply count the number of terminal nodes.
But make sure you understand what the size()
method
means for each node: n.size()
means, count
the number of terminal nodes in the subtrie rooted at n
,
but each path in the subtrie represents not a full string in the whole
trie (unless n
is the root), but a suffix of a string
in the whole trie.
remove()
method is the hardest. It's important to
understand what the recursive remove()
message to an
individual node means.
If we want to remove "oatmeal" from the (sub)trie rooted at
n
,
then that means we want to remove "atmeal" from the subtrie rooted
at n
's o
child.
So as we navigate the tree, we remove a character from the beginning
of the item string for each recursive call.
remove()
should remove nodes
that represent "dead" branches.
remove()
method returns the node that should take
the place of the one that it is called on.
Basically that means it should return null if that branch is now
dead, or the node itself (this
) if it is still live.
Test using test.TrieTest
.
There are a bunch of harder problems that one could solve. Too bad the semester has to end. If you have some extra time after the required part of this project (and all other projects), here are a few nifty problems to trie, I mean, try. I'll give some extra credit for each of these you do (but don't do these in place of completing an earlier project, do it for the fun/challenge).
Complete the iterator operation for TrieSet
.
This is set up so that the iterator operation itself is delegated to
the TrieNode
class.
That class's iterator()
method returns an iterator
over the strings that terminate in the subtrie rooted at that node.
This means that that node's iterator will make use of the iterators
returned by its children---it will be an iterator over iterators.
Also, since a node's subtrie contains only the suffixes
of the strings that terminate in its subtrie,
then in order for the iterator to return the entire string, we need
to pass the prefix to the recursive calls to iterator
.
Don't do this one until you have the iterator working, because this
will make use of the iterator.
You want to return an iterable that will give an iterator over
just the keys with a specific prefix.
This requires getting to the root node of the subtrie containing all
strings that have the given prefix and returning that node's iterator...
that is, if that node exists.
If there is no such subtrie (which should be the same thing as there being
no strings with that prefix), then this method should make a vacuous
iterator whose hasNext()
returns false the first time.
Oh, and since this method returns an Iterable
, not
an Iterator
, make sure you wrap the iterator
appropriately.
If that throws you off, here's how to wrap an iterator in an iterable:
return new Iterable{ public Iterator iterator() { // put the stuff you would put for an iterator method here } };
This problem is similar to the iterator problem (you will, indeed, make a new iterator), except that instead of traversing the entire trie, the iterator will descend only those branches whose strings match the given pattern.
Copy the file you modified (TrieSet.java
)
to your turn-in folder /cslab.all/linux/class/cs345/(your id)/trie
.
To keep up with the course, this should be finished by April 29.