Project: Perfect hashing

1. Introduction

The goal of this project is to understand the "perfect hashing" strategy for optimizing hash tables when all the keys are known ahead of time. This project is analogous to the optimal BST project; in both cases we take advantage of knowing the keys of the map before anything else happens, and I anticipate students finding both projects to be among the hardest of the semester. Perfect hashing, however, does not involve dynamic programming.

This is based on Project 6.4 in the book, which you should read for more details.

2. Set up

The code base for this project is the same as in the linear-probing project. In this project your work is in the class PerfectHashMap.

3. Implementing perfect hashing

You are essentially finishing two classes: not only PerfectHashMap itself, but also its member class SecondaryMap. The methods for the map operations (put, get, containsKey, and remove) are implemented already. I considered leaving that for you, but I then decided that those would only take up time and not be much of a challenge. But read those methods before starting on the unfinished code; make sure you understand how the primary table and secondary tables interact. What is left for you are the the interesting parts: the constructors for the two classes and the iterator.

The constructor for PerfectHashMap involves

Finding a prime number greater than all the keys. The provided method findMaskAndGreatestCode() will help.
Making a hash function for the primary map (see HashFactory.universalHashFunction(), not UniversalHashFactory.makeHashFunction() as the printed version of the book says).
Determining which keys are going to end up in which buckets
Making the secondary maps. Note that in making the secondary maps, you need to pass the SecondaryMap constructor the keys that will end up in that secondary map---which requires you to calculate which keys will end up in which map. Also, because of the interaction among nested classes, generics, and arrays, to create an array of SecondaryMaps, you'll need to do something like
```
     (SecondaryMap[]) new PerfectHashMap.SecondaryMap[m];
```

The constructor of SecondaryMap involves generating new hash functions until you find one that has no collisions for the keys (in addition to initializing the instance variables).

The test for this class is PHMapTest. After writing those constructors (ie, before writing the iterator), all the test cases that don't have the word iterator in their name should pass.

The iterator is an interesting problem since it requires you to iterate through an array of (secondar) hashtables. In my own solution, I wrote an iterator for SecondaryMap and make the iterator for PerfectHashMap to be an "iterator of iterators", that is, an iterator over the current secondary table is part of the state of the iterator of the primary table. But, as is noted in a comment, it isn't required that SecondaryMap has an iterator at all. You may choose to write your iterator a different way. But don't simply save the big list of all keys given to the constructor and iterate through that. The iterator of PerfectHashMap should return just those keys currently associated with something in the map, not necessarily all possible keys.

4. Turn in

Copy the file you modified (PerfectHashMap) to your turn-in folder /cslab/class/cs345/(your id)/perfecthash .

To keep up with the course, this should be finished by April 23.

Thomas VanDrunen

Last modified: Mon Apr 16 15:34:02 CDT 2018