Project: Perfect hashing

Special note: This project has the same code base as the linear probing project. However, I made a change to the given code impl/PerfectHashMap.java on April 19. If you grabbed the code for the linear probing project before that date (or before around 2:30 on that date), please grab a fresh copy of /homes/tvandrun/Public/cs345/hash/impl/PerfectHashMap.java before you start.

1. Introduction

The goal of this project is to understand the "perfect hashing" strategy for optimizing hash tables when all the keys are known ahead of time. This project is analogous to the optimal BST project; in both cases we take advantage of knowing the keys of the map before anything else happens, and I anticipate students finding both projects to be among the hardest of the semester. Perfect hashing, however, does not involve dynamic programming.

2. Set up

The code base for this project is the same as in the linear-probing project (except that I made an update to impl/PerfectHashMap.java on April 19). In this project your work is in the class PerfectHashMap.

3. Implementing perfect hashing

You are essentially finishing two classes: not only PerfectHashMap itself, but also its member class SecondaryMap. The methods for the map operations (put, get, containsKey, and remove) are implemented already. Originally I had planned on leaving that for you, but I then decided that those would only take up time and not be much of a challenge. So, read those methods to make sure you get how the primary table and secondary tables interact. What is left for you is the the interesting part: the constructors for the two classes and the iterator.

The constructor for PerfectHashMap involves

Finding a prime number greater than all the keys. Since you don't know what the keys are (presumably they are not integers), you need to convert them to integers first. Use the Java-given hashCode() method for this. Of course make that number positive using & 0x7fffffff. I also recommend forcing them into a smaller range by mod'ing them by some arbitrarily chosen upper bound, but make sure that all the values are unique. (I mod'd them by 100 actually; all the keys in the testcases give you a unique value for (key.hashCode() & 0x7fffffff) % 100.)
Making a hash function for the primary map (see UniversalHashFactory.makeHashFunction())
Determining which keys are going to end up in which buckets
Making the secondary maps. Note that in making the secondary maps, you need to pass the SecondaryMap constructor the keys that will end up in that secondary map---which requires you to calculate which keys will end up in which map.

The constructor of SecondaryMap involves generating new hash functions until you find one that has no collisions for the keys (in addition to initializing the instance variables).

The test for this class is PHMapTest. After writing those constructors (ie, before writing the iterator), all the test cases that don't have the word iterator in their name should pass.

The iterator is an interesting problem since it requires you to iterate through an array of (secondar) hashtables. In my own solution, I wrote an iterator for SecondaryMap and make the iterator for PerfectHashMap to be an "iterator of iterators", that is, an iterator over the current secondary table is part of the state of the iterator of the primary table. But, as is noted in a comment, it isn't required that SecondaryMap has an iterator at all. You may choose to write your iterator a different way. But don't simply save the big list of all keys given to the constructor and iterate through that. The iterator of PerfectHashMap should return just those keys currently associated with something in the map, not necessarily all possible keys.

4. Turn in

Copy the file you modified (PerfectHashMap) to your turn-in folder /cslab.all/linux/class/cs345/(your id)/perfecthash .

To keep up with the course, this should be finished by April 27.

Thomas VanDrunen

Last modified: Tue Apr 19 14:31:54 CDT 2016