Project 5: Hashing and hash maps

The goal of this project is to understand hashing and hash tables by implementing the linear probe hash map strategy and perfect hashing, and by comparing their performance experimentally.

Like project 3, there are three parts: implementing linear probe, implementing perfect hashing, and setting up an experiment.

1. Set up

Find the project code in ~tvandrun/Public/cs345/proj3. As usual, it will have three packages, adt, impl, and test. Remember that using the cp command as

cp -r ~tvandrun/Public/cs345/proj5/. .

will grab the hidden .classpath file so that Eclipse wlll but the JUnit libraries in the build path when you start an Eclipse project.

Speaking of which, make a new Eclipse project. Remember to make the project in the folder containing the adt etc folders.

In addition to the classes you need to modify (and their interfaces, and their JUnit tests), I am giving you the "basic" (separate chaining) hash map from class (and from CSCI 245), as well as the simple linear ListMap implementation of a map. These can be used in the experiments for comparison. Also included are the hash function factory, the prime source, and an implementation of a set using a list (since you may find a set to be of use at some points). Every implementation of every ADT has a JUnit test case for it (for example, LMTest is a test for the ListMap class; that way you can modify them if you want to for the purposes of your experiments and check that your modifications don't break things.

2. Implementing linear probing

I have already provided the instance variables (I suppose you may add to these, but I don't think you'll need to). You need to complete the constructor, the helper function findIndex(), put(), remove(), and iterator(). For my implementation of remove(), I used a helper method compareIdealPlace(), which I've provided a stub (and thorough documentation) for in case you find it useful.

I recommend that you develop this incrementally, saving iterator() and remove() for the end. You can test, for example, that your implementation of findIndex() and put() work before starting on the harder ones by running the JUnit test cases and looking only at the ones that rely only on what you've done so far. (I think every test case that relies on the iterator and remove() actually have "iterator" and/or "remove" in the name.)

3. Implementing perfect hashing

I again have provided the instance variables and stubs for nested classes that act as the secondary maps (and their instance variables). At first glance it may look like a lot of things need to be done, but keep in mind that all of the put, get, containsKey, and even remove methods, both in the PerfectHashMap class and the SecondaryMap class, are very simple. The interesting parts are the constructors. (The iterator is also difficult, but save that till the end.)

4. Experiments

Formulate a question about the performance of these hasing techniques and, as in Project 3, design and implement an experiment, run the experiment, and interpret the results.

Be sure you start by formulating a specific question. There are several variables you could test, several questions you could investigate:

How do the dynamic hash map strategies compare to each other, and/or to the naive list-map approach?
How does perfect hashing compare to the dynamic strategies? (Make sure it's a "fair comparison"---consider what circumstances perfect hashing makes sense for.)
How does choice of hash function affect the performance of the dynamic hash tables? This may involve some research on your part into different kinds of hash functions, implementing them, and parameterizing the hash tables to use a new hash function. (Perfect hashing as we're learning it depends on the H_pm class.)
How does rehashing affect performance? Can you get an improvement in performance by making separate chaining or linear probing rehash more aggresively? This would require you to modify the code to force more rehashes. Make sure you include the cost of rehashing in your comparison.

You may study these with either dynamic (ie, running time) or static (eg, how well things get distributed) measures, or a combination of them---whatever makes sense to answer the question you're investigating.

Write a report stating the question you investigated, explaining your methodology, presenting your results, and drawing conclusions from them.

It seemed on Project 3 that many of you had trouble identifying a specific question and/or isolating the variables you're testing. Feel free to run your idea by me ahead of time or ask for help on experiment design.

5. Turn-in

Copy the files you modified:

LinProbHashMap.java
PerfectHashMap.java
The code you wrote to run or support your experiments
Your report (pdf is preferred)

...to /cslab.all/ubuntu/cs345/turnin/(your id)/proj5

Due Monday, Apr 6, 5:00 pm.

Thomas VanDrunen

Last modified: Tue Mar 24 10:56:35 CDT 2015