Lab 11: Implementing a hash table in C

The goal of this lab is to practice implementing a hash table (or hash map), a final topic in the theme of abstract data types and their implementations.

This lab has some similarities to Project 3 (linked list implementation of a map in Java) and Lab 8 (sorted list map with generics, also in Java). The difference is that it will be in C and, generally, a little harder.

1. Introduction

Refer to the pre-lab reading as necessary to review the explanation of this lab.

2. Set up

Copy starter code from the public folder for this lab:

cp -r ~tvandrun/Public/cs245/lab11/* .

You will get the following files:

3. Inspecting the given code

Even though you won't be doing any coding in this phase or getting anything to work, it is very important to work through the code you're given carefully. If you don't understand what's going on in the code you're given, ask about it before you start on the next stuff.

First look at hashmap.h and driver.c to consider the interface of the hashmap and how it is used by the driver. The structs are documented carefully, make sure you understand them thoroughly.

One thing to notice in particular is the prototype for the function keys(). This function allocates and returns an array containing all of the keys in the hashmap. This is analogous to an iterator of the keyset in Java's HashMap class. But since C doesn't have iterators, we instead pass an array of keys.

Now, in hashmap.c, consider the first four functions, already complete. The function hash() takes a string and the number of buckets and computes a hash for that string using the "ASCII sum" approach described earlier.

The function create() takes a size for the array (or, number of buckets), and allocates the parts of this map. Make sure you understand why and how we are allocating both a hash map and an array of nodes, but no actual nodes. (The bucketSizes array, which is another field of the hashmap, will be used by the monitor() function.)

The function getNode() is like the method of the same name you wrote in Project 3---it's a helper function that finds a node, given a key. This is more complicated that the one in Project 3, though, because we first need to find the right bucket (using hash()), and then search that bucket. Note how strcmp is used to determine whether the key of the current node matches the key we're looking for.

Finally, containsKey() uses getNode(). It determines whether or not there exists an association for a given key.

You can run the driver program by making and running:

make driver
./driver

...but it doesn't work because you haven't written the hash map yet. (But it does compile and doesn't crash.) You can also compile and run the test program:

make test
./test

The first test, empty containsKey, works out of the box. The others fail. As you do the phases of this lab, make sure earlier tests that were working don't fail, and at no point should your lab crash with a segmentation fault.

4. The put() function

Your turn. Implement put(). If a node for the given key already exists, find it and replace the value. Otherwise, allocate (using malloc) and set the variables for a new node, and but the node in the appropriate bucket. As usual for a linked list, the easiest place to add is at the head. Also make sure the number of keys is updated.

If you were doing this in Java, it would look something like this:

        Node oldAssoc = getNode(key);
        if (oldAssoc != null)
             oldAssoc.value = val;
        else
            buckets[hash(key)] = new Node(key, val, buckets[hash(key)]);

Where the line buckets[hash(key)] = new Node(key, val, buckets[hash(key)]); should remind you of lines like head = new Node(item, head);. Of course, you're not writing in Java, so you will have to "translate" all this to C.

If you do this correctly, the second test, put containsKey, will work.

5. The get() function

The next function, get() is much easier. Again you can use getNode(). If there is no such association for the given key, this should return NULL.

If you do this correctly, the tests through #4 should work, the new ones being put get and put replace. But note that these also exercise further your code for put(); if these new tests fail, it might be a problem from the previous step.

6. The keys() function

Now write a function that will allocate (with calloc) an array to hold strings (note the return type is char**---pointer to pointer to char, that is, array of pointers to (beginnings of) arrays of chars); populate it with the keys, and return it. This will mean looping through the array of buckets and, for each bucket, looping through all the nodes.

Note it is the responsibility of the code calling this function to deallocate the array. You can see that deallocation in driver.c, for example.

When this is done, the tests through 5, populated keys, should work.

7. The destroy() function

Now write a function that will undo everything done in create() This function must deallocate all the parts of the hashmap (the individual nodes and the bucket array) as well as the map itself. But do not deallocate they keys and values--these are pointers to strings that are allocated statically.

No new tests will pass, but run the tests and make sure none of them crash.

8. One more, your choice

Finally, complete one of the following three tasks. For extra credit, complete two or all of them.

A. The remove() function

Remove the association for a given key. You can't use getNode() for this one because you need to find the node that comes before the node containing the key you're looking for. Instead you need to find the bucket where the key is (or would be), and remove the node from that bucket. You'll need to handle the special case where the node you want to remove is the head of the chain in that bucket, and otherwise loop through the list, always looking one step ahead. Again you may want to refer to your code from Project 3. Don't forget to use strcmp() to compare keys.

removeKey() should remove the value of association being removed, or NULL if the key doesn't exist. It should also deallocate the node being removed (but not the key or value).

Tests 6 and 7 (empty remove and populated remove) will test that this is done correctly.

B. The rehash() function

Write a function that will make a bigger array to replace the current buckets array, and redistribute the keys into that new buckets array.

Here's how I recommend doing it: Make a new map using create(), giving it an initial size larger than the current size. (How much larger? That's up to you.) Then iterate through all the keys (getting the keys from keys()), adding each key and value to the new map (using put() and get()). Finally, perform "transplant surgery", making the new map's bucket array and other guts to be the new guts of the old map. Deallocate everything that's not in use any more---especially the old map's old bucket array and the new map itself. But don't call destroy() on the new map, since that will also deallocate the new map's bucket array, which now belongs to the old map...

The 8th test, populated rehash, will test this.

C. The monitor() function

Write the function monitor() that will print out to the screen the number of items in each bucket and the maximum number of items in any bucket. This could be used to monitor how will the items are distributed among the buckets.

You'll notice that the hashmap struct has an array called bucketSizes that can be used to keep track of the number of items in each bucket. Revise the functions put() and removeKey() so that this array is kept current as the hashmap changes. Then it is a simple loop in monitor() to print out the sizes of these buckets as recorded in bucketSizes.

Test 9, populated monitor should have output something like this:

9. Populated monitor
0: 2
1: 4
2: 3
3: 1
4: 6
max: 6

It won't indicate pass or fail explicitly.


Thomas VanDrunen
Last modified: Mon Nov 16 10:43:39 CST 2015