Lab 10: Implementing a hash map in C

The goal of this lab is to practice implementing a hash table and to reinforce your understanding of pointers and dynamic memory in C.

1. Set up

Copy the given code to an appropriate directory:

cp ~tvandrun/Public/cs245/lab10/* .

2. Introduction

In this lab, you will implement a hash map in C, similar to the StringHashMap written in Java that we saw in class.

However, your hash map will have one extra feature: When your code detects that the buckets are too full, either because there is imbalance or simply because the hashmap itself is too full, you will rehash---make a bigger array of buckets, and redistribute the items.

First you need to become familiar with the given code (there is a lot of it). The program driver.c exercises your hashmap, using it to associate countries with capitals. It then tests containsKey() positively and negatively, tests rem() (remove), and an iteration over the hashmap.

(If you look at the list of countries, you'll notice some entries like Northern Ireland, Puerto Rico, Transnistria, and Palestine, which for various reasons (status, independence, recognition, etc) raise the question of how we define country. I simply grabbed this list from Wikipedia's list of national capitals, which you can check out yourself if you're interested in the political status of any of the nations included. No political message is intended on my part by including or excluding any entity.)

3. The struct

Your first task is to understand the structs hashmap_t and node_t in the file hashmap.h. They are a little different from the class StringHashMap from class---for example, since C arrays do not carry their own length. The struct also needs to hold the number of items in each bucket (itself an array) and the total number of items so we can monitor how balanced the hashmap is.

4. create()

Read and understand the create() function, comparing it with the hashmap_t struct.

5. hash()

Read and understand the hash() function, which is similar to the hash() method from StringHashMap.

6. getNode(), put(), get(), rem(), and containsKey()

In some ways, these are the "easy" ones, because they will be somewhat similar to the versions in the Java example from class. Read them carefully and ask if there is anything you don't understand. Note at the end of put() how rehash() is called if the number of items is more than five times the number of buckets or if the bucket to which we just added exceeds 10 (we maintain an invariant that no bucket has more than 10).

7. rehash()

Now for your task, and it's a hard one. Write rehash() Think carefully how you can make a new set of buckets and redistribute the items.

(It might make your job easier to make a "temporary" hashmap and make use of your put, rem, get, and keys() functions---but be careful. Ending your function with map = temp will not work. You need to modify the hashmap "object" that the parameter map points to.)

Also, don't forget to free things no longer in use.

8. keys()

Since there is no equivalent to iterators in C (unless you're really clever), I've specified this project so that there will be this function which returns an array of all the keys. Look at the driver to see how this is used. Notice that this function must allocate a new array, and it is the driver's responsibility to free it.

9. destroy()

Finally, there's a lot to clean up (and null-out): nodes, array (or arrays), and the entire struct.


Thomas VanDrunen
Last modified: Thu Mar 29 10:01:30 CDT 2012