The goal of this lab is to practice implementing a hash table (or hash map), a final topic in the theme of abstract data types and their implementations.
This lab has some similarities to Project 3 (linked list implementation of a map in Java) and Lab 8 (sorted list map with generics, also in Java). The difference is that it will be in C and, generally, a little harder.
Refer to the pre-lab reading as necessary to review the explanation of this lab.
Copy starter code from the public folder for this lab:
cp -r ~tvandrun/Public/cs245/lab11/* .
You will get the following files:
hashmap.h
, a header file containing the struct for
the map itself (holding the array itself, the length, etc),
the struct for the nodes, and the prototypes for the operations.
hashmap.c
, the implementation file, which is the
file you will work with.
driver.c
, a program to test out the hashmap.
This is included mainly to help with your intuition by showing you
the hashmap in use.
You also can use it for hand-testing and debugging.
test.c
, a program to do the "real" testing of
the hashmap, analogous to a JUnit testcase for a Java lab.
makefile
, a makefile for this lab
Even though you won't be doing any coding in this phase or getting anything to work, it is very important to work through the code you're given carefully. If you don't understand what's going on in the code you're given, ask about it before you start on the next stuff.
First look at hashmap.h
and driver.c
to consider the interface of the hashmap and how it is
used by the driver.
The structs are documented carefully, make sure you understand
them thoroughly.
One thing to notice in particular is the prototype for the
function keys()
.
This function allocates and returns an array containing all of
the keys in the hashmap.
This is analogous to an iterator of the keyset in
Java's HashMap
class.
But since C doesn't have iterators, we instead pass an array
of keys.
Now, in hashmap.c
, consider the first
four functions, already complete.
The function hash()
takes a string and the
number of buckets and computes a hash for that string using
the "ASCII sum" approach described earlier.
The function create()
takes a size for
the array (or, number of buckets), and allocates the parts of
this map.
Make sure you understand why and how we are allocating both a hash
map and an array of nodes, but no actual nodes.
(The bucketSizes
array, which is another field of
the hashmap, will be used by the monitor()
function.)
The function getNode()
is like the method of the same
name you wrote in Project 3---it's a helper function
that finds a node, given a key.
This is more complicated that the one in Project 3, though,
because we first need to find the right bucket (using hash()
),
and then search that bucket.
Note how strcmp
is used to determine whether
the key of the current node matches the key we're looking for.
Finally, containsKey()
uses getNode()
.
It determines whether or not there exists an association for a given key.
You can run the driver program by making and running:
make driver ./driver
...but it doesn't work because you haven't written the hash map yet. (But it does compile and doesn't crash.) You can also compile and run the test program:
make test ./test
The first test, empty containsKey
, works out of the box.
The others fail.
As you do the phases of this lab, make sure earlier tests
that were working don't fail, and at no point should
your lab crash with a segmentation fault.
put()
functionYour turn.
Implement put()
.
If a node for the given key already exists, find it
and replace the value.
Otherwise, allocate (using malloc
)
and set the variables for a new node,
and but the node in the appropriate bucket.
As usual for a linked list, the easiest place to add is at the head.
Also make sure the number of keys is updated.
If you were doing this in Java, it would look something like this:
Node oldAssoc = getNode(key); if (oldAssoc != null) oldAssoc.value = val; else buckets[hash(key)] = new Node(key, val, buckets[hash(key)]);
Where the line
buckets[hash(key)] = new Node(key, val, buckets[hash(key)]);
should remind you of lines like head = new Node(item, head);
.
Of course, you're not writing in Java, so you will have to "translate"
all this to C.
If you do this correctly, the second test,
put containsKey
, will work.
get()
functionThe next function, get()
is much easier.
Again you can use getNode()
.
If there is no such association for the given key,
this should return NULL
.
If you do this correctly, the tests through #4 should work,
the new ones being put get
and put replace
.
But note that these also exercise further your code for
put()
;
if these new tests fail, it might be a problem from the previous step.
keys()
functionNow write a function that will
allocate (with calloc
) an array to hold strings
(note the return type
is char**
---pointer to pointer to char, that is,
array of pointers to (beginnings of) arrays of chars);
populate it with the keys, and return it.
This will mean looping through the array of buckets and, for each
bucket, looping through all the nodes.
Note it is the responsibility of the code calling this function
to deallocate the array.
You can see that deallocation in driver.c
, for example.
When this is done, the tests through 5, populated keys
,
should work.
destroy()
functionNow write a function that will undo everything done in
create()
This function must deallocate all the parts of the hashmap
(the individual nodes and the bucket array) as well as the
map itself.
But do not deallocate they keys and values--these are pointers
to strings that are allocated statically.
No new tests will pass, but run the tests and make sure none of them crash.
Finally, complete one of the following three tasks. For extra credit, complete two or all of them.
remove()
functionRemove the association for a given key.
You can't use getNode()
for this one because
you need to find the node that comes before the node
containing the key you're looking for.
Instead you need to find the bucket where the key is (or would be),
and remove the node from that bucket.
You'll need to handle the special case where the node you want to
remove is the head of the chain in that bucket, and otherwise loop
through the list, always looking one step ahead.
Again you may want to refer to your code from Project 3.
Don't forget to use strcmp()
to compare keys.
removeKey()
should remove the value of association
being removed, or NULL
if the key doesn't exist.
It should also deallocate the node being removed
(but not the key or value).
Tests 6 and 7 (empty remove
and populated remove
)
will test that this is done correctly.
rehash()
functionWrite a function that will make a bigger array to replace the current buckets array, and redistribute the keys into that new buckets array.
Here's how I recommend doing it:
Make a new map using create()
,
giving it an initial size larger than the current size.
(How much larger? That's up to you.)
Then iterate through all the keys
(getting the keys from keys()
),
adding each key and value to the new map (using put()
and get()
).
Finally, perform "transplant surgery", making the new
map's bucket array and other guts to be the new guts of the
old map.
Deallocate everything that's not in use any more---especially the
old map's old bucket array and the new map itself.
But don't call destroy()
on the new map, since
that will also deallocate the new map's bucket array, which
now belongs to the old map...
The 8th test, populated rehash
, will test this.
monitor()
functionWrite the function monitor()
that
will print out to the screen the number of items in each
bucket and the maximum number of items in any bucket.
This could be used to monitor how will the items are distributed
among the buckets.
You'll notice that the hashmap
struct has
an array called bucketSizes
that can be used
to keep track of the number of items in each bucket.
Revise the functions put()
and removeKey()
so that this array is kept current as the hashmap changes.
Then it is a simple loop in monitor()
to print out
the sizes of these buckets as recorded in bucketSizes
.
Test 9, populated monitor
should have output
something like this:
9. Populated monitor 0: 2 1: 4 2: 3 3: 1 4: 6 max: 6
It won't indicate pass or fail explicitly.