The goal of this project is for you to practice using pointers and dynamic memory in C. Along the way, it will help you to think through how strings work in C, and also to think carefully about hashing and maps.
This is a longer project, so be sure you leave plenty of time for it.
Find the given code for this project in
~tvandrun/Public/cs245/proj5
.
There are two sub-directories, one for each of the
two parts of this project.
So, you will need to use the -r
flag when you do the copying.
cp -r ~tvandrun/Public/cs245/proj5 .
In the hmstring
directory,
write "homemade" versions of standard
C string functions.
In all of these, use the end-of-string marker
to determine the boundaries of the strings.
In all of them, you may assume there is enough
memory allocated to do what is required;
that is the responsibility of the code calling these functions.
hmstrcpy()
:
Copy all of the characters in src
into dst
.
Use the end-of-string marker to determine the boundary
of the string.
hmstrlen()
:
Determine the number of characters in str
hmstrcmp()
:
Determine which string comes first in "lexicographical"
order.
hmstrcat()
:
Modify s
by adding the characters found
in append
.
hmstrcat2()
:
Implement a different kind of function---unlike the C functions
but more like you would find in Java---that leaves the
two given char arrays unchanged but allocates a new
char array and populates it with all the characters of
pre
followed by all the characters of app
.
In other words, it should make a new string which is
the concatenation of the two.
In this part, you will implement a hash map in C, similar to
the StringHashMap
written in Java
that we saw in class.
However, your hash map will have one extra feature: When your code detects that the buckets are full, either because there is imbalance or simply because the hashmap itself is full, you will rehash---make a bigger array of buckets, and redistribute the items.
The given code can be found in the pt2
directory.
The program driver.c
exercises
your hashmap, using it to associate countries
with capitals.
It then tests containsKey()
positively and negatively,
tests rem()
(remove),
and an iteration over the hashmap.
(If you look at the list of countries, you'll notice some entries like Northern Ireland, Puerto Rico, Transnistria, and Palestine, which for various reasons (status, independence, recognition, etc) raise the question of how we define country. I simply grabbed this list from Wikipedia's list of national capitals, which you can check out yourself if you're interested in the political status of any of the nations included. No political message is intended on my part by including or excluding any entity.)
Your first task is to finish the struct hashmap_t
.
You will need to think through this tasks carefully, as
it will be a little different from the class StringHashMap
from class---for example, since C arrays do not
carry their own length.
You may find that your first attempt is incorrect and that you will
have to revise this struct as you go along.
One thing in particular to think about is that you will
need to know how many items are in each bucket to
monitor how balanced the hashmap is.
I have provided a node_t
struct, the
one that I used in my solution.
You may chose to modify it, however.
create()
Write the create()
function, thinking carefully
about all the things in your struct that need to be initialized.
hash()
The function hash()
does not appear
in hashmap.h
because the client code does
not need to use it.
You'll need to write it for use in hashmap.c
, though.
It requires the number of buckets, in order to compute
the index properly.
You may use the hash algorithm found in the
in-class example, or you may research an implement a better one
(if you do, document it).
The rehash()
function comes next in
the file, but I recommend putting that one off until
you have more experience from writing some of the other functions.
getNode()
, put()
,
get()
, rem()
, and containsKey()
In some ways, these are the "easy" ones, because they will be somewhat similar to the versions in the Java example from class. Note, however,
StringHashMap
did.
Make sure you keep this information up to date with the
other changes effected on the hash map.
rem()
is to remove items,
but one thing that is different is that this should also
return the value of the association being removed.
Moreover, in rem()
, don't forget to free anything
that is no longer in use.
strcmp
from string.h
to do comparisons.
put()
will be different: after you
add the new association, you need to determine whether it's time to rehash.
Here's the criteria: If the total number of items exceeds five
times the number of buckets or if any individual
bucket contains more than 10 associations, then
rehash.
(Hint: If you maintain an invariant that no bucket exceeds 10
associations, then all you need to do is check the
bucket you just added to, since it's the only one that could
exceed 10.)
rehash()
This is one of the hardest parts of the project. Think carefully how you can make a new set of buckets and redistribute the items.
(It might make your job easier to make "temporary" hashmap
and make use of your put
, rem
,
get
, and keys()
functions---but
be careful.
Ending your function with map = temp
will not work.
You need to modify the hashmap "object" that the
parameter map
points to.)
Also, don't forget to free things no longer in use.
numKeys()
This is an extra little function so the client code can determine the number of keys; it's necessary for iteration over the keys.
keys()
Since there is no equivalent to iterators in C (unless you're really clever), I've specified this project so that there will be this function which returns an array of all the keys. Look at the driver to see how this is used. Notice that this function must allocate a new array, and it is the driver's responsibility to free it.
destroy()
Finally, there's a lot to clean up (and null-out): nodes, array (or arrays), and the entire struct.
Turn in a hard copy of a script showing all your code and the results of showing the drivers.
DUE: Wed, Apr 6, at 5 PM.