Project 5: Dynamic memory in C

The goal of this project is for you to practice using pointers and dynamic memory in C. Along the way, it will help you to think through how strings work in C, and also to think carefully about hashing and maps.

This is a longer project, so be sure you leave plenty of time for it.

Set up

Find the given code for this project in ~tvandrun/Public/cs245/proj5 . There are two sub-directories, one for each of the two parts of this project. So, you will need to use the -r flag when you do the copying.

cp -r ~tvandrun/Public/cs245/proj5 .

Part I: Homemade strings in C

In the hmstring directory, write "homemade" versions of standard C string functions. In all of these, use the end-of-string marker to determine the boundaries of the strings. In all of them, you may assume there is enough memory allocated to do what is required; that is the responsibility of the code calling these functions.

hmstrcpy(): Copy all of the characters in src into dst. Use the end-of-string marker to determine the boundary of the string.
hmstrlen(): Determine the number of characters in str
hmstrcmp(): Determine which string comes first in "lexicographical" order.
hmstrcat(): Modify s by adding the characters found in append.
hmstrcat2(): Implement a different kind of function---unlike the C functions but more like you would find in Java---that leaves the two given char arrays unchanged but allocates a new char array and populates it with all the characters of pre followed by all the characters of app. In other words, it should make a new string which is the concatenation of the two.

Part II: Implementing a hash map in C

1. Introduction

In this part, you will implement a hash map in C, similar to the StringHashMap written in Java that we saw in class.

However, your hash map will have one extra feature: When your code detects that the buckets are full, either because there is imbalance or simply because the hashmap itself is full, you will rehash---make a bigger array of buckets, and redistribute the items.

The given code can be found in the pt2 directory. The program driver.c exercises your hashmap, using it to associate countries with capitals. It then tests containsKey() positively and negatively, tests rem() (remove), and an iteration over the hashmap.

(If you look at the list of countries, you'll notice some entries like Northern Ireland, Puerto Rico, Transnistria, and Palestine, which for various reasons (status, independence, recognition, etc) raise the question of how we define country. I simply grabbed this list from Wikipedia's list of national capitals, which you can check out yourself if you're interested in the political status of any of the nations included. No political message is intended on my part by including or excluding any entity.)

2. The struct

Your first task is to finish the struct hashmap_t. You will need to think through this tasks carefully, as it will be a little different from the class StringHashMap from class---for example, since C arrays do not carry their own length. You may find that your first attempt is incorrect and that you will have to revise this struct as you go along. One thing in particular to think about is that you will need to know how many items are in each bucket to monitor how balanced the hashmap is.

I have provided a node_t struct, the one that I used in my solution. You may chose to modify it, however.

3. `create()`

Write the create() function, thinking carefully about all the things in your struct that need to be initialized.

4. `hash()`

The function hash() does not appear in hashmap.h because the client code does not need to use it. You'll need to write it for use in hashmap.c, though. It requires the number of buckets, in order to compute the index properly. You may use the hash algorithm found in the in-class example, or you may research an implement a better one (if you do, document it).

The rehash() function comes next in the file, but I recommend putting that one off until you have more experience from writing some of the other functions.

5. `getNode()`, `put()`, `get()`, `rem()`, and `containsKey()`

In some ways, these are the "easy" ones, because they will be somewhat similar to the versions in the Java example from class. Note, however,

Your struct will carry around more information than the class StringHashMap did. Make sure you keep this information up to date with the other changes effected on the hash map.
The function rem() is to remove items, but one thing that is different is that this should also return the value of the association being removed. Moreover, in rem(), don't forget to free anything that is no longer in use.
Use strcmp from string.h to do comparisons.
put() will be different: after you add the new association, you need to determine whether it's time to rehash. Here's the criteria: If the total number of items exceeds five times the number of buckets or if any individual bucket contains more than 10 associations, then rehash. (Hint: If you maintain an invariant that no bucket exceeds 10 associations, then all you need to do is check the bucket you just added to, since it's the only one that could exceed 10.)

6. `rehash()`

This is one of the hardest parts of the project. Think carefully how you can make a new set of buckets and redistribute the items.

(It might make your job easier to make "temporary" hashmap and make use of your put, rem, get, and keys() functions---but be careful. Ending your function with map = temp will not work. You need to modify the hashmap "object" that the parameter map points to.)

Also, don't forget to free things no longer in use.

7. `numKeys()`

This is an extra little function so the client code can determine the number of keys; it's necessary for iteration over the keys.

8. `keys()`

Since there is no equivalent to iterators in C (unless you're really clever), I've specified this project so that there will be this function which returns an array of all the keys. Look at the driver to see how this is used. Notice that this function must allocate a new array, and it is the driver's responsibility to free it.

9. `destroy()`

Finally, there's a lot to clean up (and null-out): nodes, array (or arrays), and the entire struct.

To turn in

Turn in a hard copy of a script showing all your code and the results of showing the drivers.

DUE: Wed, Apr 6, at 5 PM.

Thomas VanDrunen

Last modified: Mon Apr 4 16:38:44 CDT 2011

Project 5: Dynamic memory in C

Set up

Part I: Homemade strings in C

Part II: Implementing a hash map in C

1. Introduction

2. The struct

3. create()

4. hash()

5. getNode(), put(), get(), rem(), and containsKey()

6. rehash()

7. numKeys()

8. keys()

9. destroy()