The goal of this project is to exercise several things we have learned in C recently, particularly dynamic allocation of memory and bit operations (especially as would be used in bit vectors). This project consists in three distinct parts; the third part is dependent on the second part, but the first is completely independent of the other two.
Also, note that part B is the longest. Part A will go pretty quickly (hopefully!---ask for help if it doesn't, don't spend time spinning your wheels on it), and part C will be a fairly simple application of part B.
After making a directory for this project, copy the code for this project from the course directory.
cp ~tvandrun/Public/cs245/proj5/* .
This semester we have used strings in C (of course, they're
actually char arrays), but we've never learned them
systematically. The one important thing to know is
that C strings are terminated by a special
"end-of-string" character, which you can make using
\0
.
(Note this is similar to how you make a new line character with
\n
and a tab character with
\t
.)
Note that if you have a string with, say, 10 characters, it's important that you allocate an array of at least size 11 to hold that string---the extra array position is for the end-of-line character. Why at least size 11? Because there's no problem with the array that holds the string being larger than the string that it holds (apart from wasting a little bit of memory). You may remember way back in the Chatter lab that your structs had character arrays of length 141 because the maximum message length was 140. Many messages were shorter.
C has a string library called string.h
.
However, in this project
you will write some functions for a "homemade string" library.
See hmstring.h
and hmstring.c
.
Consider the two functions that are written for you.
hmstrcpy()
, "homemade string copy."
The standard strcpy()
, which we're imitating here,
takes two pointers, which it interprets as the starting addresses of
to areas of memory and copies the contents of the second area of
memory into the the first area.
Note how we know when we're done---we reach the end-of-string
marker.
We do not know when we reach the end of the array---the
array holding the string may be longer than the string it is holding.
(Or it can be shorter.... but that's bad.)
Notice that we do copy the end-of-string marker itself.
hmstrcmp()
, "homemade string compare."
The standard strcmp()
, which, again, we're imitating,
takes two pointers to the beginning of strings and compares the contents
of the two strings, returning a negative number if the first comes first,
a positive number if the second comes first, and 0 if they are equivalent
(just like String.compareTo()
in Java).
Notice the handling of the end-of-string character.
hmstrlen()
.strlen()
that takes a pointer to the beginning of the string and computes its
length---the number of characters before the end-of-string marker.
hmstrcat()
.strlen()
does not make a new string.
Notice the return type here is void.
Instead it modifies the area of memory pointed to by
the first parameter.
Specifically, it finds the end of the string that starts at that
first address and then copies the contents of the string pointed to
by the second paremeter at the end of the first string.
Write an equivalent implementation for hmstrcat()
hmstrcat2()
.I have provided a makefile (for the whole project)
and a driver program hmstrexample.c
(for this part).
make hmstrexample ./hmstrexample
Use this to verify that your functions in hmstring
are working.
A dynamic set is a way of containing data similar
to a mathematical set except that it can change, that is, be
updated.
The most familiar implementation is
Java's HashSet
.
Specifically, a
dynamic set is an unordered collection of uniform-typed items with
the operations
(It is worth pointing out what's dynamic about a dynamic set.
In the mathematical concept of set, a set cannot change.
If you have a set, say A = { 1, 4, 5}
,
and union to it, say B = {2, 4, 6}
,
you produce a new set, call it C = {1, 2, 4, 5, 6}
;
you do no make any change to the set A,
just as adding 7 to 5 makes 12--- it doesn't change "5".
A dynamic set, on the other hand, is a mutable data structure.)
Some applications of dynamic sets require a few other additional operations like the operations on mathematical sets:
Our specific task is to implement sets of numbers,
subsets of the set { 0, 1,
... n}
.
Here's the interface if we were writing in Java:
public interface NSet { boolean contains(int i); void insert(int i); void remove(int i); NSet complement(); NSet union(NSet other); NSet intersection(NSet other); NSet difference(NSet other); }
and an implementation using an array of booleans:
public class InefficientSet implements NSet { private boolean[] array; public InefficientSet(int size) { array = new boolean[size]; } public boolean contains(int i ) { return array[i]; } public void insert(int i) { array[i] = true; } public void remove(int i) { array[i] = false; } public NSet complement() { InefficientSet toReturn = new InefficientSet(array.length); for (int i = 0; i < array.length; i++) toReturn.array[i] = !array[i]; return toReturn; } public NSet union(NSet other) { InefficientSet toReturn = new InefficientSet(array.length); for (int i = 0; i < array.length; i++) toReturn.array[i] = array[i] || other.contains(i); return toReturn; } public NSet intersection(NSet other) { InefficientSet toReturn = new InefficientSet(array.length); for (int i = 0; i < array.length; i++) toReturn.array[i] = array[i] && other.contains(i); return toReturn; } public NSet difference(NSet other) { InefficientSet toReturn = new InefficientSet(array.length); for (int i = 0; i < array.length; i++) toReturn.array[i] = array[i] && ! other.contains(i); return toReturn; } }
We, however, are going to use bit operations to make a very fast and space-efficient implementation of a dynamic set. This works only under specific circumstances:
The main idea is simple. For each dynamic set we keep a sequence of bits numbered from 0 to n. If the ith bit is set to true or 1, that indicates that i is in the set; false or 0 indicates that it is not.
Conceptually we want an array of bits, or as it is traditionally called,
a bit vector.
We can't literally use a traditional array because we don't have addresses
for a single bit.
We could make an array of, say, char
s and use only one
bit from each array location, but that would use 8 times as much memory as
we really need.
Instead, we will employ the bit manipulation operations we learned in class
to implement a bit vector, which will then be used to implement
a dynamic set.
For this part, you are given the library files
bitvector.h
and bitvector.c
and the driver vectest.c
.
The file bitvector.h
contains
the definition of the bit-vector type and the
prototype for the functions you have to write.
bitvector.c
contains stubs, and vectest.c
runs a driver program.
To compile your code and the driver and to run the driver, do
make vectest ./vectest
Note that vectest.c
has code commented out
which you need to un-comment as you go along, as it tests
different parts.
Open bitvector.h
and look at the
struct type BitVector_t
.
Since we don't know how many bits we'll need,
we have a pointer called vector
that
refers to the first byte of the memory area we'll
use.
size
keeps track of the actual number of
bits (not bytes) we're using.
If we need to store 10 bits, we will allocate 2 bytes;
all eight bits of the first byte will be used (for bits 0-7),
and the first two bits of the second will be used (for bits 8 and 9);
the other bits of the second byte will simply be left unused.
The type of the vector
pointer is unsigned char
*
.
It will be easier to think of this as an array.
Before you go one, there are a couple of important things you need to understand about the way the bit vector type is set up. Check your understanding on these before you move on, and ask for help if these don't make sense. You'll waste a lot of time in confused coding if you don't understand these:
First, note that the bit vector struct contains a pointer variable to the unsigned char array. That means when you create a new bit vector, there are two allocations: the unsigned char array and the bit vector struct value.
Second, one of the things that makes this task difficult is that there are three layers of abstraction. Going from most abstract to most concrete, they are: Conceptually we are modeling a set (level 1); we model that set using a bit vector (level 2); we implement that bit vector using an array of unsigned chars (level 3).
In bitvector.c
, implement the function
createBitVector()
.
You'll need to allocate the struct object, as well
as allocating the vector (array) itself.
Think about how to determine the number of bytes you'll need
given the desired number of bits.
Also, make sure the set is initially empty.
At the same time, implement the destroy()
function.
Just make sure you free everything that was allocated.
Compile bitvector.c
and vectest.c
and test.
Right now vectest.c
does not do very much, but test
that everything compiles and does not crash.
Now write contains()
.
This requires you to pick out the right byte offset from the
vector
pointer (ie, get the
right unsigned char from the array), and then isolate the correct bit from
from that byte.
Uncomment the relevant section in vectest.c
,
compile, and run.
The driver will print out the contents of the set.
Write insert()
.
Now you will need to modify one of the bits (in one of the bytes).
Think carefully about how to do this using bit operations.
Notice the driver requires a two-byte vector, and it inserts 5 and 9--thus,
one bit in the first byte and one bit in the second byte.
Uncomment, compile, and test.
By now, removing and element shouldn't be that bad.
(The function is called removeV()
because
one of the libraries we include already had a remove()
function.)
One thing that sets apart the operations union, intersection, difference, and complement is that they do not modify their operand bit vectors, but rather create new ones. For each of these, you will have to allocate new bit vectors to represent the results.
We'll do complement first, since it needs only one operand.
Make a new bit vector to return (same size as the operand), and make all
its bits to be
the opposite of the bits in the operand.
For efficiency, don't do this one bit at a time.
Do it for each byte as a whole using C's bit-wise
negation operator, ~
In other words, loop through the bytes, setting
each byte in the new bit vector to be the bit-wise negation
of the equivalent byte in the old bit vector.
Don't worry that this will also affect the unused bits in the
last byte--- since they're unused, it won't do any harm to
negate them as well.
Notice that you had to calculate the number of bytes again,
based on the number of bits.
When you do something twice, it's a sign there should be a separate
function to calculate that.
Write a function numBytes()
that takes a number
of bits and calculates the bytes required.
Replace that calculation in complement()
and
createBitVector()
with calls to that function.
Uncomment, compile, test.
Next, implement union.
This also can be done efficiently with a bitwise operator.
(Why is the method called unionV()
?
Well, union
is actually a reserved word in C.
I called this function unionV()
for "vector".)
Uncomment, compile, test.
Are you getting the hang of this? Intersection should be easy now.
Finally, compute set difference. Uncomment, compile, test.
Now let's use the bit-vector set you wrote in part B in a real application. The Sieve of Eratosthenes is a method for finding prime numbers. One makes a list of integers from 2 up to some specified largest number. We will cross off numbers as we find them not to be prime. Initially assume all numbers are prime, which is true at least for the first number in the list, 2. Then, starting with 2, repeatedly
Thus in the first iteration, we'll cross off every even number; in the second iteration, we'll cross off every multiple of 3; etc.
Complete the program sieve.c
so
that it uses one of your bit vectors to keep
track of the numbers in the sieve.
Your program should
Please turn in all the files you modified
(hmstring.c
, bitvec.c
,
and sieve.c
)
to the turn-in directory for this project:
cp (some file) /cslab.all/ubuntu/cs245/turnin/(your user id)/proj5
DUE: 5:00 pm Wednesday, Nov 12, 2014. This project will overlap with project 6 a few days.