The goal of this lab is to practice using bit operations; in this case, we will use an ordered collection of bits to represent a dynamic set.
A few weeks ago we talked briefly about the dictionary
data structure, the most familiar implementation of which is
Java's HashMap
.
A similar data structure is the dynamic set, as implemented by
Java's HashSet
.
A dynamic set is an unordered collection of uniform-typed items with
the operations
(It is worth pointing out what's dynamic about a dynamic set.
In the mathematical concept of set, a set cannot change.
If you have a set, say A = { 1, 4, 5}
,
and union to it, say B = {2, 4, 6}
,
you produce a new set, call it C = {1, 2, 4, 5, 6}
;
you do no make any change to the set A,
just as adding 7 to 5 makes 12--- it doesn't change "5".
A dynamic set, on the other hand, is a mutable data structure.)
Some applications of dynamic sets require a few other additional operations like the operations on mathematical sets:
Bit operations can be used to make a very fast and space-efficient implementation of a dynamic set, but only under limited circumstances:
For our purposes today, we will assume we want to work with
subsets of the set {0, 1, 2, ...
n}.
The main idea is simple. For each dynamic set we keep a sequence of bits numbered from 0 to n. If the ith bit is set to true or 1, that indicates that i is in the set; false or 0 indicates that it is not.
Conceptually we want an array of bits, or as it is traditionally called,
a bit vector.
We can't literally use a traditional array because we don't have addresses
for a single bit.
We could make an array of, say, char
s and use only one
bit from each array location, but that would use 8 times as much memory as
we really need.
Instead, we will employ the bit manipulation operations we learned in class
on Monday to implement a bit vector, which will then be used to implement
a dynamic set.
Make a new directory for this lab, and then copy the files found in the class directory for this project.
cp /cslab/class/cs245/lab14/* .
The file bitvector.h
contains
the definition of the bit-vector type and the
prototype for the functions you have to write.
bitvector.c
contains stubs, and vectest.c
runs a driver program.
Open bitvector.h
and look at the
struct type BitVector_t
.
Since we don't know how many bits we'll need,
we have a pointer called vector
that
refers to the first byte of the memory area we'll
use.
size
keeps track of the actual number of
bits (not bytes) we're using.
If we need to store 10 bits, we will allocate 2 bytes;
all eight bits of the first byte will be used (for bits 0-7),
and the first two bits of the second will be used (for bits 8 and 9);
the other bits of the second byte will simply be left unused.
The type of the vector
pointer is unsigned char
*
.
The reason we chose char
is because it is small
(so we have the tightest control on how much memory we use).
The idea of an "unsigned" char
sounds strange.
The explanation is that there are 126 ASCII codes, which means that
only seven bits are used;
the eighth, leftover bit functions as a sign bit, even though
there aren't any negative ASCII characters.
Remember that char
s are still basically integers.
By modifying the type to be unsigned char
, we
indicate that we wish to use the eighth bit for non-sign information.
In bitvector.c
, implement the function
createBitVector()
.
You'll need to allocate the struct object, as well
as allocating the vector.
Think about how to determine the number of bytes you'll need
given the desired number of bits.
Also, this set should be initially empty-- think carefully
about how to do that.
At the same time, implement the destroy()
function.
Just make sure you free everything that was allocated.
Compile bitvector.c
and vectest.c
and test.
Right now vectest.c
does not do very much, but test
that everything compiles and does not crash.
Now write contains()
.
This requires you to pick out the right byte offset from the
vector
pointer, and then isolate the correct bit from
from that byte.
Uncomment the relevant section in vectest.c
,
compile, and run.
The driver will print out whether or not each number from the
universe (0-9) is in the set, and in each case it should be "no."
Write insert()
.
Now you will need to modify one of the bits (in one of the bytes).
Think carefully about how to do this using bit operations.
Notice the driver requires a two-byte vector, and it inserts 5 and 9--thus,
one bit in the first byte and one bit in the second byte.
Uncomment, compile, and test.
By now, removing and element shouldn't be that bad.
One thing that sets apart the operations union, intersection, difference, and complement is that they do not modify their operand bit vectors, but rather create new ones. For each of these, you will have to allocate new bit vectors to represent the results.
We'll do complement first, since it needs only one operand.
Make a new bit vector to return (same size as the operand), and make all
its bits to be
the opposite of the bits in the operand.
For efficiency, don't do this one bit at a time.
Do it for each byte as a whole using C's bit-wise
negation operator, ~
In other words, loop through the bytes, setting
each byte in the new bit vector to be the bit-wise negation
of the equivalent byte in the old bit vector.
Don't worry that this will also affect the unused bits in the
last byte--- since they're unused, it won't do any harm to
negate them as well.
Notice that you had to calculate the number of bytes again,
based on the number of bits.
When you do something twice, it's a sign there should be a separate
function to calculate that.
Write a function numBytes()
that takes a number
of bits and calculates the bytes required.
Replace that calculation in complement()
and
createBitVector()
with calls to that function.
Uncomment, compile, test.
Next, implement union.
This also can be done efficiently with a bitwise operator.
(Why is the method called unionV()
?
Well, union
is actually a reserved word in C.
I called this function unionV()
for "vector".)
Uncomment, compile, test.
Are you getting the hang of this? Intersection should be easy now.
Use calls to your other functions to implement difference. It's ok if you make a temporary bit vector besides the one you return-- just make sure you destroy it when you're done.
Now let's use this in a real application. The Sieve of Eratosthenes is a method for finding prime numbers. One makes a list of integers from 2 up to some specified largest number. We will cross off numbers as we find them not to be prime. Initially assume all numbers are prime, which is true at least for the first number in the list, 2. Then, starting with 2, repeatedly
Thus in the first iteration, we'll cross off every even number; in the second iteration, we'll cross off every multiple of 3; etc.
Write a program that uses one of your bit vectors to keep track of the numbers in the sieve. Your program should
Cat all your files in a typescript which also shows you running the programs. Turn in a hard copy.