The goal of this lab is to practice using bit operations in one last exercise with C; in this case, we will use an ordered collection of bits to represent a dynamic set.
A dynamic set is a way of containing data similar
to a mathematical set except that it can change, that is, be
updated.
The most familiar implementation is
Java's HashSet
.
Specifically, a
dynamic set is an unordered collection of uniform-typed items with
the operations
(It is worth pointing out what's dynamic about a dynamic set.
In the mathematical concept of set, a set cannot change.
If you have a set, say A = { 1, 4, 5}
,
and union to it, say B = {2, 4, 6}
,
you produce a new set, call it C = {1, 2, 4, 5, 6}
;
you do no make any change to the set A,
just as adding 7 to 5 makes 12--- it doesn't change "5".
A dynamic set, on the other hand, is a mutable data structure.)
Some applications of dynamic sets require a few other additional operations like the operations on mathematical sets:
Our specific task is to implement sets of numbers,
subsets of the set { 0, 1,
... n}
.
Here's the interface if we were writing in Java:
public interface NSet { boolean contains(int i); void insert(int i); void remove(int i); NSet complement(); NSet union(NSet other); NSet intersection(NSet other); NSet difference(NSet other); }
and an implementation using an array of booleans:
public class InefficientSet implements NSet { private boolean[] array; public InefficientSet(int size) { array = new boolean[size]; } public boolean contains(int i ) { return array[i]; } public void insert(int i) { array[i] = true; } public void remove(int i) { array[i] = false; } public NSet complement() { InefficientSet toReturn = new InefficientSet(array.length); for (int i = 0; i < array.length; i++) toReturn.array[i] = !array[i]; return toReturn; } public NSet union(NSet other) { InefficientSet toReturn = new InefficientSet(array.length); for (int i = 0; i < array.length; i++) toReturn.array[i] = array[i] || other.contains(i); return toReturn; } public NSet intersection(NSet other) { InefficientSet toReturn = new InefficientSet(array.length); for (int i = 0; i < array.length; i++) toReturn.array[i] = array[i] && other.contains(i); return toReturn; } public NSet difference(NSet other) { InefficientSet toReturn = new InefficientSet(array.length); for (int i = 0; i < array.length; i++) toReturn.array[i] = array[i] && ! other.contains(i); return toReturn; } }
We, however, are going to use bit operations to make a very fast and space-efficient implementation of a dynamic set; this reduces the storage for each Boolean value to a single bit. This works only under specific circumstances:
The main idea is simple. For each dynamic set we keep a sequence of bits numbered from 0 to n. If the ith bit is set to true or 1, that indicates that i is in the set; false or 0 indicates that it is not.
Conceptually we want an array of bits, or as it is traditionally called,
a bit vector.
We can't simply use a traditional array because we don't have addresses
for a single bit.
We could make an array of, say, char
s and use only one
bit from each array location, but that would use 8 times as much memory as
we really need.
Instead, we will employ the bit manipulation operations in C (and in Java)
to implement a bit vector, which will then be used to implement
a dynamic set.
You should be comfortable with the logical operators from Java (and C), which operate on individual Boolean values. Those operators are
&&
and (binary)
||
or (binary)
!
not (unary)
C and Java provide a second set of operators that work on integer values, but perform the Boolean operations bitwise—that is, on the corresponding bits in the two operands. The bitwise operators include
&
and (binary)
|
or (binary)
~
not (unary), also called complement
^
exclusive or (binary)
The meaning of A exclusive-or B is "either A or B, but not both".
There are also operators for shifting the bits within an
integer's binary representation to the left or to the right. If I
have an unsigned character i
that contains the integer
value one, its bits are 0000 0001
. If you shift those to
the left two places with j = i << i
, then
j
contains the bits 0000 0100
. A left shift
fills in with zeros on the right, and the bits shifted out on the left
are discarded.
A right shift works similarly; on unsigned values, it shifts in
zeros on the left end. So k = j >> 1
gives
k
the value 0000 0010
.
Shifts can be combined with bitwise logical operators to extract
parts of an integer value. For example, in the disassembler code you
saw in class, the register operand was in the second byte (from the
left) of the 32-bit
instruction. So we could extract that byte from an unsigned integer
u
by first shifting it to the left 16 places (two 8-bit
bytes), then anding the result with a value that has only the last 8
bits as ones. In C, that becomes
(u >> 24) & 0x000000ffOr, you could and first, then shift
(u & 0x00ff0000) >> 24
Make a new directory for this lab, and then clone the repository to get the starting code for this lab.
hg clone /cslab/class/csci245/lab12
The file bitvector.h
contains
the definition of the bit-vector type and the
prototype for the functions you have to write.
bitvector.c
contains stubs, and vectest.c
runs a driver program.
The file makefile
is a makefile to help you manage this
project.
Open bitvector.h
and look at the
struct type BitVector_t
.
Since we don't know how many bits we'll need,
we have an array (of unsigned char
s,
or, for our purposes, bytes) called vector
that
refers to the first byte of the memory area we'll
use.
size
keeps track of the actual number of
bits (not bytes) we're using.
If we need to store 10 bits, we will allocate 2 bytes;
all eight bits of the first byte will be used (for bits 0-7),
and the first two bits of the second will be used (for bits 8 and 9);
the other bits of the second byte will simply be left unused.
Note that bitvector.h
also contains a
typedef
on line 8, which makes BitVector
another name for the type struct BitVector_t
.
In bitvector.c
, implement the function
createBitVector()
.
This will return a BitVector
value;
don't think of this as an object, because it is not
returning a reference but a complete value.
The array contained in the struct is a reference, however.
Think about how to determine the number of bytes you'll need
given the desired number of bits.
Also, make sure the set is initially empty. You can use
createUCArray()
to allocate an array of unsigned characters.
(It may be helpful to
know that calloc()
, which createUCArray()
calls to get memory, guarantees that the memory that comes back is
filled with zero bits.)
Compile bitvector.c
and vectest.c
with
the command
makeand test by running
./vectestRight now
vectest.c
does not do very much: after
the program prints an incorrect answer for 3C, the first call to
insert()
will politely die.
Now write contains()
.
This requires you to pick out the right byte offset from the
vector
array, and then isolate the correct bit from
from that byte.
Compile again, then run.
The driver will now print out the contents of the (empty) set.
Write insert()
.
Now you will need to modify one of the bits (in one of the bytes).
Think carefully about how to do this using bit operations.
Notice the driver requires a two-byte vector, and it inserts 5 and 9--thus,
one bit in the first byte and one bit in the second byte.
Compile and test.
By now, removing and element shouldn't be that bad.
(The function is called removeV()
because
one of the libraries we include already had a remove()
function.)
One thing that sets apart the operations union, intersection, difference, and complement is that they do not modify their operand bit vectors, but rather create new ones. For each of these, you will have to make new bit vectors (and allocate new arrays) to represent the results.
We'll do complement first, since it needs only one operand.
Make a new bit vector to return (same size as the operand), and make all
its bits to be
the opposite of the bits in the operand.
For efficiency, don't do this one bit at a time.
Do it for each byte as a whole using C's bit-wise
negation operator, ~
In other words, loop through the bytes, setting
each byte in the new bit vector to be the bit-wise negation
of the equivalent byte in the old bit vector.
Don't worry that this will also affect the unused bits in the
last byte--- since they're unused, it won't do any harm to
negate them as well.
Notice that you had to calculate the number of bytes again,
based on the number of bits.
When you do something twice, it's a sign there should be a separate
function to calculate that.
Write a function numBytes()
that takes a number
of bits and calculates the bytes required.
Replace that calculation in complement()
and
createBitVector()
with calls to that function.
Next, implement union.
This also can be done efficiently with a bitwise operator.
(Why is the method called unionV()
?
Well, union
is actually a reserved word in C.
I called this function unionV()
for "vector".)
Are you getting the hang of this? Intersection should be easy now.
Finally, compute set difference.
Now let's use this in a real application. The Sieve of Eratosthenes is a method for finding prime numbers. One makes a list of integers from 2 up to some specified largest number. We will cross off numbers as we find them not to be prime. Initially assume all numbers are prime, which is true at least for the first number in the list, 2. Then, starting with 2, repeatedly
Thus in the first iteration, we'll cross off every even number; in the second iteration, we'll cross off every multiple of 3; etc.
Write a program that uses one of your bit vectors to keep track of the numbers in the sieve. Your program should
The provide makefile will compile and link your program if you
make sieve
bitvector.c
and sieve.c
.
script
) and run
make clean make all ./vectest ./sieve(End the script with the command
exit
.)
/cslab/class/csci245/bin/handin lab12 bitvector.c typescript