Lab 11: Implementing sets with bit vectors

The goal of this lab is to practice using bit operations; in this case, we will use an ordered collection of bits to represent a dynamic set.

1. Introduction

See the pre-lab reading to review the background information.

2. Setup

Make a new directory for this lab, and then copy the files found in the class directory for this project.

cp ~tvandrun/Public/cs245/lab11/* .

The file bitvector.h contains the definition of the bit-vector type and the prototype for the functions you have to write. bitvector.c contains stubs, and vectest.c runs a driver program. The file makefile is a makefile to help you manage this project.

3. Basic operations

Open bitvector.h and look at the struct type BitVector_t. Since we don't know how many bits (and hence how many bytes) we'll need, we have a pointer called vector that refers to the first byte of the memory area we'll use. size keeps track of the actual number of bits (not bytes) we're using. If we need to store 10 bits, we will allocate 2 bytes; all eight bits of the first byte will be used (for bits 0-7), and the first two bits of the second will be used (for bits 8 and 9); the other bits of the second byte will simply be left unused.

The type of the vector pointer is unsigned char *. It will be easier to think of this as an array.

Before you go on, there are a couple of important things you need to understand about the way the bit vector type is set up. Check your understanding on these before you move on, and ask for help if these don't make sense. You'll waste a lot of time in confused coding if you don't understand these:

First, note that the bit vector struct contains a pointer variable to the unsigned char array. That means when you create a new bit vector, there are two allocations: the unsigned char array and the bit vector struct value.

Second, one of the things that makes this task difficult is that there are three layers of abstraction. Going from most abstract to most concrete, they are: Conceptually we are modeling a set (level 1); we model that set using a bit vector (level 2); we implement that bit vector using an array of unsigned chars (level 3).

A. Creating a new bit vector

In bitvector.c, implement the function createBitVector(). You'll need to allocate the struct object, as well as allocating the vector (array) itself. Think about how to determine the number of bytes you'll need given the desired number of bits. Also, make sure the set is initially empty.

B. Destroying a bit vector

At the same time, implement the destroy() function. Just make sure you free everything that was allocated.

Compile bitvector.c and vectest.c and test. Right now vectest.c does not do very much, but test that everything compiles and does not crash.

C. Testing for containment

Now write contains(). This requires you to pick out the right byte offset from the vector pointer (ie, get the right unsigned char from the array), and then isolate the correct bit from from that byte. Uncomment the relevant section in vectest.c, compile, and run. The driver will print out the contents of the set.

D. Inserting

Write insert(). Now you will need to modify one of the bits (in one of the bytes). Think carefully about how to do this using bit operations. Notice the driver requires a two-byte vector, and it inserts 5 and 9--thus, one bit in the first byte and one bit in the second byte. Uncomment, compile, and test.

E. Removing

By now, removing and element shouldn't be that bad. (The function is called removeV() because one of the libraries we include already had a remove() function.)

4. Whole-set operations

One thing that sets apart the operations union, intersection, difference, and complement is that they do not modify their operand bit vectors, but rather create new ones. For each of these, you will have to allocate new bit vectors to represent the results.

A. Complement

We'll do complement first, since it needs only one operand. Make a new bit vector to return (same size as the operand), and make all its bits to be the opposite of the bits in the operand. For efficiency, don't do this one bit at a time. Do it for each byte as a whole using C's bit-wise negation operator, ~ In other words, loop through the bytes, setting each byte in the new bit vector to be the bit-wise negation of the equivalent byte in the old bit vector. Don't worry that this will also affect the unused bits in the last byte--- since they're unused, it won't do any harm to negate them as well.

Notice that you had to calculate the number of bytes again, based on the number of bits. When you do something twice, it's a sign there should be a separate function to calculate that. Write a function numBytes() that takes a number of bits and calculates the bytes required. Replace that calculation in complement() and createBitVector() with calls to that function.

Uncomment, compile, test.

B. Union

Next, implement union. This also can be done efficiently with a bitwise operator. (Why is the method called unionV()? Well, union is actually a reserved word in C. I called this function unionV() for "vector".) Uncomment, compile, test.

C. Intersection

Are you getting the hang of this? Intersection should be easy now.

D. Difference

Finally, compute set difference.

5. The Sieve of Eratosthenes

Now let's use this in a real application. The Sieve of Eratosthenes is a method for finding prime numbers. One makes a list of integers from 2 up to some specified largest number. We will cross off numbers as we find them not to be prime. Initially assume all numbers are prime, which is true at least for the first number in the list, 2. Then, starting with 2, repeatedly

Thus in the first iteration, we'll cross off every even number; in the second iteration, we'll cross off every multiple of 3; etc.

Write a program that uses one of your bit vectors to keep track of the numbers in the sieve. Your program should


Thomas VanDrunen
Last modified: Tue Nov 13 16:37:28 CST 2012