Lab 12: Implementing sets with bit vectors

The goal of this lab is to practice using bit operations; in this case, we will use an ordered collection of bits to represent a dynamic set.

1. Introduction

A dynamic set is a way of containing data similar to a mathematical set except that it can change, that is, be updated. The most familiar implementation is Java's HashSet. Specifically, a dynamic set is an unordered collection of uniform-typed items with the operations

insert a new item into the set
remove an item from the set
test to see if the set contains a given item

(It is worth pointing out what's dynamic about a dynamic set. In the mathematical concept of set, a set cannot change. If you have a set, say A = { 1, 4, 5}, and union to it, say B = {2, 4, 6}, you produce a new set, call it C = {1, 2, 4, 5, 6}; you do no make any change to the set A, just as adding 7 to 5 makes 12--- it doesn't change "5". A dynamic set, on the other hand, is a mutable data structure.)

Some applications of dynamic sets require a few other additional operations like the operations on mathematical sets:

create a set that is the union of two others
create a set that is the intersection of two others
create a set that is the difference of two others
create a set that is the complement of another

Our specific task is to implement sets of numbers, subsets of the set { 0, 1, ... n}. Here's the interface if we were writing in Java:

public interface NSet {

    boolean contains(int i);
    void insert(int i);
    void remove(int i);
    NSet complement();
    NSet union(NSet other);
    NSet intersection(NSet other);
    NSet difference(NSet other);

}

and an implementation using an array of booleans:

public class InefficientSet implements NSet {

    private boolean[] array;

    public InefficientSet(int size) { array = new boolean[size]; }

    public boolean contains(int i ) { return array[i]; }
    public void insert(int i) { array[i] = true; }
    public void remove(int i) { array[i] = false; }

    public NSet complement() {
        InefficientSet toReturn = new InefficientSet(array.length);
        for (int i = 0; i < array.length; i++)
            toReturn.array[i] = !array[i];
        return toReturn;
    }

    public NSet union(NSet other) {
        InefficientSet toReturn = new InefficientSet(array.length);
        for (int i = 0; i < array.length; i++)
            toReturn.array[i] = array[i] || other.contains(i);
        return toReturn;
    }


    public NSet intersection(NSet other) {
        InefficientSet toReturn = new InefficientSet(array.length);
        for (int i = 0; i < array.length; i++)
            toReturn.array[i] = array[i] && other.contains(i);
        return toReturn;
    }

    public NSet difference(NSet other) {
        InefficientSet toReturn = new InefficientSet(array.length);
        for (int i = 0; i < array.length; i++)
            toReturn.array[i] = array[i] &&  ! other.contains(i);
        return toReturn;
    }

}

We, however, are going to use bit operations to make a very fast and space-efficient implementation of a dynamic set. This works only under specific circumstances:

The elements of the set (or sets) must be drawn from a pre-determined, finite (and presumably small) universe
The elements of the universe are (or can be represented by) a contiguous range of integers, as we're assuming.

The main idea is simple. For each dynamic set we keep a sequence of bits numbered from 0 to n. If the ith bit is set to true or 1, that indicates that i is in the set; false or 0 indicates that it is not.

Conceptually we want an array of bits, or as it is traditionally called, a bit vector. We can't literally use a traditional array because we don't have addresses for a single bit. We could make an array of, say, chars and use only one bit from each array location, but that would use 8 times as much memory as we really need. Instead, we will employ the bit manipulation operations we learned in class to implement a bit vector, which will then be used to implement a dynamic set.

Last modified: Thu Oct 20 10:21:20 CDT 2011