Project: Minimum spanning trees

1. Introduction

The goal of this project is to implement the two major algorithms to compute a minimum spanning tree for weighted graphs: Kruskal's Algorithm and Prim's Algorithm. You will also compare them using a simple experiment.

This project is also useful in that it allows you to see how other ADTs and data structures support algorithms and affect their running time.

2. Setup

Copy the give code from ~tvandrun/Public/cs345/mst and make an Eclipse project for it. You will find five packages: in addition to adt, impl, and test, which have their usual purpose, we also have alg for (externally implemented) algorithms over graphs and exper for an experiment.

Before going on, familiarize yourself with the structure of the various classes and interfaces and packages, since it is more complicated than in previous projects and I am giving you less guidance (although I will talk or have talked about it in class). Specifically understand how the two MST algorithms are encapsulated in classes that implement the MinSpanTree interface.

Note also that there is a simple class WeightedEdge to represent an edge. This is used in the result of minSpanTree, since it will represent the spanning tree as a set of edges.

3. Heaps, revisited

Both MST algorithms will rely on a heap-based priority queue. Naturally I'm not giving that code to you because it would expose a solution to an earlier project. Instead, you should use your solution to the Heaps Project to fill in the methods impl.Heap.heapify(), impl.HeapPriorityQueue.insert(), and imple.extractMax(). The two classes are given almost exactly as they were given in the prior project, so this is just a matter of copying your earlier code. But note two things:

HeapPriorityQueue has two new constructors, which will allow us to optimize just a bit in some circumstances. So don't copy your entire file from the earlier project over HeapPriorityQueue.java. You need to copy the appropriate method bodies individually.
Run test.HeapifyTest and HPQTest as a sanity check.

4. Kruskal's algorithm

Finish the method alg.KruskalMinSpanTree.minSpanTree(). The code for setting up the data structures is given to you. Specifically,

treeEdges is a set of edges, a subset of a minimum spanning tree that we continually add edges to (until it is a minimum spanning tree).
allEdges is a priority queue of edges. Kruskal's algorithm usually is described in terms of sorting the edges ahead of time. We won't do that exactly but will, equivalently, keep them in a priority queue using their weight as a priority. Repeatedly removing an edge from the priority queue is equivalent to iterating through a sorted list of edges.
vertexConnections is a disjoint set of vertices, representing the connected components. Initially each vertex is its own trivial set or component; finally all vertices are part of one big component.

What is left for you is to code up the algorithm---as we talked about in class---in this context. My solution was 9 lines long, including closing curly braces.

Test this using KMSTTest.

5. Prim's algorithm

Now turn your attention to PrimMinSpanTree.java. As with Kruskal's algorithm, the code for setting up the information structures is given to you. However, it will take some work on your part to understand them, and you must do this before attempting to complete the algorithm. Some guidance:

We keep track of a vertex's least known upper bound for cost of adding it to the set of edges with a simple class called VertexRecord. The important part of this class is that it provides a comparator which will be used to give a priority for vertices. Think of records, the array of VertexRecords, as satellite data for the vertices.
We keep the vertices in a priority queue pq---except it's not really true that we keep vertices, really we keep VertexRecords.
Similarly to what we had with Kruskal's algorithm, we keep a set of edges, mstEdges.
Finally, we keep track of each node's parent (if any, yet) in the minimum spanning tree with the parents array, another example of using an array for satellite data.

Fill in the rest. My solution had 14 lines.

Test this using PMSTTest.

6. Optimizing a heap priority queue to optimize Prim

As we talked about in class, the version of Prim from the previous section is handicapped by its reliance on linear search when testing for containment in a priority queue or increasing a key's priority. We could fix this if we had a faster way of knowing where an element is in the priority queue's underlying heap.

Solution: make the elements in a priority queue "heap aware", that is, let them all record their index. This brazenly breaks encapsulation, but sometimes you've got to do what you've got to do. Familiarize yourself with the interface impl.HeapPositionAware. Now we are going to write a new version of our heap priority queue that takes advantage of their element's awareness of their index. The other side of that is that our new, optimized heap priority queue must inform its elements when they are moved.

Finish the class impl.OptimizedHeapPriorityQueue. It has a private helper method set() that should be used instead of writing to an array index. It ensures that the elements are informed what index they are moving to.

This will involve writing the methods heapify() (OptimizedHeapPriorityQueue doesn't extend Heap--- it's all from scratch), insert(), and extractMax() yet again. However, it will be slightly different this time: make sure you use set() instead of writing directly to the array.

Test this using OptimizedHeapPrioirtyQueueTest.

7. Optimized Prim's algorithm

Almost there. Now finish the class alg.OptimizedPrimMinSpanTree with (pretty much) identical code to what you wrote in part 5. Now our records are HPAVertexRecords (ie, "heap position aware").

Test this using OPMSTTest

8. Experiments

Now read through expr.MSTExperiment to see how the experiment is set up. Run it. What do you observe?

You are encouraged to modify the experiment for further learning.

9. Turn in

Copy the files you modified (Heap, HeapPriorityQueue, KruskalMinSpanTree, PrimMinSpanTree, OptimizedHeapPriorityQueue, and OptimizedPrimMinSpanTree to your turn-in folder /cslab.all/linux/class/cs345/(your id)/mst .

To keep up with the course, this should be finished by Feb 17.

Thomas VanDrunen

Last modified: Wed Feb 10 10:08:20 CST 2016