Project: Minimum spanning trees

1. Introduction

The goal of this project is to implement the two major algorithms to compute a minimum spanning tree for weighted graphs: Kruskal's Algorithm and Prim's Algorithm. You will also compare them using a simple experiment.

This project is also useful in that it allows you to see how other ADTs and data structures support algorithms and affect their running time.

This project correspods to the material in Section 3.4, which doesn't yet have a project in it.

2. Setup

Copy the give code from ~tvandrun/Public/cs345/mst and make an Eclipse project for it. You will find five packages: in addition to adt, impl, and test, which have their usual purpose, we also have alg for (externally implemented) algorithms over graphs and exper for an experiment.

Before going on, familiarize yourself with the structure of the various classes and interfaces and packages, since it is more complicated than in previous projects and I am giving you less guidance (although I will talk or have talked about it in class). Specifically understand how the two MST algorithms are encapsulated in classes that implement the MinSpanTree interface.

Note also that there is a simple class WeightedEdge to represent an edge. This is used in the result of minSpanTree, since it will represent the spanning tree as a set of edges.

3. Heaps, revisited

Both MST algorithms will rely on a heap-based priority queue. Naturally I'm not giving that code to you because it would expose a solution to an earlier project. Instead, you should use your solution to the heap project: replace the given impl/Heap.java and impl/HeapPriorityQueue.java with the ones you wrote for that project. Then run test.HeapTest and test.HPQTest as sanity checks.

4. Kruskal's algorithm

Finish the method alg.KruskalMinSpanTree.minSpanTree(). The code for setting up the data structures is given to you. Specifically,

treeEdges is a set of edges, a subset of a minimum spanning tree that we continually add edges to (until it is a minimum spanning tree).
allEdges is a priority queue of edges. Kruskal's algorithm usually is described in terms of sorting the edges ahead of time. We won't do that exactly but will, equivalently, keep them in a priority queue using their weight as a priority. Repeatedly removing an edge from the priority queue is equivalent to iterating through a sorted list of edges.
vertexConnections is a disjoint set of vertices, representing the connected components. Initially each vertex is its own trivial set or component; finally all vertices are part of one big component.

What is left for you is to code up the algorithm---as we talked about in class---in this context. My solution was 9 lines long, including closing curly braces.

Test this using KMSTTest.

5. Prim's algorithm

Now turn your attention to PrimMinSpanTree.java. As with Kruskal's algorithm, the code for setting up the information structures is given to you. However, it will take some work on your part to understand them, and you must do this before attempting to complete the algorithm. Some guidance:

We keep track of a vertex's least known upper bound for cost of adding it to the set of edges with a simple class called VertexRecord. The important part of this class is that it provides a comparator which will be used to give a priority for vertices. Think of records, the array of VertexRecords, as satellite data for the vertices.
We keep the vertices in a priority queue pq---except it's not really true that we keep vertices, really we keep VertexRecords.
Similarly to what we had with Kruskal's algorithm, we keep a set of edges, mstEdges.
Finally, we keep track of each node's parent (if any, yet) in the minimum spanning tree with the parents array, another example of using an array for satellite data.

Fill in the rest. My solution had 14 lines.

Test this using PMSTTest.

6. Optimizing a heap priority queue to optimize Prim

As we talked about in class, the version of Prim from the previous section is handicapped by its reliance on linear search when testing for containment in a priority queue or increasing a key's priority. We could fix this if we had a faster way of knowing where an element is in the priority queue's underlying heap.

Solution: make the elements in a priority queue "heap aware", that is, let them all record their index. This brazenly breaks encapsulation, but sometimes you've got to do what you've got to do. Familiarize yourself with the interface impl.HeapPositionAware. Now we are going to write a new version of our heap priority queue that takes advantage of their element's awareness of their index. The other side of that is that our new, optimized heap priority queue must inform its elements when they are moved.

Inspect the class impl.OptimizedHeapPriorityQueue. It extends HeapPriorityQueue mainly by overriding swap(). You don't need to modify this class, but it will expose whether you wrote HeapPriorityQueue properly---that is, using swap() to move keys around instead of manipulating the array directly. The new version of swap() uses a method set() that informs a (heap-aware) key of its new position.

Test this using OptimizedHeapPrioirtyQueueTest. (Don't skip this test just because you didn't write anything. This will test code you did write indirectly.)

7. Optimized Prim's algorithm

Almost there. Now finish the class alg.OptimizedPrimMinSpanTree with (pretty much) identical code to what you wrote in part 5. Now our records are HPAVertexRecords (ie, "heap position aware").

Test this using OPMSTTest

8. Experiments

Now read through expr.MSTExperiment to see how the experiment is set up. Run it (without assertions enabled.) What do you observe?

You are encouraged to modify the experiment for further learning.

9. Turn in

Copy the files you modified (Heap, HeapPriorityQueue, KruskalMinSpanTree, PrimMinSpanTree, and OptimizedPrimMinSpanTree to your turn-in folder /cslab/class/cs345/(your id)/mst .

To keep up with the course, this should be finished by Feb 16.

Thomas VanDrunen

Last modified: Tue Jan 31 14:18:10 CST 2017