Lab 1: Evaluating sorting algorithms

The goal of this lab is to practice writing sorting algorithms on arrays and to experiment with one metric for comparing the efficiency of different algorithms.

1. Introduction

There are many ways to compare algorithms and their implementations, to consider how efficiently they make use of resources. The main resources of interest are time and space (that is, computer memory), but occasionally other resources (such as network bandwidth) are of concern.

Furthermore, there are several ways to study an algorithm's efficiency with respect to a specific resource-- it can be studied experimentally, theoretically, and probabilistically; looking at best case, worst case, average case, and expected case; and with various degrees of precision. One crude (but not altogether useless) way to study the efficiency of sorting algorithms is to count the number of comparisons of data, since most (but not all...) sorting algorithms rearrange the data based on comparing pairs. In this lab you will experiment with several sorting algorithms and compare their efficiency based on the number of comparisons they need to make to sort arrays.

2. Set up

Make a directory for this lab and move into it.

mkdir lab1
cd lab1

As in most labs and projects, I am giving you some code base to work on. Copy the following files from the course directory for this lab.

cp /cslab.all/ubuntu/cs245/lab1/* .

SortArray.java is a driver program that will allow you to test parts of your lab as you go along. Eventually you will write a couple of other programs with main() methods. If you use it in the following way on the command line

java SortArray Classname 20

It will look for a method with signature int sort(int[]) in class Classname to invoke, passing the random array. The number following the name of the class specifies the size of the array. If you leave this off, it will default to size 10. The sort method should return the number of comparisons it took to sort the array.

You do not need to look at the file SortArray.java; it uses some Java features that we won't even get to in this course.

(The driver can also read an array from a file; this feature will be described later.)

The driver will report the number of comparisons and, if the array has 20 or fewer entries, it will display the array itself before and after sorting, for debugging purposes.

3. Selection sort and Insertion sort on arrays

Open the file SelectionA.java in xemacs. The algorithm we derived in class is encapsulated in the method. However, the method does not do any counting of the comparisons. Your first task is to complete this method so that it does. Basically, add one variable to tabulate the comparisons, increment that variable for every comparison, and return it in place of "0" as is currently returned.

Specifically, we're interested in the number of times we compare two data from the array-- that is, the number of times the expression min > array[j] is evaluated. Expressions like i < array.length don't count because they don't compare items in the array.

Then compile and test the revised selection sort. It should already sort correctly; check that its number of comparisons looks reasonable.

Next we want to do the same thing to InsertionA.java, except that the sort() method isn't complete. Finish the method by writing the body of the inner loop (if you need review on how insertion sort works, see here, section 5. Also count and return the number of comparisons

4. Bubble sort on arrays

You have probably seen another sorting algorithm called bubble sort. While not a very good sort in terms of efficiency, it is easy to program and understand. This algorithm's strategy is to iterate through the array, swapping adjacent values that are out of order.

Clearly one pass through the array of this sort of swapping won't sort the array. Many passes are necessary to put all the elements in the right order. There are two ways to monitor repeated passes: First, one could keep track of whether any changes were made to the array (whether any actual swaps happened) on the current pass; if a pass completes without any swaps, then the array is sorted and we can quit. Second, we can observe the fact that after the first pass through the array, the largest element has made it all the way to the end, and so the next pass can stop one element short; the second pass will put the second largest element in the right place, and so the third pass doesn't need to examine the last two positions; an outer loop, therefore, can count down the ending point of the potentially unsorted portion of the array until that portion is empty.

The best version of Bubble sort would incorporate both of these ideas, but for our purposes, we would like to compare them against each other. Accordingly, complete the two classes BubbleA1 and BubbleA2 so that they implement these two versions of Bubble sort -- plus counting comparisons. Make sure your code both sorts correctly and gives a reasonable-looking report on the number of comparisons. (The algorithm in BubbleA2, second version, is completed for you; you need only to add the counting of comparisons. In BubbleA1, you need also to complete the algorithm.)

5. Experiments

Now we want to run some experiments to determine which algorithms require more or fewer comparisons to sort, and also how they may vary. We will conduct two sets of experiments; you will write two short programs (classes just with a main() method) to run these experiments.

A. Experiment 1: Vary the algorithm, vary the array.

In this experiment, we ask three questions:

We can easily address all three of these questions in the same experiment. Write a program with generates several (say, 5) arrays of the same size (say, 50 items). Then for each sorting algorithm, it sorts each array twice and displays the number of comparisons.

Use the methods from SortUtil to help. createRandomArray() will generate an array of a given size with random integers between 0 and 100. Make sure that when you repeat the sort of an array that you sort the original, unsorted sequence, not the sorted version. To do this, I recommend you first generate a "master" array and then make copies of it using SortUtil.copyArray() and sort the copy.

Your program should generate readable output, something like

Array 1:
Insertion: 500 500
Selection: 550 550
Bubble 1: 625 625
Bubble 2: 550 550

Array 2:
...

What do you observe?

B. Experiment 2: The growth of the algorithm

Another interesting range of questions is, how does the number of comparisons that an algorithm makes grow as the size of the array grows. If you give it an array twice as big, does it require twice as many comparisons--or perhaps four times as many comparisons?

Write a program that conducts this experiment: Loop through several sizes (for example, 10, 50, 100, 250, 500, 1000). For each size, for each algorithm, generate five random arrays, and find the average number of comparisons the algorithm makes when sorting those arrays.

When you have that data, see if you can find a pattern. If you have time, launch the OpenOffice.org Spreadsheet program and generate a graph that plots each algorithm's average number of comparisons versus array size; otherwise, do your best to eyeball it.

Can you guess what sort of functions these are? Is there an algorithm that grows most slowly (and therefore is the fastest)?

6. To turn in

Turn in hard copies of the files you wrote or modified. Also, run your experiment programs to show the results. Finally, write a short report (one paragraph for each experiment) describing your conclusions.

The command to print files neatly, two to a page, is a2ps.


Thomas VanDrunen
Last modified: Tue Jan 13 09:00:07 CST 2009