Lab 1: Sorting

The goal of this lab is to practice writing sorting algorithms on arrays and to experiment with one metric for comparing the efficiency of different algorithms.

1. Introduction

There are many ways to compare algorithms and their implementations, to consider how efficiently they make use of resources. The main resources of interest are time and space (that is, computer memory), but occasionally other resources (such as network bandwidth) are of concern.

Furthermore, there are several ways to study an algorithm's efficiency with respect to a specific resource-- it can be studied experimentally, theoretically, and probabilistically; looking at best case, worst case, average case, and expected case; and with various degrees of precision. One crude (but not altogether useless) way to study the efficiency of sorting algorithms is to count the number of comparisons of data, since most (but not all...) sorting algorithms rearrange the data based on comparing pairs. In this lab you will experiment with several sorting algorithms and compare their efficiency based on the number of comparisons they need to make to sort arrays and lists.

2. Set up

Make a directory for this course (245) and one for this lab, and move into the lab1 directory. Copy the following files from the course public directory.

cp /homeemp/tvandrun/pub/245/lab1/* .

SortArray.java is a driver program. It will generate a random array of integers and run a sort method on it. If you use it in the following way on the command line

java SortArray Classname 20

It will look for a method with signature int sort(int[]) in class Classname to invoke, passing the random array. The number following the name of the class specifies the size of the array. If you leave this off, it will default to size 10. The sort method should return the number of comparisons it took to sort the array.

(The driver can also read an array from a file; this feature is described below.)

The driver will report the number of comparisons and, if the array has 20 or fewer entries, it will display it before and after sorting, for debugging purposes.

3. Selection sort

Open the file Selection.java in xemacs. The algorithm we derived in class is encapsulated in the method. However, the method does not do any counting of the comparisons. Your first task is to complete this method so that it does. Basically, add one variable to tabulate the comparisons, increment that variable for every comparison, and return it in place of "0" as is currently returned.

Specifically, we're interested in the number of times we compare two data from the array-- that is, the number of times the expression min > array[j] is evaluated. Expressions like i < array.length don't count because they don't compare items in the array.

Then compile and test the revised selection sort.

3. Other sorting algorithms

Next, add code to count comparisons in the other two sorting algorithms, and test them.

Time permitting: You have probably seen another sorting algorithm called bubble sort. While not a very good sort in terms of efficiency, it is easy to program and understand. This algorithm's strategy is to iterate through the array, swapping adjacent values that are out of order. It repeats this until the array is sorted. Make a new class for this sort and implement it from scratch, with comparison counting, so that it can be used with our driver.

4. Experiments

Now you will use this code to run systematic experiments to compare sorting algorithms. To make these experiments scientific and rigorous, you need to consider the variables in the system.

The variables you can control are

The permutedness of the array is also a variable. You also can choose either to let the permutedness vary (ie, don't control it) by allowing the driver to generate a random array each time, or you can make your own array and store it in a file, and sort the same array each time.

You can specify a file to read by using the -f flag with the driver. Executing

java SortArray Selection -f somearray

Will read data for the array from the file somearray. These data files should contain only integers (don't use punctuation to separate the numbers), but you may separate the numbers by either spaces or new lines, or a combination of both.

This last part of the lab is open-ended: Choose for yourself two or three questions to ask (for examples, "How does the number of comparisons in selection sort increase as the size of the array increases?" "For arrays of size 100, in which algorithm is there the biggest difference between best and worst case?" "Which algorithms improve their performance when given a pre-sorted or backwards-sorted array?") and design a set of experiments to address those questions. Then perform your experiments and record your results.

Suppose you wanted to address the first example ("How does the number of comparisons in selection sort increase as the size of the array increases?"). You might choose several specific sizes (say 10, 100, 1000, 10000), and then run selection sort 10 times at each size and average the number of comparisons at each size--- that way you can deal with the variation over several runs.

Finally, write a short report on your experiments, describing your methodology (precisely enough that someone else could replicate your experiments) and report on the results in a table I recommend typing the report as a text file in xemacs.

5. To turn in

Turn in hard copies of the files you wrote or modified, in addition to your report. The command to print files neatly, two to a page, is a2ps. Give this command the name of a printer using the -P flag (the printer in the computer science lab is called sp), and list all the files you want printed. You will probably want to execute the line

a2ps -P sp Selection.java Insertion.java Merge.java 

or

a2ps -P sp Selection.java Insertion.java Bubble.java Merge.java --file-align=fill

The flag --file-align=fill tells it not to start each file on a new page, so less paper is wasted.


Thomas VanDrunen
Last modified: Tue Sep 4 09:45:21 CDT 2007