Lab 2: Instrumentation

The goal of this lab is to give you more practice programming in C and working with the C compiler, to demonstrate another sorting algorithm, and to to study the efficiency of sorting algorithms experimentally, comparing the results with the theoretical findings of complexity analysis.

1. Set up

Make a directory for this lab and move into it. Copy given the code from the course directory.

mkdir lab2
cd lab2
cp /homes/tvandrun/Public/cs245/lab2/* .

You'll notice that this looks like the same set of files as last week. However, I've made some changes to them.

In addition to the tools from arrayUtil, I'm giving you the code for the sorting algorithms we've already studied.

2. Finishing Merge sort

In sorts.c, you'll find an unfinished version of merge sort based on what we saw in class yesterday. Finish this. What is left to do is the merging of the sorted sub arrays. I have suggested some parts of the implementation by declaring and initializing a variable aux for an auxiliary array.

Your implementation should also count comparisons, as your sorts last week did.

Compile this, including compiling sDriver so you can test this using

./sDriver merge

3. Shell sort

Your next task implement yet another sorting algorithm, called Shell sort. Like merge sort, Shell sort works by sorting sections of the array in isolation. However, the "sections" used by Shell sort are not contiguous. It works like this. Suppose we have the array

49 7 83 22 8 45 72 91 22 80 53 88 43 29 14 35 55 24 37 84

First consider the items separated by 7 spaces, starting at 0.

49 7 83 22 8 45 72 91 22 80 53 88 43 29 14 35 55 24 37 84

Sort them.

14 7 83 22 8 45 72 49 22 80 53 88 43 29 91 35 55 24 37 84

The items separated by 7 starting at 1 are already sorted.

14 7 83 22 8 45 72 49 22 80 53 88 43 29 91 35 55 24 37 84

Sort the next bunch.

14 7 83 22 8 45 72 49 22 80 53 88 43 29 91 35 55 24 37 84
14 7 55 22 8 45 72 49 22 80 53 88 43 29 91 35 83 24 37 84

Keep doing that until all the array slices with gap 7 are sorted. The actual sorting can be done using a modified insertion sort. Then, we decrease the gap and repeat the process, say sorting all the slices with gap 3. Finally, sort with gap 1, which is just insertion sort, except that this should be close to the best case for insertion sort because the items by now are nearly sorted.

(It might be tempting to call these sections "shells" and pretend that's where the name of the sort comes from. Actually the algorithm was invented by someone named Donald Shell.)

Implement Shell sort using the stub in sorts.c You can use your code from insertion sort as a starting point. You'll need to wrap it in two more loops: one to iterate through the various starting points for the current gap, and an outermost one to iterate through the gaps.

In theory, any decreasing sequence can be used for the gaps, but the last gap must be 1. Also, make sure that you use gap 1 only once (or you'll probably end up with an infinite loop) and don't use gap 0.

Your implementation should also count comparisons.

Compile and test. Make sure you test not only one arrays of size 10 but also on large arrays.

4. Experiments

Read all of the experimental questions below. Then pick two of them to experiment on (if you have time left over, then pick a third, just for fun). Write programs to conduct experiments to answer the questions. Write programs that actually automate the experiment. For example, if you decided to run selection sort on 10 arrays for each size 10, 50, 100, 500, 1000, and 5000, you might write something like

     int i, j;
     int* array;
     int sizes = {10, 50, 100, 500, 1000, 5000};
     for ( i = 0; i < 6; i++)
          for (j = 0; j < 10; j++) 
           {
             array = randomArray(sizes[i]);
             selectionSort(array);
          }

Use your program also to do things like calculate averages and high/low and generate tables, where appropriate.

  1. Running time vs. comparisons. How good a predictor of running time is the number of comparisons for insertion sort? Do the number of comparisons and the running time increase with size at equivalent rates? At a given size, is the number of comparisons and the running time correlated?

  2. Theory vs. experiments. In theory, selection sort (for example) should run in O(n^2) time. Is this the case experimentally? Pick a sorting algorithm and compare how the running time increases with the input size with the rate predicted by the theory.

  3. Best case vs. worst case. The version of bubble sort that monitors whether or not a change has been made differs in complexity between best case and worst case. Is there much variance experimentally? (Pick one algorithm and one size; time the algorithm on many random arrays of that size.)

  4. Information vs. noise. If you sort the same array several times, is there any difference in the running time? Pick one sort and experiment. Be careful how you repeat this on the same array; you should first generate a random one, and the make a copy before each sort, so you'll always be sorting the same original array.

5. To turn in

For each experiment, turn in your code and a brief (paragraph-sized) write-up of your methodology, results, and conclusions. Include a table and/or (if you think it illustrates the case for your conclusions) a graph.


Thomas VanDrunen
Last modified: Tue Jan 18 14:01:12 CST 2011