Lab 2:Shell sort, merge sort, and experimentation

The goal of this lab is to give you more practice programming in C and working with the C compiler, to demonstrate two other sorting algorithms, and to to study the efficiency of sorting algorithms experimentally, comparing the results with the theoretical findings of complexity analysis.

We will be working with five sorting algorithms. You will be given the code for three of them (selection, insertion, and bubble), and you will need to implement two others (described below). Then we will run experiments on the sorting algorithms and observe their relative performance.

Part 1: Shell sort

Your first task is to implement yet another sorting algorithm, called Shell sort. Suppose we have the array

49 7 83 22 8 45 72 91 22 80 53 88 43 29 14 35 55 24 37 84

First consider the items separated by 7 spaces, starting at 0.

49 7 83 22 8 45 72 91 22 80 53 88 43 29 14 35 55 24 37 84

Sort them.

14 7 83 22 8 45 72 49 22 80 53 88 43 29 91 35 55 24 37 84

The items separated by 7 starting at 1 are already sorted.

14 7 83 22 8 45 72 49 22 80 53 88 43 29 91 35 55 24 37 84

Sort the next bunch.

14 7 83 22 8 45 72 49 22 80 53 88 43 29 91 35 55 24 37 84

14 7 55 22 8 45 72 49 22 80 53 88 43 29 91 35 83 24 37 84

Keep doing that until all the array slices with gap 7 are sorted. The actual sorting can be done using a modified insertion sort. Then, we decrease the gap and repeat the process, say sorting all the slices with gap 3. Finally, sort with gap 1, which is just insertion sort, except that this should be close to the best case for insertion sort because the items by now are nearly sorted.

(It might be tempting to call these sections "shells" and pretend that's where the name of the sort comes from. Actually the algorithm was invented by someone named Donald Shell.)

Part 2: Merge sort

Is is to be hoped that you have seen merge sort before, in Programming I or whatever prior experience you have. In brief the algorithm sorts by

dividing the array in half
sorting each half (recursively)
merging the two sorted halves together into a sorted array

The recursive structure is pretty simple, once you're able to wrap your mind around recursion in general. To use the the "pile of cards" analogy, suppose you take an unsorted pile of cards. Split the pile in two halves. Take a nap; wake up to find the two halves each sorted. Now you need to colate those two sorted halves. Start a new sorted pile (initially empty). Take the top card from each half-pile and add the smaller one to the new sorted pile. Repeat until all the cards have been moved to the unified sorted pile.

The tricky part of all this is getting the merging part right.

3. Experiments

Finally we are going to compare these two sorting algorithms with the other three we have looked at already (selection, insertion, and bubble) using experiments: we run the algorithms on some sample arrays and see how well they do.

I am providing a library of utility functions to help working with arrays. These functions will do things like populate an array with random values, copy the contents from one array to another, etc. The library is found in files array_util.h and array_util.c.

There are several ways we could use to measure their runtime performance. We will use two: counting the number of comparisons and timing how long they take.

To count the number of comparisons, we need to add code to the sorting algorithms to implement a counter that will be incremented everytime we compare two elements of the array we're sorting. (We won't count comparions of indices, such as i < n.)

To measure their running time, we will need to read from the computer's clock and simply time them. The standard way to keep track of time on a computer is the number of milliseconds since midnight, Jan 1, 1970. array_util.h provides a function get_time_millis() which reads the current from the clock. Thus we can time a call of selection sort by doing the following:

      fore = get_time_millis();
      selectionSort(copy, sizes[i]);
      aft = get_time_millis();

Then aft - fore is the number of milliseconds it took.

In our experiment, we will use counting comparisons for small arrays (because the number of milliseconds would be too small) and real time for large arrays (because the number of comparisons would be too large). I am providing the code from running the experiment, but you will need to code up your own experiments in an expanded version of this exercise in Project 1.

Thomas VanDrunen

Last modified: Wed Jan 18 14:59:26 CST 2012