Lab 2:Shell sort, merge sort, and experimentation

The goal of this lab is to give you more practice programming in C and working with the C compiler; to demonstrate two other sorting algorithms; and to to study the efficiency of sorting algorithms experimentally, comparing the results with the theoretical findings of algorithmic analysis.

We will be working with five sorting algorithms. You will be given the code for three of them (selection, insertion, and bubble), and you will need to implement two others (described below). Then we will run experiments on the sorting algorithms and observe their relative performance.

Part 1: Shell sort

Your first task is to implement yet another sorting algorithm, called Shell sort. Suppose we have the array

49 7 83 22 8 45 72 91 22 80 53 88 43 29 14 35 55 24 37 84

First consider the items separated by 7 spaces, starting at 0.

49 7 83 22 8 45 72 91 22 80 53 88 43 29 14 35 55 24 37 84

Sort them.

14 7 83 22 8 45 72 49 22 80 53 88 43 29 91 35 55 24 37 84

The items separated by 7 starting at 1 are already sorted.

14 7 83 22 8 45 72 49 22 80 53 88 43 29 91 35 55 24 37 84

Sort the next bunch.

14 7 83 22 8 45 72 49 22 80 53 88 43 29 91 35 55 24 37 84
14 7 55 22 8 45 72 49 22 80 53 88 43 29 91 35 83 24 37 84

Keep doing that until all the array slices with gap 7 are sorted. The actual sorting can be done using a modified insertion sort. Then, we decrease the gap and repeat the process, say sorting all the slices with gap 3. Finally, sort with gap 1, which is just insertion sort, except that this should be close to the best case for insertion sort because the items by now are nearly sorted.

(It might be tempting to call these sections "shells" and pretend that's where the name of the sort comes from. Actually the algorithm was invented by someone named Donald Shell.)

Part 2: Merge sort and insertion sort

Is is to be hoped that you have seen merge sort and insertion sort before, in Programming I or whatever prior experience you have. In brief the merge sort algorithm sorts by

The recursive structure is pretty simple, once you're able to wrap your mind around recursion in general. To use the the "pile of cards" analogy, suppose you take an unsorted pile of cards. Split the pile in two halves. Take a nap; wake up to find the two halves each sorted. Now you need to collate those two sorted halves. Start a new sorted pile (initially empty). Take the top card from each half-pile and add the smaller one to the new sorted pile. Repeat until all the cards have been moved to the unified sorted pile.

The tricky part of all this is getting the merging part right, but we did that in class on Friday.

Insertion sort, as we mentioned briefly on Wednesday, is analogous to selection sort in that it also maintains a sorted section and an unsorted section, but instead of finding the smallest value in the unsorted section in placing it at the end of the sorted section, insertion sort takes the first value of the unsorted section and places it in the right place in the sorted section.

In this second part of the lab, you will be given partially written implementations of merge sort and insertion sort. Your task is to finish them.

Part 3. Experiments

Finally we are going to compare these two sorting algorithms with the other three we have looked at already (selection, insertion, and bubble) using experiments: we run the algorithms on some sample arrays and see how well they do.

I am providing a library of utility functions to help working with arrays. These functions will do things like populate an array with random values, copy the contents from one array to another, etc. The library is found in files array_util.h and array_util.c.

There are several ways we could use to measure their runtime performance. We will use two: counting the number of comparisons and timing how long they take.

To count the number of comparisons, we need to add code to the sorting algorithms to implement a counter that will be incremented every time we compare two elements of the array we're sorting. (We won't count comparisons of indices, such as i < n.)

To measure their running time, we will need to read from the computer's clock and simply time them. The standard way to keep track of time on a computer is the number of milliseconds since midnight, Jan 1, 1970. array_util.h provides a function get_time_millis() which reads the current time from the clock. Thus we can time a call of selection sort by doing the following:

      fore = get_time_millis();
      selectionSort(copy, sizes[i]);
      aft = get_time_millis();

Then aft - fore is the number of milliseconds it took.

In our experiment, we will use counting comparisons for small arrays (because the number of milliseconds would be too small) and real time for large arrays (because the number of comparisons would be too large). I am providing the code from running the experiment, but you will need to code up your own experiments in an expanded version of this exercise in Project 1.

I, read the pre-lab

Thomas VanDrunen
Last modified: Wed Jan 18 14:59:26 CST 2012