Project 3: Sorting arrays

The goal of this lab is to implement basic sorting algorithms.

1. Introduction

Sorting is a fundamental problem in computer science. Not only do many applications require putting data in order, but also sorting is a good case study for problems and algorithms. There are many ways to sort a set of data--- some ways are more intuitive (that is, easier for a human to understand) than others, some are faster than others, and some require more computer memory than others. In this project you will implement four algorithms to sort arrays of integers.

To get used to the project, you may find it helpful to try sorting numbers by hand. For example, take a pile of index cards and write a number on each of them. Then sort the pile. Pay careful attention to the steps you take. How would you instruct another person to sort a pile of cards using the same process that you used? In this lab you will think about how a computer should sort and array of numbers.

2. Setup

As, usual, move into your cs235 directory, make a directory for this assignment, and change into it.

cd cs235
mkdir proj3
cd proj3

Then copy the file Sort.java from the class directory.

cp /homeemp/tvandrun/pub/235/Sort.java .

You will not need to modify this file at all; in fact, very little of it will you be able to understand (but just for the fun of it, open it in xemacs and look it over; marvel at all the code you will be able to understand--- and write yourself--- some day if you stick with it). What it does is test drive a sort algorithm by creating an array of size 10 randomly filled with integers from 0 to 99. Compile it, but don't bother running it yet.

You will write the implementation of sorting algorithms each in its own file.

3. Bubble sort

The algorithm called Bublle sort is not a particularly efficient algorithm (in fact, it's one of the worst), but it is very easy to understand and to program, so we will start with it. The idea is to iterate through the array and to compare adjacent elements, swapping them if they are out of order. Consider this example on an array containing the values 33, 11, 44, 22.

In the left column, we see what happens during one pass through the array. 33 is compared with 11; since they are out of order, they are swapped. Next, 33 is compared with 44; since they are in order, they are not swapped. Finally, 44 is compared with 22; since they are out of order, they are swapped. 44, the largest number, has made it all the way to the end of the array, so it is in the right place. The shaded are of the array is still unsorted, but the white area is sorted.

Notice that one pass through the array is not enough. In fact, we've placed only a single element in the right place. The right column shows the state of the array after successive rounds of the process we just described. If we pass through the entire array, we will place 33 in the second-to-last position. If we pass through again, we will place 22 in the third-to-last position.

Notice that this requires a loop within a loop. You will need to write an inner loop which conducts the pass over the array and an outer loop which repeats those passes.

Copy the file Bubble.java from the class directory.

cp /homeemp/tvandrun/pub/235/Bubble.java .

Open it, and notice that it does not have a main method. Instead it has a sort method, which you will fill in. All you need to know is that inside the sort method (where you see "insert code here"), there is an array if integers called array. (Do not write a main method. The driver will take care of that.)

Compile the file as usual. To run it, do

java Sort Bubble

This will print the array before and after your sort runs.

Some tips on doing this part:

Write the inner loop first, and test just that. Obviously it will not sort the array correctly, but you can tell if it is working because it will move the largest item to the last position.
Notice that if there are n items in the array, there are n - 1 adjacent pairs of items. How will this affect your inner loop?
Notice that after you have correctly placed n-1 items, the final, smallest item is already in place. How will this affect your outer loop?
Notice that since the first run of the inner loop moves the largest item all the way to the back, you will not have to pass all the way to the end next time around (because you do not need to compare anything with the last item; it's already in the right place). Likewise, on the third time around, you will not have to check either of the last two items, because they're both correctly placed. How will this affect your inner loop? This is a matter of efficiency, so you may want to get the program working first without worrying about this and then come back to see if you can improve it.

4. Selection Sort

To set up this part, copy the file Selection.java from the course directory.

cp /homeemp/tvandrun/pub/235/Selection.java .

Selection sort works this way. Suppose again than you have a pile of cards in your hand, each with a number on it, which you would like to set in order. You might try flipping through the pile searching for the smallest. Once you find the smallest number, you could remove its card and set it in a new pile by itself--the new, sorted pile. Then you would flip through the original pile to find the next smallest number, and put its card next on the pile. By continuing this process, eventually you will transfer all the cards from the old, unsorted pile to the new, sorted pile.

Now think about how to do that in a computer program on arrays. The array is like a pile. The obvious, analogical way to do selection sort would be to create a second array (standing for the new pile) and repeatedly copy values from the old array to the new array, starting from smallest and continuing to the largest. However, to be efficient, we want to avoid making that extra array. Here's how:

Assume that the sorted part comes first and the unsorted part is the rest. Then we search the array looking for the smallest number. When we find it, we swap that value with the value in the first position in the unsorted part. That way the sorted part of the array grows by one element and the unsorted shrinks by one element. Follow how it works in this illustration; the unsorted portion of the array is in gray.

Initially everything is unsorted. We identify 11 as the smallest and swap it with the first unsorted value, 33. Now 11 is in the correct position, so the sorted portion of the array is the first position. Next we find 22 as the smallest element in the unsorted part. We swap it with the first unsorted position, and so the sorted portion grows by one. Eventually, all elements are sorted.

Follow these steps to write a selection sort program:

Write a loop that simply finds the smallest value and prints it out. Since finding a smallest value is part of selection sort, this is a way to break down the problem--- plus it's review.
Next change what you've done so that your program does not ever record the smallest value, but instead records (and prints) the position (ie, index) of the smallest value.
Finally, the loop you have written so far must become the inner loop of the larger algorithm. Write the outer loop, which iterates over the array and, at each iteration, finds the position of the smallest unsorted value and swaps it with the value at the current position (notice that for each iteration of the outer loop, the current position is the first position in the unsorted section). Hint: you will have to change the starting index of the inner loop for this to work.

Compile, test, run, and when you're confident it's working right, move on.

5. Insertion sort

To set up this part, copy the file Insertion.java from the course directory.

cp /homeemp/tvandrun/pub/235/Insertion.java .

Insertion sort works this way: If you were using insertion sort over a pile of cards, you would place cards in a new, sorted pile. You would take the next card on the unsorted pile and seach for the correct position in the sorted pile. For example, if you had cards with numbers 33, 11, 44, 22, initially they would all be in the unsorted pile. First, you pick up the card on top of that pile, 33, and place it in a new, sorted pile. Since 33 is all by itself, it is "sorted." Next you pick the next card in the pile, 11. You put 11 into the correct position in the sorted pile by putting it on top of the 33. Next, pick up 44. Since it is larger than 11 and 33, you put it under the 33. Finally, you pick up 22, and since it is between 11 and 33, you insert it between those cards in the pile. In the end, the unsorted pile is empty, and the sorted pile contains 11, 22, 33, 44.

For efficiency, we do not use two arrays, as you would use two piles. Instead, again consider the array to have two parts at any moment in the computation, a sorted part and an unsorted part. Initially, the sorted part is empty and the unsorted part is the entire array.

So, insertion sort maintains sorted and unsorted sections of the same array, and the sorted section grows at each step and eventually the whole array is sorted. It takes the first element in the unsorted section, and then shifts over part of the sorted section to make room for it. Consider this illustration. The left side shows a "big-step" view; the right side shows the same process in slow-motion.

Notice that in the "slow motion" version of the illustration, the way we move numbers over to make room is that we actually shift the new number over one space at a time (by swapping, like you did for Bubble sort) until it is in the right position.

Code insertion sort in your file and test it.

6. Merge Sort

The final sort works this way: Suppose you have an unsorted pile of cards in your hand. First you split that pile up into two smaller piles. Then you sort each pile. Finally, you merge the two smaller (now sorted) piles into a full, sorted pile. This raises the question, how do we sort the smaller piles? The answers is-- the same way we sort the big one: we split those up each into two even smaller piles, sort those smaller piles, and merge them into a sorted small pile. We repeat the process (recursively) until we get to a pile of size 1, which is trivially sorted.

We break down the task of writing this sort into a few manageable pieces. This sort is more efficient than the other three sorts in its use of time (ie, it's faster), but less efficient in its use of space, since this is not an "in-place" sort. It will use helper arrays to store some of the data temporarily.

First, copy the file for this sort as you did in the other parts of this project.

cp /homeemp/tvandrun/pub/235/Merge.java .

We break down the task of writing this sort into a few manageable pieces-- each "piece" will involve writing a method. You should test these pieces individually by writing a main method for this file to test the individual methods. You will not be graded for the main method (in fact, you may delete it before turning the project in), but you will have an easier time with this if you test the pieces as you go. For example, the first piece is to write a method that will make an array from a portion of another array; you should test this by making an array, printing it out, calling the method you've written on that array, and printing out what the method returns.

A. Making a subarray

Recall the String method substring(). Now we want to write a method that works similarly for arrays. Fill-in the body for the method subArray(). It should take an array, and starting index (inclusive), and a stopping index (exclusive); it should create a new array the size of the range specified (stop - start) and copy the elements from the given array to the new array; and return the new array. After you have written this, document and test.

B. Merging two sorted arrays

Next, we want to write a method that takes two arrays---which it assumes to be already sorted--- and creates a new array, the size of the two given arrays combined--- and fills the new array with the elements from the two. For example, given {2, 6, 8, 9} and {3, 4, 5, 10}, it should return an array {2, 3, 4, 5, 6, 8, 9, 10}. This is tricky. Think carefully about this, fill-in the body of merge(), and test.

C. The merge sort algorithm

Now, we want to implement the actual merge sort algorithm. This will be in the body of mergeSort(). This method receives and array; it should (1) split the array into two arrays representing its halves (using subarray()) (2) sort each half (using mergeSort()) recursively (3) merge the result of each half together using merge() (4) return the result of merge().

D. Making it work as if it were in-place

The problem with what we have so far is that our driver program expects the sort methods to be in-place sorts. It just passes an array and expects the method to mutate it. We can simulate this by writing the sort() method so that it (1)calls mergeSort() on the given array, storing the result in another variable (2) copying the elements from the result array back into the given array. Fill-in sort() so that it works that way.

7. Turn in

Create the script file as before (cat, rm, compile, and run). Run each sort a couple of times.

 > a2ps -P sp (the name of the script file)

DUE: Friday, Feb 16, at 5:00 PM.

Thomas VanDrunen

Last modified: Mon Oct 9 11:05:43 CDT 2006