Lab 2: Instrumentation

The goal of this lab is to study the efficiency of sorting algorithms experimentally and to compare the results with the theoretical findings of complexity analysis.

1. Introduction

In engineering, instrumentation refers to the methods and techniques of measuring and controlling physical systems. Software engineering has its own methods for instrumenting applications and other pieces of software. As was mentioned last week, the two main attributes that software developers measure is time and space (ie, computer memory usage).

The main use of instrumentation in software development is in optimizing software for which we already had a working version (so, this would come after design, implementation, and testing in the software development process). Usually software developers are interested in finding parts of the application that would most benefit from optimization. To do this, parts of the program are monitored to keep track of how much time is spent (in a given method, for example), or how much memory is consumed (by a given class or group of classes).

Suppose an application contains a method a whose algorithm runs in O(n^3) time and a method b whose algorithm runs in O(n) time. From a theoretical analysis, it may seem like method a should be subject to closer inspection. However, after running experiments, a developer might discover that 80% of the application's running is spent in b (because it is called frequently with large amounts of data) but only 2% in a (because it is called rarely, and always with small amounts of data). These experimental results suggest that the developer's efforts would be much better spent improving b's complexity even a little bit than improving a's complexity a lot.

Similarly there are tools which monitor which which classes are most instantiated and in total take up the most memory-- to identify which classes could most use to be streamlined.

Last week we measured the number of comparisons that our methods made. This time we will use a more concrete measure: actual running time in milliseconds. While this might seem like the definitive measurement, some care needs to be taken to ensure that the data collected is useful-- something you will need to be thinking about as you conduct these experiments.

2. Set up

Make a directory for this lab. Copy the lot of sorting algorithms and helper programs from the course directory:

cp /cslab.all/ubuntu/cs245/lab2/* .

3. The tools

You have the following tools/pieces at your disposal for use in the experiments assigned below.

4. Experiments

Read all of the experimental questions below. Then pick two of them to experiment on (if you have time left over, then pick a third, just for fun). Write programs to conduct experiments to answer the questions. Write programs that actually automate the experiment. For example, if you decided to run selection sort on 10 arrays for each size 10, 50, 100, 500, 1000, and 5000, you might write something like

     int[] sizes = {10, 50, 100, 500, 1000, 5000};
     for (int i = 0; i < sizes.length; i++)
          for (int j = 0; j < 10; j++) {
             int[] array = SortUtil.createRandomArray(sizes[i]);
             SelectionArray.sort(array);
          }

Use your program also to do things like calculate averages and high/low and generate tables, where appropriate.

  1. Running time vs. comparisons. How good a predictor of running time is the number of comparisons for insertion sort? Do the number of comparisons and the running time increase with size at equivalent rates? At a given size, is the number of comparisons and the running time correlated? (Pick either arrays or lists to work with on this one).

  2. Arrays vs. lists. Pick a sorting algorithm. Which takes longer to sort, arrays or lists? Does size make a difference, or is one data structure faster for both big and small amounts of data?

  3. Theory vs. experiments. In theory, selection sort should run in O(n^2) time. Is this the case experimentally? (Pick either arrays or lists to work with on this one).

  4. Best case vs. worst case. The version of bubble sort that monitors whether or not a change has been made differs in complexity between best case and worst case. Is there much variance experimentally? (Experiment on a single size; choose either arrays or lists to work with.)

  5. Information vs. noise. If you sort the same array several times, is there any difference in the running time? (Pick a size, a sort, and either lists or arrays (the opposite of what you chose in question 4).) Be careful how you repeat this on the same array/list; you should first generate a random one, and the make a copy before each sort, so you'll always be sorting the same original array.

5. To turn in

For each experiment, turn in your code and a brief (paragraph-sized) write up of your methodology, results, and conclusions. Include a table and/or (if you think it illustrates the case for your conclusions) a graph.


Thomas VanDrunen
Last modified: Fri Jan 16 11:35:16 CST 2009