Lab 2: Instrumentation

The goal of this lab is to study the efficiency of sorting algorithms experimentally and to compare the results with the theoretical findings of complexity analysis.

In engineering, instrumentation refers to the methods and techniques of measuring and controlling physical systems. Software engineering has its own methods for instrumenting applications and other pieces of software. The two main attributes that software developers measure is time and space (ie, computer memory usage).

The main use of instrumentation in software development is in optimizing software for which we already have a working version (so, this would come after design, implementation, and testing in the software development process). Usually software developers are interested in finding parts of the application that would most benefit from optimization. To do this, parts of the program are monitored to keep track of how much time is spent (in a given method, for example), or how much memory is consumed (by a given class or group of classes).

Suppose an application contains a method a whose algorithm runs in O(n^3) time and a method b whose algorithm runs in O(n) time. From a theoretical analysis, it may seem like method a should be subject to closer inspection. However, after running experiments, a developer might discover that 80% of the application's running is spent in b (because it is called frequently with large amounts of data) but only 2% in a (because it is called rarely, and always with small amounts of data). These experimental results suggest that the developer's efforts would be much better spent improving b's complexity even a little bit than improving a's complexity a lot.

Similarly there are tools that monitor program executions to determine which classes are instantiated most often and in total take up the most memory-- to identify which classes could most use to be streamlined.

Last week we measured the number of comparisons that our methods made. This time we will use a more concrete measure: actual running time in milliseconds. While this might seem like the definitive measurement, some care needs to be taken to ensure that the data collected is useful-- something you will need to be thinking about as you conduct these experiments.

You have the following tools/pieces at your disposal for use in the experiments assigned below.

Implementations of selection, insertion, bubble, and merge sort. For bubble sort, we have two versions: one that quits when it detects we have gone through the array without any change, and another that counts how many times we have gone through the array.
arrayUtil, a library that contains functions randomArray(), blankArray(), displayArray(), arrayFromFile(), getTimeMillis(), copyArray(), and isSorted().
The standard way to keep time in computer systems is for the system clock to track the number of milliseconds that have elapsed since midnight, January 1, 1970. The function getTimeMillis() reads from the system clock and reports that value.

In this lab, you will perform two experiments (or three, if time permits). You and your partner will propose a question to ask and an experiment which will answer that question, then write code to perform the experiments, run the experiments, and analyze the result. Here are the questions from which you should choose:

Running time vs. comparisons. How good a predictor of running time is the number of comparisons for insertion sort? Do the number of comparisons and the running time increase with size at equivalent rates? At a given size, is the number of comparisons and the running time correlated?
Theory vs. experiments. In theory, selection sort (for example) should run in O(n^2) time. Is this the case experimentally? Pick a sorting algorithm and compare how the running time increases with the input size with the rate predicted by the theory.
Best case vs. worst case. The version of bubble sort that monitors whether or not a change has been made differs in complexity between best case and worst case. Is there much variance experimentally? (Pick one algorithm and one size; time the algorithm on many random arrays of that size.)
Information vs. noise. If you sort the same array several times, is there any difference in the running time? Pick one sort and experiment. Be careful how you repeat this on the same array; you should first generate a random one, and the make a copy before each sort, so you'll always be sorting the same original array. '

Thomas VanDrunen

Last modified: Wed Jan 18 14:59:26 CST 2012