For this lab, we're going to implement a few sorting algorithms and compare their running time on inputs of varying length. How does each algorithm do on small, medium, and large arrays?
It can be a tricky matter to determine for how long a program runs. A system, particularly one like romulus may be running many programs at once. Therefore, just using a stopwatch (that is, measuring what we call wall-clock time) in sufficient, since it will reflect not only how long a program took to run, but also how many other programs were running concurrently.
What we want to measure is the virtual time spent running your program. That is, if your program takes 100 seconds of wall-clock time, we want to know how much of that 100 seconds did the system spend running your program?
To obtain this measurement, we will use the time command. It can be applied to any command that you type at the command-line -- you simply prefix the command with time, like so:
time java Life X-pattern.init 8
The program will run and produce its normal output. However, at the end of the output, a line like the following will be appended:
0.390u 0.140s 0:00.67 79.1% 0+0k 0+0io 4802pf+0w
We are interested in the first four values (don't worry about the others), which have the following meanings:
User time: The amount of time your code spend running, given in seconds.
System time: The amount of time the operating system spent supporting your program, also given in seconds.
Wall clock time: The amount of real time that passed between the start and finish of your program, given as minutes:seconds.
CPU percentage: The fraction of the system dedicated to running your program (as opposed to other programs). In other words, the wall-clock time multiplied by this percentage should be the sum of the user and system times.
In our case, we will be particularly interested in user time. While the system time also matters, sorting algorithms should not rely on much work from the operating system (e.g. creating lots of new objects), so it should be small.
We should also beware timing resolution. Our utility only provides hundreths of a second (10-2). However, typical CPUs can perform an instruction in roughly one nanosecond (10-9). Therefore, a program that requires only 107 instructions will not even register as having taken any time. Ideally, the program will need to take substantially longer than (10-2) seconds for the measurement to have meaning.
You must write a program that creates an array of random integer values and then invokes one sorting algorithm on that array. Here are the specific requirements for this program:
It must be named SortingTest.
It must accept three command-line arguments:
Array size: The size of the array that the program should create and then sort.
Output array flag: If the sorted array should be output at the end of the program, the user must specify output here, and no-output otherwise.
Sorting algorithm: The sorting algorithm that should be used to sort the array, where the following choices are allowed:
nosort: Do not actually sort the array. We will use this option to factor out time spent on array creation.
insertion: Use insertion sort.
selection: Use selection sort.
bubble: Use bubble sort.
merge: Use merge sort.
counting: Use counting sort.
It must implement ONE of insertion, selection, or bubble sort. For the other two, it should print an error message and exit.
It must also implement BOTH merge and counting sort.
Given the array size provided by the command line arguments, it must create an array holding that many int values. It must then assign random int values into each array element.
After creating the array of random integer values, it must call the desired sorting algorithm.
If the user requested, at the command line, that the array be output, then the program must print the contents of each array element, providing one value per line of text, like so:
-15
-3
1
22
38
777
You must then run your program using the time command, using each of the implementing sorting algorithms, on a range of array sizes and no output, thus measuring how long each takes. You should plot the results of your timings for each algorithm (feel free to use a spreadsheet). You should start at a small array size and gradually increase the size (but not too gradually!) until the trend for each algorithm is clear.
What about the nosort option? Note that some time will be required to create the array and assign it random values. We don't want include that time in our measurements. Thus, for each array size, you should first measure the time required with the nosort ``algorithm''. That indicates how much time other runs, using real sorting algorithms, spent doing non-sorting work. Thus, you can subtract this time from the results for the real sorting runs, yielding only the sorting time.
Once you have written and tested SortingTest, you must use the cs12-submit command to turn in your code. If you plot your results by hand, then submit them by hand. If you use a spreadsheet, submit the spreadsheet file which should include a plot. Submit this assignment as lab-6 when you use the cs12-submit command.