PRINCIPLES OF ALGORITHM ANALYSIS
Sedgewick: Chapter 2
1
COMP1927 Principles of Algorithm
Analysis
PROBLEMS, ALGORITHMS, PROGRAMS AND
PROCESSES
Problem: A problem that needs to be solved
Algorithm: Well defined instructions for completing
the problem
Program: Implementation of the algorithm in a
particular programming language
Process: An instance of the program as it is being
executed on a particular machine
ANALYSIS OF SOFTWARE
What makes software "good"?
returns
expected result for all valid inputs
behaves "sensibly" for non-valid inputs
clear code, easy to maintain/modify
interface is clear and consistent (API or GUI)
returns results quickly (even for large inputs)
Ie. It is efficient
We may sometimes also be interested in other measures
memory/disk space, network traffic, disk IO etc
ALGORITHM EFFICIENCY
The algorithm is by far the most important
determinant of the efficiency of a program
Small speed ups in terms of operating systems,
compilers, computers and implementation details
are irrelevant
May give small speed ups but usually only by a small
constant factor
DETERMINING ALGORITHM EFFICIENCY
At the design stage
Theoretical approach - complexity theory
After the testing stage
Once it is implemented and correct you can empirically
evaluate performance eg using the time command
Note we are not interested in the absolute time it takes to
run as much as the relative time it takes as the problem
increases
Absolute times differ on different machines and with different
languages
COMPLEXITY THEORY EXAMPLE
1.int linearSearch(int a[], int n, int key){
2.
3.
for indexes from 0 to n-1
4.
if key equals current element array
5.
return current index
6.
return -1
}
What is the worst case cost?
Under what situation does this occur?
What is the best case cost?
What is the average case cost?
How many comparisons between data instances were
made?
COMPLEXITY EXAMPLE SOLUTION
How many times does each line run in the worst
case?
C0: line 2: For loop n+1 times
C1: line 3: n comparisons
C2: line 4: 0 times (worst case)
C3: line 5: 1 time (worst case)
Total: C0(n+1) + C1(n) + C3 = O(n)
For an unsorted sequence that is the best we can
do
INFORMAL DEFINITION OF BIG-O NOTATION
We express complexity using big-O notation
Represents the asymptotic worst case (unless stated
otherwise) time complexity
Big-O expressions do not have constants or loworder terms as when n gets larger these do not
matter
For example: For a problem of size n, if the cost of
the worst case is
2
1.5n +3n +10
2
in Big-O notation would be O(n )
BIG O-NOTATION FORMAL DEFINITION
The big O-notation is used to classify the work complexity of
algorithms
Definition: A function f(n) is said to be in (the set) O(g(n)) if
there exist constants c and N0 such that f(n) < c * g(n) for all n
> N0
EMPIRICAL ANALYSIS LINEAR SEARCH
Use the time command in linux. Run on different
sized inputs
Size of
Time
time ./prog < input > /dev/null
input(n)
not interested in real-time
interested in user-time
100000
What is the relationship between
input size
time
1000000
10000000
100000000
PREDICTING RUNTIME
If I know my algorithm is quadratic and I know that it
takes 1.2 seconds to run on a data set of size 1000
Approximately how long would you expect to wait
for a data set of size 2000?
What about 10000?
What about 100000?
What about 1000000?
What about 10000000?
SEARCHING IN A SORTED ARRAY
Given an array a of N elements, with a[i] <= a[j] for any pair of
indices i,j, with i <= j < N,
search for an element e in the array
int a[N];
// array with N items
int found = 0;
int i = 0;
while ((i < N) && (!found)){
found = (a[i] == e);
i++;
}
SEARCHING IN A SORTED ARRAY
Given an array a of N elements, with a[i] <= a[j] for any pair of
indices i,j, with i <= j < N,
search for an element e in the array
int a[N];
// array with N items
int found = 0;
int finished = 0;
int i = 0;
while ((i < N) && (!found) && (!finished)){
found = (a[i] == e);
exploit the fact that a is sorted
finished = (e < a[i]);
i++;
}
SEARCHING IN A SORTED ARRAY
How many steps are required to search an array of N elements
Best case: TN = 1
Worst case: TN = N
Average: TN = N/2
Still a linear algorithm, like searching in a unsorted array
BINARY SEARCH
We start in the middle of the array:
if a[N/2] == e, we found the element and were done
and, if necessary, `split array in half to continue search
if a[N/2] < e, continue search on a[0] to a[N/2 -1]
if a[N/2] > e, continue search on a[N/2+1] to a[N-1]
This algorithm is called binary search.
IMPLEMENTING BINARY SEARCH
See binary.c from examples for implementation
We maintain two indices, l and r, to denote leftmost and
rightmost array index of current part of the array
initially l=0 and r=N-1
iteration stops when:
left and right index define an empty array, element not found
Eg l > r
a[(l+r)/2] holds the element were looking for
if: a[(l+r)/2] is
larger than element, continue search on left
a[l]..a[(l+r)/2-1]
smaller than element, continue search on right
a[(l+r)/2+1]..a[r]
SEARCHING IN A SORTED ARRAY WITH BINARY SEARCH
How many comparisons do we need for
an array of size N?
linear
log (N)
Best case:
TN = 1
Worst case:
T1 = 1
TN = 1 + TN/2
TN = log2 N + 1
O(log n)
120
100
80
60
40
Binary search is a
logarithmic algorithm
20
0
10 20 30 40 50 60 70 80 90 10
BIG-O NOTATION
All constant functions are in O(1)
All linear functions are in O(n)
All logarithmic function are in the same class O(log(n))
O(log2(n)) = O(log3(n))= ....
(since logb(a) * loga(n) = logb(n))
We say an algorithm is O(g(n)) if, for an input of size n, the
algorithm requires T(n) steps, with T(n) in O(g(n)), and
O(g(n)) minimal
binary search is O(log(n))
linear search is O(n)
We say a problem is O(g(n)) if the best algorithm is O(g(n))
finding the maximum in an unsorted sequence is O(n)
ANALYSING ALGORITHMS
Most algorithms fall into one of the following categories,
having running times proportional to the functions
f(N) = 1: constant - instructions in the program are executed a fixed
number of times, independent of the size of the input
f(N) = log(N): logarithmic - some divide & conquer algorithms with
trivial splitting and combining operations
f(N) = N: linear - every element of the input has to be processed,
usually in a straight forward way
f(N) = N * log (N): Divide &Conquer algorithms where splitting or
combining operation is proportional to the input
f(N) = N2: quadratic. Algorithms which have to compare each input
value with every other input value. Problematic for large input
f(N) = N3: cubic, only feasible for very small problem sizes
f(N) = 2N: exponential, of almost no practical use
WHY COMPLEXITY MATTERS
n
log n
nlogn
n^2
2^n
10
40
100
1024
100
700
10000
1.3E+30
1000
10
10000
1000000
REALLY
BIG
10000
14
140000
100000000
100000
17
1700000
10000000000
1000000
20
20000000
1000000000000