Data Structures & Analysis of Algorithms
(COMP4120)
Lecture # 5
Algorithm Analysis
Course Instructor
Dr. Aftab Akram
PhD CS
Assistant Professor
Department of Information Sciences, Division of Science &
Technology
University of Education, Lahore
Comparing Algorithms
• How do you compare two algorithms?
• Run these algorithms are computer programs
• Compare the resource used by them
• This approach may not work due to following
reasons:
• Have to write two programs but want to keep just one
• “Better Written” program
• The choice of empirical test cases might unfairly favor
one algorithm
• What if both algorithm did not fall into resource budget?
Asymptotic Analysis
• Asymptotic analysis measures the efficiency of an
algorithm
• Actually an estimating technique
• Asymptotic analysis has proved useful to computer
scientists who must determine if a particular algorithm
is worth considering for implementation.
• The critical resource for a program is most often its
running time.
• Typically you will analyze the time required for an
algorithm (or the instantiation of an algorithm in the
form of a program), and the space required for a data
structure.
Factor affecting Running Time
• The environment in which program is compiled and
run
• Speed of CPU
• Bus Speed
• Peripheral Hardware, etc.
• Competition for network resources
• The programming languages and quality of code
generated by compiler
• The Coding Efficiency of Computer Programmer
Estimation Algorithm’s Performance
• Estimate Algorithm’s Performance
• number of basic operations required by the algorithm to
process an input of a certain size.
• For example, when comparing sorting algorithms, the
size of the problem is typically measured by the
number of records to be sorted.
• A basic operation must have the property that its time
to complete does not depend on the particular values
of its operands.
• Adding or comparing two integer variables are
examples of basic operations in most programming
languages
• Summing the contents of an array containing integers
is not, because the cost depends on the value of
Determining Running Time
• The most important factor affecting running time is
normally size of the input
• For a given input size we often express the time to
run the algorithm as a function of , written as
• We will always assume is a non-negative value.
• Let us call the amount of time required to compare
two integers in function largest.
• The total time to run largest is therefore
approximately , because we must make
comparisons, with each comparison costing time.
•
• This equation describes the growth rate for the running
time of the largest value sequential search algorithm.
Algorithm Growth Rate
• The growth rate for an algorithm is the rate at which
the cost of the algorithm grows as the size of its input
grows.
• Linear Growth Rate: as the value of n grows, the
running time of the algorithm grows in the same
proportion
• Quadratic Growth Rate: algorithm’s running-time
equation has a highest-order term containing a factor
of
• Logarithmic Growth Rate: algorithm having a
factor in running-time equation
• Exponential Growth Rate: an algorithm having
running time
Input Sizes
• Finding the factorial of
• Just one number as input
• Largest-value sequential search algorithm discussed
in Example 3.1
• This algorithm examines every elements in array
• Array size varies but still algorithm processes
every element in the array
• Even if element is found at the start of array
• So, in fact no best or worst case scenarios
Input Sizes
• For some algorithms, different inputs of a given size
require different amounts of time.
• For example, consider the problem of searching an
array containing integers to find the one with a
particular value (assume that appears exactly
once in the array).
• The sequential search algorithm begins at the first
position in the array and looks at each value in turn
until is found.
• Once is found, the algorithm stops.
Sequential Search
Best, worst and average cases
• There is a wide range of possible running times for
the sequential search algorithm.
• Best Case: The first integer in the array could have
value , and so only one integer is examined.
• Worst Case: if the last position in the array contains
, then the running time is relatively long, because
the algorithm must examine values
• Average Case: is find somewhat in the middle of
the array. On average, the algorithm examines
about values.
What scenario to consider?
• To discuss the algorithm’s performance, what scenario
should be considered?
• Best case rarely happen, and too optimistic to consider
• The advantage to analyzing the worst case is that you know
for certain that the algorithm must perform at least that
well.
• Average cases represent typical behavior of an algorithm
• Unfortunately, average-case analysis is not always possible.
• Average-case analysis first requires that we understand how
the actual inputs to the program (and their costs) are
distributed with respect to the set of all possible inputs to
the program.
• Real-time applications prefer worst-case scenarios.
Otherwise, we resort to average cases.
A Faster Computer, or a Faster
Algorithm?
• To explain this concept, we will take example of
sorting.
• Sorting is a fundamental problem related to data
structures and algorithm analysis.
• In sorting, we have to arrange elements of a data
structure, i.e., array or list in ascending or
descending order
• There are many algorithms that are designed to
solve sorting problem.
• However, not all of them have same performance.
A Faster Computer, or a Faster
Algorithm?
• To explain, our topic we take two famous sorting
algorithms
• Insertion Sort, which take to sort elements
• Merge Sort, which take to sort elements
• Both and are constants which do not depend on
• If we write Insertion sort running time as and
merge sort running time as
• We can see merge sort is faster since is much
smaller than
• For example, when , is
approximately 10, and when equals one million,
is approximately only 20.)
A Faster Computer, or a Faster
Algorithm?
• Although insertion sort usually runs faster than merge
sort for small input sizes, once the input size becomes
large enough, merge sort’s advantage of vs. will
more than compensate for the difference in constant
factors.
• For a concrete example, let us pit a faster computer
(computer A) running insertion sort against a slower
computer (computer B) running merge sort.
• They each must sort an array of 10 million numbers.
(Although 10 million numbers might seem like a lot, if
the numbers are eight-byte integers, then the input
occupies about 80 megabytes, which fits in the memory
of even an inexpensive laptop computer many times
over.)
A Faster Computer, or a Faster
Algorithm?
• Suppose that computer A executes 10 billion instructions
per second (faster than any single sequential computer at
the time of this writing) and computer B executes only 10
million instructions per second, so that computer A is 1000
times faster than computer B in raw computing power.
• To make the difference even more dramatic, suppose that
the world’s craftiest programmer codes insertion sort in
machine language for computer A, and the resulting code
requires instructions to sort numbers.
• Suppose further that just an average programmer
implements merge sort, using a high-level language with an
inefficient compiler, with the resulting code taking
instructions.
A Faster Computer, or a Faster
Algorithm?
• To sort 10 million numbers, computer A takes:
• while computer B takes
A Faster Computer, or a Faster
Algorithm?
• By using an algorithm whose running time grows
more slowly, even with a poor compiler, computer
B runs more than 17 times faster than computer A.
• The advantage of merge sort is even more
pronounced when we sort 100 million numbers:
where insertion sort takes more than 23 days,
merge sort takes under four hours.
• In general, as the problem size increases, so does
the relative advantage of merge sort.
Asymptotic Analysis
• The study of an algorithm as the input size “gets
big” or reaches a limit (in the calculus sense).
• Asymptotic analysis is a form of “back of the
envelope” estimation for algorithm resource
consumption.
• It provides a simplified model of the running time
or other resource needs of an algorithm.
• This simplification usually helps you understand the
behavior of your algorithms.
• Generally, problems with smaller input sizes
consider constants. However, for larger problems
the constants does not matter.
Upper Bounds
• It indicates the upper or highest growth rate that the
algorithm can have.
• Big-Oh Notation used for this purpose.
• If the upper bound for an algorithm’s growth rate (for,
say, the worst case) is , then we would write that
this algorithm is “in the set in the worst case”
(or just “in in the worst case”).
• For example, if grows as fast as (the running
time of our algorithm) for the worst-case input, we
would say the algorithm is “in in the worst case.”
Upper Bounds
• represents the true running time of the algorithm.
• is some expression for the upper bound.
• For a non-negative valued function, is in set
if there exist two positive constants and
such that
• The definition says that for all inputs of the type in
question (such as the worst case for all inputs of size )
that are large enough (i.e., ), the algorithm
always executes in less than steps for some
constant .
• Constant is the smallest value of for which the
claim of an upper bound holds true. Usually, is 1
(but not always)
Upper Bounds
• Some algorithms have the same behavior no matter
which input instance they receive, e.g., searching
maximum in an array.
• But for many algorithms, it makes a big difference, e.g.,
searching an unsorted array for a particular value.
• So any statement about the upper bound of an
algorithm must be in the context of some class of
inputs of size .
• We measure this upper bound nearly always on the
best-case, average-case, or worst-case inputs.
• We always seek to define the running time of an
algorithm with the tightest (lowest) possible upper
bound.
Lower Bounds
• Big-Oh notation describes an upper bound.
• In other words, big-Oh notation states a claim about
the greatest amount of some resource (usually time)
that is required by an algorithm for some class of inputs
of size
• Similar notation is used to describe the least amount
of a resource that an algorithm needs for some class of
input.
• The lower bound for an algorithm (or a problem, as
explained later) is denoted by the symbol ,
pronounced “big-Omega” or just “Omega.”
Lower Bounds
• The following definition for is symmetric with
the definition of big-Oh.
• For a non-negatively valued function, is
in set if there exist two positive constants
and such that for all
Notation
• The definitions for big-Oh and give us ways to describe
the upper bound for an algorithm
• if we can find an equation for the maximum cost of a particular
class of inputs of size
• the lower bound for an algorithm
• if we can find an equation for the minimum cost for a particular
class of inputs of size
• When the upper and lower bounds are the same within a
constant factor, we indicate this by using (Big-Theta)
notation.
• An algorithm is said to be if it is in and it is
in
• Note that we drop the word “in” for notation, because
there is a strict equality for two equations with the same .
• In other words, if is then is
Notation
• The sequential search algorithm is both in
and in in the average case, we say it is
in the average case.
Simplifying Rules
• Once you determine the running-time equation for
an algorithm, it really is a simple matter to derive
the about Big-O, Omega ( ) and Theta ( )
expressions from the equation.
• Following four rules are used to determine the
simplest form.
Rule#1
If is in and is
in then is
in
• The first rule says that if some function is an
upper bound for your cost function, then any upper
bound for is also an upper bound for your cost
function.
• A similar property holds true for notation: If is a
lower bound for your cost function, then any lower
bound for is also a lower bound for your cost
function.
Rule#2
If is in for any
constant then is in
• The significance of rule (2) is that you can ignore
any multiplicative constants in your equations
when using big-Oh notation.
• This rule also holds true for and notations.
Rule#3
If is in and
is in then
is in
• Rule (3) says that given two parts of a program run
in sequence (whether two statements or two
sections of code), you need consider only the more
expensive part.
• This rule applies to and notations as well: For
both, you need consider only the more expensive
part.
Rule#4
If is in and
is in then is
in ).
• Rule (4) is used to analyze simple loops in program
• If some action is repeated some number of times,
and each repetition has the same cost, then the
total cost is the cost of the action multiplied by the
number of times that the action takes place.
• This rule applies to and notations as well.
Ignoring Lower Order Terms
• Taking the first three rules collectively, you can ignore
all constants and all lower-order terms to determine
the asymptotic growth rate for any cost function.
• Ignoring lower-order terms is reasonable when
performing an asymptotic analysis.
• The higher-order terms soon swamp the lower-order
terms in their contribution to the total cost as
becomes larger.
• If then is in
• The terms contributes relatively little to the total
cost for large
Classifying Function
• Given functions and whose growth rates are
expressed as algebraic equations, we might like to
determine if one grows faster than the other.
• The best way to do this is to take the limit of the two
functions as grows towards infinity,
• If the limit goes to ∞, then is in because
grows faster.
• If the limit goes to zero, then is in
because grows faster.
• If the limit goes to some constant other than zero, then
because both grow at the same rate.
Example
Calculating Running Time:
Constant Running Time
Running Time of for loop
Running Time of Several for
loops
Running Time of Several for
loops
Nested for loops
Nested for loops
Nested for loops-Another
Example
Nested for loops-Another
Example
Running Time of while loop
• While loops are analyzed in a manner similar to for
loops.
• The cost of an if statement in the worst case is the
greater of the costs for the then and else clauses.
• This is also true for the average case, assuming that the
size of does not affect the probability of executing
one of the clauses (which is usually, but not necessarily,
true).
• For switch statements, the worst-case cost is that of
the most expensive branch.
• For subroutine calls, simply add the cost of executing
the subroutine.
Binary Search
Estimating Running Time for
Binary Search
Estimating Running Time for
Binary Search