Algorithm Design and Complexity - Course 1&2

Algorithm Design and Complexity
Course 1 & 2

Introduction


Algorithms and problems



We need to be able to provide solutions to problems





Any domain has problems that require an algorithmic solution
Find the best solution from a wide range of choices
Learn methods to develop solutions



Problem => Idea => Solution => Algorithm =>
Pseudocode => Code => Compiled Program



We need to design algorithms

Introduction (2)


Any non-trivial problem accepts a wide range of solutions



Need to compare these solution in order to find the best
one => Complexity



Need to show that the devised solution solves the
problem => Correctness



Not all the problems, have an algorithmic solution!

Introduction (3)


Some problems are similar (or slightly variations)





Accept similar solutions
Need to learn to discover two problems that are similar

Some methods used for designing algorithms provide
solutions for different problems



Need to understand these methods in order to know to use
them for as many problems as possible
The problems that can be solved using one method have some
common properties!

Course Info


Lectures: Traian Rebedea














Ph.D. @ University Politehnica of Bucharest, CS Dept.
traian.rebedea@cs.pub.ro / trebedea@gmail.com
Lecturer @ Computer Science Department
Interests: NLP, IR/IE, ML, TEL/CSCL, AI in general
Published over 25 papers at important conferences
Published 4 book chapters (2 in important international books)
Worked at 3 companies, founded 1 company
http://www.informatik.uni-trier.de/~ley/db/indices/atree/r/Rebedea:Traian.html
http://ro.linkedin.com/in/trebedea

Course Website: http://adcfils.wordpress.com/

Course Overview


Introduction







Complexity of Algorithms
Correctness of Algorithms (short)
Algorithm Design Techniques
Graph Algorithms & Applications




Problems and Decidability

Searches, topological sort, articulation points, bridges, strongly
connected components, minimum spanning trees, shortest
paths, network flow

Classes of Problems

Grading



Exam: 40p
Lab & Assignments: 60p



Assignments: 45p (3 x 15p)
Lab activity: 15p




The lab assistant decides how to grade the lab activity

Rules:






Minimum 30p for lab & assignments
Minimum 15p for the exam
Minimum 50p for total
You are not allowed to copy solutions from colleagues or
WWW (Measure of Software Similarity - MOSS)

Textbooks & More Info



Cormen T.H, Leiserson C.E, Rivest R.L and Stein C, Introduction to
Algorithms, Second Edition. MIT Press, 2001
Baase S and A. Van Gelder. Computer Algorithms. Introduction to
Design & Analysis, Addison-Wesley, 2000



References for each chapter



Introduction to Algorithms @ MIT




Coursera: Algorithms: Design and Analysis (part 1 and part 2)




Click here for link  also has video lectures
https://www.coursera.org/course/algo,
https://www.coursera.org/course/algo2

Websites for programming exercises: TopCoder, Infoarena, Talent
Buddy, HackerRank, Project Euler

Problems and Algorithms


Problems 1 – n Algorithms





Problem: Sorting






1+ algorithms to solve each problem
An algorithm usually solves only 1 problem

Given an array with n numbers A[n], arrange the elements in
the array such that any two consecutive elements are sorted
(A[i] <= A[i + 1] for i = 1..n-1)
Arrays A[1..n]

Algorithms:


Quick Sort, Merge Sort, Heap Sort, Bubble Sort, Insertion Sort,
Selection Sort, Radix Sort, Bucket Sort, …

Problems & Computability


There are a lot of problems


We would like to find solutions for all of them



Not all the problems can be solved!



The problems that can be solved are called computable
or decidable problems



The problems that cannot be solved are:



Very difficult
Not clear enough (need for subjective reasoning)

Problems & Computability


We would all like to know:



Which stock bonds shall rise tomorrow?
Which football team would win a game?
Who shall I marry?
Is there a God?
Are there any aliens in the universe?



Who is the most beautiful girl in the world?








Need for subjective thinking: what does “the most beautiful” mean?

Example


This example should be understood as a metaphor




Problem: Is there any alien life in the universe?




The physics and astronomy parts of it may be wrong

Assumption: the universe is infinite and there are an infinite
number of celestial bodies

Solution:




Explore all the planets, suns and other celestial bodies
Use any exploring method, it can be as good as you want
Explore the celestial body with a perfect scanner



If you find life on it => ANSWER:YES! Alien life exists
Else: continue to the next celestial body

Example (2)


The previous solution has a flaw







It never answers NO!

If the answer to our problem would be NO, then we
must wait an infinite amount of time
We cannot stop the solution at any moment in time and
conclude for certain that the answer is NO, because
maybe we still have a celestial body with alien life on it
that has not been explored yet!
This kind of problem is called undecidable!

Undecidable Problems


Problems that cannot be solved with an algorithmic solution




We can devise an algorithm, but that algorithm shall never finish in
some situations… therefore we cannot know the answer to our
problem!

Quick info:


A decision problem = a problem for which the answer is {yes, no}




Is n a prime number ?

An optimum problem = a problem for each we need to find the
optimum solution out of a set of possible solutions



Which is the shortest path between two vertices in a graph ?
Optimum = minimum or maximum

Undecidable Problems (2)


Any decision problem can be:



Decidable – can always solve the problem with an algorithm
that always finishes!
Semi-decidable – can devise a pseudo-algorithm for solving the
problem that only finishes if the answer is YES, but it never
finishes otherwise!







Therefore, we can never know whether the answer is YES or NO, but
if the algorithm stops than the answer is for sure YES

Undecidable – can not know if the answer is YES…

The previous example is a semi-decidable problem
However, most of the times all the problems that are not
decidable (semi- or un-decidable) are called undecidable



Why the example is a metaphor?



Because, any problem that is not decidable should have an
infinite space of the problem!



All the problems that have a finite number of states that
form the space of exploration for that problem are
decidable



E.g. there are a finite number of arranging the numbers in an
array
E.g. there are a finite number of ways to arrange a set of
queens on a chess board



Very difficult problems have an infinite space that should
be explored in order to find the solution!



Quick info:




There are an infinitely uncountable number of problems in this
world
There are only an infinitely countable number of programs in
this world
Alonso-Church thesis states that all the problems that can be
computed (are decidable) are the ones that can have a
program associated to them, that is used for solving them



Only a infinitely countable set of problems are decidable
The rest are not decidable

Classic Problems that are NOT Decidable


Halting Problem:


Given a program P and an input x, does P(x) halts/finishes?



YES if P(x) finishes in a finite amount of time
NO if P(x) never finishes (may be because it loops forever)



Barber Problem/Paradox



Post’s Correspondence Problem

Barber Paradox





http://en.wikipedia.org/wiki/Barber_paradox
Suppose there is a town with just one male barber; and
that every man in the town keeps himself clean-shaven:
some by shaving themselves, some by attending the
barber. It seems reasonable to imagine that the barber
obeys the following rule: He shaves all and only those men
in town who do not shave themselves.
Does the barber shave himself?

Barber Paradox (2)


1.
2.

The situation presented is in fact impossible:
If the barber does not shave himself, he must abide by
the rule and shave himself.
If he does shave himself, according to the rule he will
not shave himself

Post’s correspondence problem



http://en.wikipedia.org/wiki/Post_correspondence_problem
The input of the problem consists of two finite lists:

of words over some alphabet A having at least two symbols.
 A solution to this problem is a sequence of indices
with K >=1 and
for all k, such that


The decision problem then is to decide whether such a
solution exists or not.

Complexity of Algorithms


Need to find the best algorithm for solving a problem
Is algorithm A better than algorithm B ?



A measure for the performance of an algorithm



Simple practical solution:








Implement the algorithms
Measure their running times on a given machine

But we want to measure the performance of an algorithm:



Independent of the machine and language it is implemented in
Without wasting time for implementing it

Complexity of Algorithms (2)


We need a theoretical framework for measuring the
performance of an algorithm



Performance






Time = How quick does the algorithm compute the results ?
Space = How much memory does it need ?

Focus on time performance



Moore’s Law: processing power evolves less quickly than
storage capacity
Space constraints are rarely an issue: related to RAM size

Running Time


Measure of the time complexity of an algorithm



It is a theoretical measure that is dependent of the input
data and the processing performed by an algorithm



We define the running time as a function that only
depends on the size of the input data





The size of the input data is measured by positive integers
For arrays: A[n]
For graphs: G(V, E), |V| = n, |E| = m
For multiplying 3 matrices: lines1, columns1, columns2

Running Time (2)


We shall only discuss in this chapter running times that
are dependent of a single parameter: T(n)


The discussion can be easily extended to more parameters



T(n) is the running time for an algorithm that has an input
data of size n



T(n) is a function



T(n): N → R+

Example – Insertion Sort





http://en.wikipedia.org/wiki/Insertion_sort
Problem: Sorting an array A[n]
Solution: Insertion sort
Every repetition of the main loop of insertion sort
removes an element from the input data, inserting it into
the correct position in the already-sorted list, until no
input elements remain.



The already-sorted list is the sub-array on the left side
Usually, the removed element is the next one

Insertion Sort - Pseudocode
InsertionSort( A[1..n] )
1. FOR (j = 2 .. n)
2.
x = A[j]
3.
i=j–1
4.
5.
6.
7.
8.

// element to be inserted
// position on the right side of
//
the sorted sub-array
WHILE (i > 0 AND x < A[i]) // while not in position
A[i + 1] = A[i]
// move to right
i-// continue
A[i + 1] = x
RETURN A

Example


From Erik D. Demaine and Charles E. Leiserson –
Introduction to Algorithms@ MIT

Analysis of Complexity


What is the complexity of Insertion Sort ?



General solution for the running time:









Each simple instruction takes a constant amount of time
This is clearly a simplification as the execution of an instruction
depends on the operands
Simple instructions: assignments, logical, mathematical between
numbers, print/scan of a number, return, …
Complex instructions: calls to other functions

T(n) = sum over the running time of each instruction
The running time of an instruction = Running time to execute
it once * Number of times it is executed

Analysis of Complexity – General
Instruction
nbr

Running time – execute
once

Number of time it is
executed

1

C1 (a constant)

n

2

C2

n–1

3

C3

n–1

4

C4

T1

5

C5

T2

6

C6

T2

7

C7

n–1

8

C8

1

T(n) = C1*n + (C2+C3+C7)*(n-1) + C4*T1 + (C5+C6)*T2
T1 = (j=2..n)  tj
T2 = (j=2..n)  (tj – 1)
Difficult to say more about the general form of the running time

Analysis of Complexity – General (2)


General form of the running time cannot be expanded
because it depends on the structure of the input data




T1, T2 = ??

However, there are interesting special cases that can be
easily computed:




Worst case
Best case
Average case ?

Worst Case Complexity





Happens when the array is sorted descending
In this case, all the elements x = A[j] are lower than all
the previous elements
Therefore, they must be moved to the beginning of the
array
Thus:
tj = j (from j – 1 to 0)
T1 = (j=2..n)  j = n*(n+1) / 2 – 1
T2 = (j=2..n)  (j - 1) = n*(n-1) / 2
Tworst(n) = a*n2 + b*n + c

quadratic time

Best Case Complexity





Happens when the array is sorted ascending
In this case, all the elements x = A[j] are higher than all
the previous elements
Therefore, they are not moved
Thus:
tj = 1
T1 = (j=2..n)  1 = n-1
T2 = (j=2..n)  (1 - 1) = 0
Tbest(n) = b1*n + c1

linear time

Average Case Complexity





It is interesting to compute it precisely
It is very difficult to compute it precisely!
Should take into consideration the distribution of the input
data and sum up over all possible instances of the input data
averaged by their distribution






See example of formula in blackboard
Not feasible

Simpler solution: on average, an element x = A[j] in inserted in
the middle of the already-sorted list
Recompute T1 and T2 for this case => still a quadratic solution

Conclusions



General formula for the complexity of an algorithm is
usually incomplete due to the influence of the structure
of the input data
Average case is interesting, but difficult to compute



Solution: only compute worst case!



Makes sense for a lot of applications: ABS braking
algorithm should be good on worst case, the same for
computing reports for your boss, …
On most occasions, average case has the same order of
growth as the worst case complexity





hope you’re not sleeping yet 

Asymptotic Notations


Current simplifications for computing complexity of
algorithms:




Constant amount of time for simple instructions
Interested in worst case most of the time
Only interested in the asymptotic behavior of T(n)



Asymptotic: T(n) | n → INF



These notations are not used only for running times, but
for any function of the form f(n) : N → R+


|sin(n)|, 1/n, 2sin(n)+1 are functions that cannot be running times

Big-O Notation



http://en.wikipedia.org/wiki/Big_O_notation
Upper asymptotic bound

Big- Notation



Omega
Lower asymptotic bound

 Notation



Theta
Order of growth

Remarks



It is important to compute the order of growth for
algorithms
The asymptotic notations define sets of functions


See picture on blackboard



Sometimes, the Big-O notation is used as a substituent of
the  notation



 notation – equivalence relation for functions of the
form f(n) : N → R+
Big-O, Big- notations – partial order relations



Equivalence Relation


Three important properties:

3.

Reflexivity
Symmetry
Transitivity



It partitions the functions into equivalence classes:

1.
2.






Each class has a representative function
Obtained by removing all lower degree terms and removing
any constants from the highest degree term

(1), (log log n), (log n), (n log n), (n2), (n3), … ,
(2n), (n!), (nn)

Partial Order Relations

1.
2.
3.





Three important properties:
Reflexivity
Anti-symmetry
Transitivity
f(n)  O(g(n)) … f(n) “<=“ g(n)
f(n)  (g(n)) … f(n) “>=“ g(n)
f(n)  (g(n)) … f(n) “~=“ g(n)
Partial order because some functions cannot be compared:
e.g. n and nsin(n)+1

Small-o and Small- Notations


f(n)  O(g(n)) can be:



Tight: 2*n2 + n  O(n2)
Loose: 5*n  O(n2)



Therefore, small-o is always loose:



Similarly:




f(n)  o(g(n)) … f(n) “<“ g(n)
f(n)  (g(n)) … f(n) “>“ g(n)

Asymptotic Notations Used in Equations



For any function on the left side that is part of the set
defined by that asymptotic notation


there is a function on the right side that is part of the set
defined by that asymptotic notation

Exercises – Set 1
What is the complexity of the following algorithms ?


Matrix_add_1 (A[n][n],B[n][n]) {
for (i = 1,n) {
for (j = 1,n) {
C[i][j] = 0
}
}
for (i = 1,n) {
for (j = 1,n) {
C[i][j] = A[i][j] + B[i][j]
}
}
return C
}

Matrix_add_2 (A[n][n],B[n][n]) {
for (i = 1,n) {
for (j = 1,n) {
B[i][j] = A[i][j] + B[i][j]
}
}
return B
}

Maximum Subsequence Sum Problem


Given (possibly negative) integers a1, a2, ..., an, find the
maximum value of (k=i..j)Σak. The maximum subsequence
sum is defined to be 0 if all the integers are negative

Let A[1] … A[N] be an array of integers that contains a
sequence of length N.
Let sum and maxSum be integers initialized to 0.
For integer i = 1 to N do
Let sum = 0
For integer j = i to N do
Let sum = sum + A[ j ]
If( sum > maxSum ) then
Let maxSum = sum
Return maxSum



There also exist solutions in:




(n)
(n log n)
http://www.wou.edu/~broegb/Cs345/MaxSubsequenceSum.pdf
(n3)

Keep in Mind


Are there things more important than the performance
of an algorithm ?



May be:









Correctness
Modularity
Maintainability
Robustness
User-friendliness
Programmer time
Extensibility
Reliability

References


CLRS – Chapters 1-3



MIT OCW – Introduction to Algorithms – video lectures
1-2

Algorithm Design and Complexity - Course 1&2

More Related Content

What's hot

Similar to Algorithm Design and Complexity - Course 1&2

More from Traian Rebedea

Recently uploaded

Algorithm Design and Complexity - Course 1&2