LECTURE 03
COMPUTATIONAL COMPLEXITY OF
SEARCHING AND SORTING
ALGORITHMS
MUST KNOW GRAPH THEORY
Graph theory (GT) is closely related to searching and
sorting algorithm
GT provides a useful framework to understand and
analyse the behaviour of both searching and sorting
algorithms
If you can’t remember any, please review back Graph
Theory in Data Structure and Algorithms
Assumption:
You know about edge, nodes
You know about directed/undirected/weighted graphs
You know about adjacency list/matrix
You know about tree, spanning tree, forests, sub graphs
Adjacency Matrix vs. Adjacency List
Compare Adjacency Matrix Adjacency List
Speed Faster when algorithm Faster when algorithm
is o know the direct processes linked list
connection between from node to node
two specific vertices
Spatial complexity: ( n2 ) ( n )
effects on memory
space usage as the
problem size is varied
Vertex Infor array ( n ) ( n )
Adjacency Matrix vs. Adjacency List
(cont.)
Both of the spatial complexities for the adjacency
matrix and the adjacency list absorb the ( n ) spatial
complexity used in the vertex info array
The spatial complexity of the adjacency list really
depends on whether the graph is sparse or dense
Adjacency Matrix vs. Adjacency List
(cont.)
A sparse graph:
few edges does not have that many edges relative
to the number of vertices
if the number of edges is less than or equal to n,
then the spatial complexity of the adjacency list is
( n )
In a dense graph:
There may be close to n2 edges (no of edges close
to the maximal no of edges)
The spatial complexity of the adjacency list is ( n2
), the same as that for the adjacency matrix
Dense vs Sparse Graphs
Dense vs Sparse Graphs
HOW IMPORTANT IS SEARCHING and
SORTING?
• Most common algorithms use in solving problem
• Searching:
• For e.g., in deep learning, searching an optimal objective
function within short time is important
• Mining outliers among data input
• Sorting:
• For e.g., search the items to sort
SEARCHING ALGORITHMS
•Depth-First Search | Breadth-First Search |
Terminology
End vertices (or endpoints) of
an edge
U and V are the endpoints of
edge a
Edges incident (endpoint) on a
vertex
a, d, and b are incidents on
V V
a b
Adjacent vertices: connected by h j
an edge
U and V are adjacent U d X Z
Degree of a vertex c e i
X has degree 5 W g
Parallel edges
h and i are parallel edges f
Y
Self-loop
j is a self-loop
Properties
Notation
n = number of vertices
m = number of edges
deg(v) = degree of
vertex v
Example
v
n = 4
m = 6
deg(v) = 3
Properties
Handshaking Theorem - Property 1
Sv deg(v) = 2m
Proof: each edge is counted twice to the total degree
Property 2
In an undirected graph with no self-loops and no
multiple edges
m n (n - 1)/2
Proof: each vertex has degree at most (n - 1)
Properties
Directed graph
Indegree, deg-(v)
Outdegree, deg+(v)
Theorem:
G = (V, E) be a graph with directed edges
Proof: The first sum counts the number of outgoing
edges over all vertices and the second sum counts
the number of incoming edges over all vertices. It
follows that both sums equal the number of edges in
the graph.
Asymptotic Performance
n vertices, m edges
no parallel edges Adjacency Adjacency
no self-loops List Matrix
Bounds are “big-Oh”
Space n+m n2
incidentEdges(v) deg(v) n
areAdjacent (v, w) min(deg(v), deg(w)) 1
insertVertex(o) 1 n2
insertEdge(v, w, o) 1 1
removeVertex(v) deg(v) n2
removeEdge(e) 1 1
Exercise
Given the graph below, identify the following
properties
Degree of vertex 10
Adjacent vertices to 6
Incidents of 9
Subgraphs
A subgraph S of a graph G is a graph such that
The vertices of S are a subset of the vertices of G
The edges of S are a subset of the edges of G
A spanning subgraph of G is a subgraph that contains
all the vertices of G
Spanning subgraph
Subgraph
Connectivity
A graph is connected if there is a path between every
pair of vertices
A connected component of a graph G is a maximal
connected subgraph of G
Connected graph Non connected graph with two
connected components
Graph
To search connectivity between two vertices
To travel from one node to the destination node
How to efficiently perform that?
Graph Traversal
Depth First Search Breadth First Search
Stack Explore in depth Queue Explore level by
level
Graph
To search connectivity between two vertices
To travel from one node to the destination node
How to efficiently perform that?
Application
DFS BFS
Find path between two vertices Determine shortest path & minimum
spanning tree
Detect cycles in a graph Used by search engine crawlers to
build indexes of web page
For topological sorting Find available neighbour nodes in
peer to peer network
Solve puzzle having only one Search on social networks
solution such as maze
Graph Traversal 1: Depth-First Search
Depth-first search
Algorithm
Example
Properties
Analysis
Depth-First Search
A DFS traversal of graph G
Visits all the vertices and edges of G
Determines whether G is connected
Computes the connected components of G
Computes a spanning forest of G
DFS on a graph with n vertices and m edges takes O(n + m)
time
DFS can be further extended to solve other graph problems
Find and report a path between two given vertices
Find a cycle in the graph
Depth-first search is to graphs what Euler tour (travel each
edge exactly once; start and end at similar vertex) is to binary
trees
Recursive DFS Algorithm A
a b
c
B C
The algorithm uses a mechanism for setting and
getting “labels” of vertices and edges
Algorithm DFS(G, v)
Algorithm DFS(G) Input graph G and a start vertex v of G
Input graph G Output labeling of the edges of G
Output labeling of the edges of G in the connected component of v
as discovery edges and as discovery edges and back edges
back edges setLabel(v, VISITED) Deg(v)
for all u G.vertices() for all e G.incidentEdges(v)
O(n) setLabel(u, UNEXPLORED) if getLabel(e) = UNEXPLORED
for all e G.edges() w opposite(v,e)
O(m) setLabel(e, UNEXPLORED) if getLabel(w) = UNEXPLORED
for all v G.vertices() setLabel(e, DISCOVERY)
if getLabel(v) = UNEXPLORED DFS(G, w)
DFS(G, v) else
Here is where DFS search for neighbour nodes. Time setLabel(e, BACK)
complexity depends on whether matrix or list is applied
Example
A
A unexplored vertex
A visited vertex
B D E
unexplored edge
discovery edge C
back edge
A A
B D E B D E
C C
Example (cont.)
A A
B D E B D E
C C
A A
B D E B D E
C C
DFS and Maze Traversal
intersection
Dead end
Corner
The DFS algorithm is similar to a classic strategy for
exploring a maze
We mark each intersection, corner and dead end
(vertex) visited and each corridor (edge) traversed
We keep track of the path back to the entrance (start
vertex) by means of a rope (recursion stack)
Properties of DFS
Property 1
DFS(G, v) visits all the
vertices and edges in
the connected A
component of v
Property 2
B D E
The discovery edges
labeled by DFS(G, v)
form a spanning tree of
C
the connected
component of v
Analysis of DFS (Adjacency List)
Setting/getting a vertex/edge label takes O(1) time
Each vertex V is labeled twice
once as UNEXPLORED
once as VISITED
Each edge E is labeled twice
once as UNEXPLORED
once as DISCOVERY or BACK
For directed graph
Discover neighbors V and E once in linear time, thus O(V) +
O(E) = O(V+E)
For undirected graph
Same like directed graph except E will appears twice. In
general O(V) + O(2E) ~ O(V+E)
Analysis of DFS (Adjacency List)
Method incidentEdges is called once for each vertex
DFS runs in O(v + e) time provided the graph is
represented by the adjacency list structure
Recall that Sv deg(v) = 2m
For space, stack is used to keep track of last visited
nodes. Worst case, the size of stack is = number of
nodes, i.e. O(V)
If it is a matrix, need two for loop (visit
row columns), time complexity will be
O(v^2)
What really worries us about DFS?
Asymptotic Time Complexity (Adjacency List):
O(v + e) – Linear growth with v and e, no issue.
O(bd) where d is the depth– Exponential growth with d, terrible.
Asymptotic Space Complexity (Adjacency List):
O(v + e) – Linear growth with v, no issue.
O(bd) – Linear growth with d, no issue.
Branching factor: the out Let v: Number of Nodes
degree or number of child
nodes. For e.g., a binary
e: Number of Edges
tree is having a branching b: Branching Factor
factor of 2. d: Maximum Depth
What really worries us about DFS?
Asymptotic Time Complexity (Adjacency List):
O(v + e) – Linear growth with v and e, no issue.
O(bd) where d is the depth– Exponential growth with d, terrible.
Asymptotic Space Complexity (Adjacency List):
O(v + e) – Linear growth with v, no issue.
O(bd) – Linear growth with d, no issue.
Adjacency List
Index Parent Node
0 A B C
1 B D E
2 C F G
Iterative DFS Algorithm
S is holding all adjacent
edges, the while loop
must visit each edge and
vertex till empty. Thus,
causing the time
complexity as O( v + e)
If there is a
neighbouring edge
Source from: https://en.wikipedia.org/wiki/Depth-first_search
DFS
Time complexity
Assume worst case: 1 goal leaf at the right hand site
DFS will expand all nodes: b(d) : 1+b+b2, with more depth,
DFS needs more time to expand
In term of space,
At depth l < d(max) we have b-1, thus we have (d-1)*(b-1)
nodes at max. Depth d we have b nodes
Total (d-1)*(b-1)+b=O(bd)=O(be)
d=0
d=1
d=2
DFS
Node visited Stack (bottom to top) Expanded Nodes Frontier (the initial
(6); tested nodes: state)
A {A}
7
A (not goal) {A,B} - {A}
B (not goal) {A,B,D,E} A (not goal) {A,B}
E (not goal) {A,B,D,E} B (not goal) {A,B,E,F}
backtrack {A,B,D} F (not goal) {A,B,E,F}
D (not goal) {A,B,D} E (not goal) {A,B,E}
backtrack {A,B} C (not goal) {A,C}
d=0
backtrack {A} H (not goal) {A,C,G,H}
C (not goal) {A,C} d=1
G (goal) {A,C,G}
F (not goal) {A,C,G,H}
backtrack {A,C,G}
G (goal is the z) {A,C,G}
Graph Traversal: Breadth-First Search
Breadth-first search
Algorithm
Example
Properties
Analysis
L0
A
L1
B C D
L2
E F
Breadth-First Search
BFS traversal of graph G: BFS on a graph with n
Visits all the vertices vertices and m edges
takes O(n + m) time
and edges of G
Determines whether G BFS can be further
is connected extended to solve other
graph problems
Computes the Find and report a path
connected components with the minimum
of G number of edges
Computes a spanning between two given
forest of G vertices
Find a simple cycle, if
there is one
BFS Algorithm
The algorithm uses a Algorithm BFS(G, s)
let q be a new queue O(1)
mechanism for setting for each node v in G:
and getting “labels” of dist[v] = ∞ O(n)
vertices and edges dist[s] =
enqueue(s, q) O(1)
Algorithm BFS(G)
Input graph G while q is not empty
Output labeling of the edges let v = dequeue(q)
and partition of the for each neighbor u of v:
vertices of G if dist[u] = ∞
for all u G.vertices() dist[u] = dist[v] + 1
O(n)
setLabel(u, UNEXPLORED) enqueue(u, q)
for all e G.edges()
O(m) setLabel(e, UNEXPLORED)
for all v G.vertices() If using adjacency matrix,
it is O(n2)
if getLabel(v) = UNEXPLORED
If using adjacency list,
BFS(G, v)
It is O(n+m)
Example
L0
A
A unexplored vertex
A visited vertex L1
B C D
unexplored edge
discovery edge E F
cross edge
L0 L0
A A
L1 L1
B C D B C D
E F E F
Example (cont.)
L0 L0
A A
L1 L1
B C D B C D
L2
E F E F
L0 L0
A A
L1 L1
B C D B C D
L2 L2
E F E F
Example (cont.)
L0 L0
A A
L1 L1
B C D B C D
L2 L2
E F E F
L0
A
L1
B C D
L2
E F
Properties
Notation A
Gs: connected component of s
Property 1
B C D
BFS(G, s) visits all the vertices
and edges of Gs
Property 2 E F
The discovery edges labeled by
BFS(G, s) form a spanning tree L0
Ts of Gs A
Property 3 L1
For each vertex v in Li B C D
The path of Ts from s to v
has i edges L2
E F
Every path from s to v in Gs
has at least i edges
Analysis
Same as DFS
However, in term of data structure, BFS is using
queue to keep track of the visited nodes. The space
complexity is O(V) since the visited nodes are kept
with worst case is up to the total number of vertices.
What really worries us about BFS?
Asymptotic Time Complexity (Adjacency List):
O(n + m) – Linear growth with n and m, no issue.
O(bd) – Exponential growth with d, terrible.
Asymptotic Space Complexity (Adjacency List):
O(n + m) – Linear growth with n and m, no issue.
O(bd) – Linear growth with d, no issue.
Let n: Number of Nodes
m: Number of Edges
b: Branching Factor
d: Maximum Depth
Applications
Using the template method pattern, we can specialize
the BFS traversal of a graph G to solve the following
problems in O(n + m) time
Compute the connected components of G
Compute a spanning forest of G
Find a simple cycle in G, or report that G is a forest
Given two vertices of G, find a path in G between
them with the minimum number of edges, or report
that no such path exists
Exercise
Perform a DFS and BFS traversal on the following
graph. Start the traversal from node (A) and draw the
resulting tree
DFS vs. BFS
DFS BFS
Can be implemented with Can be implemented with
stacks (LIFO) queues (FIFO)
Two stage operation Single stage operation
Less memory compared to More memory compared with
BFS DFS
Identifies back edges Identifies cross edges
( n2 ) efficiency in adjacency matrix
O( n + e ) efficiency in adjacency list
Useful in cycle detection Useful in shortest path
identification
Useful in finding connectivity, spanning tree/forest, path
identification
DFS vs. BFS (cont.)
Back edge (v,w) Cross edge (v,w)
w is an ancestor of v in w is in the same level
the tree of discovery as v or in the next level
edges in the tree of discovery
edges
L0
A A
L1
B C D B C D
L2
E F E F
DFS BFS
Application of DFS: Topological sorting
Topological sorting
Algorithm
Example
Properties
Analysis
DAGs and Topological Sorting
A directed acyclic graph (DAG) is a
digraph that has no directed cycles D E
A topological sorting/ordering of a
digraph is a numbering B
v1 , …, vn C
of the vertices such that for every
edge (vi, vj), we have i < j A DAG G
Example: in a task scheduling v4 v5
digraph, a topological sorting a task
sequence that satisfies the D E
precedence constraints v2
Theorem B
A digraph admits a topological
v3
sorting if and only if it is a DAG v1 C
Topological
A ordering of G
Topological Sorting
Number vertices, so that (u,v) in E implies u < v
1 A typical student day
wake up
2 3
eat
study computer sci.
4 5
nap more c.s.
7
play
8
write c.s. program 6
9 work out
make cookies
for professors
10
sleep 11
dream about graphs
Topological Sorting Algorithm 1: Using DFS
Simulate the algorithm by using depth-first search
Algorithm topologicalDFS(G, v)
O(n+m) time.
Input graph G and a start vertex v of G
Algorithm topologicalDFS(G) Output labeling of the vertices of G
Input dag G in the connected component of v
Output topological ordering of G setLabel(v, VISITED)
n G.numVertices() for all e G.incidentEdges(v)
for all u G.vertices() if getLabel(e) = UNEXPLORED
setLabel(u, UNEXPLORED) w opposite(v,e)
for all e G.edges() if getLabel(w) = UNEXPLORED
setLabel(e, UNEXPLORED) setLabel(e, DISCOVERY)
for all v G.vertices() topologicalDFS(G, w)
if getLabel(v) = UNEXPLORED else
topologicalDFS(G, v) {e is a back or cross edge}
Label v with topological number n
nn-1
Topological Sorting Algorithm 2
Note: This algorithm is different than the one
in Goodrich-Tamassia (prev slide)
Algorithm TopologicalSort(G)
HG // Temporary copy of G
n G.numVertices()
while H is not empty do
Let v be a vertex with no outgoing edges
Label v n
nn-1
Remove v from H
Running time: O(n + m).
Topological Sorting: Using DFS Example
(Many Possible Solutions)
2
1
3
4
6 5
7
8
9
Topological Sorting 2: Using DFS
Example (Start from Top Vertex)
2
1
3
4
6 5
7
8
9
Topological Sorting 2: Using DFS
Example
2
1
3
4
6 5
7
8
8 9
Topological Sorting 2: Using DFS
Example
2
1
3
4
6 5
7
8
7 8 9
Topological Sorting 2: Using DFS
Example
2
1
3
4
6 5
7
8
6 7 8 9
Topological Sorting 2: Using DFS
Example
2
1
3
4
6 5
7
8
5 6 7 8 9
Topological Sorting 2: Using DFS
Example
2
1
3
4
6 5
7
8
4 5 6 7 8 9
Topological Sorting 2: Using DFS
Example
2
1
3
4
6 5
7
8
3 4 5 6 7 8 9
Topological Sorting 2: Using DFS
Example
2
1
3
4
6 5
7
8
2 3 4 5 6 7 8 9
Topological Sorting 2: Using DFS
Example
2
1
3
4
6 5
7
8
1 2 3 4 5 6 7 8 9
Exercise
Perform a topological sort for the following digraph
DFS/BFS vs Topological
Unlike DFS/BFS, topological sort sets a requirement
on the order of the components (i.e. almost like a pre-
requisite or pre-condition)
Examples of applications
Scheduling/timetabling
Code compilation (i.e. dependency checking)
SORTING ALGORITHMS
•Insertion Sort
INSERTION SORT
Assume that we have distinct elements as input:
6 3 4 2 5 1
1 2 3 4 5 6
We want to obtain output with all n elements sorted in
order
INSERTION SORT: PSEUDOCODE
Maintain a growing sorted list. For each element, put
it into its correct position in the growing list
InsertionSort(A):
for i in range(1, len(A)):
cur_value = A[i]
j = i - 1
while j >= 0 and A[j] >
cur_value:
A[j+1] = A[j]
j -= 1
A[j+1] = cur_value
In the while loop:
0 1 2 3 4 j >=0, A[0] > 2
3 2 5 1 4 23 3
2 A[1] = A[0]
j= j-1, i.e. 0-1=-1
Initial cur_value = A[1], i.e. 2 A[j+1] = 2
j = 1-1 = 0 What is the
Values swap
next value?
Example of insertion sort
8 2 4 9 3 6
L1.67
Example of insertion sort
8 2 4 9 3 6
L1.68
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
L1.69
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
L1.70
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
L1.71
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
L1.72
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
L1.73
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
L1.74
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6
L1.75
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6
L1.76
Example of insertion sort
8 2 4 9 3 6
2 8 4 9 3 6
2 4 8 9 3 6
2 4 8 9 3 6
2 3 4 8 9 6
2 3 4 6 8 9 done
L1.77
INSERTION SORT
Algorithm designer always start the algorithm with
simply sorting elements to test that it works
But this is not enough
We need to prove that it works for any possible input
lists
Insertion sort involves loop, so it is iterative algorithm.
What does each iteration promise?
By the end of each iteration i, the first i+1 elements
in the array are sorted.
We can apply Proof by Induction to show the
algorithm is working as it is
INSERTION SORT: INDUCTION PROOF
We assume the sorting is in ascending order
Inductive hypothesis (IH):
After each i iteration of the outer for loop, A[i+1] is
sorted
Base case: after 0 iteration of the outer for loop, the
list A[1] is sorted (only 1 element). Thus, inductive
hypothesis holds for i = 0
INSERTION SORT: INDUCTION STEP
Let k be an integer, where 0 < k < n.
Assume that the IH holds for i = k-1, so A[:k] is sorted after the (k-1)th iteration.
We want to show that the IH holds for i = k, i.e. that A[:k+1] is sorted after the kth iteration.
Let j* be the largest position in {0, …, k-1} such that A[j*] < A[k]. Then, the effect of the inner while-
loop is to turn:
[ A[0], A[1], …, A[j*], …, A[k-1], A[k] ] into [ A[0], A[1], …, A[j*], A[k], A[j*+1] …, A[k-1] ]
We claim that the second list on the right is sorted.
This is because A[k] > A[j*], and by the inductive hypothesis,
we have A[j*] ≥ A[j] for all j ≤ j*, so A[k] is larger than everything positioned before it. Similarly, we
also know that A[k] ≤ A[j*+1] ≤ A[j] for all j ≥ j*+1, so A[k] is also smaller than everything that comes
after it.
Thus, A[k] is in the right place, and all the other elements in A[:k+1] were already in the right place.
Thus, after the kth iteration completes, A[:k+1] is sorted, and this establishes the IH for k.
CONCLUSION
By induction, we conclude that the IH holds for all 0 ≤ i ≤ n-1. In particular, after the algorithm ends, A[:n] is sorted.
INSERTION SORT
Now the induction proof that this design is workable.
Next, we want to know, is it fast?
To find out this, we need to analyse:
How many iterations take place?
How much work happens within each iteration?
InsertionSort(A):
for i in range(1, len(A)): Till len(A), so maximum till n
cur_value = A[i] iteration
j = i - 1
while j >= 0 and A[j] >
cur_value:
A[j+1] = A[j] The while loop, maximum
j -= 1 will run n iteration (consider
A[j+1] = cur_value the worst case)
Thus, n * n, we obtain O(n2)
Can we perform better and faster?
The time complexity for selection sort is O(n2).
Designers will improve the time of sorting by using
different kind of design technique
E.g.:
Divide and conquer
Dynamic programming
etc