KEMBAR78
Lec03 SearchSort - 2023 | PDF | Vertex (Graph Theory) | Applied Mathematics
0% found this document useful (0 votes)
20 views82 pages

Lec03 SearchSort - 2023

The document discusses the computational complexity of searching and sorting algorithms, emphasizing the importance of graph theory in understanding these algorithms. It compares adjacency matrices and lists, detailing their spatial complexities in relation to sparse and dense graphs. Additionally, it covers depth-first and breadth-first search algorithms, their properties, and time complexities, while also providing insights into graph traversal and connectivity.

Uploaded by

yijoebackup
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views82 pages

Lec03 SearchSort - 2023

The document discusses the computational complexity of searching and sorting algorithms, emphasizing the importance of graph theory in understanding these algorithms. It compares adjacency matrices and lists, detailing their spatial complexities in relation to sparse and dense graphs. Additionally, it covers depth-first and breadth-first search algorithms, their properties, and time complexities, while also providing insights into graph traversal and connectivity.

Uploaded by

yijoebackup
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

LECTURE 03

COMPUTATIONAL COMPLEXITY OF
SEARCHING AND SORTING
ALGORITHMS
MUST KNOW GRAPH THEORY

 Graph theory (GT) is closely related to searching and


sorting algorithm
 GT provides a useful framework to understand and
analyse the behaviour of both searching and sorting
algorithms
 If you can’t remember any, please review back Graph
Theory in Data Structure and Algorithms
 Assumption:
 You know about edge, nodes
 You know about directed/undirected/weighted graphs
 You know about adjacency list/matrix
 You know about tree, spanning tree, forests, sub graphs
Adjacency Matrix vs. Adjacency List

Compare Adjacency Matrix Adjacency List

Speed Faster when algorithm Faster when algorithm


is o know the direct processes linked list
connection between from node to node
two specific vertices

Spatial complexity: ( n2 ) ( n )
effects on memory
space usage as the
problem size is varied
Vertex Infor array ( n ) ( n )
Adjacency Matrix vs. Adjacency List
(cont.)

 Both of the spatial complexities for the adjacency


matrix and the adjacency list absorb the ( n ) spatial
complexity used in the vertex info array
 The spatial complexity of the adjacency list really
depends on whether the graph is sparse or dense
Adjacency Matrix vs. Adjacency List
(cont.)

 A sparse graph:
 few edges does not have that many edges relative
to the number of vertices
 if the number of edges is less than or equal to n,
then the spatial complexity of the adjacency list is
( n )
 In a dense graph:
 There may be close to n2 edges (no of edges close
to the maximal no of edges)
 The spatial complexity of the adjacency list is ( n2
), the same as that for the adjacency matrix
Dense vs Sparse Graphs
Dense vs Sparse Graphs
HOW IMPORTANT IS SEARCHING and
SORTING?

• Most common algorithms use in solving problem


• Searching:
• For e.g., in deep learning, searching an optimal objective
function within short time is important
• Mining outliers among data input
• Sorting:
• For e.g., search the items to sort
SEARCHING ALGORITHMS

•Depth-First Search | Breadth-First Search |


Terminology
 End vertices (or endpoints) of
an edge
 U and V are the endpoints of
edge a
 Edges incident (endpoint) on a
vertex
 a, d, and b are incidents on
V V
a b
 Adjacent vertices: connected by h j
an edge
 U and V are adjacent U d X Z
 Degree of a vertex c e i
 X has degree 5 W g
 Parallel edges
 h and i are parallel edges f
Y
 Self-loop
 j is a self-loop
Properties

 Notation
 n = number of vertices
 m = number of edges
 deg(v) = degree of
vertex v

Example
v
n = 4
m = 6
 deg(v) = 3
Properties

 Handshaking Theorem - Property 1


 Sv deg(v) = 2m
 Proof: each edge is counted twice to the total degree
 Property 2
 In an undirected graph with no self-loops and no
multiple edges
 m  n (n - 1)/2
 Proof: each vertex has degree at most (n - 1)
Properties

 Directed graph
 Indegree, deg-(v)
 Outdegree, deg+(v)
 Theorem:
 G = (V, E) be a graph with directed edges

 Proof: The first sum counts the number of outgoing
edges over all vertices and the second sum counts
the number of incoming edges over all vertices. It
follows that both sums equal the number of edges in
the graph.
Asymptotic Performance

n vertices, m edges
no parallel edges Adjacency Adjacency
no self-loops List Matrix
Bounds are “big-Oh”
Space n+m n2
incidentEdges(v) deg(v) n
areAdjacent (v, w) min(deg(v), deg(w)) 1
insertVertex(o) 1 n2
insertEdge(v, w, o) 1 1
removeVertex(v) deg(v) n2
removeEdge(e) 1 1
Exercise

 Given the graph below, identify the following


properties
 Degree of vertex 10
 Adjacent vertices to 6
 Incidents of 9
Subgraphs

 A subgraph S of a graph G is a graph such that


 The vertices of S are a subset of the vertices of G
 The edges of S are a subset of the edges of G
 A spanning subgraph of G is a subgraph that contains
all the vertices of G

Spanning subgraph
Subgraph
Connectivity

 A graph is connected if there is a path between every


pair of vertices
 A connected component of a graph G is a maximal
connected subgraph of G

Connected graph Non connected graph with two


connected components
Graph

 To search connectivity between two vertices


 To travel from one node to the destination node
 How to efficiently perform that?

Graph Traversal

Depth First Search Breadth First Search

Stack Explore in depth Queue Explore level by


level
Graph

 To search connectivity between two vertices


 To travel from one node to the destination node
 How to efficiently perform that?

Application

DFS BFS

Find path between two vertices Determine shortest path & minimum
spanning tree
Detect cycles in a graph Used by search engine crawlers to
build indexes of web page
For topological sorting Find available neighbour nodes in
peer to peer network
Solve puzzle having only one Search on social networks
solution such as maze
Graph Traversal 1: Depth-First Search

 Depth-first search
 Algorithm
 Example
 Properties
 Analysis
Depth-First Search

 A DFS traversal of graph G


 Visits all the vertices and edges of G
 Determines whether G is connected
 Computes the connected components of G
 Computes a spanning forest of G
 DFS on a graph with n vertices and m edges takes O(n + m)
time
 DFS can be further extended to solve other graph problems
 Find and report a path between two given vertices
 Find a cycle in the graph
 Depth-first search is to graphs what Euler tour (travel each
edge exactly once; start and end at similar vertex) is to binary
trees
Recursive DFS Algorithm A
a b
c
B C
 The algorithm uses a mechanism for setting and
getting “labels” of vertices and edges
Algorithm DFS(G, v)
Algorithm DFS(G) Input graph G and a start vertex v of G
Input graph G Output labeling of the edges of G
Output labeling of the edges of G in the connected component of v
as discovery edges and as discovery edges and back edges
back edges setLabel(v, VISITED) Deg(v)
for all u  G.vertices() for all e  G.incidentEdges(v)
O(n) setLabel(u, UNEXPLORED) if getLabel(e) = UNEXPLORED
for all e  G.edges() w  opposite(v,e)
O(m) setLabel(e, UNEXPLORED) if getLabel(w) = UNEXPLORED
for all v  G.vertices() setLabel(e, DISCOVERY)
if getLabel(v) = UNEXPLORED DFS(G, w)
DFS(G, v) else
Here is where DFS search for neighbour nodes. Time setLabel(e, BACK)
complexity depends on whether matrix or list is applied
Example

A
A unexplored vertex
A visited vertex
B D E
unexplored edge
discovery edge C
back edge

A A

B D E B D E

C C
Example (cont.)

A A

B D E B D E

C C

A A

B D E B D E

C C
DFS and Maze Traversal

intersection

Dead end

Corner

 The DFS algorithm is similar to a classic strategy for


exploring a maze
 We mark each intersection, corner and dead end
(vertex) visited and each corridor (edge) traversed
 We keep track of the path back to the entrance (start
vertex) by means of a rope (recursion stack)
Properties of DFS

 Property 1
 DFS(G, v) visits all the
vertices and edges in
the connected A
component of v
 Property 2
B D E
 The discovery edges
labeled by DFS(G, v)
form a spanning tree of
C
the connected
component of v
Analysis of DFS (Adjacency List)

 Setting/getting a vertex/edge label takes O(1) time


 Each vertex V is labeled twice
 once as UNEXPLORED
 once as VISITED
 Each edge E is labeled twice
 once as UNEXPLORED
 once as DISCOVERY or BACK
 For directed graph
 Discover neighbors V and E once in linear time, thus O(V) +
O(E) = O(V+E)
 For undirected graph
 Same like directed graph except E will appears twice. In
general O(V) + O(2E) ~ O(V+E)
Analysis of DFS (Adjacency List)

 Method incidentEdges is called once for each vertex


 DFS runs in O(v + e) time provided the graph is
represented by the adjacency list structure
 Recall that Sv deg(v) = 2m
 For space, stack is used to keep track of last visited
nodes. Worst case, the size of stack is = number of
nodes, i.e. O(V)
If it is a matrix, need two for loop (visit
row columns), time complexity will be
O(v^2)
What really worries us about DFS?

Asymptotic Time Complexity (Adjacency List):


O(v + e) – Linear growth with v and e, no issue.
O(bd) where d is the depth– Exponential growth with d, terrible.

Asymptotic Space Complexity (Adjacency List):


O(v + e) – Linear growth with v, no issue.
O(bd) – Linear growth with d, no issue.

Branching factor: the out Let v: Number of Nodes


degree or number of child
nodes. For e.g., a binary
e: Number of Edges
tree is having a branching b: Branching Factor
factor of 2. d: Maximum Depth
What really worries us about DFS?

Asymptotic Time Complexity (Adjacency List):


O(v + e) – Linear growth with v and e, no issue.
O(bd) where d is the depth– Exponential growth with d, terrible.

Asymptotic Space Complexity (Adjacency List):


O(v + e) – Linear growth with v, no issue.
O(bd) – Linear growth with d, no issue.

Adjacency List
Index Parent Node
0 A B C

1 B D E

2 C F G
Iterative DFS Algorithm
S is holding all adjacent
edges, the while loop
must visit each edge and
vertex till empty. Thus,
causing the time
complexity as O( v + e)

If there is a
neighbouring edge
Source from: https://en.wikipedia.org/wiki/Depth-first_search
DFS
 Time complexity
 Assume worst case: 1 goal leaf at the right hand site
 DFS will expand all nodes: b(d) : 1+b+b2, with more depth,
DFS needs more time to expand
 In term of space,
 At depth l < d(max) we have b-1, thus we have (d-1)*(b-1)
nodes at max. Depth d we have b nodes
 Total (d-1)*(b-1)+b=O(bd)=O(be)

d=0

d=1

d=2
DFS

Node visited Stack (bottom to top) Expanded Nodes Frontier (the initial
(6); tested nodes: state)
A {A}
7
A (not goal) {A,B} - {A}
B (not goal) {A,B,D,E} A (not goal) {A,B}
E (not goal) {A,B,D,E} B (not goal) {A,B,E,F}
backtrack {A,B,D} F (not goal) {A,B,E,F}
D (not goal) {A,B,D} E (not goal) {A,B,E}
backtrack {A,B} C (not goal) {A,C}
d=0
backtrack {A} H (not goal) {A,C,G,H}
C (not goal) {A,C} d=1
G (goal) {A,C,G}
F (not goal) {A,C,G,H}
backtrack {A,C,G}
G (goal is the z) {A,C,G}
Graph Traversal: Breadth-First Search

 Breadth-first search
 Algorithm
 Example
 Properties
 Analysis

L0
A

L1
B C D

L2
E F
Breadth-First Search

 BFS traversal of graph G:  BFS on a graph with n


 Visits all the vertices vertices and m edges
takes O(n + m) time
and edges of G
 Determines whether G  BFS can be further
is connected extended to solve other
graph problems
 Computes the  Find and report a path
connected components with the minimum
of G number of edges
 Computes a spanning between two given
forest of G vertices
 Find a simple cycle, if
there is one
BFS Algorithm

 The algorithm uses a Algorithm BFS(G, s)


let q be a new queue O(1)
mechanism for setting for each node v in G:
and getting “labels” of dist[v] = ∞ O(n)
vertices and edges dist[s] = 
enqueue(s, q) O(1)
Algorithm BFS(G)
Input graph G while q is not empty
Output labeling of the edges let v = dequeue(q)
and partition of the for each neighbor u of v:
vertices of G if dist[u] = ∞
for all u  G.vertices() dist[u] = dist[v] + 1
O(n)
setLabel(u, UNEXPLORED) enqueue(u, q)
for all e  G.edges()
O(m) setLabel(e, UNEXPLORED)
for all v  G.vertices() If using adjacency matrix,
it is O(n2)
if getLabel(v) = UNEXPLORED
If using adjacency list,
BFS(G, v)
It is O(n+m)
Example
L0
A
A unexplored vertex
A visited vertex L1
B C D
unexplored edge
discovery edge E F
cross edge

L0 L0
A A

L1 L1
B C D B C D

E F E F
Example (cont.)
L0 L0
A A

L1 L1
B C D B C D

L2
E F E F

L0 L0
A A

L1 L1
B C D B C D

L2 L2
E F E F
Example (cont.)
L0 L0
A A

L1 L1
B C D B C D

L2 L2
E F E F

L0
A

L1
B C D

L2
E F
Properties
 Notation A
 Gs: connected component of s
 Property 1
B C D
 BFS(G, s) visits all the vertices
and edges of Gs
 Property 2 E F
 The discovery edges labeled by
BFS(G, s) form a spanning tree L0
Ts of Gs A

 Property 3 L1
 For each vertex v in Li B C D
 The path of Ts from s to v
has i edges L2
E F
 Every path from s to v in Gs
has at least i edges
Analysis

 Same as DFS
 However, in term of data structure, BFS is using
queue to keep track of the visited nodes. The space
complexity is O(V) since the visited nodes are kept
with worst case is up to the total number of vertices.
What really worries us about BFS?

Asymptotic Time Complexity (Adjacency List):


O(n + m) – Linear growth with n and m, no issue.
O(bd) – Exponential growth with d, terrible.

Asymptotic Space Complexity (Adjacency List):


O(n + m) – Linear growth with n and m, no issue.
O(bd) – Linear growth with d, no issue.

Let n: Number of Nodes


m: Number of Edges
b: Branching Factor
d: Maximum Depth
Applications

 Using the template method pattern, we can specialize


the BFS traversal of a graph G to solve the following
problems in O(n + m) time
 Compute the connected components of G
 Compute a spanning forest of G
 Find a simple cycle in G, or report that G is a forest
 Given two vertices of G, find a path in G between
them with the minimum number of edges, or report
that no such path exists
Exercise

 Perform a DFS and BFS traversal on the following


graph. Start the traversal from node (A) and draw the
resulting tree
DFS vs. BFS
DFS BFS
Can be implemented with Can be implemented with
stacks (LIFO) queues (FIFO)
Two stage operation Single stage operation
Less memory compared to More memory compared with
BFS DFS
Identifies back edges Identifies cross edges
( n2 ) efficiency in adjacency matrix
O( n + e ) efficiency in adjacency list
Useful in cycle detection Useful in shortest path
identification
Useful in finding connectivity, spanning tree/forest, path
identification
DFS vs. BFS (cont.)

 Back edge (v,w)  Cross edge (v,w)


 w is an ancestor of v in  w is in the same level
the tree of discovery as v or in the next level
edges in the tree of discovery
edges

L0
A A

L1
B C D B C D

L2
E F E F

DFS BFS
Application of DFS: Topological sorting

 Topological sorting
 Algorithm
 Example
 Properties
 Analysis
DAGs and Topological Sorting
 A directed acyclic graph (DAG) is a
digraph that has no directed cycles D E
 A topological sorting/ordering of a
digraph is a numbering B
v1 , …, vn C
of the vertices such that for every
edge (vi, vj), we have i < j A DAG G
 Example: in a task scheduling v4 v5
digraph, a topological sorting a task
sequence that satisfies the D E
precedence constraints v2
Theorem B
 A digraph admits a topological
v3
sorting if and only if it is a DAG v1 C
Topological
A ordering of G
Topological Sorting

 Number vertices, so that (u,v) in E implies u < v


1 A typical student day
wake up
2 3
eat
study computer sci.

4 5
nap more c.s.
7
play
8
write c.s. program 6
9 work out
make cookies
for professors
10
sleep 11
dream about graphs
Topological Sorting Algorithm 1: Using DFS

 Simulate the algorithm by using depth-first search


Algorithm topologicalDFS(G, v)
 O(n+m) time.
Input graph G and a start vertex v of G
Algorithm topologicalDFS(G) Output labeling of the vertices of G
Input dag G in the connected component of v
Output topological ordering of G setLabel(v, VISITED)
n  G.numVertices() for all e  G.incidentEdges(v)
for all u  G.vertices() if getLabel(e) = UNEXPLORED
setLabel(u, UNEXPLORED) w  opposite(v,e)
for all e  G.edges() if getLabel(w) = UNEXPLORED
setLabel(e, UNEXPLORED) setLabel(e, DISCOVERY)
for all v  G.vertices() topologicalDFS(G, w)
if getLabel(v) = UNEXPLORED else
topologicalDFS(G, v) {e is a back or cross edge}
Label v with topological number n
nn-1
Topological Sorting Algorithm 2

Note: This algorithm is different than the one


in Goodrich-Tamassia (prev slide)
Algorithm TopologicalSort(G)
HG // Temporary copy of G
n  G.numVertices()
while H is not empty do
Let v be a vertex with no outgoing edges
Label v  n
nn-1
Remove v from H

Running time: O(n + m).


Topological Sorting: Using DFS Example
(Many Possible Solutions)

2
1

3
4
6 5

7
8

9
Topological Sorting 2: Using DFS
Example (Start from Top Vertex)

2
1

3
4
6 5

7
8

 9
Topological Sorting 2: Using DFS
Example

2
1

3
4
6 5

7
8

 8 9
Topological Sorting 2: Using DFS
Example

2
1

3
4
6 5

7
8

 7 8 9
Topological Sorting 2: Using DFS
Example

2
1

3
4
6 5

7
8

 6 7 8 9
Topological Sorting 2: Using DFS
Example

2
1

3
4
6 5

7
8

 5 6 7 8 9
Topological Sorting 2: Using DFS
Example

2
1

3
4
6 5

7
8

 4 5 6 7 8 9
Topological Sorting 2: Using DFS
Example

2
1

3
4
6 5

7
8

 3 4 5 6 7 8 9
Topological Sorting 2: Using DFS
Example

2
1

3
4
6 5

7
8

 2 3 4 5 6 7 8 9
Topological Sorting 2: Using DFS
Example

2
1

3
4
6 5

7
8

 1 2 3 4 5 6 7 8 9
Exercise

 Perform a topological sort for the following digraph


DFS/BFS vs Topological

 Unlike DFS/BFS, topological sort sets a requirement


on the order of the components (i.e. almost like a pre-
requisite or pre-condition)
 Examples of applications
 Scheduling/timetabling
 Code compilation (i.e. dependency checking)
SORTING ALGORITHMS

•Insertion Sort
INSERTION SORT

 Assume that we have distinct elements as input:


6 3 4 2 5 1

1 2 3 4 5 6

 We want to obtain output with all n elements sorted in


order
INSERTION SORT: PSEUDOCODE

 Maintain a growing sorted list. For each element, put


it into its correct position in the growing list
InsertionSort(A):
for i in range(1, len(A)):
cur_value = A[i]
j = i - 1
while j >= 0 and A[j] >
cur_value:
A[j+1] = A[j]
j -= 1
A[j+1] = cur_value
In the while loop:
0 1 2 3 4 j >=0, A[0] > 2
3 2 5 1 4 23 3
2 A[1] = A[0]
j= j-1, i.e. 0-1=-1
Initial cur_value = A[1], i.e. 2 A[j+1] = 2
j = 1-1 = 0 What is the
Values swap
next value?
Example of insertion sort
8 2 4 9 3 6

L1.67
Example of insertion sort
8 2 4 9 3 6

L1.68
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

L1.69
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

L1.70
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

2 4 8 9 3 6

L1.71
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

2 4 8 9 3 6

L1.72
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

2 4 8 9 3 6

2 4 8 9 3 6

L1.73
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

2 4 8 9 3 6

2 4 8 9 3 6

L1.74
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

2 4 8 9 3 6

2 4 8 9 3 6

2 3 4 8 9 6

L1.75
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

2 4 8 9 3 6

2 4 8 9 3 6

2 3 4 8 9 6

L1.76
Example of insertion sort
8 2 4 9 3 6

2 8 4 9 3 6

2 4 8 9 3 6

2 4 8 9 3 6

2 3 4 8 9 6

2 3 4 6 8 9 done

L1.77
INSERTION SORT

 Algorithm designer always start the algorithm with


simply sorting elements to test that it works
 But this is not enough
 We need to prove that it works for any possible input
lists
 Insertion sort involves loop, so it is iterative algorithm.
What does each iteration promise?
 By the end of each iteration i, the first i+1 elements
in the array are sorted.
 We can apply Proof by Induction to show the
algorithm is working as it is
INSERTION SORT: INDUCTION PROOF

 We assume the sorting is in ascending order


 Inductive hypothesis (IH):
 After each i iteration of the outer for loop, A[i+1] is
sorted
 Base case: after 0 iteration of the outer for loop, the
list A[1] is sorted (only 1 element). Thus, inductive
hypothesis holds for i = 0

INSERTION SORT: INDUCTION STEP
Let k be an integer, where 0 < k < n.

Assume that the IH holds for i = k-1, so A[:k] is sorted after the (k-1)th iteration.

We want to show that the IH holds for i = k, i.e. that A[:k+1] is sorted after the kth iteration.
Let j* be the largest position in {0, …, k-1} such that A[j*] < A[k]. Then, the effect of the inner while-
loop is to turn:
[ A[0], A[1], …, A[j*], …, A[k-1], A[k] ] into [ A[0], A[1], …, A[j*], A[k], A[j*+1] …, A[k-1] ]

We claim that the second list on the right is sorted.


This is because A[k] > A[j*], and by the inductive hypothesis,
we have A[j*] ≥ A[j] for all j ≤ j*, so A[k] is larger than everything positioned before it. Similarly, we
also know that A[k] ≤ A[j*+1] ≤ A[j] for all j ≥ j*+1, so A[k] is also smaller than everything that comes
after it.
Thus, A[k] is in the right place, and all the other elements in A[:k+1] were already in the right place.
Thus, after the kth iteration completes, A[:k+1] is sorted, and this establishes the IH for k.

CONCLUSION
By induction, we conclude that the IH holds for all 0 ≤ i ≤ n-1. In particular, after the algorithm ends, A[:n] is sorted.
INSERTION SORT
 Now the induction proof that this design is workable.
 Next, we want to know, is it fast?
 To find out this, we need to analyse:
 How many iterations take place?
 How much work happens within each iteration?
InsertionSort(A):
for i in range(1, len(A)): Till len(A), so maximum till n
cur_value = A[i] iteration
j = i - 1
while j >= 0 and A[j] >
cur_value:
A[j+1] = A[j] The while loop, maximum
j -= 1 will run n iteration (consider
A[j+1] = cur_value the worst case)

Thus, n * n, we obtain O(n2)


Can we perform better and faster?

 The time complexity for selection sort is O(n2).


 Designers will improve the time of sorting by using
different kind of design technique
 E.g.:
 Divide and conquer
 Dynamic programming
 etc

You might also like