Daa Notes
Daa Notes
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Analysis of Algorithm | Set 5 (Amortized Analysis
Introduction)
Amortized Analysis is used for algorithms where an occasional operation is very slow, but most of the
other operations are faster. In Amortized Analysis, we analyze a sequence of operations and
guarantee a worst case average time which is lower than the worst case time of a particular expensive
operation.
The example data structures whose operations are analyzed using Amortized Analysis are Hash
Tables, Disjoint Sets and Splay Trees.
Let us consider an example of a simple hash table insertions. How do we decide table size? There is a
trade-off between space and time, if we make hash-table size big, search time becomes fast, but
space required becomes high.
If the table has space available, we simply insert new item in available space.
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
UNIT: 3
Disjoint Set Data Structure
In computing, a disjoint-set data structure, also called a union–find data structure or merge– find
set, is a data structure that keeps track of a set of elements partitioned into a number of disjoint (non
overlapping) subsets.
It supports the following useful operations:
• Find: Determine which subset a particular element is in. Find typically returns an item from
this set that serves as its "representative"; by comparing the result of two Find operations,
one can determine whether two elements are in the same subset.
• Union: Join two subsets into a single subset.
• Make Set, which makes a set containing only a given element (a singleton). Applications:
• partitioning of a set
• implementing Kruskal's algorithm to find the minimum spanning tree of a graph.
• Determine the connected components of an undirected graph. Disjoint Set Operations,
Linked list Representation
• A disjoint-set is a collection S={S1, S2,…, Sk} of distinct dynamic sets.
• Each set is identified by a member of the set, called representative. Disjoint set operations
– MAKE-SET(x): create a new set with only x. Assume x is not already in some other
set.
– UNION(x,y): combine the two sets containing x and y into one new set. A new
representative is selected.
– FIND-SET(x): return the representative of the set containing x.
Application: Determine the connected components of an undirected graph.
Linked list Representation
• Each set as a linked-list, with head and tail, and each node contains value, next node pointer
and back-to-representative pointer.
Figure 21.2 (a) Linked-list representations of two sets. Set S1 contains members d, f , and g, with representative f
, and set S2 contains members b, c, e, and h, with representative c. Each object in the list contains a set member, a
pointer to the next object in the list, and a pointer back to the set object. Each set object has pointers head and tail
to the first and last objects, respectively. (b) The result of UNION(g, e), which appends the linked list containing
e to the linked list containing g. The representative of the resulting set is f . The set object for e’s list, S2, is
destroyed.
The total time spent in all UNION operations is thus O(n lg n). Each MAKESET and FIND-SET operation takes O(1)
time, and there are O(m) of them. The total time for the entire sequence is thus O(m + n lg n).
CONNECTED-COMPONENTS(G) 5. then UNION(u,v) SAME-
1. for each vertex v V[G] COMPONENT(u,v)
2. MAKE-SET(v) 1. if FIND-SET(u)=FIND-SET(v)
3. for each edge (u,v) E[G] 2. then return TRUE
4. if FIND-SET(u) ≠ FIND-SET(v) 3. else return FALSE
The procedure CONNECTED-COMPONENTS initially places each vertex in its own set. Then, for each edge
(u,v), it unites the sets containing u and v. By Exercise 21.1-2, after processing all the edges, two vertices are in the
same connected component if and only if the corresponding objects are in the same set. Thus,
CONNECTEDCOMPONENTS computes sets in such a way that the procedure SAME-COMPONENT can
determine whether two vertices are in the same connected component. Figure 21.1(b) illustrates how
CONNECTEDCOMPONENTS compute the disjoint sets.
Disjoint Forests
In a faster implementation of disjoint sets, we represent sets by rooted trees, with each node containing
one member and each tree representing one set. In a disjointset forest, illustrated in Figure 21.4(a), each
member points only to its parent. The root of each tree contains the representative and is its own parent.
As we shall see, although the straightforward algorithms that use this representation are no faster than ones
that use the linked-list representation, by introducing two heuristics—―union by rank‖ and ―path
compression‖—we can achieve an asymptotically optimal disjoint-set data structure.
Union by Rank & Path Compression
• Union by Rank: Each node is associated with a rank, which is the upper bound on the height of
the node (i.e., the height of subtree rooted at the node), then when UNION, let the root with smaller rank
point to the root with larger rank.
• Path Compression: used in FIND-SET(x) operation, make each node in the path from x to the
root directly point to the root. Thus reduce the tree height.
Pseudocode for disjoint-set forests
To implement a disjoint-set forest with the union-by-rank heuristic, we must keep track of ranks. With
each node x, we maintain the integer value x.rank, which is an upper bound on the height of x. When
MAKE-SET creates a singleton set, the single node in the corresponding tree has an initial rank of 0. Each
FIND-SET operation leaves all ranks unchanged. The UNION operation has two cases, depending on
whether the roots of the trees have equal rank. If the roots have unequal rank, we make the root with higher
rank the parent of the root with lower rank, but the ranks themselves remain unchanged. If, instead, the
roots have equal ranks, we arbitrarily choose one of the roots as the parent and increment its rank.
Let us put this method into pseudocode. We designate the parent of node x by x.p. The LINK procedure,
a subroutine called by UNION, takes pointers to two roots as inputs. The FIND-SET procedure is a
twopass method: as it recurses, it makes one pass up the find path to find the root, and as the recursion
unwinds, it makes a second pass back down the find path to update each node to point directly to the root.
Each call of FIND-SET(x) returns x.p in line 3. If x is the root, then FIND-SET skips line 2 and instead
returns x.p, which is x; this is the case in which the recursion bottoms out. Otherwise, line 2 executes, and
the recursive call with parameter x:p returns a pointer to the root. Line 2 updates node x to point directly
to the root, and line 3 returns this pointer
Graph algorithms
A Graph is a non-linear data structure consisting of nodes and edges. The nodes are sometimes also
referred to as vertices and the edges are lines or arcs that connect any two nodes in the graph. More
formally a Graph can be defined as,
A Graph consists of a finite set of vertices(or nodes) and set of Edges which connect a pair of nodes.
Types of graphs
Undirected: An undirected graph is a graph in which all the edges are bi-directional i.e. the edges do not
point in any specific direction.
Directed: A directed graph is a graph in which all the edges are uni-directional i.e. the edges point in a
single direction.
Weighted: In a weighted graph, each edge is assigned a weight or cost.
Cyclic: A graph is cyclic if the graph comprises a path that starts from a vertex and ends at the same vertex.
That path is called a cycle. An acyclic graph is a graph that has no cycle.
Adjacency-list representation of a graph
Example :
The procedure BFS works as follows. With the exception of the source vertex s, lines 1–4 paint every vertex
white, set u.d to be infinity for each vertex u, and set the parent of every vertex to be NIL. Line 5 paints s gray,
since we consider it to be discovered as the procedure begins. Line 6 initializes s.d to 0, and line 7 sets the
predecessor of the source to be NIL.
Lines 8–9 initialize Q to the queue containing just the vertex s. The while loop of lines 10–18 iterates as long as
there remain gray vertices, which are discovered vertices that have not yet had their adjacency lists fully
examined.
Line 11 determines the gray vertex u at the head of the queue Q and removes it from Q. The for loop of lines 12–
17 considers each vertex in the adjacency list of u. If is white, then it has not yet been discovered, and the
procedure discovers it by executing lines 14–17. The procedure paints vertex gray, sets its distance v.d to u.d+1,
records u as its parent v.π, and places it at the tail of the queue Q. Once the procedure has examined all the vertices
on u’s adjacency list, it blackens u in line 18. The loop invariant is maintained because whenever a vertex is
painted gray (in line 14) it is also enqueued (in line 17), and whenever a vertex is dequeued (in line 11) it is also
painted
black (in line 18).
Analysis of BFS
The operations of enqueuing and dequeuing take O(1) time, and so the total time devoted to queue
operations is O(V). Because the procedure scans the adjacency list of each vertex only when the vertex
is dequeued, it scans each adjacency list at most once. The total time spent in scanning adjacency lists is
O(E). The total running time of the BFS procedure is O(V + E).
Example
Figure: The operation of BFS on an undirected graph. Tree edges are shown shaded as they are produced by BFS.
The value of u.d appears within each vertex u. The queue Q is shown at the beginning of each iteration of the while
loop of lines 10–18. Vertex distances appear below vertices in the queue.
Depth-first search
The strategy followed by depth-first search is, as its name implies, to search ―deeper‖ in the graph
whenever possible. Depth-first search explores edges out of the most recently discovered vertex that still
has unexplored edges leaving it. Once all of v’s edges have been explored, the search ―backtracks‖ to
explore edges leaving the vertex from which was discovered. This process continues until we have
discovered all the vertices that are reachable from the original source vertex. If any undiscovered vertices
remain, then depth-first search selects one of them as a new source, and it repeats the search from that
source. The algorithm repeats this entire process until it has discovered every vertex.
Procedure DFS works as follows. Lines 1–3 paint all vertices white and initialize their π attributes to NIL. Line 4
resets the global time counter. Lines 5–7 check each vertex in V in turn and, when a white vertex is found, visit it
using DFS-VISIT. Every time DFS-VISIT(G, u) is called in line 7, vertex u becomes the root of a new tree in the
depth-first forest. When DFS returns, every vertex u has been assigned a discovery time u.d and a finishing time u.f
. In each call DFS-VISIT(G, u), vertex u is initially white. Line 1 increments the global variable time, line 2 records
the new value of time as the discovery time u.d, and line 3 paints u gray. Lines 4–7 examine each vertex v adjacent
to u and recursively visit if it is white. As each vertex v ε Adj[u] is considered in line 4, we say that edge (u, v) is
explored by the depth-first search. Finally, after every edge leaving u has been explored, lines 8–10 paint u black,
increment time, and record the finishing time in u.f .
Analysis of BFS: The total running time of the BFS procedure is O(V + E).
Example:
Figure: The progress of the depth-first-search algorithm DFS on a directed graph. As edges are explored by the
algorithm, they are shown as either shaded (if they are tree edges) or dashed (otherwise). Nontree edges are labeled
B, C, or F according to whether they are back, cross, or forward edges. Timestamps within vertices indicate
discovery time/finishing times.
Minimum Spanning Trees
Given an undirected and connected graph G= (V,E), a spanning tree of the graph G is a tree that spans G
(that is, it includes every vertex of G) and is a subgraph of G(every edge in the tree belongs to G).
The cost of the spanning tree is the sum of the weights of all the edges in the tree. There can be many
spanning trees. Minimum spanning tree is the spanning tree where the cost is minimum among all the
spanning trees. There also can be many minimum spanning trees. Minimum spanning tree has direct
application in the design of networks.
is minimized. Since T is acyclic and connects all of the vertices, it must form a tree, which we call a
spanning tree since it ―spans‖ the graph G. We call the problem of determining the tree T the
minimumspanning-tree problem.
Electronic circuit designs often need to make the pins of several components electrically equivalent by
wiring them together. To interconnect a set of n pins, we can use an arrangement of n-1 wires, each
connecting two pins. Of all such arrangements, the one that uses the least amount of wire is usually the
most desirable.
In this chapter, we shall examine two algorithms for solving the minimumspanning-tree problem:
Kruskal’s algorithm and Prim’s algorithm.
Kruskal’s Algorithm
Kruskal’s Algorithm builds the spanning tree by adding edges one by one into a growing spanning tree.
Kruskal's algorithm follows greedy approach as in each iteration it finds an edge which has least weight
and adds it to the growing spanning tree.
Algorithm Steps:
➢ Sort the graph edges with respect to their weights.
➢ Start adding edges to the MST from the edge with the smallest weight until the edge of the largest
weight.
➢ Only add edges which doesn't form a cycle, edges which connect only disconnected components.
Example:
Figure: The execution of Kruskal’s algorithm on the graph from Shaded edges belong to the forest A being
grown. The algorithm considers each edge in sorted order by weight. An arrow points to the edge under
consideration at each step of the algorithm. If the edge joins two distinct trees in the forest, it is added to
the forest, thereby merging the two trees.
Kruskal’s uses a disjoint-set data structure to maintain several disjoint sets of elements. Each set contains
the vertices in one tree of the current forest. The operation FIND-SET(u) returns a representative element
from the set that contains u. Thus, we can determine whether two vertices u and v belong to the same tree
by testing whether FIND-SET(u) equals FIND-SET(v). To combine trees, Kruskal’s algorithm calls the
UNION procedure.
Lines 1–3 initialize the set A to the empty set and create |V| trees, one containing each vertex. The for loop
in lines 5–8 examines edges in order of weight, from lowest to highest. The loop checks, for each edge(u,
v), whether the endpoints u and v belong to the same tree. If they do, then the edge (u, v) cannot be added
to the forest without creating a cycle, and the edge is discarded. Otherwise, the two vertices belong to
different trees. In this case, line 7 adds the edge (u, v) to A, and line 8 merges the vertices in the two trees.
Time Complexity:
In Kruskal’s algorithm, most time consuming operation is sorting because the total complexity of the
Disjoint-Set operations will be O(E lg V ), which is the overall Time Complexity of the algorithm.
Prim’s Algorithm
Prim’s Algorithm also uses Greedy approach to find the minimum spanning tree. In Prim’s Algorithm we
grow the spanning tree from a starting position. Unlike an edge in Kruskal's, we add vertex to the growing
spanning tree in Prim's.
Algorithm Steps:
1) Maintain two disjoint sets of vertices. One is containing vertices that are in the growing spanning tree
and other that are not in the growing spanning tree.
2) Select the cheapest vertex that is connected to the growing spanning tree and is not in the growing
spanning tree and add it into the growing spanning tree. This can be done using Priority Queues. Insert
the vertices that are connected to growing spanning tree, into the Priority Queue.
3) Check for cycles. To do that, mark the nodes which have been already selected and insert only those
nodes in the Priority Queue that are not
marked.
In the pseudocode, the connected graph
G and the root r of the minimum
spanning tree to be grown are inputs to
the algorithm. During execution of the
algorithm, all vertices that are not in the
tree reside in a min-priority queue Q
based on a key attribute. For each vertex
v, the attribute v.key is the minimum
weight of any edge connecting v to a
vertex in the tree; by convention, v.key
= ∞ if there is no such edge. The attribute
v.π names the parent of v in the tree.
and added to S exactly once, so that the while loop of lines 4–8 iterates exactly |V| times.
Example:
Time Complexity:
Time Complexity of the implementation is O(V2). If the input graph is represented using adjacency list, it
can be reduced to O(E log V) with the help of binary heap.
Note: Dijkstra’s algorithm doesn’t work for graphs with negative weight edges. For graphs with negative
weight edges, Bellman–Ford algorithm can be used.
Bellman–Ford algorithm The Bellman-Ford algorithm
solves the single-source
shortest-paths problem in which
edge weights may be negative.
It returns a boolean value
indicating whether or not there
is a negative-weight cycle that is
reachable from the source. If
there is such a cycle, the
algorithm indicates that no
solution exists. If there is no
such cycle, it produces the
shortest paths and their
weights.
24.1 The Bellman-Ford algorithm 651
BELLMAN-FORD.G;w;s/
Time Complexity O(V.E)
1 INITIALIZE-SINGLE-SOURCE.G;s/
2 for i D 1 to jG:Vj 1
3 for each edge .u;/ 2 G:E
4 RELAX.u;;w/
5 for each edge .u;/ 2 G:E
6 if :d > u:d C w.u;/
7 return FALSE
8 return TRUE
Figure 24.4 shows the execution of the Bellman-Ford algorithm on a graph with
5 vertices. After initializing the d and values of all vertices in line 1, the algorithm
makes jV j 1 passes over the edges of the graph. Each pass is one iteration of the
for loop of lines 2–4 and consists of relaxing each edge of the graph once. Figures
24.4(b)–(e) show the state of the algorithm after each of the four passes over the
edges. After making jV j 1 passes, lines 5–8 check for a negative-weight cycle and
return the appropriate boolean value. (We’ll see a little later why this check works.)
The Bellman-Ford algorithm runs in time O.VE/, since the initialization in line
1 takes ‚.V / time, each of the jV j 1 passes over the edges in lines 2–4 takes ‚.E/
time, and the for loop of lines 5–7 takes O.E/ time.
To prove the correctness of the Bellman-Ford algorithm, we start by showing
that if there are no negative-weight cycles, the algorithm computes correct
shortest-path weights for all vertices reachable from the source.
The Floyd-Warshall algorithm
The problem is to find shortest distances between every pair of vertices in a given edge weighted
directed Graph.
We initialize the solution matrix same as the input graph matrix as a first step. Then we update the
solution matrix by considering all vertices as an intermediate vertex. The idea is to one by one pick
all vertices and updates all shortest paths which include the picked vertex as an intermediate vertex
in the shortest path. When we pick vertex number k as an intermediate vertex, we already have
considered vertices {0, 1, 2, ... k-1} as intermediate vertices. For every pair (i, j) of the source and
destination vertices respectively, there are two possible cases. 1) k is not an intermediate vertex in
shortest path from i to j. We keep the value of dist[i][j] as it is.
2) k is an intermediate vertex in shortest path from i to j. We update the value of dist[i][j] as dist[i][k]
+ dist[k][j] if dist[i][j] > dist[i][k] + dist[k][j]
The following figure shows the above optimal substructure property in the all-pairs shortest path
problem.
Example 1
5 for i D 1 to n
6 for j D 1 to n
min
7 dij.k/ D dij.k1/;dik.k1/ C dkj.k1/
8 return D.n/
Figure 25.4 shows the matrices D.k/ computed by the Floyd-Warshall algorithm for
the graph in Figure 25.1.
The running time of the Floyd-Warshall algorithm is determined by the triply
nested for loops of lines 3–7. Because each execution of line 7 takes O.1/ time,
the algorithm runs in time ‚.n3/. As in the final algorithm in Section 25.1, the code
is tight, with no elaborate data structures, and so the constant hidden in the ‚-
notation is small. Thus, the Floyd-Warshall algorithm is quite practical for even
moderate-sized input graphs.
NIL 1 1 NIL 1 ˘
0 3 814
4 NIL 4 NIL
NIL NIL NIL NIL 5
1011 7
NIL
03 8 44 ˘ NIL 1 1 2 1 ˘
03 8 44 ˘ NIL 1 1 2 1 ˘
0 31 4 4NIL 1 4 2 12150
2 ˘4 3 4 1 ˘ 3 0 4 1 14
NIL NIL 4 2 1 D.4/ D7 4 0
5 3 ….4/ D4 3 NIL 2
1
8 5 1 6 04 3 4 5 NIL
0 13 2 4NIL 3 4 5 12150
2 ˘4 3 4 1 ˘ 3 0 4 1 14
NIL NIL 4 2 1 D.5/ D7 4 0
5 3 ….5/ D4 3 NIL 2
1
8 5 1 6 04 3 4 5 NIL
Figure 25.4 The sequence of matrices D.k/ and ….k/ computed by the Floyd-Warshall algorithm for
the graph in Figure 25.1.
UNIT: 4
In the above figure ->Bounding Function: No girl sit in the middle of two boys
Types of Branch and Bound
1. FIFO Branch and Bound Search: This leads to a breadth-first search.
For this we will use a data structure called Queue. Initially Queue is empty.
2. LIFO Branch and Bound Search: This leads to a depth-first search For
this we will use a data structure called stack. Initially stack is empty.
3. LC (lowest/least cost) Branch and Bound Search: the branch is extended by the node which
adds the lowest additional costs, according to a given cost function.
N-Queens Problem
Nauck made an 8X8 Chessboard to find the first Feasible Solution. In a NxN square board N –
number of queens need to be placed considering three Condition ---
• No two Queens can be placed in same row, same column, and same diagonal. Time
Complexity of N-Queen Problem is O (n!)
Backtracking Algorithm
The idea is to place queens one by one in different columns, starting from the leftmost column.
When we place a queen in a column, we check for clashes with already placed queens. In the
current column, if we find a row for which there is no clash, we mark this row and column as part
of the solution. If we do not find such a row due to clashes then we backtrack and return false.
1) Start in the leftmost column
2) If all queens are placed return true
3) Try all rows in the current column. Do following for every tried row.
a) If the queen can be placed safely in this row then mark this [row, column] as part of
the solution and recursively check if placing queen here leads to a solution. b) If
placing queen in [row, column] leads to a solution then return true.
c) If placing queen doesn't lead to a solution then unmark this [row, column] (Backtrack) and
go to step (a) to try other rows.
3) If all rows have been tried and nothing worked, return false to trigger backtracking.
Example: 2
W = {4, 5, 6, 3} and M = 13
Recursive backtracking algorithm for Sum of Subset Problem
32 String Matching
text T abcabaabcaba c
s=3
pattern P abaa
Figure 32.1 An example of the string-matching problem, where we want to find all
occurrences of the pattern P Dabaa in the text T Dabcabaabcabac. The pattern occurs only
once in the text, at shift s D 3, which we call a valid shift. A vertical line connects each character of
the pattern to its matching character in the text, and all matched characters are shaded.
988 Chapter 32 String Matching
The naive algorithm finds all valid shifts using a loop that checks the condition
PŒ1::m D TŒs C 1::s C m for each of the n m C 1 possible values of s.
NAIVE-STRING-MATCHER.T;P/
1 n D T:length
2 m D P:length
3 for s D 0 to n m
4 if PŒ1::m == TŒs C 1::s C m
5 print “Pattern occurs with shift” s
a c a a b c a c a a b c a c a a b c a c a a b c
txt[] = "AAAAAAAAAAAAAAAAAB"
pat[] = "AAAAB"
txt[] = "ABABABCABABABCABABABC"
pat[] = "ABABAC" (not a worst case, but a bad case for Naive)
The KMP matching algorithm uses degenerating property (pattern having same sub-patterns appearing more
than once in the pattern) of the pattern and improves the worst case complexity to O(n). The basic idea
behind KMP’s algorithm is: whenever we detect a mismatch (after some matches), we already know some of
the characters in the text of the next window. We take advantage of this information to avoid matching the
characters that we know will anyway match. Let us consider below example to understand this.
Matching Overview
txt = "AAAAABAAABA"
pat = "AAAA"
Need of Preprocessing?
An important question arises from the above explanation,
how to know how many characters to be skipped. To know this,
we pre-process pattern and prepare an integer array
lps[] that tells us the count of characters to be skipped.
Preprocessing Overview:
KMP algorithm preprocesses pat[] and constructs an auxiliary lps[] of size m (same as size of pattern)
which is used to skip characters while matching.
name lps indicates longest proper prefix which is also suffix. . A proper prefix is prefix with whole
string not allowed. For example, prefixes of “ABC” are “”, “A”, “AB” and “ABC”. Proper prefixes are “”, “A”
and “AB”. Suffixes of the string are “”, “C”, “BC” and “ABC”.
We search for lps in sub-patterns. More clearly we focus on sub-strings of patterns that are either prefix
and suffix.
For each sub-pattern pat[0..i] where i = 0 to m-1, lps[i] stores length of the maximum matching proper
prefix which is also a suffix of the sub-pattern pat[0..i].
lps[i] = the longest proper prefix of pat[0..i]
which is also a suffix of pat[0..i].
Note : lps[i] could also be defined as longest prefix which is also proper suffix. We need to use properly at one
place to make sure that the whole substring is not considered.
Searching Algorithm:
Unlike Naive algorithm , where we slide the pattern by one and compare all characters at each shift, we use a
value from lps[] to decide the next characters to be matched. The idea is to not match a character that we
know will anyway match.
How to use lps[] to decide next positions (or to know a number of characters to be skipped)?
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
lps[] = {0, 1, 2, 3}
i = 0, j = 0
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] match, do i++, j++
i = 1, j = 1
txt[] = "AAAAABAAABA"
pat[] = "AAAA"
txt[i] and pat[j] match, do i++, j++
i = 2, j = 2 txt[]
= "AAAAABAAABA"
pat[] = "AAAA"
pat[i] and pat[j] match, do i++,
j++ i = 3, j = 3 txt[] =
"AAAAABAAABA" pat[] = "AAAA"
txt[i] and pat[j] match, do i++,
j++ i = 4, j = 4
Since j == M, print pattern found and reset j,
j = lps[j-1] = lps[3] = 3
The Rabin-Karp algorithm. Each character is a decimal digit, and we compute values modulo 13.
(a) A text string. A window of length 5 is shown shaded. The numerical value of the shaded
number, computed modulo 13, yields the value 7. (b) The same text string with values computed
modulo 13 for each possible position of a length-5 window. Assuming the pattern P = 31415, we
look for windows whose value modulo 13 is 7, since 31415 7 (mod 13). The algorithm finds two
such windows, shown shaded in the figure. The first, beginning at text position 7, is indeed an
occurrence of the pattern, while the second, beginning at text position 13, is a spurious hit.
Time complexity:
The time complexity is O(m+n)[Non Spurious hit], but for the worst case, it is O(mn)
[Spurious hit]
Can all computational problems be solved by a computer? There are computational problems that
can not be solved by algorithms even with unlimited time. For example Turing Halting problem (Given
a program and an input, whether the program will eventually halt when run with that input, or will
run forever). Alan Turing proved that general algorithm to solve the halting problem for all possible
program-input pairs cannot exist. A key part of the proof is, Turing machine was used as a
mathematical de nition of a computer and program (Source Halting Problem).
Status of NP Complete problems is another failure story, NP complete problems are problems whose
status is unknown. No polynomial time algorithm has yet been discovered for any NP complete
problem, nor has anybody yet been able to prove that no polynomialtime algorithm exist for any of
them. The interesting part is, if any one of the NP complete problems can be solved in polynomial
time, then all of them can be solved.
NP-complete problems are the hardest problems in NP set. A decision problem L is NP-complete if: 1) L is in NP
(Any given solution for NP-complete problems can be veri ed quickly, but there is no e cient known
solution).
2) Every problem in NP is reducible to L in polynomial time (Reduction is de ned below).
Learning reduction in general is very important. For example, if we have library functions to solve
certain problem and if we can reduce a new problem to one of the solved problems, we save a lot of
time. Consider the example of a problem where we have to nd minimum product path in a given
directed graph where product of path is multiplication of weights of edges along the path. If we have
code for Dijkstra’s algorithm to nd shortest path, we can take log of all weights and use Dijkstra’s
algorithm to nd the minimum product path rather than writing a fresh code for this new problem.
From the de nition of NP-complete, it appears impossible to prove that a problem L is NPComplete.
By de nition, it requires us to that show every problem in NP is polynomial time reducible to L.
Fortunately, there is an alternate way to prove it. The idea is to take a known NP-Complete problem
and reduce it to L. If polynomial time reduction is possible, we can prove that L is NP-Complete by
transitivity of reduction (If a NP-Complete problem is reducible to L in polynomial time, then all
problems are reducible to L in polynomial time).
It is always useful to know about NP-Completeness even for engineers. Suppose you are asked to
write an e cient algorithm to solve an extremely important problem for your company. After a
lot of thinking, you can only come up exponential time approach which is impractical. If you don’t
know about NP-Completeness, you can only say that I could not come with an e cient algorithm.
If you know about NP-Completeness and prove that the problem as NP-complete, you can proudly
say that the polynomial time solution is unlikely to exist. If there is a polynomial time solution
possible, then that solution solves a big problem of computer science many scientists have been
trying for years.
We will soon be discussing more NP-Complete problems and their proof for NPCompleteness.
References:
MIT Video Lecture on Computational Complexity
Introduction to Algorithms 3rd Edition by Clifford Stein, Thomas H. Cormen, Charles E.
P, NP-Complete, NP, and NP-Hard (in Short)
NP problems have their own significance in programming, but the discussion becomes quite hot
when we deal with differences between NP, P, NP-Complete and NP-hard.
P- Polynomial time solving: Problems which can be solved in polynomial time, which take time
like O(n), O(n2), O(n3). E.g.: finding maximum element in an array or to check whether a string is
palindrome or not. So there are many problems which can be solved in polynomial time. NP- Non
deterministic Polynomial time solving: Problem which can't be solved in polynomial time like TSP
(travelling salesman problem) or an easy example of this is subset sum problem: given a set of
numbers, does there exist a subset whose sum is zero? But NP problems are checkable in
polynomial time means that given a solution of a problem, we can check that whether the solution
is correct or not in polynomial time.
Now we will discuss about NP-Complete and NP-hard.
Take two problems A and B both are NP problems.
Reducibility- If we can convert one instance of a problem A into problem B (NP problem) then it
means that A is reducible to B.
NP-hard-- Now suppose we found that A is reducible to B, then it means that B is at least as hard
as A.
NP-Complete -- The group of problems which are both in NP and NP-hard are known as
NPComplete problem.
Now suppose we have a NP-Complete problem R and it is reducible to Q then Q is at least as hard
as R and since R is an NP-hard problem. Therefore Q will also be at least NP-hard, it may be NP-
complete also.
Approximation Algorithm
Given an optimization problem P, an algorithm A is said to be an approximation algorithm for P, if
for any given instance I, it returns an approximate solution that is a feasible solution.
Approximation algorithms
1. Guaranteed to run in polynomial time.
2. Guaranteed to get a solution which is close to the optimal solution.
3. Obstacle: need to prove a solution’s value is close to optimum value, without even knowing what
the optimum value is!
Types of approximation
P -An optimization problem, A -An approximation algorithm, I -An instance of P,
A (I) -Optimal value for the instance I, A(I) -Value for the instance I generated by A
1. Absolute approximation: A is an absolute approximation algorithm if there exists a constant
k such that, for every instance I of P, A (I) − A(I) ≤ k. Example: Planar graph coloring.
2. Relative approximation: A is an relative approximation algorithm if there exists a constant
k such that, for every instance I of P, max{ A (I) /A(I) , A(I) /A (I) } ≤ k. Example: Vertex cover.
The full walk of above tree would be
1-2-1-4-1-3-1. Apply Hamiltonian
cycle so that the output is 1-2-4-3-1
Cost=10+25+30+15=80
In this case, the
approximate algorithm
produces the optimal tour, but it
may not produce optimal tour in all
cases.
A full walk is lists all vertices when they are first visited in preorder, it also list vertices when they
are returned after a subtree is visited in preorder. The full walk of above tree would be 1-21-4-1-
3-1.
Following are some important facts that prove the 2-approximateness.
1) The cost of best possible Travelling Salesman tour is never less than the cost of MST.
(The definition of MST says, it is a minimum cost tree that connects all vertices).
2) The total cost of full walk is at most twice the cost of MST (Every edge of MST is visited atmost
twice)
3) The output of the above algorithm is less than the cost of full walk. In above algorithm, we print
preorder walk as output. In preorder walk, two or more edges of full walk are replaced with a
single edge. For example, 2-1 and 1-4 are replaced by 1 edge 2-4. So if the graph follows triangle
inequality, then this is always true.
From the above three statements, we can conclude that the cost of output produced by the
approximate algorithm is never more than twice the cost of best possible solution.
A Hamiltonian cycle, also called a Hamiltonian circuit, Hamilton cycle, or Hamilton circuit, is a
graph cycle (i.e., closed loop) through a graph that visits each node exactly once. Determining
whether such paths and cycles exist in graphs is the Hamiltonian path problem, which is
NPcomplete.
35.2 The traveling-salesman problem 1113
Example:2
a d a d
e e
b f g b f g
c c
Time Complexity of
APPROX-TSP-TOUR
h h ϴ(V ²)
(d) (e)
Figure 35.2 The operation of APPROX-TSP-TOUR. (a) A complete undirected graph. Vertices lie on
intersections of integer grid lines. For example, f is one unit to the right and two units up from h. The
cost function between two points is the ordinary euclidean distance. (b) A minimum spanning tree T
of the complete graph, as computed by MST-PRIM. Vertex a is the root vertex. Only edges in the
minimum spanning tree are shown. The vertices happen to be labeled in such a way that they are
added to the main tree by MST-PRIM in alphabetical order. (c) A walk of T , starting at a. A full walk
of the tree visits the vertices in the order a;b;c;b;h;b;a;d;e;f;e;g;e;d;a. A preorder walk of T lists a
vertex just when it is first encountered, as indicated by the dot next to each vertex, yielding the
ordering a;b;c;h;d;e;f;g. (d) A tour obtained by visiting the vertices in the order given by the preorder
walk, which is the tour H returned by APPROX-TSP-TOUR. Its total cost is approximately 19:074. (e)
An optimal tour H for the original complete graph. Its total cost is approximately 14:715.
Recall from Section 12.1 that a preorder tree walk recursively visits every vertex
in the tree, listing a vertex when it is first encountered, before visiting any of its
children.
Figure 35.2 illustrates the operation of APPROX-TSP-TOUR. Part (a) of the figure
shows a complete undirected graph, and part (b) shows the minimum spanning
tree T grown from root vertex a by MST-PRIM. Part (c) shows how a preorder walk
of T visits the vertices, and part (d) displays the corresponding tour, which is the
tour returned by APPROX-TSP-TOUR. Part (e) displays an optimal tour, which is
about 23% shorter.
Vertex Cover Problem
https://www.linkedin.com/in/bhabanishankarpradhan07/