KEMBAR78
Introduction to Algorithms and Data Structures | PDF | Time Complexity | Computational Complexity Theory
0% found this document useful (0 votes)
209 views44 pages

Introduction to Algorithms and Data Structures

The document discusses algorithms and their characteristics. It defines an algorithm as a step-by-step procedure to solve a problem and get a desired output. Key characteristics of algorithms include being unambiguous, having well-defined inputs and outputs, terminating in a finite number of steps, and being independent of programming language. Common algorithm categories include search, sort, insert, update, and delete operations. Dynamic programming is also discussed as an approach that breaks problems into overlapping sub-problems and remembers solutions to avoid recomputing them.

Uploaded by

soban shahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
209 views44 pages

Introduction to Algorithms and Data Structures

The document discusses algorithms and their characteristics. It defines an algorithm as a step-by-step procedure to solve a problem and get a desired output. Key characteristics of algorithms include being unambiguous, having well-defined inputs and outputs, terminating in a finite number of steps, and being independent of programming language. Common algorithm categories include search, sort, insert, update, and delete operations. Dynamic programming is also discussed as an approach that breaks problems into overlapping sub-problems and remembers solutions to avoid recomputing them.

Uploaded by

soban shahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Algorithm

Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a


certain order to get the desired output. Algorithms are generally created independent of underlying
languages, i.e. an algorithm can be implemented in more than one programming language.

From the data structure point of view, following are some important categories of algorithms −

 Search − Algorithm to search an item in a data structure.


 Sort − Algorithm to sort items in a certain order.
 Insert − Algorithm to insert item in a data structure.
 Update − Algorithm to update an existing item in a data structure.
 Delete − Algorithm to delete an existing item from a data structure.

Characteristics of an Algorithm
Not all procedures can be called an algorithm. An algorithm should have the following
characteristics −0

 Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or
phases), and their inputs/outputs should be clear and must lead to only one meaning.
 Input − An algorithm should have 0 or more well-defined inputs.
 Output − An algorithm should have 1 or more well-defined outputs, and should match
the desired output.
 Finiteness − Algorithms must terminate after a finite number of steps.
 Feasibility − Should be feasible with the available resources.
 Independent − An algorithm should have step-by-step directions, which should be
independent of any programming code.

More Characteristics of an Algorithms:

The main characteristics of algorithms are as follows −

 Algorithms must have a unique name


 Algorithms should have explicitly defined set of inputs and outputs
 Algorithms are well-ordered with unambiguous operations
 Algorithms halt in a finite amount of time. Algorithms should not run for infinity, i.e., an
algorithm must end at some point

Dynamic Programming

Dynamic programming approach is similar to divide and conquer in breaking down the problem
into smaller and yet smaller possible sub-problems. But unlike, divide and conquer, these sub-
problems are not solved independently. Rather, results of these smaller sub-problems are
remembered and used for similar or overlapping sub-problems.
Dynamic programming is used where we have problems, which can be divided into similar sub-
problems, so that their results can be re-used. Mostly, these algorithms are used for optimization.
Before solving the in-hand sub-problem, dynamic algorithm will try to examine the results of the
previously solved sub-problems. The solutions of sub-problems are combined in order to achieve
the best solution.

So we can say that −

 The problem should be able to be divided into smaller overlapping sub-problem.


 An optimum solution can be achieved by using an optimum solution of smaller sub-
problems.
 Dynamic algorithms use Memoization.

Comparison
In contrast to greedy algorithms, where local optimization is addressed, dynamic algorithms are
motivated for an overall optimization of the problem.

In contrast to divide and conquer algorithms, where solutions are combined to achieve an overall
solution, dynamic algorithms use the output of a smaller sub-problem and then try to optimize a
bigger sub-problem. Dynamic algorithms use Memoization to remember the output of already
solved sub-problems.

Example

The following computer problems can be solved using dynamic programming approach −

 Fibonacci number series


 Knapsack problem
 Tower of Hanoi
 All pair shortest path by Floyd-Warshall
 Shortest path by Dijkstra
 Project scheduling

Dynamic programming can be used in both top-down and bottom-up manner. And of course,
most of the times, referring to the previous solution output is cheaper than recomputing in terms
of CPU cycles.

Data Structures & Algorithm Basic Concepts


This chapter explains the basic terms related to data structure.
Data Definition
Data Definition defines a particular data with the following characteristics.

 Atomic − Definition should define a single concept.


 Traceable − Definition should be able to be mapped to some data element.
 Accurate − Definition should be unambiguous.
 Clear and Concise − Definition should be understandable.

Data Object
Data Object represents an object having a data.

Data Type
Data type is a way to classify various types of data such as integer, string, etc. which determines
the values that can be used with the corresponding type of data, the type of operations that can be
performed on the corresponding type of data. There are two data types −

 Built-in Data Type


 Derived Data Type

Built-in Data Type

Those data types for which a language has built-in support are known as Built-in Data types. For
example, most of the languages provide the following built-in data types.

 Integers
 Boolean (true, false)
 Floating (Decimal numbers)
 Character and Strings

Derived Data Type

Those data types which are implementation independent as they can be implemented in one or
the other way are known as derived data types. These data types are normally built by the
combination of primary or built-in data types and associated operations on them. For example −

 List
 Array
 Stack
 Queue
Basic Operations
The data in the data structures are processed by certain operations. The particular data structure
chosen largely depends on the frequency of the operation that needs to be performed on the data
structure.

 Traversing
 Searching
 Insertion
 Deletion
 Sorting
 Merging

Array is a container which can hold a fix number of items and these items should be of the same
type. Most of the data structures make use of arrays to implement their algorithms. Following are
the important terms to understand the concept of Array.

 Element − Each item stored in an array is called an element.


 Index − Each location of an element in an array has a numerical index, which is used to
identify the element.

Array Representation
Arrays can be declared in various ways in different languages. For illustration, let's take C array
declaration.

Arrays can be declared in various ways in different languages. For illustration, let's take C array
declaration.

As per the above illustration, following are the important points to be considered.

 Index starts with 0.


 Array length is 10 which means it can store 10 elements.
 Each element can be accessed via its index. For example, we can fetch an element at
index 6 as 9.

Basic Operations
Following are the basic operations supported by an array.

 Traverse − print all the array elements one by one.


 Insertion − Adds an element at the given index.
 Deletion − Deletes an element at the given index.
 Search − Searches an element using the given index or by the value.
 Update − Updates an element at the given index.

In C, when an array is initialized with size, then it assigns defaults values to its elements in
following order.

Data Type Default Value

Bool false

Char 0

Int 0

Float 0.0

double 0.0f

Void

wchar_t 0

Insertion Operation
Insert operation is to insert one or more data elements into an array. Based on the requirement, a
new element can be added at the beginning, end, or any given index of array.

Here, we see a practical implementation of insertion operation, where we add data at the end of
the array −

Algorithm

Let Array be a linear unordered array of MAX elements.


Example

Result

Let LA be a Linear Array (unordered) with N elements and K is a positive integer such that
K<=N. Following is the algorithm where ITEM is inserted into the Kth position of LA −

1. Start
2. Set J = N
3. Set N = N+1
4. Repeat steps 5 and 6 while J >= K
5. Set LA[J+1] = LA[J]
6. Set J = J-1
7. Set LA[K] = ITEM
8. Stop

Pseudocode
Pseudocode gives a high-level description of an algorithm without the ambiguity associated with
plain text but also without the need to know the syntax of a particular programming language.

The running time can be estimated in a more general manner by using Pseudocode to represent
the algorithm as a set of fundamental operations which can then be counted.

Difference between Algorithm and Pseudocode


An algorithm is a formal definition with some specific characteristics that describes a process,
which could be executed by a Turing-complete computer machine to perform a specific task.
Generally, the word "algorithm" can be used to describe any high level task in computer science.

On the other hand, pseudocode is an informal and (often rudimentary) human readable
description of an algorithm leaving many granular details of it. Writing a pseudocode has no
restriction of styles and its only objective is to describe the high level steps of algorithm in a
much realistic manner in natural language.

For example, following is an algorithm for Insertion Sort.

Algorithm: Insertion-Sort
Input: A list L of integers of length n
Output: A sorted list L1 containing those integers present in L
Step 1: Keep a sorted list L1 which starts off empty
Step 2: Perform Step 3 for each element in the original list L
Step 3: Insert it into the correct position in the sorted list L1.
Step 4: Return the sorted list
Step 5: Stop

Here is a pseudocode which describes how the high level abstract process mentioned above in
the algorithm Insertion-Sort could be described in a more realistic way.
for i <- 1 to length(A)
x <- A[i]
j <- i
while j > 0 and A[j-1] > x
A[j] <- A[j-1]
j <- j - 1
A[j] <- x

In this tutorial, algorithms will be presented in the form of pseudocode, that is similar in many respects
to C, C++, Java, Python, and other programming languages.

How to Write an Algorithm?


There are no well-defined standards for writing algorithms. Rather, it is problem and resource
dependent. Algorithms are never written to support a particular programming code.

As we know that all programming languages share basic code constructs like loops (do, for,
while), flow-control (if-else), etc. These common constructs can be used to write an algorithm.

We write algorithms in a step-by-step manner, but it is not always the case. Algorithm writing is
a process and is executed after the problem domain is well-defined. That is, we should know the
problem domain, for which we are designing a solution.

Example

Let's try to learn algorithm-writing by using an example.

Problem − Design an algorithm to add two numbers and display the result.
Step 1 − START
Step 2 − declare three integers a, b & c
Step 3 − define values of a & b
Step 4 − add values of a & b
Step 5 − store output of step 4 to c
Step 6 − print c
Step 7 − STOP

Algorithms tell the programmers how to code the program. Alternatively, the algorithm can be
written as −

Step 1 − START ADD


Step 2 − get values of a & b
Step 3 − c ← a + b
Step 4 − display c
Step 5 − STOP
In design and analysis of algorithms, usually the second method is used to describe an algorithm.
It makes it easy for the analyst to analyze the algorithm ignoring all unwanted definitions. He
can observe what operations are being used and how the process is flowing.

Writing step numbers, is optional.

We design an algorithm to get a solution of a given problem. A problem can be solved in more
than one ways.

Hence, many solution algorithms can be derived for a given problem. The next step is to analyze those
proposed solution algorithms and implement the best suitable solution.

Algorithm Design
The important aspects of algorithm design include creating an efficient algorithm to solve a
problem in an efficient way using minimum time and space.

To solve a problem, different approaches can be followed. Some of them can be efficient with
respect to time consumption, whereas other approaches may be memory efficient. However, one
has to keep in mind that both time consumption and memory usage cannot be optimized
simultaneously. If we require an algorithm to run in lesser time, we have to invest in more
memory and if we require an algorithm to run with lesser memory, we need to have more time.

Problem Development Steps


The following steps are involved in solving computational problems.

 Problem definition
 Development of a model
 Specification of an Algorithm
 Designing an Algorithm
 Checking the correctness of an Algorithm
 Analysis of an Algorithm
 Implementation of an Algorithm
 Program testing
 Documentation

Algorithm Analysis
Efficiency of an algorithm can be analyzed at two different stages, before implementation and
after implementation. They are the following −

 A Priori Analysis − This is a theoretical analysis of an algorithm. Efficiency of an


algorithm is measured by assuming that all other factors, for example, processor speed,
are constant and have no effect on the implementation.
 A Posterior Analysis − This is an empirical analysis of an algorithm. The selected
algorithm is implemented using programming language. This is then executed on target
computer machine. In this analysis, actual statistics like running time and space required,
are collected.

We shall learn about a priori algorithm analysis. Algorithm analysis deals with the execution or
running time of various operations involved. The running time of an operation can be defined as
the number of computer instructions executed per operation.

Analysis of Algorithms
In theoretical analysis of algorithms, it is common to estimate their complexity in the asymptotic
sense, i.e., to estimate the complexity function for arbitrarily large input. The term "analysis of
algorithms" was coined by Donald Knuth.

Algorithm analysis is an important part of computational complexity theory, which provides


theoretical estimation for the required resources of an algorithm to solve a specific computational
problem. Most algorithms are designed to work with inputs of arbitrary length. Analysis of
algorithms is the determination of the amount of time and space resources required to execute it.

Usually, the efficiency or running time of an algorithm is stated as a function relating the input
length to the number of steps, known as time complexity, or volume of memory, known as
space complexity.

The Need for Analysis


In this chapter, we will discuss the need for analysis of algorithms and how to choose a better
algorithm for a particular problem as one computational problem can be solved by different
algorithms.
By considering an algorithm for a specific problem, we can begin to develop pattern recognition
so that similar types of problems can be solved by the help of this algorithm.

Algorithms are often quite different from one another, though the objective of these algorithms
are the same. For example, we know that a set of numbers can be sorted using different
algorithms. Number of comparisons performed by one algorithm may vary with others for the
same input. Hence, time complexity of those algorithms may differ. At the same time, we need to
calculate the memory space required by each algorithm.

Analysis of algorithm is the process of analyzing the problem-solving capability of the algorithm
in terms of the time and size required (the size of memory for storage while implementation).
However, the main concern of analysis of algorithms is the required time or performance.
Generally, we perform the following types of analysis −

 Worst-case − The maximum number of steps taken on any instance of size a.


 Best-case − The minimum number of steps taken on any instance of size a.
 Average case − An average number of steps taken on any instance of size a.
 Amortized − A sequence of operations applied to the input of size a averaged over time.

To solve a problem, we need to consider time as well as space complexity as the program may run on a
system where memory is limited but adequate space is available or may be vice-versa. In this context, if
we compare bubble sort and merge sort. Bubble sort does not require additional memory, but merge
sort requires additional space. Though time complexity of bubble sort is higher compared to merge sort,
we may need to apply bubble sort if the program needs to run in an environment, where memory is very
limited.

Asymptotic Analysis
The asymptotic behavior of a function f(n) refers to the growth of f(n) as n gets large.

We typically ignore small values of n, since we are usually interested in estimating how slow the
program will be on large inputs.

A good rule of thumb is that the slower the asymptotic growth rate, the better the algorithm.
Though it’s not always true.

For example, a linear algorithm f(n)=d∗n+k

is always asymptotically better than a quadratic one, f(n)=c.n2+q.

Solving Recurrence Equations


A recurrence is an equation or inequality that describes a function in terms of its value on smaller
inputs. Recurrences are generally used in divide-and-conquer paradigm.
Let us consider T(n) to be the running time on a problem of size n.

If the problem size is small enough, say n < c where c is a constant, the straightforward solution
takes constant time, which is written as θ(1). If the division of the problem yields a number of
sub-problems with size nb

To solve the problem, the required time is a.T(n/b). If we consider the time required for division
is D(n) and the time required for combining the results of sub-problems is C(n), the recurrence
relation can be represented as −

T(n)={θ(1)aT(nb)+D(n)+C(n)ifn⩽cotherwise
A recurrence relation can be solved using the following methods −

 Substitution Method − In this method, we guess a bound and using mathematical


induction we prove that our assumption was correct.
 Recursion Tree Method − In this method, a recurrence tree is formed where each node
represents the cost.
 Master’s Theorem − This is another important technique to find the complexity of a
recurrence relation.

Amortized Analysis
Amortized analysis is generally used for certain algorithms where a sequence of similar
operations are performed.

 Amortized analysis provides a bound on the actual cost of the entire sequence, instead of
bounding the cost of sequence of operations separately.
 Amortized analysis differs from average-case analysis; probability is not involved in
amortized analysis. Amortized analysis guarantees the average performance of each
operation in the worst case.

It is not just a tool for analysis, it’s a way of thinking about the design, since designing and
analysis are closely related.

Aggregate Method

The aggregate method gives a global view of a problem. In this method, if n operations takes
worst-case time T(n) in total. Then the amortized cost of each operation is T(n)/n. Though
different operations may take different time, in this method varying cost is neglected.
Accounting Method

In this method, different charges are assigned to different operations according to their actual
cost. If the amortized cost of an operation exceeds its actual cost, the difference is assigned to the
object as credit. This credit helps to pay for later operations for which the amortized cost less
than actual cost.

If the actual cost and the amortized cost of ith operation are ci

and cl^ , then

∑i=1ncl^⩾∑i=1nci
Potential Method

This method represents the prepaid work as potential energy, instead of considering prepaid
work as credit. This energy can be released to pay for future operations.

If we perform n operations starting with an initial data structure D0. Let us consider, ci as the
actual cost and Di as data structure of ith operation. The potential function Ф maps to a real
number Ф(Di), the associated potential of Di. The amortized cost cl^

can be defined by

cl^=ci+Φ(Di)−Φ(Di−1)
Hence, the total amortized cost is

∑i=1ncl^=∑i=1n(ci+Φ(Di)−Φ(Di−1))=∑i=1nci+Φ(Dn)−Φ(D0)
Dynamic Table
If the allocated space for the table is not enough, we must copy the table into larger size table.
Similarly, if large number of members are erased from the table, it is a good idea to reallocate the
table with a smaller size.

Using amortized analysis, we can show that the amortized cost of insertion and deletion is
constant and unused space in a dynamic table never exceeds a constant fraction of the total
space.

In the next chapter of this tutorial, we will discuss Asymptotic Notations in brief.
Algorithm Complexity
In designing of Algorithm, complexity analysis of an algorithm is an essential aspect. Mainly,
algorithmic complexity is concerned about its performance, how fast or slow it works.

The complexity of an algorithm describes the efficiency of the algorithm in terms of the amount
of the memory required to process the data and the processing time.

Complexity of an algorithm is analyzed in two perspectives: Time and Space.

Algorithm Complexity
Suppose X is an algorithm and n is the size of input data, the time and space used by the
algorithm X are the two main factors, which decide the efficiency of X.

 Time Factor − Time is measured by counting the number of key operations such as
comparisons in the sorting algorithm.
 Space Factor − Space is measured by counting the maximum memory space required by
the algorithm.

The complexity of an algorithm f(n) gives the running time and/or the storage space required by
the algorithm in terms of n as the size of input data.

Space Complexity
Space complexity of an algorithm represents the amount of memory space required by the
algorithm in its life cycle. The space required by an algorithm is equal to the sum of the
following two components −

 A fixed part that is a space required to store certain data and variables, that are
independent of the size of the problem. For example, simple variables and constants used,
program size, etc.
 A variable part is a space required by variables, whose size depends on the size of the
problem. For example, dynamic memory allocation, recursion stack space, etc.

Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part and S(I)
is the variable part of the algorithm, which depends on instance characteristic I. Following is a
simple example that tries to explain the concept −

Algorithm: SUM(A, B)
Step 1 - START
Step 2 - C ← A + B + 10
Step 3 - Stop

Here we have three variables A, B, and C and one constant. Hence S(P) = 1 + 3. Now, space
depends on data types of given variables and constant types and it will be multiplied accordingly.
Time Complexity
Time complexity of an algorithm represents the amount of time required by the algorithm to run
to completion. Time requirements can be defined as a numerical function T(n), where T(n) can
be measured as the number of steps, provided each step consumes constant time.

For example, addition of two n-bit integers takes n steps. Consequently, the total computational
time is T(n) = c ∗ n, where c is the time taken for the addition of two bits. Here, we observe that
T(n) grows linearly as the input size increases.

Asymptotic Analysis
Asymptotic analysis of an algorithm refers to defining the mathematical boundation/framing of
its run-time performance. Using asymptotic analysis, we can very well conclude the best case,
average case, and worst case scenario of an algorithm.

Asymptotic analysis is input bound i.e., if there's no input to the algorithm, it is concluded to
work in a constant time. Other than the "input" all other factors are considered constant.

Asymptotic analysis refers to computing the running time of any operation in mathematical units
of computation. For example, the running time of one operation is computed as f(n) and may be
for another operation it is computed as g(n2). This means the first operation running time will
increase linearly with the increase in n and the running time of the second operation will increase
exponentially when n increases. Similarly, the running time of both operations will be nearly the
same if n is significantly small.

Usually, the time required by an algorithm falls under three types −

 Best Case − Minimum time required for program execution.


 Average Case − Average time required for program execution.
 Worst Case − Maximum time required for program execution.

Asymptotic Notations
Following are the commonly used asymptotic notations to calculate the running time complexity
of an algorithm.

 Ο Notation
 Ω Notation
 θ Notation
Big Oh Notation, Ο

The notation Ο(n) is the formal way to express the upper bound of an algorithm's running time. It
measures the worst case time complexity or the longest amount of time an algorithm can
possibly take to complete.

For example, for a function f(n)

Ο(f(n)) = { g(n) : there exists c > 0 and n0 such that f(n) ≤ c.g(n) for all n
> n0. }
Omega Notation, Ω

The notation Ω(n) is the formal way to express the lower bound of an algorithm's running time. It
measures the best case time complexity or the best amount of time an algorithm can possibly
take to complete.

For example, for a function f(n)

Ω(f(n)) ≥ { g(n) : there exists c > 0 and n0 such that g(n) ≤ c.f(n) for all n
> n0. }
Theta Notation, θ

The notation θ(n) is the formal way to express both the lower bound and the upper bound of an
algorithm's running time. It is represented as follows −

θ(f(n)) = { g(n) if and only if g(n) = Ο(f(n)) and g(n) = Ω(f(n)) for all n
> n0. }

Common Asymptotic Notations


Following is a list of some common asymptotic notations −

constant − Ο(1)

logarithmic − Ο(log n)

linear − Ο(n)

n log n − Ο(n log n)

quadratic − Ο(n2)

cubic − Ο(n3)

polynomial − nΟ(1)

exponential − 2Ο(n)

An algorithm is designed to achieve optimum solution for a given problem. In greedy algorithm
approach, decisions are made from the given solution domain. As being greedy, the closest
solution that seems to provide an optimum solution is chosen.
Greedy algorithms try to find a localized optimum solution, which may eventually lead to
globally optimized solutions. However, generally greedy algorithms do not provide globally
optimized solutions.

What is Space Complexity?


Space complexity is a function describing the amount of memory (space) an algorithm takes in
terms of the amount of input to the algorithm.

We often speak of extra memory needed, not counting the memory needed to store the input
itself. Again, we use natural (but fixed-length) units to measure this.

We can use bytes, but it's easier to use, say, the number of integers used, the number of fixed-
sized structures, etc.

In the end, the function we come up with will be independent of the actual number of bytes
needed to represent the unit.

Space complexity is sometimes ignored because the space used is minimal and/or obvious,
however sometimes it becomes as important issue as time complexity

Definition

Let M be a deterministic Turing machine (TM) that halts on all inputs. The space complexity of
M is the function f:N→N

, where f(n) is the maximum number of cells of tape and M scans any input of length M. If the
space complexity of M is f(n), we can say that M runs in space f(n).

We estimate the space complexity of Turing machine by using asymptotic notation.

Let f:N→R+

be a function. The space complexity classes can be defined as follows −

SPACE = {L | L is a language decided by an O(f(n)) space deterministic TM}

SPACE = {L | L is a language decided by an O(f(n)) space non-deterministic TM}

PSPACE is the class of languages that are decidable in polynomial space on a deterministic
Turing machine.

In other words, PSPACE = Uk SPACE (nk)


Savitch’s Theorem
One of the earliest theorem related to space complexity is Savitch’s theorem. According to this
theorem, a deterministic machine can simulate non-deterministic machines by using a small
amount of space.

For time complexity, such a simulation seems to require an exponential increase in time. For
space complexity, this theorem shows that any non-deterministic Turing machine that uses f(n)
space can be converted to a deterministic TM that uses f2(n) space.

Hence, Savitch’s theorem states that, for any function, f:N→R+

, where f(n)⩾n

NSPACE(f(n)) ⊆ SPACE(f(n))

Relationship Among Complexity Classes

The following diagram depicts the relationship among different complexity classes.

Divide and Conquer


Many algorithms are recursive in nature to solve a given problem recursively dealing with sub-
problems.

In divide and conquer approach, a problem is divided into smaller problems, then the smaller
problems are solved independently, and finally the solutions of smaller problems are combined
into a solution for the large problem.

Generally, divide-and-conquer algorithms have three parts −

 Divide the problem into a number of sub-problems that are smaller instances of the same
problem.
 Conquer the sub-problems by solving them recursively. If they are small enough, solve
the sub-problems as base cases.
 Combine the solutions to the sub-problems into the solution for the original problem.

Pros and cons of Divide and Conquer Approach


Divide and conquer approach supports parallelism as sub-problems are independent. Hence, an
algorithm, which is designed using this technique, can run on the multiprocessor system or in
different machines simultaneously.

In this approach, most of the algorithms are designed using recursion, hence memory
management is very high. For recursive function stack is used, where function state needs to be
stored.

Application of Divide and Conquer Approach


Following are some problems, which are solved using divide and conquer approach.

 Finding the maximum and minimum of a sequence of numbers


 Strassen’s matrix multiplication
 Merge sort
 Binary search

Let us consider a simple problem that can be solved by divide and conquer technique.

Problem Statement

The Max-Min Problem in algorithm analysis is finding the maximum and minimum value in an
array.

Solution

To find the maximum and minimum numbers in a given array numbers[] of size n, the following
algorithm can be used. First we are representing the naive method and then we will present
divide and conquer approach.

Naïve Method
Naïve method is a basic method to solve any problem. In this method, the maximum and
minimum number can be found separately. To find the maximum and minimum numbers, the
following straightforward algorithm can be used.

Algorithm: Max-Min-Element (numbers[])


max := numbers[1]
min := numbers[1]

for i = 2 to n do
if numbers[i] > max then
max := numbers[i]
if numbers[i] < min then
min := numbers[i]
return (max, min)
Analysis

The number of comparison in Naive method is 2n - 2.

The number of comparisons can be reduced using the divide and conquer approach. Following is
the technique.

Divide and Conquer Approach


In this approach, the array is divided into two halves. Then using recursive approach maximum
and minimum numbers in each halves are found. Later, return the maximum of two maxima of
each half and the minimum of two minima of each half.

In this given problem, the number of elements in an array is y−x+1

, where y is greater than or equal to x.

Max−Min(x,y)

will return the maximum and minimum values of an array numbers[x...y]


Algorithm: Max - Min(x, y)
if y – x ≤ 1 then
return (max(numbers[x], numbers[y]), min((numbers[x], numbers[y]))
else
(max1, min1):= maxmin(x, ⌊((x + y)/2)⌋)
(max2, min2):= maxmin(⌊((x + y)/2) + 1)⌋,y)
return (max(max1, max2), min(min1, min2))
Analysis

Let T(n) be the number of comparisons made by Max−Min(x,y)

, where the number of elements n=y−x+1

If T(n) represents the numbers, then the recurrence relation can be represented as
T(n)=⎧⎩⎨⎪⎪T(⌊ n2⌋ )+T(⌈ n2⌉ )+210forn>2forn=2forn=1
Let us assume that n is in the form of power of 2. Hence, n = 2k where k is height of the
recursion tree.

So,

T(n)=2.T(n2)+2=2.(2.T(n4)+2)+2.....=3n2−2
Compared to Naïve method, in divide and conquer approach, the number of comparisons is less.
However, using the asymptotic notation both of the approaches are represented by O(n).

In divide and conquer approach, the problem in hand, is divided into smaller sub-problems and
then each problem is solved independently. When we keep on dividing the subproblems into
even smaller sub-problems, we may eventually reach a stage where no more division is possible.
Those "atomic" smallest possible sub-problem (fractions) are solved. The solution of all sub-
problems is finally merged in order to obtain the solution of an original problem.

Broadly, we can understand divide-and-conquer approach in a three-step process.

Divide/Break
This step involves breaking the problem into smaller sub-problems. Sub-problems should
represent a part of the original problem. This step generally takes a recursive approach to divide
the problem until no sub-problem is further divisible. At this stage, sub-problems become atomic
in nature but still represent some part of the actual problem.
Conquer/Solve
This step receives a lot of smaller sub-problems to be solved. Generally, at this level, the
problems are considered 'solved' on their own.

Merge/Combine
When the smaller sub-problems are solved, this stage recursively combines them until they
formulate a solution of the original problem. This algorithmic approach works recursively and
conquer & merge steps works so close that they appear as one.

Examples

The following computer algorithms are based on divide-and-conquer programming approach −

 Merge Sort
 Quick Sort
 Binary Search
 Strassen's Matrix Multiplication
 Closest pair (points)

There are various ways available to solve any computer problem, but the mentioned are a good
example of divide and conquer approach.

Problem Statement
The problem of sorting a list of numbers lends itself immediately to a divide-and-conquer
strategy: split the list into two halves, recursively sort each half, and then merge the two sorted
sub-lists.

Solution
In this algorithm, the numbers are stored in an array numbers[]. Here, p and q represents the start
and end index of a sub-array.

Algorithm: Merge-Sort (numbers[], p, r)


if p < r then
q = ⌊(p + r) / 2⌋
Merge-Sort (numbers[], p, q)
Merge-Sort (numbers[], q + 1, r)
Merge (numbers[], p, q, r)
Function: Merge (numbers[], p, q, r)
n1 = q – p + 1
n2 = r – q
declare leftnums[1…n1 + 1] and rightnums[1…n2 + 1] temporary arrays
for i = 1 to n1
leftnums[i] = numbers[p + i - 1]
for j = 1 to n2
rightnums[j] = numbers[q+ j]
leftnums[n1 + 1] = ∞
rightnums[n2 + 1] = ∞
i = 1
j = 1
for k = p to r
if leftnums[i] ≤ rightnums[j]
numbers[k] = leftnums[i]
i = i + 1
else
numbers[k] = rightnums[j]
j = j + 1

Analysis
Let us consider, the running time of Merge-Sort as T(n). Hence,

T(n)={c2xT(n2)+dxnifn⩽1otherwise
where c and d are constants

Therefore, using this recurrence relation,

T(n)=2iT(n2i)+i.d.n

As, i=logn,T(n)=2lognT(n2logn)+logn.d.n

=c.n+d.n.logn

Therefore, T(n)=O(nlogn)

Example
In the following example, we have shown Merge-Sort algorithm step by step. First, every
iteration array is divided into two sub-arrays, until the sub-array contains only one element.
When these sub-arrays cannot be divided further, then merge operations are performed.
Problem Statement
Binary search can be performed on a sorted array. In this approach, the index of an element x is
determined if the element belongs to the list of elements. If the array is unsorted, linear search is
used to determine the position.

Solution
In this algorithm, we want to find whether element x belongs to a set of numbers stored in an
array numbers[]. Where l and r represent the left and right index of a sub-array in which
searching operation should be performed.

Algorithm: Binary-Search(numbers[], x, l, r)
if l = r then
return l
else
m := ⌊(l + r) / 2⌋
if x ≤ numbers[m] then
return Binary-Search(numbers[], x, l, m)
else
return Binary-Search(numbers[], x, m+1, r)

Analysis
Linear search runs in O(n) time. Whereas binary search produces the result in O(log n) time

Let T(n) be the number of comparisons in worst-case in an array of n elements.

Hence,
T(n)={0T(n2)+1ifn=1otherwise

Using this recurrence relation T(n)=logn

Therefore, binary search uses O(logn)

time.

Example
In this example, we are going to search element 63.

Problem Statement

Let us consider two matrices X and Y. We want to calculate the resultant matrix Z by multiplying
X and Y.

Naïve Method
First, we will discuss naïve method and its complexity. Here, we are calculating Z = X × Y.
Using Naïve method, two matrices (X and Y) can be multiplied if the order of these matrices are
p × q and q × r. Following is the algorithm.
Algorithm: Matrix-Multiplication (X, Y, Z)
for i = 1 to p do
for j = 1 to r do
Z[i,j] := 0
for k = 1 to q do
Z[i,j] := Z[i,j] + X[i,k] × Y[k,j]
Complexity

Here, we assume that integer operations take O(1) time. There are three for loops in this
algorithm and one is nested in other. Hence, the algorithm takes O(n3) time to execute.

Strassen’s Matrix Multiplication Algorithm


In this context, using Strassen’s Matrix multiplication algorithm, the time consumption can be
improved a little bit.

Strassen’s Matrix multiplication can be performed only on square matrices where n is a power
of 2. Order of both of the matrices are n × n.

Divide X, Y and Z into four (n/2)×(n/2) matrices as represented below −

Z=[IKJL]

X=[ACBD] and Y=[EGFH]


Using Strassen’s Algorithm compute the following −

M1:=(A+C)×(E+F)
M2:=(B+D)×(G+H)
M3:=(A−D)×(E+H)
M4:=A×(F−H)
M5:=(C+D)×(E)
M6:=(A+B)×(H)
M7:=D×(G−E)
Then,

I:=M2+M3−M6−M7
J:=M4+M6
K:=M5+M7
L:=M1−M3−M4−M5
Analysis

T(n)={c7xT(n2)+dxn2ifn=1otherwise
where c and d are constants

Using this recurrence relation, we get T(n)=O(nlog7)

Hence, the complexity of Strassen’s matrix multiplication algorithm is O(nlog7).

Among all the algorithmic approaches, the simplest and straightforward approach is the Greedy
method. In this approach, the decision is taken on the basis of current available information
without worrying about the effect of the current decision in future.

Greedy algorithms build a solution part by part, choosing the next part in such a way, that it
gives an immediate benefit. This approach never reconsiders the choices taken previously. This
approach is mainly used to solve optimization problems. Greedy method is easy to implement
and quite efficient in most of the cases. Hence, we can say that Greedy algorithm is an
algorithmic paradigm based on heuristic that follows local optimal choice at each step with the
hope of finding global optimal solution.

In many problems, it does not produce an optimal solution though it gives an approximate (near
optimal) solution in a reasonable time.

Components of Greedy Algorithm


Greedy algorithms have the following five components −

 A candidate set − A solution is created from this set.


 A selection function − Used to choose the best candidate to be added to the solution.
 A feasibility function − Used to determine whether a candidate can be used to contribute
to the solution.
 An objective function − Used to assign a value to a solution or a partial solution.
 A solution function − Used to indicate whether a complete solution has been reached.

Areas of Application
Greedy approach is used to solve many problems, such as

 Finding the shortest path between two vertices using Dijkstra’s algorithm.
 Finding the minimal spanning tree in a graph using Prim’s /Kruskal’s algorithm, etc.

Where Greedy Approach Fails


In many problems, Greedy algorithm fails to find an optimal solution, moreover it may produce a
worst solution. Problems like Travelling Salesman and Knapsack cannot be solved using this
approach.

The Greedy algorithm could be understood very well with a well-known problem referred to as
Knapsack problem. Although the same problem could be solved by employing other algorithmic
approaches, Greedy approach solves Fractional Knapsack problem reasonably in a good time.
Let us discuss the Knapsack problem in detail.

Knapsack Problem
Given a set of items, each with a weight and a value, determine a subset of items to include in a
collection so that the total weight is less than or equal to a given limit and the total value is as
large as possible.

The knapsack problem is in combinatorial optimization problem. It appears as a subproblem in


many, more complex mathematical models of real-world problems. One general approach to
difficult problems is to identify the most restrictive constraint, ignore the others, solve a
knapsack problem, and somehow adjust the solution to satisfy the ignored constraints.

Applications

In many cases of resource allocation along with some constraint, the problem can be derived in a
similar way of Knapsack problem. Following is a set of example.

 Finding the least wasteful way to cut raw materials


 portfolio optimization
 Cutting stock problems

Problem Scenario

A thief is robbing a store and can carry a maximal weight of W into his knapsack. There are n
items available in the store and weight of ith item is wi and its profit is pi. What items should the
thief take?

In this context, the items should be selected in such a way that the thief will carry those items for
which he will gain maximum profit. Hence, the objective of the thief is to maximize the profit.

Based on the nature of the items, Knapsack problems are categorized as

 Fractional Knapsack
 Knapsack

Fractional Knapsack
In this case, items can be broken into smaller pieces, hence the thief can select fractions of items.

According to the problem statement,

 There are n items in the store


 Weight of ith item wi>0

  Profit for ith item pi>0

 and
 Capacity of the Knapsack is W

In this version of Knapsack problem, items can be broken into smaller pieces. So, the thief may
take only a fraction xi of ith item.

0⩽xi⩽1

The ith item contributes the weight xi.wi

to the total weight in the knapsack and profit xi.pi

to the total profit.

Hence, the objective of this algorithm is to

maximize∑n=1n(xi.pi)
subject to constraint,

∑n=1n(xi.wi)⩽W
It is clear that an optimal solution must fill the knapsack exactly, otherwise we could add a
fraction of one of the remaining items and increase the overall profit.

Thus, an optimal solution can be obtained by

∑n=1n(xi.wi)=W
In this context, first we need to sort those items according to the value of piwi
, so that pi+1wi+1 ≤ piwi

. Here, x is an array to store the fraction of items.

Algorithm: Greedy-Fractional-Knapsack (w[1..n], p[1..n], W)


for i = 1 to n
do x[i] = 0
weight = 0
for i = 1 to n
if weight + w[i] ≤ W then
x[i] = 1
weight = weight + w[i]
else
x[i] = (W - weight) / w[i]
weight = W
break
return x
Analysis

If the provided items are already sorted into a decreasing order of piwi

, then the whileloop takes a time in O(n); Therefore, the total time including the sort is in O(n
logn).

Example

Let us consider that the capacity of the knapsack W = 60 and the list of provided items are shown
in the following table −

Item A B C D

Profit 280 100 120 120

Weight 40 10 20 24

Ratio (piwi)

7 10 6 5

As the provided items are not sorted based on piwi

. After sorting, the items are as shown in the following table.

Item B A C D

Profit 100 280 120 120


Weight 10 40 20 24

Ratio (piwi)

10 7 6 5

Solution

After sorting all the items according to piwi

. First all of B is chosen as weight of B is less than the capacity of the knapsack. Next, item A is
chosen, as the available capacity of the knapsack is greater than the weight of A. Now, C is
chosen as the next item. However, the whole item cannot be chosen as the remaining capacity of
the knapsack is less than the weight of C.

Hence, fraction of C (i.e. (60 − 50)/20) is chosen.

Now, the capacity of the Knapsack is equal to the selected items. Hence, no more item can be
selected.

The total weight of the selected items is 10 + 40 + 20 * (10/20) = 60

And the total profit is 100 + 280 + 120 * (10/20) = 380 + 60 = 440

This is the optimal solution. We cannot gain more profit selecting any different combination of
items.

Problem Statement
In job sequencing problem, the objective is to find a sequence of jobs, which is completed within
their deadlines and gives maximum profit.

Solution
Let us consider, a set of n given jobs which are associated with deadlines and profit is earned, if
a job is completed by its deadline. These jobs need to be ordered in such a way that there is
maximum profit.

It may happen that all of the given jobs may not be completed within their deadlines.

Assume, deadline of ith job Ji is di and the profit received from this job is pi. Hence, the optimal
solution of this algorithm is a feasible solution with maximum profit.

Thus, D(i)>0
for 1⩽i⩽n

Initially, these jobs are ordered according to profit, i.e. p1⩾p2⩾p3⩾...⩾pn

Algorithm: Job-Sequencing-With-Deadline (D, J, n, k)


D(0) := J(0) := 0
k := 1
J(1) := 1 // means first job is selected
for i = 2 … n do
r := k
while D(J(r)) > D(i) and D(J(r)) ≠ r do
r := r – 1
if D(J(r)) ≤ D(i) and D(i) > r then
for l = k … r + 1 by -1 do
J(l + 1) := J(l)
J(r + 1) := i
k := k + 1

Analysis
In this algorithm, we are using two loops, one is within another. Hence, the complexity of this
algorithm is O(n2)

Example
Let us consider a set of given jobs as shown in the following table. We have to find a sequence
of jobs, which will be completed within their deadlines and will give maximum profit. Each job
is associated with a deadline and profit.

Job J1 J2 J3 J4 J5

Deadline 2 1 3 2 1

Profit 60 100 20 40 20

Solution
To solve this problem, the given jobs are sorted according to their profit in a descending order.
Hence, after sorting, the jobs are ordered as shown in the following table.
Job J2 J1 J4 J3 J5

Deadline 1 2 2 3 1

Profit 100 60 40 20 20

From this set of jobs, first we select J2, as it can be completed within its deadline and contributes
maximum profit.

 Next, J1 is selected as it gives more profit compared to J4.


 In the next clock, J4 cannot be selected as its deadline is over, hence J3 is selected as it
executes within its deadline.
 The job J5 is discarded as it cannot be executed within its deadline.

Thus, the solution is the sequence of jobs (J2, J1, J3), which are being executed within their
deadline and gives maximum profit.

Total profit of this sequence is 100 + 60 + 20 = 180.

Merge a set of sorted files of different length into a single sorted file. We need to find an optimal
solution, where the resultant file will be generated in minimum time.

If the number of sorted files are given, there are many ways to merge them into a single sorted
file. This merge can be performed pair wise. Hence, this type of merging is called as 2-way
merge patterns.

As, different pairings require different amounts of time, in this strategy we want to determine an
optimal way of merging many files together. At each step, two shortest sequences are merged.

To merge a p-record file and a q-record file requires possibly p + q record moves, the obvious
choice being, merge the two smallest files together at each step.

Two-way merge patterns can be represented by binary merge trees. Let us consider a set of n
sorted files {f1, f2, f3, …, fn}. Initially, each element of this is considered as a single node binary
tree. To find this optimal solution, the following algorithm is used.

Algorithm: TREE (n)


for i := 1 to n – 1 do
declare new node
node.leftchild := least (list)
node.rightchild := least (list)
node.weight) := ((node.leftchild).weight) + ((node.rightchild).weight)
insert (list, node);
return least (list);

At the end of this algorithm, the weight of the root node represents the optimal cost.
Example
Let us consider the given files, f1, f2, f3, f4 and f5 with 20, 30, 10, 5 and 30 number of elements
respectively.

If merge operations are performed according to the provided sequence, then

M1 = merge f1 and f2 => 20 + 30 = 50

M2 = merge M1 and f3 => 50 + 10 = 60

M3 = merge M2 and f4 => 60 + 5 = 65

M4 = merge M3 and f5 => 65 + 30 = 95

Hence, the total number of operations is

50 + 60 + 65 + 95 = 270

Now, the question arises is there any better solution?

Sorting the numbers according to their size in an ascending order, we get the following sequence

f4, f3, f1, f2, f5

Hence, merge operations can be performed on this sequence

M1 = merge f4 and f3 => 5 + 10 = 15

M2 = merge M1 and f1 => 15 + 20 = 35

M3 = merge M2 and f2 => 35 + 30 = 65

M4 = merge M3 and f5 => 65 + 30 = 95

Therefore, the total number of operations is

15 + 35 + 65 + 95 = 210

Obviously, this is better than the previous one.

In this context, we are now going to solve the problem using this algorithm.
Initial Set

Step-1

Step-2

Step-3

Step-4

Hence, the solution takes 15 + 35 + 60 + 95 = 205 number of comparisons.


Dynamic Programming is also used in optimization problems. Like divide-and-conquer method,
Dynamic Programming solves problems by combining the solutions of subproblems. Moreover,
Dynamic Programming algorithm solves each sub-problem just once and then saves its answer in
a table, thereby avoiding the work of re-computing the answer every time.

Two main properties of a problem suggest that the given problem can be solved using Dynamic
Programming. These properties are overlapping sub-problems and optimal substructure.

Overlapping Sub-Problems
Similar to Divide-and-Conquer approach, Dynamic Programming also combines solutions to
sub-problems. It is mainly used where the solution of one sub-problem is needed repeatedly. The
computed solutions are stored in a table, so that these don’t have to be re-computed. Hence, this
technique is needed where overlapping sub-problem exists.

For example, Binary Search does not have overlapping sub-problem. Whereas recursive program
of Fibonacci numbers have many overlapping sub-problems.

Optimal Sub-Structure
A given problem has Optimal Substructure Property, if the optimal solution of the given problem
can be obtained using optimal solutions of its sub-problems.

For example, the Shortest Path problem has the following optimal substructure property −

If a node x lies in the shortest path from a source node u to destination node v, then the shortest
path from u to v is the combination of the shortest path from u to x, and the shortest path from x
to v.

The standard All Pair Shortest Path algorithms like Floyd-Warshall and Bellman-Ford are typical
examples of Dynamic Programming.

Steps of Dynamic Programming Approach


Dynamic Programming algorithm is designed using the following four steps −

 Characterize the structure of an optimal solution.


 Recursively define the value of an optimal solution.
 Compute the value of an optimal solution, typically in a bottom-up fashion.
 Construct an optimal solution from the computed information.

Applications of Dynamic Programming Approach


 Matrix Chain Multiplication
 Longest Common Subsequence
 Travelling Salesman Problem

earlier we have discussed Fractional Knapsack problem using Greedy approach. We have shown
that Greedy approach gives an optimal solution for Fractional Knapsack. However, this chapter
will cover 0-1 Knapsack problem and its analysis.

In 0-1 Knapsack, items cannot be broken which means the thief should take the item as a whole
or should leave it. This is reason behind calling it as 0-1 Knapsack.

Hence, in case of 0-1 Knapsack, the value of xi can be either 0 or 1, where other constraints
remain the same.

0-1 Knapsack cannot be solved by Greedy approach. Greedy approach does not ensure an
optimal solution. In many instances, Greedy approach may give an optimal solution.

The following examples will establish our statement.

Example-1

Let us consider that the capacity of the knapsack is W = 25 and the items are as shown in the
following table.

Item A B C D

Profit 24 18 18 10

Weight 24 10 10 7

Without considering the profit per unit weight (pi/wi), if we apply Greedy approach to solve this
problem, first item A will be selected as it will contribute maximum profit among all the
elements.

After selecting item A, no more item will be selected. Hence, for this given set of items total
profit is 24. Whereas, the optimal solution can be achieved by selecting items, B and C, where
the total profit is 18 + 18 = 36.

Example-2

Instead of selecting the items based on the overall benefit, in this example the items are selected
based on ratio pi/wi. Let us consider that the capacity of the knapsack is W = 60 and the items are
as shown in the following table.

Item A B C
Price 100 280 120

Weight 10 40 20

Ratio 10 7 6

Using the Greedy approach, first item A is selected. Then, the next item B is chosen. Hence, the
total profit is 100 + 280 = 380. However, the optimal solution of this instance can be achieved by
selecting items, B and C, where the total profit is 280 + 120 = 400.

Hence, it can be concluded that Greedy approach may not give an optimal solution.

To solve 0-1 Knapsack, Dynamic Programming approach is required.

Problem Statement

A thief is robbing a store and can carry a maximal weight of W into his knapsack. There are n
items and weight of ith item is wi and the profit of selecting this item is pi. What items should the
thief take?

Dynamic-Programming Approach
Let i be the highest-numbered item in an optimal solution S for W dollars. Then S' = S - {i} is an
optimal solution for W - wi dollars and the value to the solution S is Vi plus the value of the sub-
problem.

We can express this fact in the following formula: define c[i, w] to be the solution for items 1,2,
… , i and the maximum weight w.

The algorithm takes the following inputs

 The maximum weight W


 The number of items n
 The two sequences v = <v1, v2, …, vn> and w = <w1, w2, …, wn>

Dynamic-0-1-knapsack (v, w, n, W)
for w = 0 to W do
c[0, w] = 0
for i = 1 to n do
c[i, 0] = 0
for w = 1 to W do
if wi ≤ w then
if vi + c[i-1, w-wi] then
c[i, w] = vi + c[i-1, w-wi]
else c[i, w] = c[i-1, w]
else
c[i, w] = c[i-1, w]
The set of items to take can be deduced from the table, starting at c[n, w] and tracing backwards
where the optimal values came from.

If c[i, w] = c[i-1, w], then item i is not part of the solution, and we continue tracing with c[i-1,
w]. Otherwise, item i is part of the solution, and we continue tracing with c[i-1, w-W].

Analysis

This algorithm takes θ(n, w) times as table c has (n + 1).(w + 1) entries, where each entry
requires θ(1) time to compute.

The longest common subsequence problem is finding the longest sequence which exists in both
the given strings.

Subsequence
Let us consider a sequence S = <s1, s2, s3, s4, …,sn>.

A sequence Z = <z1, z2, z3, z4, …,zm> over S is called a subsequence of S, if and only if it can be
derived from S deletion of some elements.

Common Subsequence
Suppose, X and Y are two sequences over a finite set of elements. We can say that Z is a
common subsequence of X and Y, if Z is a subsequence of both X and Y.

Longest Common Subsequence


If a set of sequences are given, the longest common subsequence problem is to find a common
subsequence of all the sequences that is of maximal length.

The longest common subsequence problem is a classic computer science problem, the basis of
data comparison programs such as the diff-utility, and has applications in bioinformatics. It is
also widely used by revision control systems, such as SVN and Git, for reconciling multiple
changes made to a revision-controlled collection of files.

Naïve Method
Let X be a sequence of length m and Y a sequence of length n. Check for every subsequence of X
whether it is a subsequence of Y, and return the longest common subsequence found.

There are 2m subsequences of X. Testing sequences whether or not it is a subsequence of Y takes


O(n) time. Thus, the naïve algorithm would take O(n2m) time.
Dynamic Programming
Let X = < x1, x2, x3,…, xm > and Y = < y1, y2, y3,…, yn > be the sequences. To compute the length
of an element the following algorithm is used.

In this procedure, table C[m, n] is computed in row major order and another table B[m,n] is
computed to construct optimal solution.

Algorithm: LCS-Length-Table-Formulation (X, Y)


m := length(X)
n := length(Y)
for i = 1 to m do
C[i, 0] := 0
for j = 1 to n do
C[0, j] := 0
for i = 1 to m do
for j = 1 to n do
if xi = yj
C[i, j] := C[i - 1, j - 1] + 1
B[i, j] := ‘D’
else
if C[i -1, j] ≥ C[i, j -1]
C[i, j] := C[i - 1, j] + 1
B[i, j] := ‘U’
else
C[i, j] := C[i, j - 1]
B[i, j] := ‘L’
return C and B
Algorithm: Print-LCS (B, X, i, j)
if i = 0 and j = 0
return
if B[i, j] = ‘D’
Print-LCS(B, X, i-1, j-1)
Print(xi)
else if B[i, j] = ‘U’
Print-LCS(B, X, i-1, j)
else
Print-LCS(B, X, i, j-1)

This algorithm will print the longest common subsequence of X and Y.

Analysis
To populate the table, the outer for loop iterates m times and the inner for loop iterates n times.
Hence, the complexity of the algorithm is O(m, n), where m and n are the length of two strings.

Example
In this example, we have two strings X = BACDB and Y = BDCB to find the longest common
subsequence.
Following the algorithm LCS-Length-Table-Formulation (as stated above), we have calculated
table C (shown on the left hand side) and table B (shown on the right hand side).

In table B, instead of ‘D’, ‘L’ and ‘U’, we are using the diagonal arrow, left arrow and up arrow,
respectively. After generating table B, the LCS is determined by function LCS-Print. The result
is BCB.

A spanning tree is a subset of an undirected Graph that has all the vertices connected by
minimum number of edges.

If all the vertices are connected in a graph, then there exists at least one spanning tree. In a graph,
there may exist more than one spanning tree.

Properties

 A spanning tree does not have any cycle.


 Any vertex can be reached from any other vertex.

Example

In the following graph, the highlighted edges form a spanning tree.


Minimum Spanning Tree
A Minimum Spanning Tree (MST) is a subset of edges of a connected weighted undirected
graph that connects all the vertices together with the minimum possible total edge weight. To
derive an MST, Prim’s algorithm or Kruskal’s algorithm can be used. Hence, we will discuss
Prim’s algorithm in this chapter.

As we have discussed, one graph may have more than one spanning tree. If there are n number of
vertices, the spanning tree should have n - 1 number of edges. In this context, if each edge of the
graph is associated with a weight and there exists more than one spanning tree, we need to find
the minimum spanning tree of the graph.

Moreover, if there exist any duplicate weighted edges, the graph may have multiple minimum
spanning tree.

In the above graph, we have shown a spanning tree though it’s not the minimum spanning tree.
The cost of this spanning tree is (5 + 7 + 3 + 3 + 5 + 8 + 3 + 4) = 38.

We will use Prim’s algorithm to find the minimum spanning tree.


Prim’s Algorithm
Prim’s algorithm is a greedy approach to find the minimum spanning tree. In this algorithm, to
form a MST we can start from an arbitrary vertex.

Algorithm: MST-Prim’s (G, w, r)


for each u є G.V
u.key = ∞
u.∏ = NIL
r.key = 0
Q = G.V
while Q ≠ Ф
u = Extract-Min (Q)
for each v є G.adj[u]
if each v є Q and w(u, v) < v.key
v.∏ = u
v.key = w(u, v)

The function Extract-Min returns the vertex with minimum edge cost. This function works on
min-heap.

Example

Using Prim’s algorithm, we can start from any vertex, let us start from vertex 1.

Vertex 3 is connected to vertex 1 with minimum edge cost, hence edge (1, 2) is added to the
spanning tree.

Next, edge (2, 3) is considered as this is the minimum among edges {(1, 2), (2, 3), (3, 4), (3, 7)}.

In the next step, we get edge (3, 4) and (2, 4) with minimum cost. Edge (3, 4) is selected at
random.

In a similar way, edges (4, 5), (5, 7), (7, 8), (6, 8) and (6, 9) are selected. As all the vertices are
visited, now the algorithm stops.

The cost of the spanning tree is (2 + 2 + 3 + 2 + 5 + 2 + 3 + 4) = 23. There is no more spanning


tree in this graph with cost less than 23.
Hence, the minimum distance between vertex s and vertex d is 20.

Based on the predecessor information, the path is s→ h→ e→ g→ c→ d

You might also like