Analysis and Design of Algorithms UNIT - 1
Analysis and Design of Algorithms UNIT - 1
(UNIT-I)
Prepared by:
Neeraj Sharma
What are Data Structures?
• Data Structure can be defined as the grouping scheme of data elements which will
provides an efficient way of storing, organising, and accessing data elements.
• Some examples of Data Structures are arrays, Linked List, Stack, Queue, etc. Data
Structures are widely used in almost every aspect of Computer Science i.e. Operating
System, Compiler Design, Artificial intelligence, Graphics and many more.
• Data Structures are the main part of many computer science algorithms as they enable the
programmers to handle the data in an efficient way.
• It plays a vital role in enhancing the performance of a software or a program as the main
function of the software is to store and retrieve the user's data as fast as possible.
Basic Terminology
• Data structures are the building blocks of any program or the software.
• Choosing the appropriate data structure for a program is the most difficult task for a
programmer. Following terminology is used as far as data structures are concerned.
– Data: Data can be defined as an elementary value or the collection of values, for
example, student's name and its id are the data about the student.
– Group Items: Data items which have subordinate data items are called Group item, for
example, name of a student can have first name and the last name.
– Record: Record can be defined as the collection of various data items, for example, if
we talk about the student entity, then its name, address, course and marks can be
grouped together to form the record for the student.
– File: A File is a collection of various records of one type of entity, for example, if there
are 60 employees in the class, then there will be 20 records in the related file where
each record contains the data about each employee.
– Attribute and Entity: An entity represents the class of certain objects. it contains
various attributes. Each attribute represents the particular property of that entity.
– Field: Field is a single elementary unit of information representing the attribute of an
entity.
Types of Data Structures
Linear Data Structures
• A data structure is called linear if all of its elements are arranged in the linear order. In
linear data structures, the elements are stored in non-hierarchical way where each element
has the successors and predecessors except the first and last element.
– Arrays: An array is a collection of similar type of data items and each data item is
called an element of the array. The data type of the element may be any valid data type
like char, int, float or double. The elements of array share the same variable name but
each one carries a different index number known as subscript. The array can be one
dimensional, two dimensional or multidimensional.
The individual elements of the array age are:
age[0], age[1], age[2], age[3],......... age[98], age[99].
Linear Data Structures
• Linked List: Linked list is a linear data structure which is used to maintain a list in the
memory. It can be seen as the collection of nodes stored at non-contiguous memory
locations. Each node of the list contains a pointer to its adjacent node.
• Stack: Stack is a linear list in which insertion and deletions are allowed only at one end,
called top.
• A stack is an abstract data type (ADT), can be implemented in most of the programming
languages. It is named as stack because it behaves like a real-world stack, for example: -
piles of plates or deck of cards etc.
• Queue: Queue is a linear list in which elements can be inserted only at one end
called rear and deleted only at the other end called front.
It is an abstract data structure, similar to stack. Queue is opened at both end therefore it
follows First-In-First-Out (FIFO) methodology for storing the data items.
Data Structures Operations
• Traversing: Every data structure contains the set of data elements. Traversing the data
structure means visiting each element of the data structure in order to perform some
specific operation like searching or sorting.
• Insertion: Insertion can be defined as the process of adding the elements to the data
structure at any location.
If the size of data structure is n then we can only insert n-1 data elements into it.
Data Structures Operations
• Deletion: The process of removing an element from the data structure is called Deletion.
We can delete an element from the data structure at any random location.
If we try to delete an element from an empty data structure then underflow occurs.
• Searching: The process of finding the location of an element within the data structure is
called Searching. There are two algorithms to perform searching, Linear Search and Binary
Search. We will discuss each one of them later in this tutorial.
• Sorting: The process of arranging the data structure in a specific order is known as Sorting.
There are many algorithms that can be used to perform sorting, for example, insertion sort,
selection sort, bubble sort, etc.
• Merging: When two lists List A and List B of size M and N respectively, of similar type of
elements, clubbed or joined to produce the third list, List C of size (M+N), then this process
is called merging
What is Algorithm?
• A finite set of instruction that specifies a sequence of operation is to be carried out in order
to solve a specific problem or class of problems is called an Algorithm.
Why study Algorithm?
• As the speed of processor increases, performance is frequently said to be less central than
other software quality characteristics (e.g. security, extensibility, reusability etc.).
• However, large problem sizes are commonplace in the area of computational science,
which makes performance a very important factor.
• This is because longer computation time, to name a few mean slower results, less through
research and higher cost of computation (if buying CPU Hours from an external party).
• Analysis of algorithms is the determination of the amount of time and space resources
required to execute it.
Why Analysis of Algorithm is important?
• It is much more convenient to have simple measures for the efficiency of an algorithm than
to implement the algorithm and test the efficiency every time a certain parameter in the
underlying computer system changes.
• It is impossible to predict the exact behaviour of an algorithm. There are too many
influencing factors.
• Best case: Define the input for which algorithm takes less time or minimum time. In the
best case calculate the lower bound of an algorithm. Example: In the linear search when
search data is present at the first location of large data then the best case occurs.
• Worst Case: Define the input for which algorithm takes a long time or maximum time. In
the worst calculate the upper bound of an algorithm. Example: In the linear search when
search data is not present at all then the worst case occurs.
• Average case: In the average case take all random inputs and calculate the computation
time for all inputs.
• An algorithm can be defined as a finite set of steps, which has to be followed while
carrying out a particular problem.
• An algorithm is a distinct computational procedure that takes input as a set of values and
results in the output as a set of values by solving the problem.
• Easy and Efficient Coding: An algorithm is nothing but a blueprint of a program that
helps develop a program.
• Logarithmic Complexity:
It imposes a complexity of O(log(N)). It undergoes the execution of the order of log(N)
steps. To perform operations on N elements, it often takes the logarithmic base as 2.
For N = 1,000,000, an algorithm that has a complexity of O(log(N)) would undergo 20
steps. Here, the logarithmic base does not hold a necessary consequence for the operation
count order, so it is usually omitted.
Typical Complexity of an Algorithm
• Linear Complexity:
– It imposes a complexity of O(N). It encompasses the same number of steps as that of
the total number of elements to implement an operation on N elements.
For example, if there exist 500 elements, then it will take about 500 steps. Basically, in
linear complexity, the number of elements linearly depends on the number of steps. For
example, the number of steps for N elements can be N/2 or 3*N.
– It also imposes a run time of O(n*log(n)). It undergoes the execution of the order
N*log(N) on N number of elements to solve the given problem.
For a given 1000 elements, the linear complexity will execute 10,000 steps for solving
a given problem.
Typical Complexity of an Algorithm
• Quadratic Complexity: It imposes a complexity of O(n2). For N input data size, it
undergoes the order of N2 count of operations on N number of elements for solving a given
problem. If N = 100, it will endure 10,000 steps. In other words, whenever the order of
operation tends to have a quadratic relation with the input data size, it results in quadratic
complexity. For example, for N number of elements, the steps are found to be in the order
of 3*N2/2.
• Cubic Complexity: It imposes a complexity of O(n3). For N input data size, it executes the
order of N3 steps on N elements to solve a given problem.
For example, if there exist 100 elements, it is going to execute 1,000,000 steps.
1. Divide and Conquer Approach: It is a top-down approach. The algorithms which follow
the divide & conquer techniques involve three steps:
Greedy Algorithm always makes the choice (greedy criteria) looks best at the moment,
to optimize a given objective.
The greedy algorithm doesn't always guarantee the optimal solution however it
generally produces a solution that is very close in value to the optimal.
Algorithm Design Techniques
3. Dynamic Programming: Dynamic Programming is a bottom-up approach we solve all
possible small problems and then combine them to obtain solutions for bigger problems.
This is particularly helpful when the number of copying subproblems is exponentially
large. Dynamic Programming is frequently related to Optimization Problems.
4. Branch and Bound: In Branch & Bound algorithm a given subproblem, which cannot be
bounded, has to be divided into at least two new restricted subproblems. Branch and Bound
algorithm are methods for global optimization in non-convex problems. Branch and Bound
algorithms can be slow, however in the worst case they require effort that grows
exponentially with problem size, but in some cases we are lucky, and the method coverage
with much less effort.
5. Backtracking Algorithm: Backtracking Algorithm tries each possibility until they find the
right one. It is a depth-first search of the set of possible solution. During the search, if an
alternative doesn't work, then backtrack to the choice point, the place which presented
different alternatives, and tries the next alternative.
Algorithm Design Techniques
• The word Asymptotic means approaching a value or curve arbitrarily closely. Asymptotic
notations are used to write fastest and slowest possible running time for an algorithm.
These are also referred to as 'best case' and 'worst case' scenarios respectively.
• In asymptotic notations, we derive the complexity concerning the size of the input.
(Example in terms of n)“
• These notations are important because they give an approximation about the cost of running
the algorithm.
Hence, function g (n) is an upper bound for function f (n), as g (n) grows faster than f (n)
For Example:
1. 3n+2=O(n) as 3n+2≤4n for all n≥2
2. 3n+3=O(n) as 3n+3≤4n for all n≥3
Hence, the complexity of f(n) can be represented as
O (g (n))
Asymptotic Notation
2. Omega () Notation: The function f (n) = Ω (g (n)) [read as "f of n is omega of g of n"] if
and only if there exists positive constant c and n0 such that
For Example:
f (n) = 8n2+2n-3 ≥ 8n2-3
= 7n2+(n2-3) ≥ 7n2 (g(n))
Thus, k1=7
3. Theta (θ): The function f (n) = θ (g (n)) [read as "f is the theta of g of n"] if and only if there
exists positive constant k1, k2 and k0 such that
The Theta Notation is more precise than both the big-oh and Omega notation. The function
f(n) = θ (g (n)) if g(n) is both an upper and lower bound.
Growth of function
• Resources for an algorithm are usually expressed as a function regarding input. Often this
function is messy and complicated to work. To study Function growth efficiently, we
reduce the function down to the important part.
In this function, the n2 term dominates the function that is when n gets sufficiently large.
Dominate terms are what we are interested in reducing a function, in this; we ignore all
constants and coefficient and look at the highest order term concerning n.
Recurrence Relation
• A recurrence is an equation or inequality that describes a function in terms of its values on
smaller inputs.
• To solve a Recurrence Relation means to obtain a function defined on the natural numbers
that satisfy the recurrence.
For Example, the Worst Case Running Time T(n) of the MERGE SORT Procedures is
described by the recurrence.
T (n) = θ (1) if n=1
2T(n/2) + θ (n) if n>1
There are four methods for solving Recurrence:
– Substitution Method
– Iteration Method
– Recursion Tree Method
– Master Method
Substitution Method:
• The Substitution Method Consists of two main steps:
– Guess the Solution.
– Use the mathematical induction to find the boundary condition and shows that the
guess is correct.
For Example1 Solve the equation by Substitution Method.
T (n) = T(n/2) + n
We have to show that it is asymptotically bound by O (log n).
Solution: For T (n) = O (log n)
Put this in given Recurrence Equation.
T (n) ≤c log+ 1
≤c log+ 1 = c logn-clog2 2+1
≤c logn for c≥1
Thus T (n) =O logn.
Substitution Method:
Example2 Consider the Recurrence
T (n) = 2T(n/2)+ n n>1
Find an Asymptotic bound on T.
Solution:
We guess the solution is O (n (logn)).Thus for constant 'c'.
T (n) ≤c n logn
Put this in given Recurrence Equation.
Now,
T (n) ≤2c(n/2)log(n/2) +n
≤cnlogn-cnlog2+n
=cn logn-n (clog2-1)
≤cn logn for (c≥1)
Thus T (n) = O (n logn).
Iteration Method:
• It means to expand the recurrence and express it as a summation of terms of n and initial
condition.
Example1: Consider the Recurrence
T (n) = 1 if n=1
= 2T (n-1) if n>1
Solution:
T (n) = 2T (n-1)
= 2[2T (n-2)] = 22T (n-2)
= 4[2T (n-3)] = 23T (n-3)
= 8[2T (n-4)]
= 24T (n-4) (Eq.1)
Repeat the procedure for i times
T (n) = 2i T (n-i)
Put n-i=1 or i= n-1 in (Eq.1)
T (n) = 2n-1 T (1)
= 2n-1 .1 {T (1) =1 .....given}
= 2n-1
Recursion Tree Method:
• It is sometimes difficult to come up with a good guess. In Recursion tree, each root and
child represents the cost of a single subproblem.
• We sum the costs within each of the levels of the tree to obtain a set of pre-level costs and
then sum all pre-level costs to determine the total cost of all levels of the recursion.
• A Recursion Tree is best used to generate a good guess, which can be verified by the
Substitution Method.
Recursion Tree Method:
Example 1
Consider T (n) = 2T(n/2) + n2
We have to obtain the asymptotic bound using recursion tree method.
Solution: The Recursion tree for the above recurrence is
Master Method:
The Master Method is used for solving the following types of recurrence
T (n) = a T(n/b)+ f (n) with a≥1 and b≥1 be constant & f(n) be a function and n/b can be
interpreted as
In the function to the analysis of a recursive algorithm, the constants and function take on the
following significance:
Master Method:
• n is the size of the problem.
• n/b is the size of each subproblem. (Here it is assumed that all subproblems are essentially
the same size.)
• f (n) is the sum of the work done outside the recursive calls, which includes the sum of
dividing the problem and the sum of combining the solutions to the subproblems.
• It is not possible always bound the function according to the requirement, so we make three
cases which will tell us what kind of bound we can apply on the function.
Divide and Conquer Introduction
• Divide and Conquer is an algorithmic pattern. In algorithmic methods, the design is to take
a dispute on a huge input, break the input into minor pieces, decide the problem on each of
the small pieces, and then merge the piecewise solutions into a global solution. This
mechanism of solving the problem is called the Divide & Conquer Strategy.
• Divide and Conquer algorithm consists of a dispute using the following three steps.
– Divide the original problem into a set of subproblems.
– Conquer: Solve every subproblem individually, recursively.
– Combine: Put together the solutions of the subproblems to get the solution to the whole
problem.
Divide and Conquer Introduction
Divide and Conquer Introduction
Examples: The specific computer algorithms are based on the Divide & Conquer approach:
– Binary Search
– Sorting (merge sort, quick sort)
– Tower of Hanoi.
Fundamental Divide and Conquer Strategy
• There are two fundamental of Divide & Conquer Strategy:
– Recurrence Formula
– Stopping Condition
1. Recurrence Formula: It is the formula that we generate from the given technique. After
generation of Formula we apply D&C Strategy, i.e. we break the problem recursively &
solve the broken subproblems.
2. Stopping Condition: When we break the problem using Divide & Conquer Strategy, then
we need to know that for how much time, we need to apply divide & Conquer. So the
condition where the need to stop our recursion steps of D&C is called as Stopping
Condition.
Application of Divide and Conquer Approach
• Following algorithms are based on the concept of the Divide and Conquer Technique:
• Binary Search: The binary search algorithm is a searching algorithm, which is also called
a half-interval search or logarithmic search. It works by comparing the target value with the
middle element existing in a sorted array. After making the comparison, if the value differs,
then the half that cannot contain the target will eventually eliminate, followed by
continuing the search on the other half. We will again consider the middle element and
compare it with the target value. The process keeps on repeating until the target value is
met. If we found the other half to be empty after ending the search, then it can be concluded
that the target is not present in the array.
• Quicksort: It is the most efficient sorting algorithm, which is also known as partition-
exchange sort. It starts by selecting a pivot value from an array followed by dividing the
rest of the array elements into two sub-arrays. The partition is made by comparing each of
the elements with the pivot value. It compares whether the element holds a greater value or
lesser value than the pivot and then sort the arrays recursively.
Application of Divide and Conquer Approach
• Merge Sort: It is a sorting algorithm that sorts an array by making comparisons. It starts by
dividing an array into sub-array and then recursively sorts each of them. After the sorting is
done, it merges them back.
• Closest Pair of Points: It is a problem of computational geometry. This algorithm
emphasizes finding out the closest pair of points in a metric space, given n points, such that
the distance between the pair of points should be minimal.
• Strassen's Algorithm: It is an algorithm for matrix multiplication, which is named after
Volker Strassen. It has proven to be much faster than the traditional algorithm when works
on large matrices.
• Cooley-Tukey Fast Fourier Transform (FFT) algorithm: The Fast Fourier Transform
algorithm is named after J. W. Cooley and John Turkey. It follows the Divide and Conquer
Approach and imposes a complexity of O(nlogn).
• Karatsuba algorithm for fast multiplication: It is one of the fastest multiplication
algorithms of the traditional time, invented by Anatoly Karatsuba in late 1960 and got
published in 1962. It multiplies two n-digit numbers in such a way by reducing it to at most
single-digit.
Advantages of Divide and Conquer
• Divide and Conquer tend to successfully solve one of the biggest problems, such as the
Tower of Hanoi, a mathematical puzzle. It is challenging to solve complicated problems for
which you have no basic idea, but with the help of the divide and conquer approach, it has
lessened the effort as it works on dividing the main problem into two halves and then solve
them recursively. This algorithm is much faster than other algorithms.
• It efficiently uses cache memory without occupying much space because it solves simple
subproblems within the cache memory instead of accessing the slower main memory.
• Since these algorithms inhibit parallelism, it does not involve any modification and is
handled by systems incorporating parallel processing.
Disadvantages of Divide and Conquer
• Since most of its algorithms are designed by incorporating recursion, so it necessitates high
memory management.
• It may even crash the system if the recursion is performed rigorously greater than the stack
present in the CPU.
DIVIDE AND CONQUER: GENERAL METHOD
• Merge sort is yet another sorting algorithm that falls under the category of Divide and
Conquer technique. It is one of the best sorting techniques that successfully build a
recursive algorithm.
• The Merge-Sort function keeps on splitting an array into two halves until a condition is met
where we try to perform Merge-Sort on a subarray of size 1, i.e., p == r.
• And then, it combines the individually sorted sub arrays into larger arrays until the whole
array is merged.
ALGORITHM-MERGE SORT
1. if p <= r
1. Then q → (p+ r)/2
2. MERGE-SORT (A, p, q)
3. MERGE-SORT (A, q+1,r)
4. MERGE (A, p, q, r)
Merge Sort
Merge Sort
Merge Sort
Divide: Rearrange the elements and split arrays into two sub-arrays and an element in
between search that each element in left sub array is less than or equal to the average
element and each element in the right sub- array is larger than the middle element.
Conquer: Recursively, sort two sub arrays.
Combine: Combine the already sorted array. (optional)
Quick Sort Algorithm
void quickSort(arr, low, high) Partition algorithm rearranges the sub arrays in a place.
{ PARTITION (array A, int m, int n)
if (low < high) 1 x ← A[m]
{ 2o←m
pi = partition(arr, low, high); 3 for p ← m + 1 to n
quickSort(arr, low, pi - 1); 4 do if (A[p] < x)
quickSort(arr, pi + 1, high); 5 then o ← o + 1
} 6 swap A[o] with A[p]
7 swap A[m] with A[o]
}
8 return o
Strassen’s Matrix Multiplication Algorithm
1. SPACE COMPLEXITY:
The space complexity of an algorithm is the amount of Memory Space
required by an algorithm during course of execution is called space complexity .There
are three types of space
a) Instruction space :executable program
b) Data space: Required to store all the constant and variable data space.
c) Environment: It is required to store environment information needed to resume
the suspended space.
2. TIME COMPLEXITY:
The time complexity of an algorithm is the total amount of time required by an
algorithm to complete its execution.
Space complexity
b) Linear(variable)space complexity
1.Constant space complexity: A fixed amount of space for all
the input values.
1.Algorithm sum(a,b,c)
{
a=10; a-1
b=20; b-1
c=a+b; c-1
}
s(p)=c+sp
3+0=3
0(n)=3
2. algorithm sum(a,n)
{
total-=0; -1
Fori=1 to n do -1,1
Total=total+a[i] --n
Return total
}
Space
Complexity
The time T(p) taken by a program P is the sum of the compile time
and the run time(execution time)
Statement S/e Frequency Total
1. Algorithm Sum(a, n) 0 - 0
2.{ 0 - 0
3. S=0.0; 1 1 1
4. for i = 1 to n do 1 n+1 n+1
5. s=s+a[i]; 2 n 2n
6. return s; 1 1 1
7. } 0 - 0
Total 3n+3
KINDS OF ANALYSIS
1.Worst-case: (usually)
• T(n) = maximum time of algorithm on any input of size n.
2.Average-case: (sometimes)
• T(n) = expected time of algorithm over all inputs of size n.
• Need assumption of statistical distribution of inputs.
3.Best-case:
• T(n) = minimum time of algorithm on any input of size n.
COMPLEXITY:
Complexity refers to the rate at which the storage time grows as a function of the
problem size
Analysis of an Algorithm
The goal of analysis of an algorithm is to compare algorithm in running
time and also Memory management.
Running time of an algorithm depends on how long it takes a
computer to run the lines of code of the algorithm.
1.Machine constants.
2.Doesn’t require algorithm to be implemented.
3.Time taken by program to be prepare.
ASYMPTOTIC NOTATION
1. Big oh (O)notation
2. Big omega (Ω) notation
3. Theta(Θ) notation
4. Little oh (o) notation
5. Little omega(ω) notation
1.Big oh (O) notation:
Asymptotic “less than”(slower rate). This notation mainly represent upper bound
of algorithm run time. Big oh (O)notation is useful to calculate maximum amount of time of
execution.
Above condition is satisfied this notation takes maximum amount of time to execute. So that it is
called worst case complexity.
Ω-Omega notation
Ω-Omega notation: Asymptotic “greater than” (faster rate).
It represent Lower bound of algorithm run time. By using Big Omega notation we can
calculate minimum amount of time. We can say that it is best case time complexity.
Problem:-
Find upper bond ,lower bond & tight bond range for
functions: f(n)= 2n+5
Solution:-
Let us given that f(n)= 2n+5 , now g(n)= n
lower bond=2n, upper bond =3n, tight bond=2n
For Big –oh notation(O):- according to definition
f(n)<=cg(n) for Big oh we use upper bond so
f(n)=2n+5, g(n)=n and c=3 according to definition
2n+5<=3n
Put n=1 7<=3 false Put n=2 9<=6 false Put n=3 14<=9 false Put n=4 13<=12
false Put n=5 15<=15 true
now for all value of n>=5 above condition is satisfied. C=3 n>=5
2. Big - omega notation :- f(n)>=c.g(n) we know that this
Notation is lower bond notation so c=2
Let f(n)=2n+5 & g(n)=2.n
Now 2n+5>=c.g(n);
2n+5>=2n put n=1
We get 7>=2 true for all value of n>=1,c=2 condition is satisfied.
3. Theta notation :- according to definition
c1.g(n)<=f(n)<=c2.g
Thank You