CST 201 Data Structures
Prepared by
Anishamol Abraham
AP, CSE, AJCE
Module 1
Basic Concepts of Data Structures
• System Life Cycle, Algorithms, Performance Analysis,
Space Complexity, Time Complexity, Asymptotic
Notation, Complexity Calculation of Simple Algorithms
Module 2
Arrays and Searching
• Polynomial representation using Arrays, Sparse matrix, Stacks, Queues-
Circular Queues, Priority Queues, Double Ended Queues, Evaluation of
Expressions
• Linear Search and Binary Search
Module 3
Linked List and Memory Management
• Self Referential Structures, Dynamic Memory Allocation, Singly Linked List-
Operations on Linked List. Doubly Linked List, Circular Linked List, Stacks
and Queues using Linked List, Polynomial representation using Linked List
• Memory allocation and de-allocation-First-fit, Best-fit and Worst-fit
allocation schemes
Module 4
Trees and Graphs
• Trees, Binary Trees-Tree Operations, Binary Tree Representation, Tree
Traversals, Binary Search Trees- Binary Search Tree Operations
• Graphs, Representation of Graphs, Depth First Search and Breadth First
Search on Graphs, Applications of Graphs
Module 5
Sorting and Hashing
• Sorting Techniques – Selection Sort, Insertion Sort, Quick Sort, Merge Sort
and Heap Sort
• Hashing- Hashing Techniques, Collision Resolution, Overflow handling,
Hashing functions – Mid square, Division, Folding, Digit Analysis
Text Book
• Ellis Horowitz, Sartaj Sahni and Susan Anderson-Freed, Universities Press,
Fundamentals of Data Structures in C
• Samanta D., Classic Data Structures, Prentice Hall India.
System Life Cycle
Module 1
System Life Cycle
• The system life cycle is a series of stages that are worked through during the
development of a new information system.
• A lot of time and money can be wasted if a system is developed that doesn’t
work properly or do exactly what is required of it.
System Life Cycle-5 Phases
• Requirements
• Analysis
• Design
• Refinement & Coding
• Verification
System Life Cycle
Requirements- Understanding the information you are given (the input) and
what results you are to produce (the output).
• A rigorous description of the input and output which covers all cases.
System Life Cycle
Analysis- Problem breakdown
• Bottom up approach
• Top down approach
System Life Cycle
Design-
• Designer approaches with program needs and operations
• Program need leads to Abstract Data types
• Operations leads to Algorithm specification and algorithm design strategies
• Both are language independent
System Life Cycle
Refinement and coding
• Choose representations for data objects
• Write algorithms for each of the operations on these objects.
• Refine the algorithm
• The order in which you do this may be crucial, because once you choose a
representation, the resulting algorithms may be inefficient.
System Life Cycle
Verification
• Develop correctness proofs for the program
• Testing the program with all variety of inputs
• Remove errors, if any
How to create programs
• Requirements
• Analysis: bottom-up vs. top-down
• Design: data objects and operations
• Refinement and Coding
• Verification
• Program Proving
• Testing
• Debugging
Algorithm
• Definition
An algorithm is a finite set of instructions that accomplishes a particular task.
• Criteria
1. input
2. output
3. definiteness: clear and unambiguous
4. finiteness: terminate after a finite number of steps
5. effectiveness: instruction is basic enough to be carried out
Algorithm Example
• You are given a set of numbers . Youhaveto find the largest value in
thatset.
• Problem Statement : Find the largestnumber in the given list ofnumbers?
• Input : Alist of positive integer numbers. (List must contain at least one
number).
• Output : The largest number in the givenlist of positive integer
numbers.
Algorithm I
• Consider the given list of numbers as 'L' (input), and the largest number as
'max' (Output).
Step 1: Define a variable 'max' and initialize with '0'.
Step 2: Compare first number (say 'x') in the list 'L' with 'max', if 'x' is larger
than 'max', set 'max' to 'x'.
Step 3: Repeat step 2 for all numbers in the list 'L'.
Step 4: Display the value of 'max' as a result.
Algorithm II
• Consider the given list of numbers as 'L' (input), and the largest number as
'max' (Output).
Step 1: Sort the given list in descending order .
Step 2: Assign first element in the list to ‘max’
Step 3: Display the value of 'max' as a result.
Performance Analysis
7. Is the program’s running time acceptable for the task?
Performance Analysis
• In computer science, there may bemultiple algorithms to solve aproblem.
• When there are multiple alternative algorithmsto solve a problem, we analyze them
and pick the one which is best suitable for our requirements.
• Performanceanalysis helps us to select the best algorithm among the multiple
algorithms designed to solve aproblem.
• Performance analysis of an algorithm means predicting the resources which are
required to an algorithm to perform its task.
Performance Analysis
• The formal definition is asfollows...
– Performance analysis of an algorithm is a process of making evaluative judgment
about algorithms.
• Two main evaluating criteria are space and time required by that particular
algorithm
• Based on this, performance analysis of an algorithm can also be defined asfollows…
• Performance analysis of an algorithm isthe process of calculating
space and time required by that algorithm.
Complexity of anAlgorithm
• Performance of an algorithm is expressed asComplexity of Algorithm.
• Usually the complexity of an algorithm is measured in terms of the input data size.
• The complexity of an algorithm is the function f(n) which gives the running time and/or storage
space requirement of the algorithm in terms of the input datasize 'n'.
• Spacerequired to complete the task of an algorithm is termed asSpace Complexity.
• Time required to complete the task of an algorithm is termed asTime Complexity of that
algorithm
• Mostly, the storage space required by an algorithm is simply a multiple of the data sizen. Hence,
generally, complexity shall refer to the running time of thealgorithm.
Best , Average and Worst case complexities
• The complexity of an algorithm can be measured in 3 cases:
• Best case complexity : - It gives the minimum possible complexity f(n).
• Example : in the case of linear search, if the item to be searched is the first element in the list itself
then it can be searched with the minimum number of iteration. It will be the best case complexity of
linear search.
• Average case complexity :- It gives the average complexity f(n).
• In linear search the element to be searched will be in somewhat middle.
• Worst case complexity : -It gives maximum possible complexity f(n) of an algorithm.
• In the case of linear search, if the element to be searched is at the last position then maximum number
of iteration has to be performed.
Space complexity
• Total amount of computer memory required by an algorithm to complete its execution is called as space
complexity of that algorithm.
• The space requirement of an algorithm/ program depends on the following components..
• Instruction Space: It is the amount of memory used to store compiled version of instructions.
• Environmental Stack: It is the amount of memory used to store information of partially executed
functions at the time of function call.
• Data Space: It is the amount of memory used to store all the variables and constants. Data space has two
components:
• Space needed for constants and simple variables in program.
• Space needed by fixed sized structural variables, such as arrays and structures.
• Space needed for dynamically allocated objects such as arrays and class instances.
Space Complexity
S(P)=C+SP(I)
Fixed Space Requirements (C)
Independent of the characteristics of the inputs and outputs
– instruction space
– space for simple variables, fixed-size structured variable, constants
Variable Space Requirements (SP(I))
depend on the instance characteristic I
– number, size, values of inputs and outputs associated with I
– recursive stack space, formal parameters, local variables, return
address
*Program 1.9: Simple arithmetic function (p.19)
float abc(float a, float b, float c)
{
return a + b + b * c + (a + b - c) / (a + b) + 4.00;
}
Sabc(I) = 0
*Program 1.10: Iterative function for summing a list of numbers (p.20)
float sum(float list[ ], int n)
{
Ssum(I) = 0
float tempsum = 0;
int i; Recall: pass the address of the
for (i = 0; i<n; i++) first element of the array &
tempsum += list [i]; pass by value
return tempsum;
}
*Program 1.11: Recursive function for summing a list of numbers (p.20)
float rsum(float list[ ], int n)
{
if (n) return rsum(list, n-1) + list[n-1];
return 0;
} Ssum(I)=Ssum(n)=6n
Assumptions:
*Figure 1.1: Space needed for one recursive call of Program 1.11 (p.21)
Type Name Number of bytes
parameter: float list [ ] 2
parameter: integer n 2
return address:(used internally) 2(unless a far address)
TOTAL per recursive call 6
Time complexity
• Every algorithm requires some amount of computer time to execute its instruction to
perform the task.
• The time complexity of an algorithm is the total amount of time required by an algorithm
to complete its execution.
• Generally, the running time of an algorithm depends upon the following...
• Whether it is running on Single processor machine or Multi processor machine.
• Whether it is a 32 bit machine or 64 bit machine.
• Read and Write speed of the machine.
• The amount of time required by an algorithm to perform Arithmetic operations, logical operations,
return value and assignment operations etc.,
• Input data
Time complexity
• All these factors except input data depends on the system configuration in which
we are executing the program.
• Calculating Time Complexity of an algorithm based on the system configuration is
a very difficult task because the configuration changes from one system to another .
• Actual execution time of an algorithm can be computed only by implementing and
executing the program in a computer.
• It may varies from system to systems.
• It does not seems to be an effective method for analyzing algorithms.
• Hence, requires a unified system model to analyze the algorithm.
Time Complexity
T(P)=C+TP(I)
Compile time (C)
independent of instance characteristics
run (execution) time TP
Definition: TP(n)=caADD(n)+csSUB(n)+clLDA(n)+cstSTA(n)
A program step is a syntactically or semantically meaningful
program segment whose execution time is independent of the
instance characteristics.
Example
– abc = a + b + b * c + (a + b - c) / (a + b) + 4.0
– abc = a + b + c Regard as the same unit
machine independent
Methods to compute the step count
Introduce variable count into programs
Tabular method
– Determine the total number of steps contributed by each statement
step per execution frequency
– add up the contribution of all statements
Iterative summing of a list of numbers
*Program 1.12: Program 1.10 with count statements (p.23)
float sum(float list[ ], int n)
{
float tempsum = 0; count++; /* for assignment */
int i;
for (i = 0; i < n; i++) {
count++; /*for the for loop */
tempsum += list[i]; count++; /* for assignment */
}
count++; /* last execution of for */
return tempsum;
count++; /* for return */
}
2n + 3 steps
Recursive summing of a list of numbers
*Program 1.14: Program 1.11 with count statements added (p.24)
float rsum(float list[ ], int n)
{
count++; /*for if conditional */
if (n) {
count++; /* for return and rsum invocation */
return rsum(list, n-1) + list[n-1];
}
count++;
return list[0];
}
2n+2
Matrix addition
*Program 1.15: Matrix addition (p.25)
void add( int a[ ] [MAX_SIZE], int b[ ] [MAX_SIZE],
int c [ ] [MAX_SIZE], int rows, int cols)
{
int i, j;
for (i = 0; i < rows; i++)
for (j= 0; j < cols; j++)
c[i][j] = a[i][j] +b[i][j];
}
*Program 1.16: Matrix addition with count statements (p.25)
void add(int a[ ][MAX_SIZE], int b[ ][MAX_SIZE],
int c[ ][MAX_SIZE], int row, int cols )
{
int i, j; 2rows * cols + 2 rows + 1
for (i = 0; i < rows; i++){
count++; /* for i for loop */
for (j = 0; j < cols; j++) {
count++; /* for j for loop */
c[i][j] = a[i][j] + b[i][j];
count++; /* for assignment statement */
}
count++; /* last time of j for loop */
}
count++; /* last time of i for loop */
}
Tabular Method
*Figure 1.2: Step count table for Program 1.10 (p.26)
Iterative function to sum a list of numbers
steps/execution
Statement s/e Frequency Total steps
float sum(float list[ ], int n) 0 0 0
{ 0 0 0
float tempsum = 0; 1 1 1
int i; 0 0 0
for(i=0; i <n; i++) 1 n+1 n+1
tempsum += list[i]; 1 n n
return tempsum; 1 1 1
} 0 0 0
Total 2n+3
Recursive Function to sum of a list of numbers
*Figure 1.3: Step count table for recursive summing function (p.27)
Statement s/e Frequency Total steps
float rsum(float list[ ], int n) 0 0 0
{ 0 0 0
if (n) 1 n+1 n+1
return rsum(list, n-1)+list[n-1]; 1 n n
return list[0]; 1 1 1
} 0 0 0
Total 2n+2
Matrix Addition
*Figure 1.4: Step count table for matrix addition (p.27)
Statement s/e Frequency Total steps
Void add (int a[ ][MAX_SIZE]‧ ‧ ‧ ) 0 0 0
{ 0 0 0
int i, j; 0 0 0
for (i = 0; i < row; i++) 1 rows+1 rows+1
for (j=0; j< cols; j++) 1 rows‧ (cols+1) rows‧ cols+rows
c[i][j] = a[i][j] + b[i][j]; 1 rows‧ cols rows‧ cols
} 0 0 0
Total 2rows‧ cols+2rows+1
Time complexity: Frequency counting method
Algorithm for computing the sum of ‘n’ numbers.
• The time complexityf(n)= c1 +
c2(n + 1) + c3n + c4 . Assume
that all the instructions are having
unit execution cost. Then
• f(n) = 1 + (n + 1) + n + 1 = 2n +
3
ASYMPTOTIC
NOTATIONS
ASYMPTOTIC NOTATIONS
• The most common method of analyzing an algorithm is to express the asymptotic
growth rate of algorithms with respect to its input data size.
• Asymptotic behavior describes a function (f (n)) with a defined limit (Another
function – g(n) ). The function may approach this limit, getting closer and closer as
the size of function’s input changes, but will never reach it.
• Asymptotic analysis of an algorithm, refers to defining the mathematical bound of
its run-time performance.
• The notations used to represent the asymptotic growth rate of algorithms are
called asymptotic notations.
ASYMPTOTIC NOTATIONS
• Asymptotic notations are the mathematical notations used to describe
the running time of an algorithm when the input tends towards a
particular value or a limiting value.
• For example: In bubble sort, when the input array is already sorted, the
time taken by the algorithm is linear i.e. the best case.
• But, when the input array is in reverse condition, the algorithm takes the
maximum time (quadratic) to sort the elements i.e. the worst case.
• When the input array is neither sorted nor in reverse order, then it takes
average time. These durations are denoted using asymptotic notations.
ASYMPTOTIC NOTATIONS
• Asymptotic notations are used to analyze an algorithm's running time by identifying its
behavior as the input size for the algorithm increases.
• It gives the rate at which the complexity of an algorithm increases as the input data size
increase.
• Using asymptotic analysis, we can very well conclude the best case, average case and worst
case scenario of an algorithm Behavior of this function is usually expressed in terms of
one or more standard functions.
• Expressing the complexity function with reference to other known functions is called
asymptote.
• Commonly used Asymptotic notations are : Big-oh (O) Notation, Big- omega(Ω) Notation
and Big-theta(Ø)Notation.
ASYMPTOTIC NOTATIONS
• Let f(n) be the actual time complexity of an algorithm (Eg:- f(n) = 3n2 + 2n + 4)
• Let g(n) be the asymptote, by which the growth rate of f(n) is to be expressed.
• Big-oh (O) : The upper limit (worst case ) growth rate
• f(n ) = O(g(n))
• Big-theta(Ø) : Growth in between two boundaries (average case)
• f(n ) = Ø (g(n))
• Big- omega(Ω) : Asymptotic lower bound (Best case )
• f(n) = Ω(g(n))
Big-oh (O) Notation
• The Big-oh notation determines the upper limit of f(n).
• Let g(n) be the said upper limit of the function f(n)
• Formal definition: f(n) = O(g(n)) if and only if, for any two positive constants c and n0, the inequality f(n)
<= c * g(n) holds for any input size n > n0.
• Example:
• f(n) = 2n + 10
• let g(n) = n, We can write f(n) = O(n) only when the 2n + 10 <= c * n hold.
• 2n+ 10 – c*n =0
• (2-c)* n = -10
• let n =1 , -c = -10-2 =-12 => c = 12
• This inequality hold for all the value n0>1 and c >= 12 hence 2n + 10 an be represented as O(n)
Big-oh (O) Notation
• The Big-oh notation can be represented graphically as :
Big-Oh Notation: Asymptotic Upper Bound
• T(n) = f(n) = O(g(n))
• if f(n) <= c*g(n) for all n >= n0, where c & n0 are constants > 0
c*g(n)
f(n)
n
n0
– Example: T(n) = 2n + 5 is O(n). Why?
– 2n+5 <= 3n, for all n >= 5
– Worst case complexity
Big- omega(Ω) Notation
• Big - Omega notation is used to define the lower bound of an algorithm in terms of Time
Complexity.
• Big-Omega notation always indicates the minimum time required by an algorithm for all
input values.
• That means Big-Omega notation describes the best case of an algorithm time complexity.
• Formal Definition: f(n) = Ω(g(n)) if and only if, for any two positive constants c and n0,
the inequality f(n) >= c * g(n) holds for any input size n > n0.
• Explanation : Consider function f(n) as time complexity of an algorithm and g(n) is the
most significant term. Then we can represent f(n) as Ω(g(n)) if and only if f(n) >= C *g(n)
holds for all he input data size n >= n0 where C >=1 and n0 >= 1 are any two
constants
Big- omega(Ω) Notation
• The Big-Omega notation can be represented graphically as
• Example:: Let the complexity f(n) = 3n + 2 and g(n) = n
• If we want to represent f(n) as Ω(g(n)) then it must satisfy f(n) >= C
g(n) for C>0 and n0>= 1
• f(n) >= C g(n)
• ⇒3n + 2 >= C n
• Above condition is always TRUE for all values of C = 1 and n >= 1.
• By using Big - Omega notation we can represent the time complexity
as:
• 3n + 2 = Ω(n)
W Notation: Asymptotic Lower Bound
• T(n) = f(n) = W(g(n))
• if f(n) >= c*g(n) for all n >= n0, where c and n0 are constants > 0
f(n)
c*g(n)
n
n0
– Example: T(n) = 2n + 5 is W(n). Why?
– 2n+5 >= n, for all n > 0
– Best case complexity
Big-theta(Ø)Notation.
• Big - Theta notation is used to define the average bound of an algorithm in
terms of Time Complexity.
• Big - Theta notation always indicates the average time required by an
algorithm for all input values.
• Big - Theta notation describes the average case of an algorithm time
complexity.
Big-theta(Ø)Notation.
• Definition : f(n) = Ø(g(n)) if and only if, for any three positive constants
c1,c2and n0, the inequality c1*g(n) <= f(n) <= c2 * g(n) holds for any input
size n > n0.
• Explanation : - Consider function f(n) as time complexity of an algorithm
and g(n) is the most significant term. Then we can represent f(n) as Ø(g(n))
if and only if C1 g(n) <= f(n) <= C2 g(n) hold for all input data size n >=
n0. where C1, C2 and n0 are positive constants.
Big-theta(Ø)Notation.
• The Big-Theta notation can be represented graphically as
• Example: Let the complexity f(n) = 3n + 2 and g(n) = n
• f(n) = 3n + 2 g(n) = n
• If we want to represent f(n) as Ø(g(n)) then it must satisfy C1 g(n) <= f(n) <=
C2 g(n) for all values of C1 > 0, C2 > 0 and n0>= 1
• C1 g(n) <= f(n) <= C2 g(n) ⇒C1 n <= 3n + 2 <= C2 n
• Above condition is always TRUE for all values of C1 = 1, C2 = 4 and n >= 2.
• By using Big - Theta notation we can represent the time complexity as follows...
• 3n + 2 = Ø(n)
Ø Notation: Asymptotic Tight Bound
• T(n) = f(n) = Ø(g(n))
• if c1*g(n) <= f(n) <= c2*g(n) for all n >= n0, where c1, c2 and n0 are constants > 0
c2*g(n)
f(n)
c1*g(n)
n
n0
– Example: T(n) = 2n + 5 is Ø(n). Why?
2n <= 2n+5 <= 3n, for all n >= 5
Average case complexity
More examples
1. Big ‘oh’: The function f(n)=O(g(n)) iff there exist positive
constants c and n0 such that
f(n) ≤ c*g(n) for all n , n ≥ n0.
Eg1:- 3n+2
f(n)=3n+2 ≤ 4n for all n≥2
≤ 4*n
≤ O(n)
More examples
1. Big ‘oh’: The function f(n)=O(g(n)) iff there exist positive
constants c and n0 such that f(n) ≤ c*g(n) for all n , n ≥ n0.
Eg2:- 10n2 +4n+2
f(n)= 10n2 +4n+2 ≤ 11n2 for all n≥5
= O(n2)
More examples
2. Omega: The function f(n)=Ω(g(n)) iff there exist positive
constants c and n0 such that
f(n) ≥ c*g(n) for all n, n ≥ n0.
Eg1:- 3n+2
f(n)=3n+2 ≥ 3n for all n≥1
≥ 3*n
≥ Ω(n)
More examples
2. Omega: The function f(n)=Ω(g(n)) iff there exist positive
constants c and n0 such that
f(n) ≥ c*g(n) for all n, n ≥ n0.
Eg2:- 10n2 +4n+2
f(n)= 10n2 +4n+2 ≥ n2 for all n≥1
= Ω(n2)
More examples
3. Theta: The function f(n)= Ø(g(n)) iff there exist positive constants c1,c2 and n0
such that
c1g(n) ≤ f(n) ≤ c2 g(n) for all n, n ≥ no.
Eg 1:- 3n+2
f(n)=3n+2
3n+2 ≤4n for all n≥2
3n+2 ≥3n for all n≥2
3n ≤ 3n+2 ≤ 4n
= Ø(n)
More examples
3. Theta: The function f(n)= Ø(g(n)) iff there exist positive constants c1,c2 and n0
such that
c1g(n) ≤ f(n) ≤ c2 g(n) for all n, n ≥ no.
Eg 2:- 10n2 +4n+2
f(n)= 10n2 +4n+2
10n2 +4n+2 ≥ n2 for all n≥5
10n2 +4n+2 ≤ 11n2 for all n≥5
n2≤ 10n2 +4n+2 ≤ 11n2
= Ø(n2)
Asymptotic Notations
► O-notation ---Less than
equal to (“≤”)
► Ω-notation --- Greater
than equal to(“≥”)
► Ø-notation ---Equal to
(“=“)
Asymptotic notations are used to formalize that an algorithm has
running time or storage requirements that are ``less than,'' ``always
greater than,'‘ or ``exactly'' some amount
Compute time complexity of linear search algorithm using frequency count method
Algorithm LinearSearch(A[1..n], n, key)
{
step 1 : loc= -1 1
step 2 : repeat step 3 for i = 1 to n n+1
step 3: if ( A[i]= key) n
step 4: loc = i 1
step 5: break
end if 1
end repeat
step6: if loc == -1 then 1
step7: print “Element not found”
step 8: else 1
step 9: print “ Elementfound at “ loc
step10: stop
} 2n +6
What is the purpose of calculating frequency count? Compute the frequency count of
the followingcode fragment.
for(i=0;i<n;i++)
for(j=0;j<n;j++)
printf(“%d”,a[i][j]);
for(i=0;i<n;i++)
for(j=0;j<n;j++)
printf(“%d”,a[i][j]);
Derive the Big O notation for f(n) = n2+2n+5.
f(n) = n2+2n+5. let g(n) = n2 , f(n) = Og(n) iff.
f(n) <= c*g(n)
n2+2n+5 <= c* n2 n2+2n+5 - c* n2 =0
(1-c) n2+2n+5=0
let n =1
(1-c)+2+5 =0
-c +8 =0 => c = 8.
n2+2n+5 = O(n2 ) , for all the values of c >=8
and n0 >=1
N2+ N = O (N3) Justify your answer..
N2+ N = O (N3) iff
N2+ N <= c*N3
N2+ N - c*N3 = 0
let N =1
1+1-c =0
c =2
Hence we can say that N2+ N = O (N3) for all c >=2and
N0 >= 1