KEMBAR78
Data Structures Material | PDF | Queue (Abstract Data Type) | Matrix (Mathematics)
0% found this document useful (0 votes)
43 views145 pages

Data Structures Material

Uploaded by

Prabu S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views145 pages

Data Structures Material

Uploaded by

Prabu S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 145

BCAS Data Structures

SYLLABUS
DATA STRUCTURES
UNIT I
Introduction: Introduction of Algorithms, Analyzing Algorithms. Arrays: Sparse Matrices
- Representation of Arrays. Stacks and Queues. Fundamentals - Evaluation of Expression Infix to
Postfix Conversion - Multiple Stacks and Queues

UNIT II
Linked List: Singly Linked List - Linked Stacks and Queues - Polynomial Addition -
More on Linked Lists - Sparse Matrices - Doubly Linked List and Dynamic - Storage
Management - Garbage Collection and Compaction.

UNIT III
Trees: Basic Terminology - Binary Trees - Binary Tree Representations - Binary Trees -
Traversal - More on Binary Trees - Threaded Binary Trees - Binary Tree Representation of Trees
- Council Binary Trees. Graphs: Terminology and Representations - Traversals, Connected
Components and Spanning Trees Shortest Paths and Transitive Closure

UNIT IV
External Sorting: Storage Devices -Sorting with Disks: K-Way Merging - Sorting with
Tapes. Symbol Tables: Static Tree Tables - Dynamic Tree Tables - Hash Tables: Hashing
Functions - Overflow Handling.

UNIT V
Internal Sorting: Insertion Sort - Quick Sort - 2 Way Merge Sort - Heap Sort - Shell Sort -
Sorting on Several Keys. Files: Files, Queries and Sequential organizations - Index Techniques -
File Organizations.

TEXT BOOKS
1. Ellis Horowitz, Sartaj Shani, Data and File Structures Galgotia Publication.
2. Ellis Horowitz, Sartaj Shani, Sanguthevar Rajasekaran, “Computer Algorithms
Galgotia Publication.

1
Material Prepare by : Mrs.D.KALPANA Assistant Professor of CS
Text Book(s)
1. Ellis Horowitz, Sartaj Shani, Data Structures, Galgotia Publication.
2. Ellis Horowitz, Sartaj Shani, Sanguthevar Rajasekaran, Computer Algorithms,
GalgotiaPublication.
3. S.Lovelyn Rose, R.Venkatesan, Data Structures, Wiley India Private
Limited,2015, 1st Edition

Unit I - Introduction of Algorithms

• Algorithm
• Use of the Algorithms
• Need for algorithms
• Characteristics of an Algorithm
• Properties of Algorithm
• Types of Algorithms
• Design an Algorithm
• Two Factors of Algorithm Complexity
• Advantages
• Disadvantages
Algorithm

Algorithm
• Use of the Algorithms
• Need for algorithms
• Characteristics of an Algorithm
• Properties of Algorithm
• Types of Algorithms
• Design an Algorithm
• Two Factors of Algorithm Complexity
• Space Complexity
• Time Complexity
• Advantages
• Disadvantages

UNIT 1 - ALGORITHM 2
• An Algorithm is a set of finite rules or instructions to be followed in
calculations or other problem-solving operations.
• A procedure for solving a mathematical problem in a finite number of steps
that frequently involves recursive operations.
• Algorithm refers to a sequence of finite steps to solve a particular problem.
• Algorithms can be simple and complex depending on the users needs.

Use of the Algorithms


• Computer Science: Algorithms form the basis of computer programming
and are used to solve problems ranging from sorting and searching to
complex such as artificial intelligence and machine learning.
• Mathematics: Algorithms are used to solve mathematical problems, such as
finding the optimal solution to a linear equation or finding the shortest path
in a graph.
• Operations Research: Algorithms are used to optimize and make decisions
in fields such as transportation, logistics, and resource allocation.
• Artificial Intelligence: Algorithms are the foundation of artificial intelligence
and machine learning, and are used to develop intelligent systems that can
perform tasks such as image recognition, natural language processing, and
decision-making.
• Data Science: Algorithms are used to analyze, process, and extract insights
from large amounts of data in fields such as marketing, finance, and
healthcare.

Need for algorithms


• Algorithms are necessary for solving complex problems efficiently and
effectively.
• They help to automate processes and make them more reliable, faster, and
easier to perform.
• They are used in various fields such as mathematics, computer science,
engineering, finance, and many others to optimize processes, analyze data,
make predictions, and provide solutions to problems.

UNIT 1 - ALGORITHM 3
Characteristics of an Algorithm

• Clear and Unambiguous - The algorithm should be clear and simple.


• Well Defined Inputs: It has well-defined inputs. An algorithm has zero or
more inputs
• Well Defined Outputs - The algorithm define the output and produce at
least one output.
• Finite-ness - The algorithm must be finite and terminate after a finite time.
• Feasible -The algorithm must be simple, generic, and practical. It can be
executed with the available resources.
• Language Independent - It must be instructions that are implemented in
any language and yet the output will be the same.
• Definiteness - All instructions in an algorithm must be unambiguous,
accurate, and easy to interpret.
• Finiteness - An algorithm must terminate after a finite number of steps in
all test cases. Every instruction which contains a fundamental operator
must be terminated within a finite amount of time. Infinite loops or
recursive functions without base conditions do not possess finiteness.
• Effectiveness - An algorithm must be developed by basic, simple, and
feasible operations.
Properties of Algorithm
• It should terminate after a finite time.
• It should produce at least one output.
• It should take zero or more input.
• It should be deterministic means giving the same output for the same
input case.
• Every step in the algorithm must be effective i.e. every step should do
some work.
Types of Algorithms
• Brute Force Algorithm - A brute force algorithm is the first approach that
comes to finding a problem.

UNIT 1 - ALGORITHM 4
Example - lock of 4-digit PIN, traveling salesman problem, NP ( Non-
deterministic Polynomial Time ) Hard Problems

• Recursive Algorithm - A problem is broken into several sub-parts and called


the same function again and again.
Example - Fibonacci sequence, Factorial of Numbers
• Backtracking Algorithm - The backtracking algorithm builds the solution by
searching among all possible solutions. Whenever a solution fails then trace
back to the failure point build on the next solution and continue this process
till find the solution or all possible solutions.
Example - Puzzles such as eight queens puzzle, crosswords, verbal
arithmetic, Sudoku, and Peg Solitaire
• Searching Algorithm - Searching algorithms are used for searching elements
or groups of elements. There are different types in which the element should
be found.
Example - linear and binary search
• Sorting Algorithm - The algorithms are used to sort groups of data in an
increasing or decreasing order.
Examples - bubble sort, insertion sort, quicksort, merge sort, and heap
sort
• Hashing Algorithm - Hashing algorithms are similar to the searching
algorithm. Hashing refers to the process of generating a fixed-size output
from an input of variable size using the mathematical formulas known as hash
functions. This technique determines an index or location for the storage of
an item in a data structure.
Examples – Message Digest for Security Purposes.
• Divide and Conquer Algorithm - This algorithm breaks a problem into sub-
problems, solves a single sub-problem, and merges the solutions to get the
final solution.
Three steps
• Divide
• Solve
• Combine
Examples – Binary Search, Merge Sort, Quick Sort, Strassen's
Matrix multiplication and Karatsuba Algorithm

• Greedy Algorithm - The solution for the next part is built based on the
immediate benefit of the next part. The one solution that gives the most
benefit than the next part of the solution.
Examples - Kruskal’s algorithm and Prim's algorithm for finding
minimum spanning trees and the algorithm for finding optimum
Huffman trees.
• Dynamic Programming Algorithm - This algorithm uses the concept of
using the already found solution to avoid repetitive calculation of the same
part of the problem. It divides the problem into smaller overlapping
subproblems and solves them.

UNIT 1 - ALGORITHM 5
Examples - Google maps to find paths and search engines
• Randomized Algorithm - In the randomized algorithm, we use a random
number so it gives immediate benefit. The random number helps in deciding
the expected outcome. Used in critical areas like cryptography, network
algorithms, machine learning, and data analysis,
Examples - Throwing Two Dice
Design an Algorithm
• The problem that is to be solved clearly by the algorithm.
• The constraints of the problem must be considered while solving the problem.
• The input to be taken to solve the problem.
• The output is to be expected when the problem is solved.
• The solution to this problem is within the given constraints.

Example - To add two numbers


Step 1: Start
Step 2: Declare three variables a, b, and sum.
Step 3: Enter the values of a and b.
Step 4: Add the values of a and b and store the result in the sum variable,
i.e., sum=a+b.
Step 5: Print the sum
Step 6: Stop
Two Factors of Algorithm Complexity
• Time Complexity - Time is measured by counting the number of key
operations such as comparisons in the sorting algorithm.
• Space Complexity - Space is measured by counting the maximum
memory space required by the algorithm to run and execute.

Space Complexity - The space complexity of an algorithm refers to the amount of


memory required by the algorithm to store the variables and get the result. This
can be for inputs, temporary operations, or outputs.
calculate Space Complexity

Space complexity S(P) of any algorithm P


S(P) = C + SP(I)
where C is the fixed part
S(I) is the variable part of the algorithm, which depends on characteristic I.
The space complexity is calculated by the two components

• Fixed Part: This refers to the space that is required by the algorithm.
Example - Input variables, output variables, program size,
• Variable Part: This refers to the space that can be different based on the
implementation of the algorithm.
Example - Temporary variables, dynamic memory allocation,
recursion stack space, etc.

UNIT 1 - ALGORITHM 6
2. Time Complexity - The time complexity of an algorithm refers to the amount of
time required by the algorithm to execute and get the result. This can be for normal
operations, conditional if-else statements, loop statements, etc.

Time complexity 𝑇(𝑃) - T(P) of any algorithm P is

T(P) = C + TP(I),
C is the constant time part
TP(I) is the variable part of the algorithm, which depends on the characteristic I

Calculate Time Complexity


The time complexity of an algorithm is calculated by the two components
• Fixed time part: Any instruction that is executed just once comes in this
part.
Example - input, output, if-else, switch, arithmetic operations, etc.
• Variable Time Part: Any instruction that is executed more than once, say
n times, comes in this part.
Example - loops, recursion, etc.
Advantages
• It is easy to understand.
• An algorithm is a step-wise representation of a solution to a given
problem.
• In an Algorithm the problem is broken down into smaller pieces or steps
hence, it is easier for the programmer to convert it into an actual
program.
Disadvantages
• Writing an algorithm takes a long time so it is time-consuming.
• Understanding complex logic through algorithms can be very difficult.
• Branching and Looping statements are difficult to show in Algorithms.
================================

Analysis of Asymptotic Notation or Complexity Analysis of Algorithms

Analysis of Asymptotic Notation


• Three asymptotic notations
• Big-O Notation (O-notation) - worst-case
• Omega Notation (Ω-notation) - best-case
• Theta Notation (Θ-notation) - average-case

UNIT 1 - ALGORITHM 7
• The asymptotic analysis measures the efficiency of algorithms that do not
depend on machine-specific constants and don’t require algorithms to be
implemented.
• The time taken by programs to be compared.
• Asymptotic Notation and Analysis based on input size in Complexity Analysis
of Algorithms
• Asymptotic notations are mathematical tools to represent the time complexity
of algorithms for asymptotic analysis.
• Asymptotic Notations are mathematical tools used to analyze the performance
of algorithms by understanding how their efficiency changes as the input size
grows.
• These notations provide to express the behavior of an algorithm’s time or space
complexity as the input size approaches infinity.
• Comparing algorithms directly, asymptotic analysis focuses on understanding
the relative growth rates of algorithms’ complexities.
• Asymptotic analysis allows for the comparison of algorithms’ space and time
complexities by examining their performance characteristics as the input size
varies.

Three asymptotic notations for time or space complexities

• Big-O Notation (O-notation) - worst-case


• Omega Notation (Ω-notation) - best-case
• Theta Notation (Θ-notation) - average-case

Big O (<=) worst case - Upper Big Omega (Ω) (>=) best Big Theta (Θ) (==) worst
Bound case - Lower Bound case - Tight Bound
The rate of growth of an rate of growth is greater meaning the rate of growth
algorithm is less than or equal than or equal to the value. is equal to the value.
to the value.
The upper bound of the The algorithm’s lower The bounding of function
algorithm is represented by bound is represented by from above and below is
Big O notation. Only the above Omega notation. The represented by theta
function is bounded by Big O. asymptotic lower bound is notation. The exact
Asymptotic upper bound is given by Omega notation. asymptotic behavior is done
given by Big O notation. by this theta notation.
It is defined as the upper It is defined as the lower It is defined as the tightest
bound and the upper bound bound and the lower bound bound so it is the average
on an algorithm is the most on an algorithm is the least case times that the algorithm
amount of time required so it amount of time required so can perform.
is the worst case performance. it is the best case.
Mathematically: Big Oh is 0 <= Mathematically: Big Omega Mathematically – Big Theta
f(n) <= Cg(n) for all n >= n0 is 0 <= Cg(n) <= f(n) for all n is 0 <= C2g(n) <= f(n) <=
>= n0 C1g(n) for n >= n0

UNIT 1 - ALGORITHM 8
===============

Array

Array
• Arrays in Data Structures
• Need an Array of Data Structures
• Types of Arrays
• Advantages of Arrays
• Disadvantages of Arrays

• An array is a linear data structure that collects elements of the same data type
and stores them in contiguous and adjacent memory locations.
• Arrays work on an index system starting from 0 to (n-1), where n is the size of
the array.

Arrays in Data Structures

Need an Array of Data Structures

A class consists of ten students, and the class has to publish their results. If the
user had declared all ten variables individually, it would be challenging to manipulate
and maintain the data.

UNIT 1 - ALGORITHM 9
If more students were to join, it would become more difficult to declare all the
variables and keep track of it. To overcome this problem, arrays came into the picture.

Types of Arrays

One-Dimensional Array Two – Dimension Array Multi-Dimensional Array


One-dimensional arrays, A two-dimensional array A multi-dimensional array
also known as single is a collection of elements is an array with more than
arrays with only one arranged in rows and one level or dimension
dimension like a single columns. It can be
row or a single column visualized as a table with
rows and columns to form
cells.

Advantages of Arrays

• Arrays store multiple elements of the same type with the same name.
• You can randomly access elements in the array using an index number.
• Array memory is predefined, so there is no extra memory loss.
• Arrays avoid memory overflow.
• 2D arrays can efficiently represent the tabular data.

Disadvantages of Arrays

• The number of elements in an array should be predefined


• An array is static. It cannot alter its size after declaration.
• Insertion and deletion operation in an array is quite tricky as the array stores
elements in continuous form.
• Allocating excess memory than required may lead to memory wastage.

=======================

UNIT 1 - ALGORITHM 10
Sparse Matrix

• Use of Sparse Matrix


• Two representations
• Arrays Representation
• Linked List Representation
• Advantages
• Disadvantages

• A matrix is a two-dimensional data object made of m rows and n columns,


therefore having total m x n values.
• If most of the elements of the matrix have a 0 value, then it is called a sparse
matrix.
Use of Sparse Matrix
• Storage: There are fewer non-zero elements than zeros and thus less
memory can be used to store only those elements.
• Computing time: Computing time can be saved by logically designing a
data structure traversing only non-zero elements.

Example 00304
00570
00000
02600

• Representing a sparse matrix by a 2D array leads to the wastage of lots of


memory as zeroes in the matrix are of no use.
• Instead of storing zeroes with non-zero elements, the user can store non-zero
elements.
• This means storing non-zero elements with triples like Row, Column and
value.
Two representations

• Array Representation or Triplet Representation


• Linked List Representation

Arrays Representation
The two-dimensional array is used to represent a sparse matrix in which there
are three rows as
• Row - Index of the row, where the non-zero element is located
• Column - Index of the column, where the non-zero element is located
• Value: - Value of the non-zero element located at the index in row and
column.

UNIT 1 - ALGORITHM 11
• Time Complexity - O(NM), where N is the number of rows in the sparse
matrix and M is the number of columns in the sparse matrix.
• Auxiliary Space - O(NM), where N is the number of rows in the sparse
matrix and M is the number of columns in the sparse matrix.
Linked List Representation
Each node in the linked list has four fields. These four fields are
• Row: Index of the row, where the non-zero element is located
• Column: Index of the column, where the non-zero element is located
• Value: Value of the non-zero element located at the index in row and
column.
• Next node: Address of the next node

• Time Complexity - O(N*M), where N is the number of rows in the sparse


matrix, and M is the number of columns in the sparse matrix.
• Auxiliary Space - O(K), where K is the number of non-zero elements in the
array.
Advantages
• Memory Management
• Using sparse matrices to store data that contains a large number of zero-
valued elements can save a significant amount of memory and speed up
the processing of that data.
• Store only the nonzero elements of the matrix, together with their
indices.
• Reduce computation time by eliminating operations on zero elements.
• Computational Efficiency
• operations with sparse matrices do not perform unnecessary low-level
arithmetic, such as zero-adds.

UNIT 1 - ALGORITHM 12
• Store only the nonzero elements of the matrix, together with their indices.
• Reduce computation time by eliminating operations on zero elements.
• The efficiencies in execution time for programs working with large amounts
of sparse data.
Disadvantages
• Not everything can be made into a sparse matrix for representation.
• Holds less information.
=================

Stack and Queue

• Stack and Queue


• Two similarities of the stack and queue
• Stacks
• Operations on Stack
• Two ways to implement the stack
• Uses of Stack
• Applications of Stack
• Advantages
• Disadvantages
• Queue
• Operations on queue
• Uses of queue
• Applications of Queue
• Advantages
• Disadvantages

Data structures are fundamental concepts for organizing and storing data
efficiently.

Two Structures used in programming and algorithm design

• Stack
• Queue

They form the backbone of many complex systems and applications.

Two similarities of the stack and queue

• Linear data structure - Both the stack and queue are the linear data structure,
which means that the elements are stored sequentially and accessed in a single
run.

UNIT 1 - ALGORITHM 13
• Flexible in size - Both the stack and queue are flexible in size, which means
they can grow and shrink according to the requirements at the run-time.

Stacks
• A stack is a linear data structure that follows the Last In First Out -LIFO
principle.
• This means that the last element added to the stack is the first one to be
removed.
Example - a pile of plates can only add or remove the top plate

Operations on Stack
• Push - Adds an element to the top of the stack.
• Pop - Removes and returns the top element of the stack.
• Peek or Top - Returns the top element of the stack without removing it.
• IsEmpty - Checks if the stack is empty.
• Size - Returns the number of elements in the stack.
In stack, the top is a pointer which is used to keep track of the last inserted element.
To implement the stack the user should know the size of the stack. The user needs to
allocate the memory to get the size of the stack.

Two ways to implement the stack


o Static: The static implementation of the stack can be done with the help of
arrays.
o Dynamic: The dynamic implementation of the stack can be done with the help
of a linked list.

Uses of Stack
• Function Call Management: The call stack in programming languages
keeps track of function calls and returns.
• Expression Evaluation: Used in parsing expressions and evaluating
postfix or prefix notations.
• Backtracking: Helps in algorithms that require exploring all possibilities,
such as maze solving and depth-first search.

UNIT 1 - ALGORITHM 14
Applications of Stacks
• Function calls - Stacks are used to keep track of the return addresses of
function calls, allowing the program to return to the correct location after
a function has finished executing.
• Recursion - Stacks are used to store the local variables and return
addresses of recursive function calls, allowing the program to keep track
of the current state of the recursion.
• Expression evaluation - Stacks are used to evaluate expressions in
postfix notation (Reverse Polish Notation).
• Syntax parsing - Stacks are used to check the validity of syntax in
programming languages and other formal languages.
• Memory management - Stacks are used to allocate and manage memory
in some operating systems and programming languages.
Advantages of Stacks
• Simplicity - Stacks are a simple and easy-to-understand data structure,
making them suitable for a wide range of applications.
• Efficiency - Push and pop operations on a stack can be performed in
constant time (O(1)), providing efficient access to data.
• Last-in, First-out (LIFO) - Stacks follow the LIFO principle, ensuring that
the last element added to the stack is the first one removed. This behavior
is useful in many scenarios, such as function calls and expression
evaluation.
• Limited memory usage - Stacks only need to store the elements that have
been pushed onto them, making them memory-efficient compared to other
data structures.
Disadvantages of Stacks
• Limited access - Elements in a stack can only be accessed from the top,
making it difficult to retrieve or modify elements in the middle of the
stack.
• Overflow - If more elements are pushed onto a stack than it can hold, an
overflow error will occur, resulting in a loss of data.
• Not suitable for random access - Stacks do not allow for random access
to elements, making them unsuitable for applications where elements
need to be accessed in a specific order.
• Limited capacity - Stacks have a fixed capacity, which can be a limitation
if the number of elements that need to be stored is unknown or highly
variable.

Queue

• A Queue is a linear data structure.


• It is an ordered list that follows the principle FIFO - First In -First Out.
• A Queue is a structure that follows some restrictions on insertion and deletion.
• In Queue, insertion is performed from one end, and that end is known as a rear
end. The deletion is performed from another end, and that end is known as a
front end.

UNIT 1 - ALGORITHM 15
• In Queue, the technical words for insertion and deletion
are enqueue() and dequeue().

Operations on Queue
• Enqueue - Adds an element to the end (rear) of the queue.
• Dequeue - Removes and returns the front element of the queue.
• Front or Peek - Returns the front element of the queue without removing
it.
• IsEmpty - Checks if the queue is empty.
• Size - Returns the number of elements in the queue.

Uses of Queue
• Task Scheduling - Operating systems use queues to manage tasks and
processes.
• Breadth-First Search (BFS) - In graph traversal algorithms, queues help
in exploring nodes level by level.
• Buffering - Used in situations where data is transferred asynchronously,
such as IO buffers and print spooling.
Applications of Queue
• Multi programming - Multi programming means when multiple
programs are running in the main memory. It is essential to organize these
multiple programs and these multiple programs are organized as queues.
• Network - In a network, a queue is used in devices such as a router or a
switch. another application of a queue is a mail queue which is a directory
that stores data and controls files for mail messages.
• Job Scheduling - The computer has a task to execute a particular number
of jobs that are scheduled to be executed one after another. These jobs are
assigned to the processor one by one which is organized using a queue.
• Shared resources - Queues are used as waiting lists for a single shared
resource.
Advantages of Queue
• A large amount of data can be managed efficiently with ease.
• Operations such as insertion and deletion can be performed with ease as
they follow the first in first out rule.
• Queues are useful when a particular service is used by multiple
consumers.

UNIT 1 - ALGORITHM 16
• Queues are fast in speed for data inter-process communication.
• Queues can be used in the implementation of other data structures.

Disadvantages of Queue
• The operations such as insertion and deletion of elements from the
middle are time-consuming.
• In a classical queue, a new element can only be inserted when the
existing elements are deleted from the queue.
• Searching an element takes O(N) time.
• Maximum size of a queue must be defined prior in case of array
implementation.

Difference between Stack and Queue

Stack Queue
A linear data structure that follows A linear data structure that follows
the Last In First Out (LIFO) principle. the First In First Out (FIFO) principle.
Enqueue (add an item), Dequeue (remove
Push (add an item), Pop (remove an
an item), Front (view the first item), Rear
item), Peek (view the top item)
(view the last item)
Elements are added and removed from Elements are added at the rear and
the same end (the top). removed from the front.
Function call management (call stack),
Scheduling processes in operating
expression evaluation and syntax
systems, managing requests in a printer
parsing, undo mechanisms in text
queue, breadth-first search in graphs.
editors.
Browser history (back button), Customer service lines, CPU task
reversing a word. scheduling.
A stack of plates: you add and remove A queue at a ticket counter: the first person
plates from the top. in line is the first to be served.
Enqueue: O(1)
Push: O(1)
Dequeue: O(1)
Pop: O(1)
Front: O(1)
Peek: O(1)
Rear: O(1)
Typically uses a contiguous block of Typically uses a circular buffer or linked
memory or linked list. list.
Can be implemented using arrays or Can be implemented using arrays, linked
linked lists. lists, or circular buffers.

==================

UNIT 1 - ALGORITHM 17
Evaluation of expressions

Evaluation of expressions
• Three different types of arithmetic
expression
• Infix Notation
• Prefix Notation
• Postfix Notation
• Parsing Expressions
• Precedence
• Associativity

• An expression is any word or group of words or symbols that generates a value


on evaluation. Parsing expression means analysing the expression for its words
or symbols depending on a particular criterion.
• Expression parsing is a term used in a programming language to evaluate
arithmetic and logical expressions.
• The way to write an arithmetic expression is known as a notation.

Three different types of arithmetic expression

• Infix Notation
• Prefix (Polish) Notation
• Postfix (Reverse-Polish) Notation

These notations are used operators in expression.

Infix Notation

• The expression in infix notation like a - b + c, where operators are used in-
between operands.
• It is easy for us humans to read, write, and speak in infix notation but the same
does not go well with computing devices.
• An algorithm to process infix notation could be difficult and costly in terms of
time and space consumption.

Prefix Notation

• In this notation, the operator is prefixed to operands, i.e. operator is written


ahead of operands.

Example - +ab. This is equivalent to its infix notation a + b.

• Prefix notation is also known as Polish Notation.

UNIT 1 - ALGORITHM 18
Postfix Notation

• This notation style is known as Reversed Polish Notation.


• In this notation style, the operator is postfixed to the operands like the operator
is written after the operands.

Example - ab+ - This is equivalent to its infix notation a + b.

Difference in three notations

Infix Notation Prefix Notation Postfix Notation


a+b +ab ab+
(a + b) ∗ c ∗+abc ab+c∗
a ∗ (b + c) ∗a+bc abc+∗
a/b+c/d +/ab/cd ab/cd/+
(a + b) ∗ (c + d) ∗+ab+cd ab+cd+∗
((a + b) ∗ c) - d -∗+abcd ab+c∗d-

Parsing Expressions

Parsing Expressions is not a very efficient way to design an algorithm or


program to parse infix notations. Instead, these infix notations are first converted into
either postfix or prefix notations and then computed.

To parse any arithmetic expression the user needs operator precedence and
associativity.

Precedence

When an operand is in between two different operators, which operator will


take the operand first, is decided by the precedence of an operator over others.

Example

As multiplication operation has precedence over addition, b * c will be


evaluated first.

Associativity

Associativity describes the rule where operators with the same precedence
appear in an expression.

UNIT 1 - ALGORITHM 19
Example - In expression a + b − c, both + and − have the same precedence, then
which part of the expression will be evaluated first, is determined by associativity of
those operators. Here, both + and − are left-associative, so the expression will be
evaluated as (a + b) − c.

Precedence and associativity determine the order of evaluation of an


expression.

Following is an operator precedence and associativity table from highest to


lowest.

Operator Precedence Associativity


Exponentiation ^ Highest Right Associative
Multiplication ( ∗ ) & Division ( / ) Second Highest Left Associative
Addition ( + ) & Subtraction ( − ) Lowest Left Associative

Example − In a + b*c, the expression part b*c will be evaluated first, with
multiplication as precedence over addition. The user here uses parenthesis for a + b to
be evaluated first, like (a + b)*c.

==========================

Convert Infix to Postfix Conversion

Operands - Variables – a, b, x, y, p, q, k
Operators – Symbols - + , -, *, %

Infix Postfix Prefix


If operators are present If operators are present If operators are present
in between the two after the operands. before the operands.
operands.
A+B AB+ +AB
Rules for infix to Postfix
1. Priority of operators
• Exponent has the highest priority
• Multiplication and division are the next priority
• Addition and Subtraction have the lowest priority
2. No two operators that have the same priority can stay together in the stack
column.
Example - +- === No two operators have come under the same priority
3. The lowest priority cannot be placed before the highest priority.
Example - *+ ======= Highest priority cannot come before the
lowest priority
4. If the operator is closed within the parenthesis.
Example – (*) === pop or remove the multiplication within the
parenthesis

UNIT 1 - ALGORITHM 20
Conversion of Infix to Postfix

A + B / C * (D + E) – F

Result
Stack –
( Postfix –
Only
Symbol only Variables Remarks
Operators
and the pop of
or Symbols
the operands )
( (
A ( A
+ (+ A
B (+ AB
/ (+/ AB
C (+/ ABC
At the top of the stack, there is division.
* (+* ABC/ Multiplication and Division have the same
priority so pop the division – Rule 2
( (+*( ABC/
D (+*( ABC/D
+ (+*(+ ABC/D
E (+*(+ ABC/DE
) (+*(+) ABC/DE+ If the operator is closed within the
parenthesis, then need to pop the addition
operator – Rule 4
- (- ABC/DE+*+ Multiplication is the highest priority and
Minus is the lowest priority so the lowest
priority cannot be placed before the
highest priority. Then pop * Rule – 3.
Then + and – have the same priority so
pop the + - Rule 2
F (-) ABC/DE+*+F If the operator is closed within the
parenthesis, then need to pop the Minus
operator – Rule 4
ABC/DE+*+F-

UNIT 1 - ALGORITHM 21
Conversion of Infix to Prefix

A*B–C/D+E
Expression Remarks
(A*B)-(C/D)+E Place the parenthesis for each Expression. Based on the
higher priority parenthesis given.
Multiplication and Division has the highest priority
((*AB) – (/CD))+E Apply the prefix in the parenthesis that is operator
must place before the operands.
+(*AB) – (/CD)E Apply the prefix in the parenthesis that is operator
must place before the operands.
+– (*AB) (/CD)E Apply the prefix in the parenthesis that is operator
must place before the operands.
+– *AB/CDE Remove the parenthesis

(A+B) / C * D - E
Expression Remarks
(+AB) / C * D - E Consider the parenthesis for the Expression.
((+AB) / C) * D - E Apply the prefix in the parenthesis that is operator
must place before the operands.
((/ +ABC) * D ) - E Apply the prefix in the parenthesis that is operator
must place before the operands.
(* / +ABCD ) - E Apply the prefix in the parenthesis that is operator
must place before the operands.
- * / +ABCDE Remove the parenthesis

Conversion of Postfix to Infix

AB – DE +F */
Reading of Postfix Stack Top Remarks
A A
B B
- (A – B) Consider the Operators as the
Expression 1 (A – B) as the top
of the stack
D D
E E
+ D+E Consider the Operators as the
Expression 2
F F
* (D + E) * F Consider the Operators as the
Expression 3
/ ((A – B) / ( D + E)) * F Consider the Operators as the
Expression 4

UNIT 1 - ALGORITHM 22
Conversion of Prefix to Infix
+-*AB / CDE
Step 1 – First make it as Reverse the Prefix Expressions – EDC / BA * - +

Step 2 – Reading of prefix and consider as stack top

Reading of Stack Top Remarks


Postfix
E E
D D
C C
/ C/D Consider the Operators as the
Expression 1
B B
A A
* B*A Consider the Operators as the
Expression 2
- (B * A) – (C / D) Consider the Operators as the
Expression 3
+ (B * A) – (C / D) +E Consider the Operators as the
Expression 4

Conversion of Postfix to Prefix

ABC / - AK /L -*
Reading of Stack Top Remarks
Postfix
A
B
C
/ /BC Consider the Operators as the
Expression 1
- -A/BC Consider the Operators as the
Expression 2
A
K
/ -A/BC /AK Consider the Operators as the
Expression 3
L - A/BC /AKL
- -A/BC -/AKL Consider the Operators as the
Expression 5
* * - A/BC -/AKL

UNIT 1 - ALGORITHM 23
Unit: II LINKEDLIST 8 hours
Linked List: Singly Linked List – Linked Stack and Queue - Polynomial Addition –
More on Linked List – Sparse Matrix – Doubly Linked List - Dynamic Storage
Management – Garbage Collection and Compaction

 Linked List
 Polynomial Addition
 Doubly Linked List
 Dynamic Storage Management
 Garbage Collection and Compaction

Linked List

Linked List
 Two Fields
 Uses of Linked List
 Types of Linked List
 Singly Linked Lists
 Structure of Singly Linked List
 Creation and Traversal of Singly Linked List
 Doubly Linked Lists
 Structure of Doubly Linked List
 Advantage
 Creation and Traversal of Doubly Linked List
 Circular Linked Lists
 Structure of Circular Linked List
 Advantage
 Creation and Traversal of Circular Linked List.
 Two Types
 Applications
 Two ways to create a circular linked list
 Two ways to traverse a circular linked list
 Circular Doubly Linked List
 Structure of Doubly Circular Linked List
 Creation and Traversal of Doubly Circular Linked List
Operations in Linked List
 Insertion Operation Linked List
 Three Ways
 Deletion Operation Linked List
 Three Ways
 Reverse Operation Linked List

Unit II Liked List Page 1


 A linked list is a linear data structure which can store a collection of nodes
connected together through links like pointers.
 Linked lists nodes are not stored at a contiguous location and they are linked using
pointers to the different memory locations.
 A node consists of the data value and a pointer to the address of the next node
within the linked list.
 A linked list is a dynamic linear data structure whose memory size can be
allocated or de-allocated at run time based on the operation insertion or deletion
using memory efficiently.
 Linked lists can be used to implement various data structures like a stack, queue,
graph, hash maps, etc.
 A linked list that stores a sequence of elements.

o A linked list starts with a head node which points to the first node.
o Every node consists of data which holds the actual data value associated with the
node and a next pointer which holds the memory address of the next node in the
linked list.
o The last node is called the tail node in the list which points to null indicating the
end of the list.

Two Fields

o Address field - Data stored at that particular address


o Pointer field - The last node of the list contains the address of the next node in the
memory and point to the null.

Uses of Linked List


o The list is not essential to be contiguously present in the memory. The node can
reside anywhere in the memory and linked together to make a list. This optimized
utilization of space.
o List size is limited to the memory size and doesn't need to be declared in advance.
o Empty node cannot be present in the linked list.

Unit II Liked List Page 2


o The user can store values of primitive types or objects in the singly linked list.

Linked Lists Arrays


Linked lists are dynamic in size and any The size is given at the time of creation
number of nodes can be added in the and so arrays are of fixed length
linked lists dynamically
Linked lists can store various nodes of An array can accommodate similar types
different data types. of data types
 Data Structure: Non-contiguous  Data Structure: Contiguous
 Memory Allocation : Allocated one by  Memory Allocation : Allocated to the
one to individual elements whole array
 Insertion/Deletion: Efficient  Insertion/Deletion: Inefficient
 Access: Sequential  Access: Random

Advantages
 Dynamic data structure - A linked list is a dynamic arrangement at runtime by
allocating and deallocating memory.
 No memory wastage - In the Linked list, efficient memory utilization can be
achieved since the size of the linked list increase or decrease at run time so there is
no memory wastage and there is no need to pre-allocate the memory.
 Implementation - Linear data structures like stacks and queues are often easily
implemented using a linked list.
 Insertion and Deletion Operations - Insertion and deletion operations are easier.
There is no need to shift elements after the insertion or deletion of an element only
the address present in the next pointer needs to be updated.
 Flexible - The elements in Linked List are not stored in contiguous memory
locations.
 Efficient for large data - When working with large datasets and can grow and
shrink dynamically.
 Scalability - Contains the ability to add or remove elements at any position.
Disadvantages
 Memory usage - More memory is required a pointer to store the address of the
next element.
 Traversal -It is more time-consuming as compared to an array.
 Reverse Traversing - Not possible in a singly linked list and in doubly-linked list
can possible with a pointer to the previously connected nodes with each node.
There is wastage of memory.
 Random Access - Random access is not possible due to dynamic memory
allocation.
 Lower efficiency at times - Searching for an element can be slower in a linked list.
 Complex implementation - More complex to array for a complex programming.

Unit II Liked List Page 3


 Difficult to share data - It is not possible to directly access the memory address of
an element in a linked list.
 Not suited for small dataset - Cannot for small dataset to an array.

Types of Linked List

Singly Linked Lists

Singly linked lists contain two buckets in one node. They are

 One bucket holds the data


 Other bucket holds the address of the next node of the list.

Traversals can be done in one direction only as there is only a single link between
two nodes of the same list.

It is a unidirectional linked list and can traverse it in one direction from head to tail
node.

Structure of Singly Linked List


 One application is to store a list of items that need to be processed in order.

Unit II Liked List Page 4


Example - To store the list of tasks that need to complete with the head node that
represents the first task to be completed and the tail node represents the last task to be
completed.
 Singly linked lists are used in algorithms that need to process a list of items
in reverse order.
Example – Sorting Algorithm – Quick sort uses to store the list of items that need
to be sorted. By processing the list in reverse order, quick sort can sort the list more
efficiently.

Creation and Traversal of Singly Linked List


 Each element in the list is called a node and each node refers to the next
node in the list. The first node in the list is called the head and the last node
in the list is called the tail.
 To create a singly linked list, first need to create a node class. Each node
have two data members
 An integer value
 A reference to the next node in the list.
 Next, to create a Linked List class. This class have two data members
 A head node - The head node store the first element in the list
 A tail node - The tail node will store the last element in the list.
 Add an element to the list - The user needs to create a new node and set the
next reference of the previous tail node to point to the new node. Then, the
user can set the new node as the new tail of the list.
 Remove an element - To find the node that contains the value that the user
wants to remove. The user can traverse the list until find the node with the
matching value.
 Find the node – The user need to set the next reference of the previous node
to point to the next node.
 Search for an element in the list – The user need to traverse the list until the
user fined the node with the matching value.
 Traverse the list – The user can start at the head node and follow the next
references until reach the tail node.

Doubly Linked Lists

Doubly Linked Lists contain three buckets in one node. They are

 One bucket holds the data


 The other buckets hold the addresses of the previous
 The next nodes in the list.

Unit II Liked List Page 5


The list is traversed twice as the nodes in the list are connected to each other from
both sides.

A doubly linked list is a bi-directional linked list. The user can traverse it in both
directions. Unlike singly linked lists, its nodes contain one extra pointer called the
previous pointer. This pointer points to the previous node.

Structure of Doubly Linked List

A doubly linked list of singly linked lists is a data structure that consists of a set of
singly linked lists for each of which is doubly linked. It is used to store data in a way that
allows for fast insertion and deletion of elements.

Each Singly Linked List is made up of three parts

 A head - Contains a pointer to the first element in the list


 Data – The value stored in the data
 A tail. - Contains a pointer to the last element
Advantages

 The data structures allows for quick insertion and deletion of elements.
 It is easy to implement and can be used in a variety of applications.

Creation and Traversal of Doubly Linked List

A doubly linked list is a type of data structure that allows for the storage of data in
a linear like a singly linked list. A doubly linked list allows for both forward and
backward traversal of the data stored within it. This makes it an ideal data structure for
applications that require the ability to move both forward and backward through a list of
data.

Unit II Liked List Page 6


To create a doubly linked list, the user first need to create a Node class used to
store data. This Node class has two attributes
 Data - The data attribute used to store the data in the list
 Next. - The next attribute used to store a reference to the next node in the
list.
Once the user has the Node class then can create doubly linked list. This class has
two attributes
 Head - The head attribute used to store a reference to the first node in the
list.
 Tail. - The tail attribute used to store a reference to the last node in the list.
After the list class is created the user can add data to it. To add data to the list for
creating a new node and set its data attribute to the data that want to add. Then, the user
need to set the next attribute of the new node to point to the head node of our list. Finally,
the user needs to set the head node of our list to point to the new node.

To add data to our list, we can write a function to traverse our list. To do this, the
user needs to create a variable that will keep track of the current node that is traversing.
The user can set this variable to the head node of our list to start. Then, the user need to
create a while loop that will continue until the current node is None, which indicates that
have reached the end of our list.

Circular Linked Lists

Circular linked lists can exist in both singly linked list and doubly linked list. Since
the last node and the first node of the circular linked list are connected, the traversal in
this linked list will go on forever until it is broken.

A circular linked list is a unidirectional linked list. The user can traverse it in only
one direction. But this type of linked list has its last node pointing to the head node. So
while traversing, need to be careful and stop traversing when you revisit the head node.

Unit II Liked List Page 7


Structure of Circular Linked List

A circular linked list is to store data in a linear, sequential fashion. A circular


linked list has no beginning or end. It is essentially a ring of nodes. This makes circular
linked lists ideal for applications where data needs to be processed in a continuous loop,
such as in real-time applications or simulations.

Circular linked lists are implemented using a singly linked list data structure. This
means that each node in the list is connected to the next node via a pointer. The last node
in the list is then connected back to the first node, creating the ring-like structure.

Accessing data in a circular linked list is to accessing data in a linked list. There is
no defined beginning or end to the list. It can be difficult to know where to start
traversing the list. As a result, many implementations of circular linked lists use a head
pointer that points to the first node in the list.

Advantages
 There is no defined beginning or end to the list, data can be added and removed
from the list at any time. This makes circular linked lists ideal for applications
where data needs to be constantly added or removed, such as in a real-time
application.
 The data is stored in a ring structure. It can be accessed in a continuous loop. This
makes circular linked lists ideal for applications where data needs to be processed
in a continuous loop such as in a real-time application or simulation.
 There is no defined beginning or end to the list, circular linked lists are more
efficient when it comes to memory usage and memory for pointers that point to
the beginning and end of the list.
 Circular linked lists require a single pointer to be stored in memory is the head
pointer.
 Circular linked lists are easier to implement the use of additional data structures,
such as stacks and queues, to keep track of the list's beginning and end
Creation and Traversal of Circular Linked List
A circular linked list is a type of data structure that can store a collection of items.
It is related to both the singly linked list and the doubly linked list. Unlike a singly linked
list, which has a NULL pointer at the end of the list, a circular linked list has a pointer
that points back to the first node in the list. This makes it possible to traverse the entire
list without having to keep track of the end of the list.
Two types of circular linked lists
 Singly linked - Each node has a pointer that points to the next node in the list. The
last node in the list points back to the first node.
 Doubly linked - Each node has pointers that point to both the next node and the
previous node.

Unit II Liked List Page 8


Applications
 They can be used to implement queues, stacks, or deques.
 They can also be used for applications that require circular buffers or circular
arrays.
Two ways to create a circular linked list
 Create a singly linked list and make the last node point to the first node.
 Create a doubly linked list and make the last node point to the first node and the
first node point to the last node.
Two ways to traverse a circular linked list:
 Traverse the list until reach the head node again.
 Keep track of the nodes visited.

Doubly Circular Linked List (DCL)


A circular doubly linked list is a mixture of a doubly linked list and a circular
linked list. Like the doubly linked list, it has an extra pointer called the previous pointer,
and similar to the circular linked list, its last node points at the head node. This type of
linked list is the bi-directional list and can traverse it in both directions.

Structure of Doubly Circular Linked List


A doubly circular linked list is a variation of the standard doubly linked list. In a
DCL, the first and last nodes are linked together, forming a circle. This allows for quick
and easy traversal of the entire list without the need for special case handling at the
beginning or end of the list.
 DCLs are often used in applications where data needs to be processed in a
sequential fashion, such as in a video or audio player.
 The circular nature of the list allows for quick and easy movement from one
node to the next without the need for special case handling.
 DCLs are also sometimes used in applications where data needs to be
accessed randomly, such as in a database.
 The list allows for quick and easy movement to any node in the list without
the need for special case handling.

Unit II Liked List Page 9


Creation and Traversal of Doubly Circular Linked List
 The user first need to create a node structure that will hold our data and the
pointers. The user can create a head pointer and initialize it to null.
 Next, to create a function to insert new nodes into the list. This function
should take as input the data that we want to insert and the head pointer. It
should then create a new node with this data and insert it into the list.
 Finally to write a function to traverse the list and print out the data
contained in each node. This function should take as input the head pointer.
It should then loop through each node in the list, printing out the data
contained in that node.

Operations in Linked List

 Insertion − Adds an element in the list.


 Deletion − Deletes an element from the list.
 Display − Displays the complete list.
 Search − Searches an element using the given key.

Insertion Operation Linked List

Adding a new node in linked list is a more than one step activity. A node has the
same structure and finds the location where to be inserted.

NewNode.next -> RightNode;

Unit II Liked List Page 10


The next node at the left should point to the new node.

LeftNode.next -> NewNode;

The new node has in the middle of the two.

Three ways of Insertion in linked list

The insertion into a singly linked list can be performed at different positions.
Based on the position of the new node being inserted, the insertion is categorized into the
following categories.

SN Operation Description
It involves inserting any element at the front of the list. The
Insertion at
1 user need make the new node as the head of the list. Adding an
beginning
element at the beginning of the list.
It involves insertion at the last of the linked list. The new node
Insertion at end of
2 can be inserted as the only node in the list and inserted at the
the list
last one. Adding an element at the ending of the list.
It involves insertion after the specified node of the linked list.
Insertion after The user uses the desired number of nodes to reach the node
3
specified node after which the new node will be inserted. Adding an element
at any position within the list

Deletion Operation Linked List

Deletion is a more than one step process with pictorial representation. First, locate
the target node to be removed, by using searching algorithms.

Unit II Liked List Page 11


The left or previous node of the target node now should point to the next node of
the target node.

LeftNode.next -> TargetNode.next

This will remove the link that pointing to the target node by using the remove of
the target node is pointing at.

TargetNode.next -> NULL;

The user needs to use the deleted node. The user can keep in memory and
deallocate memory and wipe off the target node completely.

The node is inserted at the beginning of the list. While inserting it at the end, the
second last node of the list should point to the new node and the new node will point to
NULL.

Unit II Liked List Page 12


Three ways of Deletion in linked lists

Operation Description
It involves deletion of a node from the beginning of the list. Deleting
Deletion at
an element from the beginning of the list. For this the head to the
beginning
second node.
Deletion at the It involves deleting the last node of the list. The list can either be
end of the list empty or full. Deleting an element from the ending of the list.
Deletion after It involves deleting the node after the specified node in the list.
specified node Deleting an element at any position of the list.
In traversing, visit each node of the list at least once in order to
Traversing perform some specific operation on it.
Example - printing data part of each node present in the list.
In searching, the user match each element of the list with the given
Searching element. If the element is found on any of the location then location
of that element is returned otherwise null is returned. .

Reversal Operation Linked List

The user needs to make the last node to be pointed by the head node and reverse
the whole linked list. This operation is a thorough one.

First, traverse to the end of the list. It should be pointing to NULL. The user shall
make it point to its previous node.

The user has to make sure that the last node is not the last node. The head node
pointing to the last node to make all left side nodes point to their previous nodes one by
one.

Unit II Liked List Page 13


Except the node is the first node pointed by the head node, all nodes should point
to their predecessor, making them their new successor. The first node will point to NULL.

The head node point to the new first node by using the temp node

==========

Linked list of stack


Linked list of stack
Stack Operations
Push operation
POP operation
Peek Operation
Applications of Stacks
Advantages of Stacks
Disadvantages of Stacks
Display Operation

Linked list allocates the memory dynamically and time complexity for all the
operations are push, pop and peek.

In linked list of stack the nodes are maintained non-contiguously in the memory.
Each node contains a pointer to its immediate successor node in the stack. Stack is said to
be overflown if there is no space in the memory to create a node.

Unit II Liked List Page 14


The top most nodes in the stack contain null in its address field then each
operation is performed in linked list of the stack.

Stack Operations
Stack Operations Time Space
Stack Operations
Description Complexity Complexity
Insert a new element into
Push () O(1) O(1)
the beginning of the list.
Return the top element of
Pop () the Stack. Deletes the first o(n) o(n)
element from the list.
Returns true if the stack is
isEmpty O(1) O(1)
empty, false otherwise.
Peep () or Peek Return the top element. O(1) O(1)

Display () or Traverse Print all elements in Stack. o(n) o(n)

 Push operation -Adding a node to the stack


Adding a node to the stack is push operation. Pushing an element to a
stack in linked list steps is involved
 Create a node first and allocate memory to it.

Unit II Liked List Page 15


 POP operation - Deleting a node from the stack

Deleting a node from the top of stack is as pop operation. Deleting a node
from the linked list of stack and an element from the stack to pop are
 Check for the underflow condition - The underflow condition occurs to
pop from an already empty stack. The stack is empty if the head pointer of
the list points to null.
 Adjust the head pointer - In stack, the elements are popped only from one
end then the value stored in the head pointer must be deleted and the node
must be freed. The next node of the head node becomes the head node.
 Peek Operation
 Check if there is node present or not, if not then return.
 Otherwise return the value of top node of the linked list
 Display Operation - Traversing the nodes
Displaying all the nodes of a stack needs traversing all the nodes of the
linked list organized in the form of stack are
 Copy the head pointer into a temporary pointer.
 Move the temporary pointer through all the nodes of the list and print the
value field attached to every node.
Applications of Stacks
 Function calls - Stacks are used to keep track of the return addresses of function
calls for execution.
 Recursion - Stacks are used to store the local variables and return addresses of
recursive function calls.
 Expression evaluation - Stacks are used to evaluate expressions in postfix
notation are called as Reverse Polish Notation.
 Syntax parsing - Stacks are used to check the validity of syntax in programming
languages.
 Memory management - Stacks are used to allocate and manage memory in
operating systems and programming languages.
Advantages of Stacks
 Simplicity - Stacks are a simple and easy-to-understand for a wide range of
applications.
 Efficiency - Push and pop operations on a stack can be performed in constant
time (O(1)) and providing efficient access to data.
 Last-in First-out (LIFO) - The last element added to the stack is the first one
removed for function calls and expression evaluation.
 Limited memory usage - Stacks need to store the elements that pushed them for
memory-efficient.
Disadvantages of Stacks
 Limited access - Elements in a stack can only be accessed from the top and
difficult to retrieve or modify elements in the middle of the stack.

Unit II Liked List Page 16


 Potential for overflow - More elements are pushed onto a stack than it hold an
overflow error resulting a loss of data.
 Not suitable for random access - Stacks do not allow for random access to
elements.
 Limited capacity - Stacks have a fixed capacity limitation.
===========
Linked List of Queue
Linked List of Queue
Linked List
Queue
Linked Queue
Two parts of each node of the queue
Operation on Linked Queue
 Front pointer
 Rear pointer
Insert Operation
Delete Operation
Application of Queue
Advantages of Queue
Disadvantages of Queue

The array implementation cannot be used for the large applications where the
queues are implemented. One of the alternatives of array implementation is linked list of
queue.

Linked List
A linked list is a node-based linear data structure used to store data elements. Each
node in a linked list is made up of two key are
 data - that store the data
 next - the address of the next none in the linked list.

Queue

 A queue is a linear data structure with both ends open for operations and First In
First Out (FIFO)
 The first element in a queue is enqueue has to be the first element.
 The last element out of a queue is enqueue.

Unit II Liked List Page 17


Linked Queue

The storage requirement of linked representation of a queue with n elements is


o(n) while the time requirement for operations is o(1).

Two parts of each node of the queue


 Data part - The values stored in the node
 Link part - Link to the next node.

Each element of the queue points to its immediate next element in the memory.

Operation on Linked Queue


In queue linked list the two pointers maintained in the memory

 Front pointer
 The front points to the first item of the queue
 The address of the starting element of the queue.
 Insertion is performed at rear. By adding an element to the end of the
queue.
 The new element will be the last element of the queue
 enQueue() - This operation adds a new node after the rear and moves the
rear to the next node.

Unit II Liked List Page 18


 Rear pointer
 The rear points to the last item.
 The address of the last element of the queue. Deletions are performed at
rear.
 deQueue() - This operation removes the front node and moves the front to
the next node.

Both front and rear are NULL indicates that the queue is empty.

Insert Operation
The insert operation is used to insert an element into a queue. The new element is
added as the last element of the queue.
Example

 To insert an element with value 9, first check if FRONT=NULL. If the condition


holds, then the queue is empty. Then allocate memory for a new node, store the
value in its data part.
 NULL in its next part. The new node will then be called both FRONT and rear. if
FRONT != NULL then insert the new node at the rear end of the linked queue and
name this new node as READ.

Delete Operation

The delete operation is used to delete the element that is first inserted in a queue,
i.e., the element whose address is stored in FRONT. Before deleting the value, we must
first check if FRONT=NULL because if the queue is empty and no more deletions can be
done. To delete a value from a queue that is empty is an underflow.

Unit II Liked List Page 19


 To delete an element first check if FRONT=NULL. If the condition is false,
then delete the first node pointed by FRONT.
 The FRONT will now point to the second element of the linked queue.

Application of Queue
 Operating Systems - Queues are used to execute by the CPU by scheduling
algorithms within the operating system.
 Data Traffic Management - Devices like Routers and switches use queues to
manage data buffers for data packet transmission.
 Web Servers - Web servers use a queue data structure to manage and optimize
incoming requests. All requests to the server is added to the queue and are
processed by the FIFO.
 Printers - Printing multiple documents use queues to manage the workload and
process each printing task on the queue.
 Banking Systems - Banking transactions like RTGS are managed using queues and
processed at a time.
 Call Center Systems - The queue data structure manages customer connect
requests to the support and works on a FIFO.
 Television Broadcast - Television stations manage the broadcast queue that
ensuring programs or advertisements are played in the intended sequence.

Advantages of Queue
 A large amount of data can be managed efficiently with ease.
 Operations such as insertion and deletion can be performed with FIFO.
 Queues are useful when a particular service is used by multiple operations.
 Queues are fast in speed for data inter-process communication.

Disadvantages of Queue
 The operations such as insertion and deletion of elements from the middle are
time consuming.
 A new element can only be inserted when the existing elements are deleted from
the queue.
 Searching an element takes O(N) time.
 Maximum size of a queue must be defined prior in array.
==============

Unit II Liked List Page 20


Polynomial Addition

Polynomial Addition
 Applications
 Problem Statement
 Pseudocode
 Time and Space Complexities

Polynomial addition used to represent the Linked lists in the data structures.
Polynomials do not exist to develop algorithms which make it possible to address real
world situations. It is combining two or more polynomials are sequenced in an organized
way. Polynomials are method is to record the coefficients of each phrase and their related
exponents in arrays or linked lists.

Applications
 Scientific Computing
 Computer Graphics
 Signal Processing
 Cryptography.

Problem Statement
A linked list is being used to represent two polynomial expressions. Create a
function to combine these lists and print the result on the screen.

Example
Input:
1st number = 5x2 + 4x1 + 2x0
2nd number = -5x1 - 5x0
Output - 5x2-1x1-3x0

Unit II Liked List Page 21


Pseudocode
 Declare variables that point to the head of the linked list.
Compare the power of the first polynomial of both lists.
 If it is the same then add their coefficients and push them into the resultant
list. Also, increment both the variables so that it points to the next
polynomial.
 Else, Push the polynomial of the list whose power is greater than the other.
Also, increment that particular list.
 Keep repeating the 2nd step until one of the variables reaches the end of the list.
Then check for the remaining data in both of the lists and add it to the resultant
list.
Time and Space Complexities
 The time complexity is O(n) where n is the highest degree of the polynomials. This
linear time complexity results because each term in both polynomials is visited
once during addition.
 The space complexity is O(n) where n is a maximum degree of polynomials. The
initial loading step involves storing coefficients of the sum polynomial into the
result array using the long polynomial.
============

Sparse Matrix using Linked List


A sparse matrix is used to store only non-zero elements in the linked list.
Linked List
A collection of linked lists and a linked list is a collection of nodes. Each node
contains a column number and a non-zero element. The node structure is

The node structure contains 3 things that are column number, a non-zero element,
and a pointer to the next node (if any is otherwise null). We can define the structure of
this node in C language as,

The representation of a sparse matrix is to display from that representation in the


linked list.
Example - 5×6 = 5 rows and 6 columns

Unit II Liked List Page 22


Calculate the five non-zeroes elements are 5, 3, 4, 2, 8, and 7 to represent sparse
matrix. The positions of that element in the matrix are the row and column numbers for
non-zero elements are
Row Column Value
0 2 5
1 4 3
2 2 4
3 1 2
3 3 8
4 0 7

Here 0th row has a node that contains column number 2. The non-zero element 5
and a pointer to the next node is null ‘/’. Add all the non-zero elements to their row
number in the array through the linked list.

The above diagram represents all the non-zero elements of the given matrix in the
array through the linked list.
=================
Memory Allocation

Memory Allocation
 Two types of memory allocations
 Dynamic Storage Allocation
 Use of Dynamic Memory Allocation
 Advantages
 Disadvantages

Memory allocation is a process by which computer programs and services are


assigned with physical or virtual memory space. The memory allocation is done either
before or at the time of program execution.

Unit II Liked List Page 23


Two types of memory allocations
Static Memory Allocation or Compile Dynamic Memory Allocation or Run
Time Time
In the static memory allocation, In the Dynamic memory allocation, the
variables get allocated permanently, till memory is controlled by the programmer.
the program executes or function call It gets allocated whenever a malloc() is
finishes. executed gets deallocated wherever the
free() is executed.
Static Memory Allocation is done before Dynamic Memory Allocation is done
program execution. during program execution.
It uses stack for managing the static It uses heap of memory for managing the
allocation of memory dynamic allocation of memory
It is less efficient It is more efficient
In Static Memory Allocation, there is no In Dynamic Memory Allocation, there is
memory re-usability memory re-usability and memory can be
freed when not required
In static memory allocation, once the In dynamic memory allocation, when
memory is allocated, the memory size memory is allocated the memory size can
cannot change. be changed.
In this memory allocation scheme, we This allows reusing the memory. The user
cannot reuse the unused memory. can allocate more memory. The user can
release the memory when the user needs.
In this memory allocation scheme, In this memory allocation scheme,
execution is faster than dynamic execution is slower than static memory
memory allocation. allocation.
In this memory is allocated at compile In this memory is allocated at run time.
time.
In this allocated memory remains from In this allocated memory can be released at
start to end of the program. any time during the program.
Example - Array Example – Linked List.
Dynamic Storage Allocation
Dynamic storage allocation allocates an object or array from the free store at
runtime. This dynamic storage remains allocated until explicitly deallocated or until the
program ends.
Dynamic memory allocation is a memory management technique that involves
reserving memory for variables at runtime. This means that memory is allocated and
deallocated as required during the program execution. Dynamic memory allocation is
commonly used for creating data structures such as linked lists, trees, and dynamic
arrays.
The dynamic memory allocation process involves using functions such as
 malloc() - Allocates memory in bytes and returns a pointer to the allocated
memory.

Unit II Liked List Page 24


 calloc() -Aallocates memory and initializes it to zero
 realloc() - To change the size of an already allocated memory block.
 free() - function deallocates the memory previously allocated by malloc() or
calloc().
Use of Dynamic Memory Allocation
 Dynamic memory allocation is for the size of the data structure is not
known in advance and needs to be changed during program execution.
 It is also useful for situations where memory needs to be allocated and
deallocated frequently.
 Dynamic memory allocation should be used when flexibility and efficiency
are important, and when memory usage needs to be optimized.
Advantages
 Flexible Memory Usage - Dynamic memory allocation allows the size of the data
structure to be changed dynamically during program execution
 Efficient Memory Usage - Dynamic memory allocation allows memory to be
allocated only when it is needed and results in less wastage of memory.
 Global Access - Dynamic memory can be accessed globally and can shared
between different functions.
Disadvantages
 Slower Access - The memory address is not known at compile time. The memory
address must be looked up during program execution.
 Memory Leaks - Dynamic memory allocation can result in memory leaks if
memory is not deallocated properly. This cause the program to crash or slow
down.
 Fragmentation - Dynamic memory allocation can result in memory fragmentation
if the memory is not allocated and deallocated properly. Memory fragmentation
occurs when there are small unused gaps between allocated memory blocks. These
gaps can prevent larger memory blocks are allocated.
========
Garbage collection

Garbage collection
Three Approaches
 Mark and Sweep Algorithm
 Advantages
 Disadvantages
 Reference Counting Algorithm
 Advantages
 Disadvantages
 Copy Collecting Algorithm
 Advantages
 Disadvantages

Unit II Liked List Page 25
Garbage collection is a dynamic approach to automatic memory management and
heap allocation that processes and identifies dead memory blocks and reallocates storage
for reuse. The primary purpose of garbage collection is to reduce memory leaks.
Three approaches
 Mark-and-sweep – When memory runs out the GC locates all accessible memory and
then regains available memory.
 Reference counting –When the memory count is zero the object is garbage and then
destroyed. The free memory returns to the memory heap.
 Copy collection – There are two memory partitions
 First partition is full - GC locates all accessible data structures.
 Second partition - compacting GC process and allowing continuous free memory
Three Approaches
 Mark and Sweep Algorithm
 Reference Counting Algorithm
 Copy Collecting Algorithm

Mark and Sweep Algorithm


The mark-and-sweep algorithm is called a tracing garbage collector because it
traces out the entire collection of objects that are directly or indirectly accessible by the
program.
When memory runs out the garbage collection process and locates all accessible
memory and then reclaims the available memory.

Example - objects in memory at the time the garbage collector takes over

Unit II Liked List Page 26


The roots can be static or frame-allocated variables. These objects are
represented as follows in memory as consecutive bytes.

1 6 e 3
2 0 i 4
3 0 d 5
4 0 g 0
5 0 a 0
6 8 b 7
7 10 k 0
8 0 c 0
9 0 j 10
10 0 f 0
11 0 h 12
12 11 l 0

Here, 0 are null pointers while the other pointers are memory addresses.
The free list is 0 as empty and the roots are 1, 6, and 9. After the mark-and-sweep
garbage collection the memory layout becomes

1 6 e 3
2 0
3 0 d 5
4 2
5 0 a 0
6 8 b 7
7 10 k 0
8 0 c 0
9 0 j 10
10 0 f 0
11 4
12 11

Unit II Liked List Page 27


Advantages
 It handles the case with cyclic references that never ends up in an infinite
loop.
 There are no additional overheads incurred during the execution of the
algorithm.
Disadvantages
 The program execution is suspended while the garbage collection algorithm
runs.
 The Mark and Sweep Algorithm is run several times on a program, reachable
objects end up being separated by many, small unused memory regions.

 White blocks denote the free memory and the grey blocks denote the memory
taken by all the reachable objects.
 Now the free segments which are denoted by white color are of varying sizes
for 5 free segments are of size 1, 1, 2, 3, 5 are size in units.
 To create an object which takes 10 units of memory that memory can be
allocated only in the contiguous form of blocks, the creation of an object is not
possible although we have an available memory space of 12 units and it will
cause Out of Memory error.
This problem is termed Fragmentation. Fragmentation refers to an unwanted
problem in which a process is unloaded and loaded from memory and the free memory
space gets fragmented.
The user can reduce the fragmentation by compaction. The user reduces the loss of
memory.
Example - A contiguous block of free memory of size 12 units and allocate memory
to an object of size 10 units.

Reference Counting
Reference counting is based on the idea that an object is alive as long as there is at
least one reference to it. Whenever a new reference to an object is created, the reference
count of that object is incremented. Whenever a reference to an object is removed, the
reference count of that object is decremented. When the reference count of an object
reaches zero, it means that the object is no longer accessible and can be freed from
memory. Reference counting can be implemented in various ways, such as using smart
pointers, weak references, or explicit reference count manipulation.
Advantages of Reference Counting

Unit II Liked List Page 28


 Reference counting is reclaiming memory from unreachable without
waiting for a garbage collection cycle. This can reduce memory
consumption, avoid memory fragmentation and improve alertness.
 It does not depend on external factors such as heap size, allocation patterns,
or execution time. This can make it easier to reason about the memory and
to debug memory issues.
 It is compatible with most Object-Oriented Design principles like
encapsulation, inheritance and polymorphism.
Disadvantages of Reference Counting
 Reference counting is a significant overhead of time and space.
 Every time a reference is created or destroyed, the reference count of the
corresponding object has to be updated which can include atomic
operations. This can slow down the execution and increase the memory.
 It can fail to collect cyclic references, which are references that form a
loop among objects.
Example - if object A points to object B
object B points to object A
The reference counts will never reach zero. To solve this problem
reference counting need to combine with mark-and-sweep or weak
references.
Copy Collector
A copying collector does is start from a set of roots in the operand stack and
traverse all of the reachable memory-allocated objects, copying them from one half of
memory into the other half. The area of memory that copy from is called old space and
the area of memory that copy to is called new space. When copy the reachable data
is compact so it is a contiguous chunk.
The holes are in the memory that the garbage data occupied. After the copy and
compaction end up with a compacted copy of the data in new space data and a large,
contiguous area of memory in new space in which quickly and easily allocate new
objects. The next time garbage collection the roles of old space and new space will be
reversed.

Example - memory looks the colored boxes represent different objects and the
black box in the middle represents the half-way point in memory.

Obj 1 Obj 2 Obj 3 Obj 4 Obj 5

Filled up half of memory and initiate a collection. Old space is on the left and new
space on the right. Suppose further that only the red and light-blue boxes (objects 2 and 4)
are reachable from the stack. After copying and compacting are

Unit II Liked List Page 29


Obj 1 Obj 2 Obj 3 Obj 4 Obj 5 Obj 2' Obj 4'

The unreachable data in the first half and "throw away" the first half of memory
Obj 2 Obj 4

After copying the data into new space then restart the computation where it left
off. The computation continues allocating objects, but this time allocates them in the other
half of memory is the new space.
Obj 2 Obj 4 Obj 6 Obj 7 Obj 8
The new space fills up and we are ready to do another collection. The other half of
memory and compact them, throwing away the old data
Obj 4 Obj 6 Obj 8

=====================
Difference between Garbage collection and Compaction

Garbage collection Compaction

Marking unreachable memory blocks Moving reachable memory blocks close


garbage as free. together, so that there are not free
memory blocks between them. It is
usually implemented in combination
with garbage collection.

The garbage collector will release any Compaction is usually performed in


garbage memory without any need to combination with garbage collection.
actively free memory. This lifts some Compaction eliminates Fragmentation
cognitive burden from the programmer like garbage memory, reduces memory
and makes it easier to write correct code. utilization.

Example Here are 2 free fragments totaling


900MB, but in then allocating a 1 GB
memory block will fail, since there is
no consecutive block of 900 MB.
Compaction will move object 2 to the
left

Unit II Liked List Page 30


Difference between Fragmentation and Compaction

Fragmentation Compaction
It is due to the creation of holes It is to manage the hole.
(small chunks of memory space).
This creates a loss of memory. This reduces the loss of memory.
Sometimes, the process cannot be Memory is optimized to accommodate the
accommodated. process.
External Fragmentation may cause It solves the issue of External
major problems. Fragmentation.
It depends on the amount of It depends on the number of blocks
memory storage and the average occupied and the number of holes left.
process size.
It occurs in contiguous and non- It works only if the relocation is dynamic.
contiguous allocation.

============

Unit II Liked List Page 31


Unit III - Trees

Basic Terminology - Binary Trees - Binary Tree Representations – Binary Trees-


Traversal-More on Binary Trees – Threaded Binary Trees - Binary Tree. Representation
of Trees - Counting Binary Trees. Graphs: Terminology and Representations-
Traversals, Connected Components and Spanning Trees, Shortest Paths and Transitive
Closure

Binary Trees
Binary Tree Traversal
Threaded Binary Tree
Binary Tree Representations
Counting Binary Trees
Graphs
Graph Traversals
Connected Components
Spanning Trees
Shortest Paths
Transitive Closure

Binary Trees

Binary Trees
 Components of a Binary Tree
 Benefits of two child nodes
 Binary Trees , Array and Linked Lists
 Five Types of Binary Trees
 Full Binary Tree
 Properties of Full Binary Tree
 Complete Binary Tree
 Properties of Full Binary Tree
 Perfect Binary Tree
 Balanced Binary Tree
 Degenerate Binary Tree
 Binary Tree Implementation
 Application of Binary Trees
 Advantages of Binary Tree
 Disadvantages of Binary Tree

Unit III Binary Tree and Graph 1


Binary search tree in a data structure is used to represent or store hierarchical
data. A binary tree is a tree where every node has two child nodes that form the tree
branches. These child nodes are called left and right child nodes. Binary Tree is made
of the root node, left-sub tree and right-sub tree.

Examples

Components of a Binary Tree


 Node - A node consists of data and links to its both child nodes
 Root - A root node is the first node of the tree
 Leaf - Nodes that don't have any children
 Parent node - Any node apart from the root node, any node with at least one
child is parent to that child node
 Child node - Any node with a parent node is called a child node
 Internal node - Any node with a child and a parent
 Height - The height of a binary tree is the longest path from the root to any leaf
 Depth - A depth of a node is the total number of edges from the root node to the
target node
Benefits of two child nodes
 Algorithms like traversing, searching, insertion and deletion are, to implement,
and run faster.
 Keeping data sorted in a Binary Search Tree (BST) makes searching very
efficient.
 Balancing trees is easier to do with a limited number of child nodes, using an
AVL Binary Tree.
 Binary Trees can be represented as arrays, making the tree more memory
efficient.

Example

Unit III Binary Tree and Graph 2


 A parent node or internal node is a node with one or two child nodes.

Binary Tree –
RABCDEFG
Root Node –R
A’s Left Child – C
A’s Right Child - D
B’s Subtree – BEFG
Child Node -
 ACDBEFG
The left child node is the child node to the left.
 Parent / Internal Node -
The right child node is the child node to the right.
 The tree height is the maximum number of RABF edges from the root node to a leaf
node.

Binary Trees, Array and Linked Lists

Binary Trees Arrays Linked Lists


Binary Search Trees and Inserting and deleting Linked Lists are fast
AVL Trees, are great elements require other when inserting or
compared to Arrays and elements to shift in deleting nodes, no
Linked Lists because they memory to make place for memory shifting needed,
are BOTH fast at the new element, or to but to access an element
accessing a node, AND take the deleted elements inside the list, the list
fast when it comes to place, and that is time must be traversed, and
deleting or inserting a consuming. that takes time.
node, with no shifts in Example - Arrays are fast
memory needed. to access an element
directly, like element
number 700 in an array of
1000 elements
Tree can branch out in Arrays and Linked Lists are linear data structures
different directions (non- there is only one way to traverse like start at the first
linear), there are different element or node and continue to visit the next until
ways of traversing Trees. visited them all.

Five Types of Binary Trees

Unit III Binary Tree and Graph 3


 Full Binary Tree – Full or proper or strict Binary tree

A full binary tree is a tree structure. A binary tree node can have
either two children or no child. A full Binary Tree is a kind of tree where each
node has either 0 or 2 child nodes.

The full binary tree is also known as a strict binary tree. The tree can only
be considered as the full binary tree if each node must contain either 0 or 2
children. The full binary tree can also be defined as the tree in which each node
must contain 2 children except the leaf nodes.

Properties of Full Binary Tree

 The number of leaf nodes is equal to the number of internal nodes plus 1.
In the above example, the number of internal nodes is 5 and the number
of leaf nodes is equal to 6.
 The maximum number of nodes is the same as the number of nodes in
the binary tree, i.e., 2h+1 -1.
 The minimum number of nodes in the full binary tree is 2*h-1.
 The minimum height of the full binary tree is log2(n+1) - 1.
 The maximum height of the full binary tree can be computed as:
 n= 2*h - 1
 n+1 = 2*h
 h = n+1/2
 Complete Binary Tree

A complete binary tree is another specific binary tree where each node on
all levels except the last level has two children. And at the lowest level, all
leaves should reside possibly on the left side. A complete Binary Tree has all
levels full of nodes, except the last level, which is can also be full, or filled from
left to right. The properties of a complete Binary Tree mean it is also balanced.

Unit III Binary Tree and Graph 4


The complete binary tree is a tree in which all the nodes are completely
filled except the last level. In the last level, all the nodes must be as left as
possible. In a complete binary tree, the nodes should be added from the left.

Properties of Complete Binary Tree

 The maximum number of nodes in complete binary tree is 2h+1 - 1.


 The minimum number of nodes in complete binary tree is 2h.
 The minimum height of a complete binary tree is log2(n+1) - 1.
 The maximum height of a complete binary tree is
 Perfect Binary Tree

A binary tree is said to be perfect if every node must have two children
and every leaf is present on the same level. A perfect Binary Tree has all leaf
nodes on the same level, which means that all levels are full of nodes, and all
internal nodes have two child nodes. The properties of a perfect Binary Tree
means it is also full, balanced, and complete. A tree is a perfect binary tree if all
the internal nodes have 2 children, and all the leaf nodes are at the same level.

 Balanced Binary Tree

The balanced binary tree is a tree in which both the left and right trees
differ by at most 1. For example, AVL and Red-Black trees are balanced binary
tree.

Unit III Binary Tree and Graph 5


A balanced Binary Tree has at most 1 in difference between its left and
right sub tree heights, for each node in the tree.

Balance factor = height (left sub tree) – height (right sub tree)

It balances a binary tree for each node if its balance factor is either -1, 0 or
1. The height of the left sub tree and that of the right tree can vary by at most
one.

 Degenerate Binary Tree

A binary tree is referred to as a degenerate binary tree only if every


internal node has exactly one child. The degenerate binary tree is a tree in which
all the internal nodes have only one child.

Binary Tree Implementation

The Binary Tree can be implemented in a Singly Linked List. It is used to create
a structure where each node can be linked to both its left and right child nodes.

Unit III Binary Tree and Graph 6


Application of Binary Trees
 Search algorithms
 Sorting algorithms
 Database systems
 File systems
 Compression algorithms
 Decision trees
 Game AI
 DOM in HTML.
 File explorer.
 Used as the basic data structure in Microsoft Excel and spreadsheets.
 Evaluate an expression
 Routing Algorithms
Advantages of Binary Tree
 Efficient searching -Searching for a specific element that each node has at
most two child nodes and performed in O(log n) time complexity.
 Ordered traversal - The traversed like in-order, pre-order, and post-order.
 Memory efficient - binary trees are relatively memory-efficient because
they only require two child pointers per node.
 Fast insertion and deletion - Binary trees can be used to perform insertions
and deletions in O(log n) time complexity..
 Useful for sorting - Sorting algorithms like heap sort and binary search
tree sort.
Disadvantages of Binary Tree
 Limited structure - Binary trees are limited with two child nodes.
 Unbalanced trees - Unbalanced binary trees has one subtree is larger than
the other.
 Space inefficiency - Each node requires two child pointers which important
amount of memory for large trees.
 Slow performance in worst-case - Search operations can degrade to O(n)
time complexity, where n is the number of nodes in the tree.
 Complex balancing algorithms - To ensure that binary trees are balanced,
like AVL trees and red-black trees.
==============

Unit III Binary Tree and Graph 7


Binary Tree Traversal

Binary Tree Traversal


 Two categories of Tree traversal
 Depth First Search
 Breadth First Search
 Three different types of DFS traversals
 Pre-order Traversal of Binary Trees
 In-order Traversal of Binary Trees
 Post-order Traversal

A Tree by visiting every node, one node at a time, is called traversal. A Tree can
branch out in different directions is the non-linear are different ways of traversing
Trees.

Two categories of Tree traversal

Unit III Binary Tree and Graph 8


Three different types of DFS traversals

 Pre-order
 In-order
 Post-order

Unit III Binary Tree and Graph 9


Pre-order Traversal of Binary Trees
Pre-order Traversal is a type of Depth First Search, where each node is visited in
a certain order. This traversal is pre order because the node is visited before the
recursive pre-order traversal of the left and right sub trees.

Pre-order Traversal is done by visiting the root node first and then recursively
do a pre-order traversal of the left sub tree, followed by a recursive pre-order traversal
of the right sub tree. It's used for creating a copy of the tree, prefix notation of an
expression tree.

In-order Traversal of Binary Trees

In-order Traversal is a type of Depth First Search, where each node is visited in
a certain order. The traversal in order is that the node is visited in between the
recursive function calls. The node is visited after the In-order Traversal of the left sub
tree, and before the In-order Traversal of the right sub tree.

Unit III Binary Tree and Graph 10


In-order Traversal does a recursive In-order Traversal of the left subtree, visits
the root node, and finally, does a recursive In-order Traversal of the right subtree. This
traversal is mainly used for Binary Search Trees where it returns values in ascending
order.

Post-order Traversal

Post-order Traversal is a type of Depth First Search, where each node is visited
in a certain order. The traversal post is that visiting a node is done "after" the left and
right child nodes are called recursively.

Post-order Traversal works by recursively doing a Post-order Traversal of the


left sub tree and the right sub tree, followed by a visit to the root node. It is used for
deleting a tree in the post-fix notation of an expression tree.

===============

Threaded Binary Tree

Threaded Binary Tree


 Two Types of Threaded Binary Tree
 One-way threaded Binary trees
 Two-way threaded Binary Trees
 Applications of threaded binary tree
 Advantages of Threaded Binary Tree
 Disadvantages of Threaded Binary Tree

Unit III Binary Tree and Graph 11


In the linked representation of binary trees, more than one half of the link fields
contain NULL values which results in wastage of storage space. If a binary tree consists
of n nodes then n+1 link fields contain NULL values.

In order to effectively manage the space, a method was devised by Perlis and
Thornton in which the NULL links are replaced with special links known as threads.
Such binary trees with threads are known as threaded binary trees.

Each node in a threaded binary tree either contains a link to its child node or
thread to other nodes in the tree.

Two Types of Threaded Binary Tree


 One-way threaded Binary Tree
 Two-way threaded Binary Tree

One-way threaded Binary trees

In one-way threaded binary trees, a thread will appear either in the right or left
link field of a node. If it appears in the right link field of a node then it will point to the
next node that will appear on performing in order traversal. Such trees are called Right
threaded binary trees.

If thread appears in the left field of a node then it will point to the nodes in-
order predecessor. Such trees are called Left threaded binary trees.

Unit III Binary Tree and Graph 12


In one-way threaded binary trees, the right link field of last node and left link
field of first node contain a NULL. In order to distinguish threads from normal links
they are represented by dotted lines.

The in-order traversal of this binary tree yields D, B, E, A, C, F. When this tree is
represented as a right threaded binary tree, the right link field of leaf node D which
contains a NULL value is replaced with a thread that points to node B which is the in-
order successor of a node D. In the same way other nodes containing values in the
right link field will contain NULL value.

Two-way threaded Binary Trees

In two-way threaded Binary trees, the right link field of a node containing
NULL values is replaced by a thread that points to nodes in-order successor and left
field of a node containing NULL values is replaced by a thread that points to nodes in-
order predecessor.

Unit III Binary Tree and Graph 13


The in-order traversal of this binary tree yields D, B, E, G, A, C, F. The node E
whose left field contains NULL is replaced by a thread pointing to its in-order
predecessor i.e. node B. Similarly, for node G whose right and left linked fields contain
NULL values is replaced by threads such that right link field points to its in-order
successor and left link field points to its in-order predecessor. In the same way, other
nodes containing NULL values in their link fields are filled with threads.

The two-way threaded Binary tree, we noticed that no left thread is


possible for the first node and no right thread is possible for the last node. This is
because they do not have any in-order predecessor and successor respectively. This is
indicated by threads pointing. So in order to maintain the uniformity of threads, we
maintain a special node called the header node.

Unit III Binary Tree and Graph 14


The header node does not contain any data part and its left link field points to
the root node and its right link field points to itself. If this header node is included in
the two-way threaded Binary tree then this node becomes the in-order predecessor of
the first node and in-order successor of the last node. Now the threads of left link fields
of the first node and right link fields of the last node will point to the header node.

Applications of threaded binary tree


 Expression evaluation - Threaded binary trees used to evaluate arithmetic
expressions to avoid recursion or a stack. The tree traversed in-order or pre-
order to perform the evaluation.
 Database indexing - The tree can be constructed with the indexed values as
keys, and then traversed in-order to retrieve the data in sorted order.
 Symbol table management - In a compiler or interpreter, the tree constructed
with the symbols as keys and then traversed in-order or pre-order to perform
various operations on the symbol table.
 Disk-based data structures - Threaded binary trees can be used in disk-based
data structures to improve performance.
 Navigation of hierarchical data - The tree can be constructed from the
hierarchical data and then traversed in-order or pre-order to efficiently access
the data in a specific order.

Advantages of Threaded Binary Tree


 Tree enables linear traversal of elements.
 It eliminates the use of stack as it performs linear traversal for save memory.
 Enables to find parent node without open use of parent pointer
 Threaded tree give forward and backward traversal of nodes by in-order.
 Nodes contain pointers to in-order predecessor and successor
 In threaded binary tree there is no NULL pointer present. Hence memory
wastage in reside in NULL links is avoided.
 The threads are pointing to successor and predecessor nodes. This makes us to
obtain predecessor and successor node of any node quickly.

Unit III Binary Tree and Graph 15


There is no need of stack while traversing the tree because using thread links

we can reach to previously visited nodes.
Disadvantages of Threaded Binary Tree
 Every node in threaded binary tree need extra information or extra memory to
indicate whether its left or right node indicated its child nodes or its in order
predecessor or successor. The node consumes extra memory to implement.
 Insertion and deletion are way more complex and time consuming than the
normal one since both threads and ordinary links need to be maintained.
 Implementing threads for every possible node is complicated.
 Increased complexity - Implementing a threaded binary tree requires more
complex algorithms.
 Extra memory usage - The tree is not fully balanced as threading a skewed tree
can result in a large number of additional pointers.
 Limited flexibility - Threaded binary trees are specialized data structures that
are optimized for specific types of traversal.
 Difficulty in parallelizing - It can be challenging to parallelize operations on a
threaded binary tree that make it difficult to process nodes independently.
===========

Binary Tree Representations

Binary Tree Representations


Two methods
 Array Representation
 Linked List Representation

The Binary Tree can be implemented in a Singly Linked List. It is used to create
a structure where each node can be linked to both its left and right child nodes.
Two methods
 Array Representation
 Linked List Representation

Unit III Binary Tree and Graph 16


Array Representation of Binary Tree

In array representation of a binary tree, we use one-dimensional array (1-D

Array) to represent a binary tree.

Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10 5 16 - 8 15 20 - - - - - - - 23

The index 1 is holding the root, it has two children 5 and 16, they are placed at
location 2 and 3. Some children are missing so their place is left as blank. A binary tree
represents of depth 'n' using array representation in one dimensional array with a maximum
size of 2n + 1.

Linked List Representation of Binary Tree


A double linked list to represent a binary tree. In a double linked list, every

node consists of three fields are


 First field for storing left child address
 Second for storing actual data
 Third for storing right child address.

Example of the binary tree represented using Linked list

================

Unit III Binary Tree and Graph 17


Graph

Graph
 Graph Properties
 Graph Representations
 Two vertices
 Adjacency Matrix
 Adjacency List
 Operations on Graphs
 Graph Terminolog
 Applications of Graph
 Advantages of Graph
 Disadvantages of Graph

A Graph is a non-linear data structure that consists of nodes or vertices and


edges. The edges connect any two nodes in the graph and the nodes are also known as
vertices.Graphs has different paths to get from one vertex to another.

This graph has a set of vertices V= {1, 2, 3, 4, 5} and a set of edges


E= { (1,2),(1,3),(2,3),(2,4),(2,5),(3,5),(4,50 }.

 A vertex called a node is a point or an object in the Graph.


 An edge is used to connect two vertices with each other.

Graph Properties
The properties of the graph are as follows
 A weighted Graph is a Graph where the edges have values. The weight value of
an edge can represent things like distance, capacity, time or probability.
 A connected Graph is when all the vertices are connected through edges
somehow. A Graph that is not connected is a Graph with isolated or disjoint sub
graphs or single isolated vertices.

Unit III Binary Tree and Graph 18


 A directed Graph known as a digraph is when the edges between the vertex
pairs have a direction. The direction of an edge can represent things like
hierarchy or flow.
 A cyclic Graph is defined differently depending on whether it is directed or
Undirected Graph

Undirected Graph Directed Graph


It is simple to manipulate. It provides a clear representation of
relationships with direction.
Inefficient traversal in the It is efficient traversal in the
direction. direction.
It requires less memory hence It may require increased memory
memory efficient. efficient.
It provides limited modeling. It is suitable for modeling processes
or workflows.
It has the balance relationship. It has a lack of balance
It is difficult to represent specific It is complex as compared to
nodes. undirected graphs.

 A loop is called as self-loop is an edge that begins and ends on the same vertex.
A loop is a cycle that only consists of one edge.
Graph Representation
A Graph representation is stored in memory and different Graph representations
are
 Take up more or less space.
 Be faster or slower to search or manipulate.
 Be better suited depending on what type of Graph like weighted, directed.
 Be easier to understand and implement than others.
 Adjacency Matrix is the representation for Graphs moving forward.
Graph representations store information about which vertices are adjacent and the

Unit III Binary Tree and Graph 19


edges between the vertices. Graph representations are slightly different if the edges are
directed or weighted.
Graphs are used to represent and solve problems where the data consists of objects
and relationships between.

Two vertices

Unit III Binary Tree and Graph 20


Adjacency Matrix
Adjacency Matrix is the Graph representation in structure. The Adjacency
Matrix is a 2D array matrix where each cell on index (i,j) stores information about the
edge from vertex i to vertex j.

A Graph with the Adjacency Matrix representation next to it

The adjacency matrix above represents an undirected Graph, so the values '1'
where the edges are the values in the adjacency matrix are symmetrical because the
edges go both ways undirected Graph.

To create a directed Graph with an adjacency matrix which vertices the edges go
from and to, by inserting the value at the correct indexes (i, j). To represent a weighted
Graph can put other values than '1' inside the adjacency matrix.

Adjacency Matrix representation next to it with a directed and weighted Graph

In the adjacency matrix the value 3 on index (0,1) tells us there is an edge from
vertex A to vertex B, and the weight for that edge is 3.

The weights are placed directly into the adjacency matrix for the correct edge,

Unit III Binary Tree and Graph 21


and for a directed Graph, the adjacency matrix does not have to be symmetric.

Adjacency List

An Adjacency List has an array that contains all the vertices in the Graph and
each vertex has a Linked List or Array with the vertex edges.

A sparse Graph with many vertices and can save space by using an Adjacency
List compared to using an Adjacency Matrix, because an Adjacency Matrix would
reserve a lot of memory on empty Array elements for edges that don't exist.

A sparse Graph is a Graph where each vertex only has edges to a small portion
of the other vertices in the Graph.

In the adjacency list the vertices A to D are placed in an Array and each vertex
in the array has its index written right next to it.
Each vertex in the Array has a pointer to a Linked List that represents that
vertex's edges. More specifically, the Linked List contains the indexes to the adjacent or
neighbor vertices.
Example -Vertex A has a link to a Linked List with values 3, 1, and 2. These
values are the indexes to A's adjacent vertices D, B, and C.
An Adjacency List can represent a directed and weighted Graph

In the Adjacency List vertices are stored in an Array. Each vertex has a pointer
to a Linked List with edges stored as I, w, where i is the index of the vertex the edge,
and w is the weight of that edge.

Unit III Binary Tree and Graph 22


Example -Node D has a pointer to a Linked List with an edge to vertex A. The
values 0,4 means that vertex D has an edge to vertex on index 0 in vertex A and the
weight of that edge is 4.

Operations on Graphs

 Insertion of Nodes or Edges in the graph – Insert a node into the graph.
 Deletion of Nodes or Edges in the graph – Delete a node from the graph.
 Searching on Graphs – Search an entity in the graph.
 Traversal of Graphs – Traversing all the nodes in the graph.
 Shortest Paths - From a source to a destination, a source to all other nodes
and between all pairs.
 Minimum Spanning Tree - In a weighted, connected undirected graph,
finding the minimum weight edges to connect all.

Graph Terminology
 Graph - A Graph G is a non-empty set of vertices or nodes V and a set of
edges E, where each edge connects a pair of vertices. Formally, a graph can be
represented as G= (V, E). Graphs can be classified based on various properties,
such as directedness of edges and connectivity.

Unit III Binary Tree and Graph 23


 Vertex or Node -A Vertex, often referred to as a Node, is a fundamental unit of
a graph. It represents an entity within the graph. In applications like social
networks, vertices can represent individuals, while in road networks they can
represent intersections or locations.
 Edge - An Edge is a connection between two vertices in a graph. It can be
either directed or undirected. In a directed graph, edges have a specific
direction, indicating a one-way connection between vertices. In contrast,
undirected graphs have edges that do not have a direction and represent
bidirectional connections.
 Degree of a Vertex - The Degree of a Vertex in a graph is the number of edges
occurrence to that vertex. In a directed graph, the degree is divided
 The in-degree for incoming edges
 The out-degree for outgoing edges of the vertex

 Path - A Path in a graph is a sequence of vertices where each adjacent pair is


connected by an edge. Paths can be of varying lengths and may or may not visit
the same vertex more than once.
As path is also a trail, thus it is also an open walk. A path is a walk with
no repeated vertex. This directly implies that no edges will ever be repeated and
hence is redundant to write in the definition of path.

 Cycle - A Cycle in a graph is a path that starts and ends at the same vertex, with
no repetitions of vertices. A cycle is closed path in which both edges and
vertices cannot be repeated.

Unit III Binary Tree and Graph 24


walk Trail Circuit
A walk is a sequence of Trail is an open walk in
Traversing a graph such that
vertices and edges of a which no edge is
not an edge is repeated but
graph. Traverse a graph then repeated and vertex can
vertex can be repeated
get a walk. be repeated.
Both Edge and Vertices can Edges Cannot be Vertex can be repeated and it
repeat. Walk can be open or repeated and Vertex is closed trail. Edge cannot
closed. Can be repeated be repeated.
Two types of trails
Two types of walks
Open trail - The starting
Open walk - The starting and
and ending vertex is Circuit is a closed trail.
ending vertices are different.
different. These can have repeated
Closed walk - The starting
Closed trail - The vertices only.
and ending vertices are
starting and ending
equal.
vertex is same.

Open Trail 1->2->4->3->6->8->3->1 is a


open walk
1->3->8->6->3->2 is circuit.
1->2->3->4->5->3
Closed trail Circuit is a closed trail.
closed walk
1->3->8->6->3->2->1 These can have repeated
1->2->3->4->5->3->1
vertices only.
Walk can be open or closed.

Unit III Binary Tree and Graph 25


Types of Graphs
Types of the graph is based on the graph terminology

Unit III Binary Tree and Graph 26


Unit III Binary Tree and Graph 27
Unit III Binary Tree and Graph 28
Applications of Graph
 Social networks - Each person is a vertex, and relationships are the edges.
 Neural Networks - Vertices represent neurons and edges represent the
synapse between them.
 Compilers - They are also used in compilers for query optimization in
database languages.
 Robot planning - Vertices represent states the robot and the edges the
possible transitions between the states.
 GPS - The problems like finding the closest route.
 Minimum Spanning Tree - All devices are connected in the Graph problem.
 Topology of networks -The connections between routers and switches.
 Graph topological sorting algorithm- Generating a sequence to solve all
tasks.
 Internet -Graph with web pages as vertices and hyperlinks as edges.
 Biology - Graphs used in neural networks or the spread of diseases.

Advantages of Graph
 Representing complex data - The relationships between the data points are
not direct.
 Efficient data processing - To perform complex operations on large datasets
quickly and effectively.
 Network analysis - Graphs are useful in social sciences, business and
marketing.
 Path finding - To find the shortest path for logistics and transportation
planning.
 Visualization -To communicate complex data and for presentations, reports
and data analysis.
 Machine learning -To find the model of complex relationships between
variables like reference systems or fraud detection.
 Users on Face book – It referred as vertices and they are friends to an edge
for connecting them. The Friend Suggestion system on Face book is based
on graph theory.
 Resources Allocation in the Operating System - Edges is drawn from
resources to assigned functions or from the requesting process to the
desired resources.
 Web pages - It referred as vertices on the World Wide Web. There is a link
from page A to page B that can represent an edge in a directed graph.
Disadvantages of Graph
 Limited representation - Graphs can represent relationships between
objects and not their properties or attributes.
 Difficulty in analysis - Graphs can difficult to interpret for large or
complex.
 Scalability issues -The number of nodes and edges in a graph increases then
the processing time and memory also increases. This can make it difficult to
work with large or complex graphs.
 Data quality issues -If the data is incomplete, inconsistent, or inaccurate

Unit III Binary Tree and Graph 29


then the graph may not accurately reflect the relationships between objects.
 Privacy lose - Graphs can increase public sensitive information about
individuals or organizations in social network or marketing.

==========
Spanning tree

Spanning tree
 Applications of the spanning tree
 Properties of spanning-tree
 Minimum Spanning tree
 Applications of minimum spanning tree
 Two Algorithms for Minimum spanning tree
 Example
 Prim's Algorithm
 Steps of Prim’s Algorithm
 Key Characteristics of Prim's Algorithm
 Applications of Prim's Algorithm
 Advantages of Prim's Algorithm
 Disadvantages of Prim's Algorithm
 Kruskal's Algorithm
 Steps of Kruskal's Algorithm
 Key Characteristics of Kruskal's Algorithm
 Applications of Kruskal's Algorithm
 Advantages of Kruskal's Algorithm
 Disadvantages of Kruskal's Algorithm

 The spanning tree is the minimum spanning tree. But before moving directly
towards the spanning tree,
 A spanning tree can be defined as the sub graph of an undirected connected
graph. It includes all the vertices along with the least possible number of
edges.
 If any vertex is missed, it is not a spanning tree. A spanning tree is a subset of
the graph that does not have cycles, and it also cannot be disconnected.
 A spanning tree consists of (n-1) edges, where 'n' is the number of vertices or
nodes. Edges of the spanning tree may or may not have weights.
 All the possible spanning trees created from the given graph G have the same
number of vertices, but the number of edges in the spanning tree would be
equal to the number of vertices in the given graph minus 1.

A complete undirected graph have nn-2 number of spanning trees where n is the
number of vertices in the graph.

Unit III Binary Tree and Graph 30


Example - n = 5, the number of maximum possible spanning trees is 55-2 = 125.

Applications of the spanning tree


 Cluster Analysis
 Civil network planning
 Computer network routing protocol

Properties of spanning tree

 There can be more than one spanning tree of a connected graph G.


 A spanning tree does not have any cycles or loop.
 A spanning tree is minimally connected, so removing one edge from the tree
will make the graph disconnected.
 A spanning tree is maximally acyclic, so adding one edge to the tree will create a
loop.
 There can be a maximum nn-2 number of spanning trees that can be created from
a complete graph.
 A spanning tree has n-1 edges, where 'n' is the number of nodes.
 If the graph is a complete graph, then the spanning tree can be constructed by
removing maximum (e-n+1) edges, where 'e' is the number of edges and 'n' is
the number of vertices.

A spanning tree is a subset of connected graph G and there is no spanning tree of a


disconnected graph. A spanning tree is used to find a minimum path to connect all
nodes of the graph.

Unit III Binary Tree and Graph 31


Minimum Spanning tree

A minimum spanning tree can be defined as the spanning tree in which the sum
of the weights of the edge has minimum. The weight of the spanning tree is the sum of
the weights given to the edges of the spanning tree.

Example of minimum spanning tree

The sum of the edges of the above graph is 16. The possible spanning trees are

The minimum spanning tree that is selected from the above spanning trees for
the given weighted graph is

Applications of minimum spanning tree


 The distance
 Traffic load
 Congestion
 Any random value
 To design water-supply networks, telecommunication networks and electrical
grids.
 It can be used to find paths in the map.

Unit III Binary Tree and Graph 32


Two Algorithms for Minimum spanning tree
A minimum spanning tree can be found from a weighted graph by using the
algorithms
 Prim's algorithm - It is a greedy algorithm that starts with an empty spanning
tree. It is used to find the minimum spanning tree from the graph. This
algorithm finds the subset of edges that includes every vertex of the graph such
that the sum of the weights of the edges can be minimized.
 Kruskal's algorithm - This algorithm is also used to find the minimum spanning
tree for a connected weighted graph. Kruskal's algorithm also follows greedy
approach, which finds an optimum solution at every stage instead of focusing
on a global optimum.
Example
Finding the minimum spanning tree (MST) of a graph to solve this problem are
Prim's and Kruskal's algorithms. While both algorithms aim to find the minimum
spanning tree of a graph they differ in their approaches and the principles.

Prim's Algorithm
Prim's algorithm is a greedy algorithm that incrementally grows the minimum
spanning tree from a starting vertex. The algorithm maintains two sets of vertices are
 One set contains the vertices already included in the MST
 Other set contains the remaining vertices.
Prim's algorithm iteratively selects the vertex with the smallest edge weight that
connects the two sets and adds it to the MST.
 Initialization - Choose a starting vertex and mark it as visited.
 Find the minimum-weight edge connected to the visited vertices.

Unit III Binary Tree and Graph 33


To accomplish this, examine all edges connected to the visited vertices
and select the one with the lowest weight.
 Select the edge with the lowest weight and add it to the MST.
 Mark the newly added vertex as visited.
 Repeat steps 2-4 until all vertices are visited.
Steps of Prim’s Algorithm
 Initialization - Start with an arbitrary vertex and mark it as part of the MST.
 Edge Selection - From the set of edges that connect vertices in the MST to
vertices outside the MST, select the edge with the minimum weight.
 Update - Add the selected edge and the connected vertex to the MST.
 Repeat - Repeat the edge selection and update steps until all vertices are
included in the MST.
Key Characteristics of Prim's Algorithm
 Prim's algorithm is a greedy approach, as it makes locally optimal choices at
each step to construct the MST.
 It guarantees the generation of a connected and acyclic MST.
 The time complexity of Prim's algorithm is O (V^2) with an adjacency matrix. A
priority queue can reduce the complexity to O (E log V).
 Time Complexity – O (V^2) with a simple implementation and O(E log V) with
a priority queue.
Applications of Prim's Algorithm
 Network Design - Prim's algorithm is commonly used in network design
scenarios to find the minimum cost network that connects various locations,
minimizing the overall connection cost.
 Cluster Analysis - It can be applied to identify clusters or communities in a
network, where each cluster is represented by a sub tree of the minimum
spanning tree.
Advantages of Prim's Algorithm
 Efficiency: Prim's algorithm performs well on dense graphs where the number
of edges is close to the maximum possible. Its time complexity is O(V^2) with an
adjacency matrix representation.
 Guaranteed MST: Prim's algorithm guarantees that the MST is found within V-1
iterations, where V is the number of vertices in the graph.

Unit III Binary Tree and Graph 34


 Simplicity: Prim's algorithm is relatively easy to understand and implement,
making it a popular choice for educational purposes.
Disadvantages of Prim's Algorithm
 Requirement of Connected Graphs - Prim's algorithm assumes a connected
graph. If the graph has disconnected components, the algorithm needs to be
applied to each component to find minimum spanning trees.
 Inability to Handle Negative Weights - Prim's algorithm cannot handle graphs
with negative edge weights since it may lead to incorrect MST results.
 Performance on Sparse Graphs - For sparse graphs with a significantly smaller
number of edges, Prim's algorithm may be less efficient compared to Kruskal's
algorithm.

Kruskal's Algorithm
Kruskal's algorithm is another greedy algorithm used to find the minimum
spanning tree. Unlike Prim's algorithm, Kruskal's algorithm processes the edges of the
graph in ascending order of their weights. It incrementally adds edges to the MST as
long as they do not create a cycle.

Steps of Kruskal's Algorithm


 Initialization - Sort all the edges in the graph by their weight in non-decreasing
order.
 Edge Selection - Starting from the smallest edge add the edge to the MST.
 Cycle Detection - To detect and prevent cycles in the MST.
 Repeat - Continue adding edges until the MST contains exactly (V-1) edges,
where V is the number of vertices.
Key Characteristics of Kruskal's Algorithm
 Kruskal's algorithm uses the concept of disjoint sets to detect cycles efficiently.
 It does not require a starting vertex and is not restricted to a connected graph.
 The time complexity of Kruskal's algorithm is O (E log E) or O (E log V) with
efficient sorting algorithms, where E represents the number of edges and V the
number of vertices.
 Time complexity is O (E log E) or O (E log V) where E is the number of edges
and V is the number of vertices.

Unit III Binary Tree and Graph 35


Applications of Kruskal's Algorithm
 Network Connectivity - Kruskal's algorithm is used in network for fully
connected or not by finding the minimum spanning forest.
 Image Segmentation - It can be applied in image processing to partition an
image into distinct regions by treating pixels as vertices and pixels as edge
weights.
Advantages of Kruskal's Algorithm
 Handling Disconnected Graphs - Kruskal's algorithm handles disconnected
graphs and produces a minimum spanning forest which consists of multiple
MST for each connected component.
 Handling Negative Weights or No Negative Cycles - Kruskal's algorithm graphs
with negative edge weights but there is no negative cycles present in the graph.
 Efficiency for Sparse Graphs - Kruskal's algorithm performs better on sparse
graphs where the number of edges is small.
Disadvantages of Kruskal's Algorithm
 Sorting Overhead - Kruskal's algorithm requires sorting the edges based on
their weights.
 Potential Forest Output - In disconnected graphs, Kruskal's algorithm may
produce a forest of multiple minimum spanning trees which may not applicable
for some applications.

Prim’s Algorithm Kruskal’s Algorithm


Vertex-based, grows the MST one vertex Edge-based, adds edges in increasing order of
at a time weight
Priority queue (min-heap) Union-Find data structure
Adjacency matrix or adjacency list Edge list
Starts from an arbitrary vertex Starts with all vertices as separate trees (forest)
Chooses the minimum weight edge Chooses the minimum weight edge from all
from the connected vertices edges
Not explicitly managed and grows
Uses Union-Find to avoid cycles
connected component
O(V^2) for adjacency matrix and
O(E log E) or O(E log V) to edge sorting
O ((E + V) log V) with a priority queue
Relatively simpler in dense graphs More complex due to cycle management
Difficult to parallelize Easier to parallelize
More memory for priority queue Less memory if edges can be sorted externally
Network design, clustering with dense Road networks, telecommunications with
connections sparse connections
No specific starting point, operates on global
Requires a starting vertex
edges
Dense graphs where adjacency list is
Sparse graphs where edge list is efficient
used
===============

Unit III Binary Tree and Graph 36


Shortest Path - Dijkstras Algorithm

Dijkstras Algorithm
 History of Dijkstra's Algorithm
 Working of Dijkstra's Algorithm
 Two properties of each Vertex
 Examples
 Advantages
 Disadvantages

Dijkstra's Algorithm is a Graph algorithm that finds the shortest path from a
source vertex to all other vertices in the Graph of single source shortest path. It is a
type of Greedy Algorithm that only works on Weighted Graphs having positive
weights.
The time complexity
 The adjacency matrix of the graph - O(V2)
 The adjacency list of the graph - O((V + E) log V)
Where V- The number of vertices in the graph
E - The number of edges in the graph

History of Dijkstra's Algorithm


Dijkstra's Algorithm was designed and published by Dr. Edsger W. Dijkstra, a
Dutch Computer Scientist, Software Engineer, Programmer, Science Essayist, and
Systems Scientist.
Fundamentals of Dijkstra's Algorithm
 Dijkstra's Algorithm begins at the source node and it examines the graph to find
the shortest path between that node and all the other nodes in the graph.
 The Algorithm keeps records of the presently acknowledged shortest distance
from each node to the source node and it updates these values to find the
shorter path.
 The Algorithm has retrieved the shortest path between the source and NEXT
node that node is marked as visited and added the path.
 The procedure continues until all the nodes in the graph is visited in the path. A
path connecting the source node to all other nodes following the shortest
possible path to reach each node.

Working of Dijkstra's Algorithm

A graph and source vertex is requirements for Dijkstra's Algorithm. This


Algorithm is established on Greedy Approach and finds the nearby choice at each step
of the Algorithm.

Unit III Binary Tree and Graph 37


Two properties of each Vertex
Visited Property Path Property

 The visited property show  The path property stores the value of
whether all the node has been the minimum path to the node.
visited.  The minimum path implies the
 This property does not revisit any shortest way to reach the all node.
node.  This property is revised when any
 A node is marked visited only neighbor of the node is visited.
when the shortest path has been  This property stores the final shortest
found. path of all node.

Examples

Unit III Binary Tree and Graph 38


Advantages
 Dijkstra’s algorithm is its considerably low complexity, which is almost linear.

Disadvantages
 Dijkstra’s algorithm cannot working with negative weights,
 When working with dense graphs Dijkstra’s algorithm is not a good option to
calculate the shortest path between any pair of nodes.

=============

Graph transitive closure

Graph transitive closure


 The stages of the algorithm
 Complexity of Transitive closure

Transitive Closure is the reach ability matrix to reach from vertex u to vertex v
of a graph. One graph is given to find a vertex v which is reachable from another
vertex u, for all vertex pairs (u, v).

The final matrix is the Boolean type. When there is a value 1 for vertex u to

Unit III Binary Tree and Graph 39


vertex v, it means that there is at least one path from u to v.

Given a directed graph, determine if a vertex j is reachable from another vertex i


for all vertex pairs (i, j) in the given graph. Here reachable means that there is a path
from vertex i to j. The reach-ability matrix is called the transitive closure of a graph.

The stages of the algorithm

 The input neighbouring matrix of the graph is used to initialise the solution matrix
in the first phase is shortest path.
 The next step is to treat each node in the range of 0–n as a starting vertex when
iterate over them one by one. The nodes 1, 2...Nis iterated over once again while
the terminating vertex into account.
 The next iteration for the shortest path must range from 1, 2,..., k-1, where vertex k
has been chosen as an intermediate vertex.
 There are two potential outcomes for each pair of the starting and finishing
vertices I j), respectively.

Shortest path[i][j] > shortest path[i][k] + shortest path[k][j]


New path path[i][j]

 The loop continues if k is not an intermediate vertex since then nothing is


changed.

Complexity of Transitive closure


 Time Complexity – O (V^3) - where V is the number of vertexes.
 Space complexity – O (V^2) - where n is the arrays size.

Unit III Binary Tree and Graph 40


Floyd-Warshall algorithm is the dynamic programming. It divides the issue into
more manageable sub problems and then integrates the solutions to those sub
problems to address the larger issue so transitive closure of a graph using Floyd
Warshall Algorithm

================

Summary
 An edge is one of the two primary units used to form graphs. Each edge has two
ends which are vertices to which it is attached.
 If two vertices are endpoints of the same edge, they are adjacent.
 A vertex's outgoing edges are directed edges that point to the origin.
 A vertex's incoming edges are directed edges that point to the vertex's
destination.
 The total number of edges occurring to a vertex in a graph is its degree.
 The out-degree of a vertex in a directed graph is the total number of outgoing
edges, whereas the in-degree is the total number of incoming edges.
 A vertex with an in-degree of zero is referred to as a source vertex, while one
with an out-degree of zero is known as sink vertex.
 An isolated vertex is a zero-degree vertex that is not an edge's endpoint.
 A path is a set of alternating vertices and edges, with each vertex connected by
an edge.
 The path that starts and finishes at the same vertex is known as a cycle.
 A path with unique vertices is called a simple path.
 For each pair of vertices x, y, a graph is strongly connected if it contains a
directed path from x to y and a directed path from y to x.
 A directed graph is weakly connected if all of its directed edges are replaced
with undirected edges, resulting in a connected graph. A weakly linked graph's
vertices have at least one out-degree or in-degree.
 A tree is a connected forest. The primary form of the tree is called a rooted tree,
which is a free tree.
 A spanning sub graph that is also a tree is known as a spanning tree.
 A connected component is the unconnected graph's most connected subgraph.
 A bridge, which is an edge of removal, would sever the graph.
 Forest is a graph without a cycle.

================

Unit III Binary Tree and Graph 41


Unit – IV - Storage Devices

Storage Devices -Sorting with Disks: K-Way Merging – Sorting with Tapes Symbol
Tables: Static Tree Tables - Dynamic Tree Tables - Hash Tables: Hashing Functions - Overflow
Handling.

Storage Devices
K-way merge sort
Difference between Static Data Structure and Dynamic Data Structure
Hash Table

Storage Devices

Storage Devices
Types of Computer Storage Devices
 Primary Storage Devices
 Secondary Storage Devices
 Magnetic Storage Devices
 Flash memory Devices
 Optical Storage Devices
 Cloud and Virtual Storage
Characteristics of Computer Storage Devices

The storage unit is a part of the computer system which is employed to store the
information and instructions to be processed. A storage device is an integral part of the
computer hardware which stores information/data to process the result of any computational
work. Without a storage device, a computer would not be able to run or even boot up. The
user can say that a storage device is hardware that is used for storing, porting, or extracting
data files. It can also store information/data both temporarily and permanently.
Types of Computer Storage Devices

These storage devices have their own specification and used in the storage devices are
 Primary Storage Devices
 Secondary Storage Devices
 Magnetic Storage Devices
 Flash memory Devices
 Optical Storage Devices
 Cloud and Virtual Storage

Unit – IV – Storage Device Page 1


Primary Memory Secondary Memory
The primary memory of a computer is the Secondary memory is defined as
main memory that is utilized to store data additional storage devices that are
temporarily. utilized to store data permanently.
It is not directly accessible by the
It is directly accessible by the processor.
processor.
It is both volatile and non-volatile
It is a non-volatile memory.
memory.
It is also known as the secondary or
It is also known as the main memory.
auxiliary memory.
It is composed of magnetic and optical
It is composed of semiconductors.
materials.
The data that must be executed is copied It is utilized to store data that requires
to the main memory. should be stored permanently.
The speed of accessing data is faster. The speed of accessing data is slower.
It is costly. It is cheaper.
The size of primary memory is small. The size of secondary memory is large.
It is internal memory. It is an external memory.
Magnetic memory, semiconductor
Two types - RAM, ROM memory, and optical memory are the
types of secondary memory.

Primary Storage Devices


Primary memory holds only those data and instructions on which the
computer is currently working. It has a limited capacity and data is lost when
power is switched off. These memories are not as fast as registers. The data and
instructions required to be processed reside in the main memory. It is divided
into two subcategories RAM and ROM.

Unit – IV – Storage Device Page 2


Magnetic Storage Devices
 Floppy Disk: Floppy Disk is also known as a floppy diskette. It is generally used on a
personal computer to store data externally. A Floppy disk is made up of a plastic
cartridge and secured with a protective case. Nowadays floppy disk is replaced by new
and effective storage devices like USB, etc.
 Hard Disk – Hard disk is a storage device (HDD) that stores and retrieves data using
magnetic storage. It is a non-volatile storage device that can be modified or deleted n
number of times without any problem. Most computers and laptops have HDDs as
their secondary storage device. It is actually a set of stacked disks, just like phonograph
records. In every hard disk, the data is recorded electromagnetically in concentric
circles or the user can say track present on the hard disk, and with the help of a head
just like a phonograph arm(but fixed in a position) to read the information present on

Unit – IV – Storage Device Page 3


the track. The read-write speed of HDDs is not so fast but decent. It ranges from a few
GBs to a few and more TB.
 Magnetic Card - It is a card in which data is stored by modifying or rearranging the
magnetism of tiny iron-based magnetic particles present on the band of the card. It is
also known as a swipe card. It is used like a passcode (to enter the house or hotel
room), credit card, identity card, etc.
 Tape Cassette - It is also known as a music cassette. It is a rectangular flat container in
which the data is stored in an analog magnetic tape. It is generally used to store audio
recordings.
 Super Disk - It is also called LS-240 and LS-120. It is introduced by Imation
Corporation and it is popular with OEM computers. It can store data up to 240 MB.

Flash Memory Devices


It is a cheaper and more portable storage device. It is the most commonly used device
to store data because is more reliable and efficient as compared to other storage devices. Some
of the commonly used flash memory devices are:
 Pen Drive - It is also known as a USB flash drive that includes flash memory with an
integrated USB interface. The user can directly connect these devices to our computers and
laptops and read/write data into them in a much faster and more efficient way. These
devices are very portable. It ranges from 1GB to 256GB generally.
 SSD - It stands for Solid State Drive a mass storage device like HDD. It is more durable
because it does not contain optical disks inside like hard disks. It needs less power as
compared to hard disks, is light weight, and has 10x faster read and writes speed as
compared to hard disks. But, these are costly as the well. While SSDs serve an equivalent
function as hard drives, their internal components are much different. Unlike hard drives,
SSDs don’t have any moving parts and thus they’re called solid-state drives. Instead of
storing data on magnetic platters, SSDs store data using non-volatile storage. Since SSDs
haven’t any moving parts, they do not need to “spin up”. It ranges from 150GB to a few
more TB.
 SD Card - It is known as a Secure Digital Card. It is generally used with electronic
devices like phones, digital cameras, etc. to store larger data. It is portable and the size of
the SD card is also small so that it can easily fit into electronic devices. It is available in
different sizes like 2GB, 4GB, 8GB, etc.
 Memory Card - It is generally used in digital cameras, printers, game consoles, etc. It is
also used to store large amounts of data and is available in different sizes. To run a
memory card on a computer the user requires a separate memory card reader.

Unit – IV – Storage Device Page 4


 Multimedia Card - It is also known as MMC. It is an integrated circuit that is generally
used in-car radios, digital cameras, etc. It is an external device to store data/information.

Optical Storage Devices

Optical Storage Devices is also secondary storage device. It is a removable storage device.
Following are some optical storage devices:
 CD - It is known as Compact Disc. It contains tracks and sectors on its surface to store
data. It is made up of polycarbonate plastic and is circular in shape. CD can store data up
to 700MB. It is of two types:
 CD-R: It stands for Compact Disc read-only. In this type of CD, once the data is
written cannot be erased. It is read-only.
 CD-RW: It stands for Compact Disc Read Write. In this type of CD, the user can
easily write or erase data multiple times.
 DVD - It is known as Digital Versatile Disc. DVDs are circular flat optical discs used to
store data. It comes in two different sizes one is 4.7GB single-layer discs and another one
is 8.5GB double-layer discs. DVDs look like CDs but the storage capacity of DVDs is more
than as compared to CDs. It is of two types:
 DVD-R - It stands for Digital Versatile Disc read-only. In this type of DVD,
once the data is written cannot be erased. It is read-only. It is generally used to
write movies, etc.
 DVD-RW - It stands for Digital Versatile Disc Read Write. In this type of DVD,
the user can easily write or erase data multiple times.
 Blu-ray Disc - It is just like CD and DVD but the storage capacity of Blu-ray is up to
25GB. To run a Blu-ray disc the user need a separate Blu-ray reader. This Blu-ray
technology is used to read a disc from a blue-violet laser due to which the information is
stored in greater density with a longer wavelength.

Unit – IV – Storage Device Page 5


Cloud and Virtual Storage
Secondary memory has been upgraded to virtual or cloud storage devices. The user
can store our files and other stuff in the cloud and the data is stored for as long as the user
pay for the cloud storage. There are many companies that provide cloud services largely
Google, Amazon, Microsoft, etc. The user can pay the rent for the amount of space the user
need and the user get multiple benefits out of it. Though it is actually being stored in a
physical device located in the data centers of the service provider, the user doesn’t interact
with the physical device and its maintenance.
Example - Amazon The user Services offers AWS S3 as a type of storage where users
can store data virtually instead of being stored in physical hard drive devices. These sorts of
innovations represent the frontier of where storage media goes.

Characteristics of Computer Storage Devices


 Data stored in the Memory can be changed or replaced in case of a requirement,
because of the mobility of the storage devices.
 Storage Devices validate that saved data can be replaced or deleted as per the
requirements because the storage devices are easily readable, writeable, and rewritable.
 Storage Devices are easy and convenient to access because they do not require much
skill set to handle these resources.
 The storage capacity of these devices is an extra advantage to the system.

Unit – IV – Storage Device Page 6


 Storage Devices have better performance and data can be easily transferred from one
device to another.
==============

K-way merge sort

K-way merge sort


K-way merge sort algorithm
Time and Space Complexity
Sorting with Disc
K-Way Merging

K-way merge sort is a sophisticated sorting algorithm that extends the merge sort
approach. The goal of the k-way merge problem is to combine k-sorted arrays into one sorted
array that has the same elements. While the traditional merge sort algorithm merges two sub
arrays at a time, the K-way merge sort merges K sub arrays, which leads to better performance
when dealing with vast amounts of data.

This algorithm is handy when handling large datasets, as it can efficiently sort and
merge multiple sub arrays, leading to improved efficiency and faster processing times. The K-
way merge sort algorithm is used in various applications that require sorting large data sets.

K-way merge sort algorithm

1. Divide: The original array is divided into K sub arrays of approximately equal size.
2. Conquer: Each of these sub arrays is sorted individually. This can be done using any
sorting algorithm, but for efficiency, the user typically uses a recursive application of the
K-way merge sort.
3. Combine: The sorted sub arrays are merged and the user merge K arrays using a min-
heap data structure.

Unit – IV – Storage Device Page 7


K-way merge sort is an advanced sorting algorithm that breaks the input dataset
into K number of smaller sub-lists, each of which is sorted individually. These sub-lists
are then merged to obtain the final sorted output. This approach helps to reduce the
overall time complexity of the sorting process and makes it particularly useful when
dealing with large datasets. The user use divide-and-conquer efficiency of a min-heap,
K-way merge sort provides with large values of K.

Unit – IV – Storage Device Page 8


Time and Space Complexity

 Time Complexity - Its time complexity is O(n logk n), where n is the number of elements
in the array. This efficiency is particularly noticeable when K is greater than 2 and there
is a significant amount of data.

 Time: O(N log k) where N is total elements


 Space Complexity - During the process of the K-way merge sort, multiple sub arrays are
created, and a min-heap is used to merge the sub arrays efficiently. Additional space is
required to store the sub arrays and the min-heap performance when dealing with large
datasets.
 Space: O(k) to store k elements in heap performance when dealing with
large datasets.

Unit – IV – Storage Device Page 9


Sorting with Disc

Sorting with Disc is the external Sort-Merge Algorithm in database system. It


means arranging the data either in ascending or descending order. The user use sorting
not only for generating a sequenced output but also for satisfying conditions of various
database algorithms. In query processing, the sorting method is used for performing
various relational operations efficiently. But the need is to provide a sorted input value
to the system. For sorting any relation, the user have to build an index on the sort key
and use that index for reading the relation in sorted order. An index, the user sort the
relation logically, not physically that include

 Relations that are having either small or medium size than main memory.
 Relations having a size larger than the memory size.

All the sorting algorithms that the data is in memory and that the user can swap
data elements efficiently. Sometimes, because of the size of the data the user cannot fit
all of it in memory. For this assignment, the user will be implementing an on-disk
sorting algorithm that is designed to use the disk efficiently to sort large data sets. The
sorting algorithm will work in two phases

 First, sorting algorithm breaks the data into chunks and sorts each of these
individual chunks. This is accomplished by reading a chunk of data, sorting it,
writing it to a file, then reading more data, etc. At the end of this phase, the user
will have a number of files on disk that are all sorted.
 Second, the user will need to merge all of these files into one large file. This is
accomplished by pair-wise merging of the files and then writing out the result to
a new, larger merged file. All of the files will be merged to one large file can done
memory efficiently.

Unit – IV – Storage Device Page 10


Difference between Static Data Structure and Dynamic Data Structure

Static Data Structures Dynamic Data Structures


Static Data Structure has a fixed size. Dynamic Data Structure has a dynamic
size, which means it can be increased
and decreased.
Memory allocated at compile time Memory allocated at runtime
No dynamic memory allocation or Dynamic memory allocation and
deallocation deallocation
Not efficient for frequent Efficient for frequent
insertions/deletions insertions/deletions
Arrays, Stacks, Queues Linked Lists, Trees, Hash Tables

Features of Static Data Structures Features of Dynamic Data Structures


The size of a static data structure is fixed The size of a dynamic data structure is not
and determined at compile-time fixed and can change during runtime.
Memory allocation for static data Memory allocation for dynamic data
structures is done at compile-time and structures is done during runtime using
cannot be changed during runtime. techniques such as heap memory
allocation or pointer-based data structures.
Static data structures can be accessed Dynamic data structures can grow or
randomly or sequentially, and access time shrink as needed and are best suited for
is usually faster than dynamic data applications that have varying sizes or a
structures. changing number of elements.
Static data structures are best suited for Insertion and deletion operations are
applications that have a known maximum generally faster in dynamic data structures
size and a fixed number of elements. since elements can be added or removed
without the need to shift other elements.

Unit – IV – Storage Device Page 11


Insertion and deletion operations can be Accessing elements in a dynamic data
slower in static data structures, especially structure may be slower than in a static
if the structure needs to be resized or data structure since memory may be
elements need to be shifted. spread out in different locations.

Advantages of Static Data Structures Disadvantages of Static Data


Structures
Accessing elements is faster than dynamic They have a fixed size, which can lead to
data structures. memory wastage for larger or variable-
sized data sets.
Memory is allocated at compile-time, They are not flexible and adding or
eliminating the need for additional removing elements can be inefficient and
memory management during runtime. require resizing the entire structure.
They can be more efficient for smaller They can lead to memory fragmentation
datasets with fixed sizes. and may not be suitable for long-term use.
They are easier to implement and require They can be challenging to implement for
less overhead compared to dynamic data complex or dynamic applications.
structures.

Advantages of Dynamic Data Disadvantages of Dynamic Data


Structures Structures
They can change in size and shape during Memory allocation and deallocation can
runtime, making them flexible and lead to performance issues, such as
adaptable to different applications. fragmentation, overhead, or memory
leaks.
Memory usage is optimized, as it can grow They may require more memory than
or shrink based on actual data static data structures for bookkeeping and
requirements. pointer storage.
Insertion and deletion of elements are Accessing elements can be slower than
faster and more efficient than static data static data structures due to the need for
structures pointer traversal or indirection.
They are the userll-suited for large and They can be more challenging to
complex data sets. implement and debug than static data
structures.

=============

Unit – IV – Storage Device Page 12


Hash Table

Hash Table
 Uses of Hash Tables
 Drawback of Hash function
 Hashing
 Terminologies in hashing
 Two techniques for searching
 Linear Searching
 Binary Searching
 Methods of Hash functions
 Division Method
 Multiplication Method
 Universal Hashing
 Collision
 Causes of Hash Collisions
 Poor Hash Function
 High Load Factor
 Similar Keys
 Causes of Hash Collisions
 Two types Collision Resolution Techniques
 Open Addressing
 Linear probing
 Quadratic probing
 Double Hashing technique
 Close Addressing
 Advantages of Hashing
 Disadvantages of Hashing
 Applications of Hashing

Hash table is a special function known as a hash function that maps a given
value with a key to access the elements faster.
A hash table is also referred as a hash map for key value pairs or a hash set for
keys only. It uses a hash function to map keys to a fixed-size array called a hash table.
This allows in faster search, insertion, and deletion operations.
A Hash table stores some information, and the information has basically two
main components
 key
 value

Unit – IV – Storage Device Page 13


The hash table can be implemented with the help of an associative array. The
efficiency of mapping depends upon the efficiency of the hash function used for
mapping.
Example - The key value is Arun and the value is the phone number then
 pass the key value in the hash function - Hash(key)= index;
 pass the key in the hash function, then it gives the index - Hash(Arun) = 3;
The above example adds the Arun at the index 3.

Uses of Hash Tables

 Checking if something is in a collection (like finding a book in a library).


 Storing unique items and quickly finding them (like storing phone numbers).
 Connecting values to keys (like linking names to phone numbers).

Drawback of Hash function


A Hash function assigns each value with a unique key. Sometimes hash table
uses an imperfect hash function that causes a collision because the hash function
generates the same key of two different values.
Hashing
Hashing is a technique used in data structures that efficiently stores and
retrieves data in a way that allows for quick access. It involves mapping data to a
specific index in a hash table using a hash function that enables fast retrieval of
information based on its key. This method is commonly used in databases, caching
systems, and various programming applications to optimize search and retrieval
operations. The great thing about hashing is, the user can achieve all three operations
(search, insert and delete) in O(1) time on average.

Hashing is a technique used in data structures to store and retrieve data efficiently.
It involves using a hash function to map data items to a fixed-size array which is
called a hash table.

Unit – IV – Storage Device Page 14


Terminologies in hashing
 Hash Function – The user provides data items into the hash function.
 Hash Code - The hash function heads the data and gives a unique hash code.
This hash code is integer value that used an index.
 Hash Table - The hash code points to a specific location within the hash table.

Hashing is one of the searching techniques that uses a constant time. The
time complexity in hashing is O(1).

Two techniques for searching


 Linear searching - The worst time complexity is O(n)
 Binary searching - The worst time complexity is O(log n)

In both the searching techniques, the searching depends upon the number of
elements but the user wants the technique that takes a constant time. So, hashing
technique provides a constant time.
In hashing technique, the hash table and hash function are used. Using the hash
function, the user can calculate the address at which the value can be stored.
The hashing is to create the key or value pairs. If the key is given, then the
algorithm computes the index at which the value stored.
Index = hash(key)

The hash function is a function that takes a key and returns an index into the hash
table.
The goal of a hash function is to distribute keys evenly across the hash table,
minimizing collisions when two keys map to the same index.
Hashing is a technique to convert a range of key values into a range of indexes of an
array. The user going to use modulo operator to get a range of key values.

Example - size 20 and the following items are to be stored. Item are in the (key,
value) format.

Unit – IV – Storage Device Page 15


 (1,20)
 (2,70)
 (42,80)
 (4,25)
 (12,44)
 (14,32)
 (17,11)
 (13,78)
 (37,98)

S. No. Key Hash Array Index


1 1 1 % 20 = 1 1
2 2 2 % 20 = 2 2
3 42 42 % 20 = 2 2
4 4 4 % 20 = 4 4
5 12 12 % 20 = 12 12
6 14 14 % 20 = 14 14
7 17 17 % 20 = 17 17
8 13 13 % 20 = 13 13
9 37 37 % 20 = 17 17

Methods of Hash functions


 Division Method - Key % Hash Table Size
 Multiplication Method - (Key * Constant) % Hash Table Size
 Universal Hashing - A family of hash functions designed to minimize collisions
Division Method
In the division method, the hash function is h(ki) = ki % m and where m is the size of
the hash table.
Example - if the key value is 6 and the size of the hash table is 10. When the user
apply the hash function to key 6 then the index is

Unit – IV – Storage Device Page 16


h(6) = 6%10 = 6
The index is 6 at which the value is stored.
In the above example, the value is stored at index 6. If the key value is 26, then
the index h(26) = 26%10 = 6
Therefore, two values are stored at the same index is 6 and this leads to the
collision problem. To resolve these collisions, the user has some techniques known as
collision techniques.

Collision
A hash collision occurs when two different keys map to the same index in a
hash table. This can happen even with a good hash function, especially if the hash
table is full or the keys are similar.

Causes of Hash Collisions


 Poor Hash Function - A hash function that does not distribute keys evenly
across the hash table can lead to more collisions.
 High Load Factor - A high load factor (ratio of keys to hash table size) increases
the probability of collisions.
 Similar Keys - Keys that are similar in value or structure are more likely to
collide.
When the two different values have the same value, then the problem occurs
between the two values, known as a collision.

Two types Collision Resolution Techniques

 Open Addressing - It is also known as Closed Hashing


 Linear Probing - Search for an empty slot sequentially
 Quadratic Probing - Search for an empty slot using a quadratic function
 Closed Addressing - It is also known as Open Hashing
 Chaining - Store colliding keys in a linked list or binary search tree at
each index
 Cuckoo Hashing - Use multiple hash functions to distribute keys

Unit – IV – Storage Device Page 17


Separate Chaining Open Addressing
All the keys are stored only inside the hash
Keys are stored inside the hash table as
table.
well as outside the hash table.
No key is present outside the hash table.
The number of keys to be stored in the The number of keys to be stored in the
hash table can even exceed the size of the hash table can never exceed the size of the
hash table. hash table
Deletion is easier. Deletion is difficult.
Extra space is required for the pointers to No extra space is required.
store the keys outside the hash table.
Cache performance is poor. Cache performance is better.
This is because of linked lists which store This is because here no linked lists are
the keys outside the hash table. used.
Some buckets of the hash table are never Buckets may be used even if no key maps
used which leads to wastage of space. to those particular buckets.

Open Hashing
Open Hashing, one of the methods used to resolve the collision is known as a
chaining method to resolve the collision.

Unit – IV – Storage Device Page 18


The list of key values
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3

In this case, the user cannot directly use h(k) = ki/m as h(k) = 2k+3

The index of
key value
index = h(3) = The value 3 would be stored at the index 9.
3
(2(3)+3)%10 = 9
index = h(2) = The value 2 would be stored at the index 7
2
(2(2)+3)%10 = 7
index = h(9) = The value 9 would be stored at the index 1
9
(2(9)+3)%10 = 1
index = h(6) = The value 6 would be stored at the index 5
6
(2(6)+3)%10 = 5
The value 11 would be stored at the index 5.
Now, the user have two values (6, 11) stored at
the same index, i.e., 5. This leads to the
collision problem, so the user will use the
index = h(11) =
11 chaining method to avoid the collision. The
(2(11)+3)%10 = 5
user will create one more list and add the
value 11 to this list. After the creation of the
new list, the newly created list will be linked to
the list having value 6.

Unit – IV – Storage Device Page 19


The value 13 would be stored at index 9. Now,
the user have two values (3, 13) stored at the
same index, i.e., 9. This leads to the collision
problem, so the user will use the chaining
index = h(13) =
13 method to avoid the collision. The user will
(2(13)+3)%10 = 9
create one more list and add the value 13 to
this list. After the creation of the new list, the
newly created list will be linked to the list
having value 3.
The value 7 would be stored at index 7. Now,
the user have two values (2, 7) stored at the
same index, i.e., 7. This leads to the collision
problem, so the user will use the chaining
index = h(7) =
7 method to avoid the collision. The user will
(2(7)+3)%10 = 7
create one more list and add the value 7 to this
list. After the creation of the new list, the
newly created list will be linked to the list
having value 2.
According to the above calculation, the value
12 must be stored at index 7, but the value 2
index = h(12) =
12 exists at index 7. So, the user will create a new
(2(12)+3)%10 = 7
list and add 12 to the list. The newly created
list will be linked to the list having a value 7.

The calculated index value associated with each key value

key Location(u)
3 ((2*3)+3)%10 = 9
2 ((2*2)+3)%10 = 7
9 ((2*9)+3)%10 = 1
6 ((2*6)+3)%10 = 5
11 ((2*11)+3)%10 = 5
13 ((2*13)+3)%10 = 9
7 ((2*7)+3)%10 = 7
12 ((2*12)+3)%10 = 7
Closed Hashing

Like open hashing, closed hashing is also a technique used for collision
resolution in hash tables. Unlike open hashing, where collisions are resolved by
chaining elements in separate chains, closed hashing aims to store all elements directly
within the hash table itself without the use of additional data structures like linked lists.

Unit – IV – Storage Device Page 20


In closed hashing, when a collision occurs, and a hash slot is already occupied, the
algorithm probes for the next available slot in the hash table until an empty slot is found
as Search, insert and delete operations are conducted in closed hashing

 Search: To search for an element in closed hashing, the user calculate its hash
value and probe through the slots until the user find the desired element or an
empty slot. If the element is found, the user can retrieve it. If an empty slot is
encountered during probing, the element is not present in the hash table.
 Insert: When inserting a new element, the user calculate its hash value and probe
through the slots until the user find an empty slot. Once an empty slot is found,
element into that slot.
 Delete: Deleting an element in closed hashing typically involves marking the slot
as "deleted" or using a special marker to indicate that the slot was once occupied
but is now vacant.

The probing process can be done in various ways, such as linear probing, quadratic
probing, or double hashing.

 Linear probing
 Quadratic probing
 Double Hashing technique

Linear Probing
Linear probing is one of the forms of open addressing. Each cell in the hash table
contains a key-value pair, so when the collision occurs by mapping a new key to the cell
already occupied by another key, then linear probing technique searches for the closest
free locations and adds a new key to that empty cell. In this case, searching is performed
sequentially, starting from the position where the collision occurs till the empty cell is
not found.

Unit – IV – Storage Device Page 21


Quadratic Probing

In case of linear probing, searching is performed linearly. In contrast, quadratic


probing is an open addressing technique that uses quadratic polynomial for searching
until a empty slot is found.
It can also be defined as that it allows the insertion ki at first free location
from (u+i2) % m where i=0 to m-1.

Unit – IV – Storage Device Page 22


Double Hashing
Double hashing is an open addressing technique which is used to avoid the
collisions. When the collision occurs then this technique uses the secondary hash of the
key. It uses one hash value as an index to move forward until the empty location is
found.
In double hashing, two hash functions are used. Suppose h1(k) is one of the hash
functions used to calculate the locations whereas h2(k) is another hash function. It can
be defined as "insert ki at first free place from (u + v*i) % m where i= (0 to m-1)". In this
case, u is the location computed using the hash function and v is equal to (h2 (k) % m).
Consider the same example that the user use in quadratic probing.

Unit – IV – Storage Device Page 23


Applications of Hashing

 Dictionaries - To implement a dictionary so that the user can quickly search a


word
 Databases - Hashing is used in database indexing. There are two popular ways
to implement indexing, search trees (B or B+ Tree) and hashing.
 Cryptography - When the user create a password on a website, they typically
store it after applying a hash function rather than plain text
 Caching - Storing frequently accessed data for faster retrieval. For example
browser caches, the user can use URL as keys and find the local storage of the
URL.
 Symbol Tables - Mapping identifiers to their values in programming languages

Unit – IV – Storage Device Page 24


 Network Routing - Determining the best path for data packets
 Associative Arrays - Associative arrays are used in SQL library functions allow
the user to retrieve data as associative arrays so that the retrieved data in RAM
can be quickly searched for a key. Hash Table is a data structure which stores
data in an associative manner. In a hash table, data is stored in an array format,
where each data value has its own unique index value. Access of data becomes
very fast if the user know the index of the desired data. Thus, it becomes a data
structure in which insertion and search operations are very fast irrespective of
the size of the data. Hash Table uses an array as a storage medium and uses hash
technique to generate an index where an element is to be inserted or is to be
located from.

Advantages of Hashing
 Data Retrieval Speed - Hashing allows for fast data retrieval by mapping a key (or
data) to a unique hash value, which serves as an index. This enables efficient
lookup operations, typically with constant time complexity O(1).
 Data Integrity - Hashing is commonly used to verify data integrity. By comparing
the hash value of received data with the original hash value, the user can detect if
the data has been altered or corrupted during transmission.
 Password Storage - Hashing is widely used for securely storing passwords.
Instead of storing plain-text passwords, systems store the hash values of
passwords. Even if the hash is compromised, it’s challenging for attackers to
reverse it to obtain the original password.
 Data Structures - Hashing is a fundamental component of hash tables, which are
versatile data structures used in various applications like databases, caching, and
search engines for efficient data retrieval.
 Cryptography - Hash functions are crucial in cryptographic applications, such as
digital signatures, message authentication codes (MACs), and data encryption
algorithms. They provide security by transforming data into a fixed-size hash that
is difficult to reverse engineer.
 Data De-duplication - Hashing is used in data de-duplication techniques to
identify and eliminate duplicate data in storage systems, saving storage space.

Disadvantages of Hashing
 Collision Risk - Collisions occur when two different inputs produce the same
hash value. While good hash functions aim to minimize collisions, they are still
possible. Collisions can have security implications and impact the efficiency of
hash tables.
 Non-Reversible - Hash functions are designed to be one-way functions, meaning
it’s computationally infeasible to reverse the process and obtain the original input
data. This can be a disadvantage when reverse lookup is required.

Unit – IV – Storage Device Page 25


 Deterministic - Hash functions are deterministic, meaning the same input will
always produce the same hash value. This can be problematic for security if an
attacker knows the input values and hash function.
 Limited Range - Hash functions have a limited output range (fixed length), which
means that no matter how large the input data is, the hash value will always be of
a fixed size. This can lead to hash collisions in situations with many possible
inputs.
 Performance Impact - Computing hash values can be computationally intensive
for complex data structures or large datasets. This can impact the performance of
applications that heavily rely on hashing.
 Security Vulnerabilities - If a weak or poorly designed hash function is used, it
can be vulnerable to various attacks, such as collision attacks, rainbow table
attacks, and pre image attacks.

Technique Advantages Disadvantages Performance


Lookup, insertion, and deletion
Efficient for Requires extra
have O(1 + k/n) average
Chaining high collision memory for
complexity, where k is the
scenarios. linked lists.
number of elements.
Simple Lookup, insertion, and deletion
Prone to
implementation have O(1) average complexity,
Linear Probing primary
and cache- but can degrade to O(n) in
clustering.
friendly. worst cases.
Lookup, insertion, and deletion
Reduces Suffers from
Quadratic have O(1) average complexity,
primary secondary
Probing but can degrade to O(n) in
clustering. clustering.
worst cases.

Uniform Lookup, insertion, and deletion


distribution and Requires careful have O(1) average complexity,
Double Hashing avoids selection of hash but performance can degrade if
clustering functions. hash functions produce many
issues. collisions.

=============

Unit – IV – Storage Device Page 26


Unit – V - Internal Sorting

Insertion Sort - Quick Sort - 2 Way Merge Sort - Heap Sort – Shell Sort - Sorting on
Several Keys. Files: Files, Queries and Sequential organizations – Index Techniques -File
Organizations.

------------------------

Internal Sorting
File Organization
Indexing
Hash File Organization
File Organization
Merge Sort

Internal Sorting

Internal Sorting
• Types of Internal Sorting
• Bubble Sort
• Quick Sort
• Insertion Sort
• Heap Sort
• Shell sort
• Merge Sort
• Disadvantages of Merge Sort
• Advantages of Merge Sort
• Applications of Merge Sort
• Complexity Analysis of Merge Sort
• Recurrence Relation of Merge Sort
• Difference between Internal and External Sort
• Importance of Internal Sorting

Sorting plays an important role in efficiently organising and arranging data.


Sorting is a process that is required in many places and is to be handled properly in order
to use efficiently. Sorting is a simple process of swapping two or more numbers in an
ordered for both ascending and descending order.

UNIT - V - Internal Sorting 1


Internal Sorting is the type of sorting that takes place inside the main memory of
the computer and happens only when the data to be sorted is exceptionally small enough
that can be managed by the main memory. Reading and writing of the data from this
slower media slows down the sorting process significantly.

Types of Internal Sorting

Bubble Sort: A simple and intuitive sorting algorithm that repeatedly steps through the
list, compares adjacent elements, and swaps them if they are in the wrong order. Bubble
sort has limited practical applications due to its inefficiency on large datasets.

Quick Sort - A divide-and-conquer algorithm works by selecting a 'pivot' element and


partitioning the array into two sub-arrays, according to whether elements are less than or
greater than the pivot. Quick sort is efficient and often the preferred choice for sorting
large datasets.

UNIT - V - Internal Sorting 2


Advantages
• Average-case performance is O(n log n).
• In-place sorting, meaning that it does not require additional memory space for
sorting.
• Easy to implement and understand.
Disadvantages
• The worst-case performance is O(n^2), which is non-optimal.
• Not stable, meaning that it does not preserve the order of equal elements in the
input data.
• Not as efficient as merge sort for very large data sets.
Insertion Sort - This algorithm builds the final sorted array one item at a time. It is much
less efficient on large lists compared to more advanced algorithms such as quick sort and
merge sort.

UNIT - V - Internal Sorting 3


Advantages
• Simple and easy to implement.
• Efficient for small data sets.
• In-place sorting, meaning that it does not require additional memory space for
sorting.
Disadvantages
• The worst-case performance is O(n^2), which is non-optimal.
• Not efficient for large data sets.
• Not as adaptive as other algorithms, meaning that it does not adjust its performance
depending on the input data.
Heap Sort - A comparison-based sorting algorithm uses a binary heap data structure.
While not as popular as quick sort or merge sort, heap sort has its applications in specific
scenarios.

Shell sort

Shell sort is a highly efficient sorting algorithm and is based on insertion sort
algorithm. This algorithm avoids large shifts as in case of insertion sort, if the smaller
value is to the far right and has to be moved to the far left.

This algorithm uses insertion sort on a widely spread elements, first to sort them and
then sorts the less widely spaced elements. This spacing is termed as interval. This
interval is calculated based on Knuth's formula as

h=h*3+1

UNIT - V - Internal Sorting 4


where − h is interval with initial value 1

This algorithm is quite efficient for medium-sized data sets as its average and
worst-case complexity are of O(n), where n is the number of items.

Shell Sort Algorithm


• Initialize the value of h.

• Divide the list into smaller sub-list of equal interval h.

• Sort these sub-lists using insertion sort.

• Repeat until complete list is sorted.

• Here these values are {35, 14}, {33, 19}, {42, 27} and {10, 14}

UNIT - V - Internal Sorting 5


UNIT - V - Internal Sorting 6
• Following is the step-by-step depiction

UNIT - V - Internal Sorting 7


Advantages:
• Efficient for medium-sized data sets.
• Adaptive, meaning that it can adjust its performance depending on the input data.
• In-place sorting, meaning that it does not require additional memory space for
sorting.
Disadvantages:
• The complexity of implementation.
• Not stable, meaning that it does not preserve the order of equal elements in the
input data.
• Non-optimal worst-case performance of O(n^2).

Merge Sort

Merge sort is sorting algorithms that follows the divide and conquer approach. It
works by recursively dividing the input array into smaller sub arrays and sorting those
sub arrays then merging them back together to obtain the sorted array.
In simple terms, we can say that the process of merge sort is to divide the array into two
halves, sort each half, and then merge the sorted halves back together. This process is
repeated until the entire array is sorted.

UNIT - V - Internal Sorting 8


Merge sort is a popular sorting algorithm known for its efficiency and stability. It follows
the divide-and-conquer approach to sort a given array of elements are

▪ Divide: Divide the list or array recursively into two halves until it can no more be
divided.
▪ Conquer: Each sub array is sorted individually using the merge sort algorithm.
▪ Merge: The sorted sub arrays are merged back together in sorted order. The
process continues until all elements from both sub arrays have been merged.

Merge sort is one of the most optimized and most used sorting algorithms in the
industry but there are more sorting algorithms that are more optimized in different
cases.

Recurrence Relation of Merge Sort

▪ T(n) Represents the total time taken by the algorithm to sort an array of size n.
▪ 2T(n/2) represents time taken by the algorithm to recursively sort the two halves
of the array. Since each half has n/2 elements, we have two recursive calls with
input size as (n/2).
▪ O(n) represents the time taken to merge the two sorted halves
Complexity Analysis of Merge Sort
• Time Complexity:
▪ Best Case: O(n log n), When the array is already sorted or nearly sorted.
▪ Average Case: O(n log n), When the array is randomly ordered.
▪ Worst Case: O(n log n), When the array is sorted in reverse order.
• Auxiliary Space: O(n), Additional space is required for the temporary array used
during merging.
Applications of Merge Sort
• Sorting large datasets
• External sorting (when the dataset is too large to fit in memory)
• Inversion counting
• Merge Sort and its variations are used in library methods of programming
languages.
Example its variation Tim Sort is used in Python, Java Android and Swift. The
main reason why it is preferred to sort non-primitive types is stability which is not
there in Quick Sort. For example
• It is a preferred algorithm for sorting Linked lists.
• It can be easily parallelized as we can independently sort sub arrays and then
merge.

UNIT - V - Internal Sorting 9


• The merge function of merge sort to efficiently solve the problems like union and
intersection of two sorted arrays.
Advantages of Merge Sort
• Stability - Merge sort is a stable sorting algorithm, which means it maintains the
relative order of equal elements in the input array.
• Guaranteed worst-case performance - Merge sort has a worst-case time
complexity of O(N logN) , which means it performs well even on large datasets.
• Simple to implement - The divide-and-conquer approach is straightforward.
• Naturally Parallel - We independently merge sub arrays that makes it suitable
for parallel processing.
Disadvantages of Merge Sort
• Space complexity - Merge sort requires additional memory to store the merged
sub-arrays during the sorting process.
• Not in-place - Merge sort is not in-place sorting algorithm, which means it
requires additional memory to store the sorted data. This can be a disadvantage in
applications where memory usage is a concern.
• Slower than Quick Sort and more cache friendly because it works in-place.

Importance of Internal Sorting

Internal sorting plays a crucial role in the efficient operation of computer systems and
databases.

• Data Retrieval Efficiency - When data is sorted, searching for specific information
becomes faster and more efficient. This is crucial in applications like search
engines and database management systems.
• Resource Optimization - Internal sorting reduces the need for frequent data
transfer between primary and secondary storage, which helps conserve resources
and enhance the overall system performance.
• User Experience - In user-facing applications, such as e-commerce websites,
internal sorting ensures that search results are presented quickly and in a user-
friendly manner, enhancing the user experience.
• Memory Constraints - Sorting large datasets in primary memory can be
challenging if memory resources are limited.
• Algorithm Selection - Choosing the right sorting algorithm is critical. The selection
depends on the data size, distribution, and the desired efficiency.
• Stability and prevention – Sorting algorithms alter the relative order of equal
elements in specific applications where stability and preservation of the order.

UNIT - V - Internal Sorting 10


Difference between Internal and External Sort

Internal Sort External Sort


Internal sorting is used when the External sorting is used when the input
input data can be adjusted in the data cannot be adjusted in the main memory
main memory all at once. all at once.
The data to be sorted should be
It is used when data to be sorted cannot fit
small enough to fit in the main
into the main memory all at once.
memory.
The storage device used in this
Both Secondary memory (Hard Disk) and
method is only main memory
Main memory are used in this method.
(RAM).
While sorting is in progress, all While sorting, data is loaded into the Main
the data to be sorted is stored in the memory in small chunks, all data is stored
main memory at all times. outside the memory like on Hard Disk.
Algorithms Used for Internal Sort Algorithms Used for External Sort are
are Bubble sort, Insertion Sort, Merge sort, Tape Sort, External radix sort,
Quick sort, Heap sort, etc. etc.
==========
File Organization

File Organization
• There are two important features of file:
• File Organization
• Three types of organizing the file
• Sequential access file organization
• Advantages of sequential file
• Disadvantages of sequential file
• Direct access files organization
• Advantages of direct access file organization
• Disadvantages of direct access file organization
• Indexed sequential access file organization
• Advantages of Indexed sequential access file organization
• Disadvantages of Indexed sequential access file organization

UNIT - V - Internal Sorting 11


File is a collection of records related to each other. The file size is limited by the
size of memory and storage medium.
There are two important features of file:

• File Activity - specifies of actual records which proceed in a single run.

• File Volatility - addresses the properties of record changes. It helps to


increase the efficiency of disk design than tape.
File Organization
File organization ensures that records are available for processing. It is used to
determine an efficient file organization for each base relation.

For example, if we want to retrieve employee records in alphabetical order of name.


Sorting the file by employee name is a good file organization. However, if we want to
retrieve all employees whose marks are in a certain range, a file is ordered by employee
name would not be a good file organization.

Types of File Organization

Three types of organizing the file

• Sequential access file organization


• Direct access files organization
• Indexed sequential access file organization

Sequential access file organization


• Storing and sorting in contiguous block within files on tape or disk is called
as sequential access file organization.
• In sequential access file organization, all records are stored in a sequential order.
The records are arranged in the ascending or descending order of a key field.
• Sequential file search starts from the beginning of the file and the records can be
added at the end of the file.
• In sequential file, it is not possible to add a record in the middle of the file
without rewriting the file.
Advantages of sequential file

• It is simple to program and easy to design.


• Sequential file is best use if storage space.

UNIT - V - Internal Sorting 12


Disadvantages of sequential file

• Sequential file is time consuming process.


• It has high data redundancy.
• Random searching is not possible.
Direct access file organization
• Direct access file is also known as random access or relative file organization.
• In direct access file, all records are stored in direct access storage device
(DASD), such as hard disk. The records are randomly placed throughout the
file.
• The records does not need to be in sequence because they are updated directly
and rewritten back in the same location.
• This file organization is useful for immediate access to large amount of
information. It is used in accessing large databases.
• It is also called as hashing.
Advantages of direct access file organization

• Direct access file helps in online transaction processing system (OLTP) like online
railway reservation system.
• In direct access file, sorting of the records are not required.
• It accesses the desired records immediately.
• It updates several files quickly.
• It has better control over record allocation.
Disadvantages of direct access file organization

• Direct access file does not provide back up facility.


• It is expensive.
• It has less storage space as compared to sequential file.
Indexed sequential access file organization
• Indexed sequential access file combines both sequential file and direct access file
organization.
• In indexed sequential access file, records are stored randomly on a direct access
device such as magnetic disk by a primary key.
• These files have multiple keys. These keys can be alphanumeric in which the
records are ordered is called primary key.

UNIT - V - Internal Sorting 13


• The data can be access either sequentially or randomly using the index. The index
is stored in a file and read into memory when the file is opened.
Advantages of Indexed sequential access file organization

• In indexed sequential access file, sequential file and random file access is
possible.
• It accesses the records very fast if the index table is properly organized.
• The records can be inserted in the middle of the file.
• It provides quick access for sequential and direct processing.
• It reduces the degree of the sequential search.
Disadvantages of Indexed sequential access file organization

• Indexed sequential access file requires unique keys and periodic reorganization.
• Indexed sequential access file takes longer time to search the index for the data
access or retrieval.
• It requires more storage space.
• It is expensive because it requires special software.
• It is less efficient in the use of storage space as compared to other file
organizations.
================

Indexing

Indexing
• Attributes of Indexing
• Two types of file organization
• Sequential File Organization
• Dense Index

Indexing improves database performance by minimizing the number of disc visits


required to fulfil a query. It is a data structure technique used to locate and quickly access
data in databases. Several database fields are used to generate indexes. The main key or
candidate key of the table is duplicated in the first column, which is the Search key. To
speed up data retrieval, the values are also kept in sorted order. It should be highlighted
that sorting the data is not required. The second column is the Data Reference or Pointer

UNIT - V - Internal Sorting 14


which contains a set of pointers holding the address of the disk block where that particular
key value can be found.

Attributes of Indexing

• Access Types: This refers to the type of access such as value-based search, range
access, etc.

• Access Time: It refers to the time needed to find a particular data element or set of
elements.

• Insertion Time: It refers to the time taken to find the appropriate space and insert
new data.

• Deletion Time: Time taken to find an item and delete it as well as update the index
structure.

• Space Overhead: It refers to the additional space required by the index.

UNIT - V - Internal Sorting 15


Two types of file organization followed by the indexing methods to store the data are

• Sequential File Organization or Ordered Index File - In this, the indices are based on
a sorted ordering of the values. These are generally fast and a more traditional type
of storing mechanism. These Ordered or Sequential file organizations might store
the data in a dense or sparse format.

• Dense Index - For every search key value in the data file, there is an index record.
This record contains the search key and also a reference to the first data record with
that search key value.

Sparse Index

The index record appears only for a few items in the data file. Each item points to a block
as shown.

To locate a record, we find the index record with the largest search key value less than or
equal to the search key value we are looking for.

The user start at that record pointed to by the index record, and proceed along with the
pointers in the file that is, sequentially until we find the desired record.

Number of Accesses required=log₂ (n)+1, (here n=number of blocks acquired by index


file)

UNIT - V - Internal Sorting 16


========

Hash File Organization

Hash File Organization


• Types of Hash File Organization
• Clustered Indexing or Primary Indexing
• Non-clustered or Secondary Indexing
• Advantages of Indexing
• Disadvantages of Indexing
• Features of Indexing
• Summary

Indices are based on the values being distributed uniformly across a range of
buckets. The buckets to which a value is assigned are determined by a function called a
hash function. There are primarily three methods of indexing

▪ Clustered Indexing - When more than two records are stored in the same file this
type of storing is known as cluster indexing. By using cluster indexing we can
reduce the cost of searching reason being multiple records related to the same thing
are stored in one place and it also gives the frequent joining of more than two tables
(records).
The clustering index is defined on an ordered data file. The data file is ordered on a
non-key field. In some cases, the index is created on non-primary key columns
which may not be unique for each record. In such cases, in order to identify the
records faster, we will group two or more columns together to get the unique

UNIT - V - Internal Sorting 17


values and create an index out of them. This method is known as the clustering
index. Essentially, records with similar properties are grouped together, and
indexes for these groupings are formed.

▪ Students studying each semester, for example, are grouped together. First-semester
students, second-semester students, third-semester students, and so on are
categorized.

▪ Primary Indexing - This is a type of Clustered Indexing wherein the data is sorted
according to the search key and the primary key of the database table is used to
create the index. It is a default format of indexing where it induces sequential file
organization. As primary keys are unique and are stored in a sorted manner, the
performance of the searching operation is quite efficient.

▪ Non-clustered or Secondary Indexing - A non-clustered index where the data lies


us a list of virtual pointers or references to the location where the data is actually
stored. Data is not physically stored in the order of the index. Instead, data is
present in leaf nodes.

Example - Contents page of a book. Each entry gives us the page number or
location of the information stored. The actual data information on each page
of the book is not organized but have an ordered reference (contents page) to
where the data points actually lie. The user can have only dense ordering in
the non-clustered index as sparse ordering is not possible because data is not
physically organized accordingly. It requires more time as compared to the
clustered index because some amount of extra work is done in order to
extract the data by further following the pointer. In the case of a clustered
index, data is directly present in front of the index.

UNIT - V - Internal Sorting 18


Non-Clustered Indexing

Multilevel Indexing with the growth of the size of the database, indices also grow.
As the index is stored in the main memory, a single-level index might become too large a
size to store with multiple disk accesses. The multilevel indexing segregates the main
block into various smaller blocks so that the same can be stored in a single block. The outer
blocks are divided into inner blocks which in turn are pointed to the data blocks. This can
be easily stored in the main memory with fewer overheads.

UNIT - V - Internal Sorting 19


Advantages of Indexing

• Improved Query Performance - Indexing enables faster data retrieval from the
database. The database may rapidly discover rows that match a specific value or
collection of values by generating an index on a column, minimizing the amount of
time it takes to perform a query.

• Efficient Data Access - Indexing can enhance data access efficiency by lowering the
amount of disk I/O required to retrieve data. The database can maintain the data
pages for frequently visited columns in memory by generating an index on those
columns, decreasing the requirement to read from disk.

• Optimized Data Sorting - Indexing can also improve the performance of sorting
operations. By creating an index on the columns used for sorting, the database can
avoid sorting the entire table and instead sort only the relevant rows.

• Consistent Data Performance - Indexing can assist ensure that the database
performs consistently even as the amount of data in the database rises. Without
indexing, queries may take longer to run as the number of rows in the table grows,
while indexing maintains a roughly consistent speed.

• By ensuring that only unique values are inserted into columns that have been
indexed as unique, indexing can also be utilized to ensure the integrity of data. This
avoids storing duplicate data in the database, which might lead to issues when
performing queries or reports.

• Overall, indexing in databases provides significant benefits for improving query


performance, efficient data access, optimized data sorting, consistent data
performance, and enforced data integrity

Disadvantages of Indexing

• Indexing necessitates more storage space to hold the index data structure, which
might increase the total size of the database.

• Increased database maintenance overhead: Indexes must be maintained as data is


added, destroyed, or modified in the table, which might raise database maintenance
overhead.

• Indexing can reduce insert and update performance since the index data structure
must be updated each time data is modified.

UNIT - V - Internal Sorting 20


• Choosing an index can be difficult: It can be challenging to choose the right indexes
for a specific query or application and may call for a detailed examination of the
data and access patterns.

Features of Indexing

• The development of data structures, such as B-Trees or hash table, that provide
quick access to certain data items is known as indexing. The data structures
themselves are built on the values of the indexed columns, which are utilized to
quickly find the data objects.

• The most important columns for indexing columns are selected based on how
frequently they are used and the sorts of queries they are subjected to.
The cardinality, selectivity, and uniqueness of the indexing columns can be
considered.

• There are several different index types used by databases, including primary,
secondary, clustered, and non-clustered indexes. Based on the particular needs of
the database system, each form of index offers benefits and drawbacks.

• For the database system to function at its best, periodic index maintenance is
required. According to changes in the data and usage patterns, maintenance work
involves building, updating, and removing indexes.

• Database query optimization involves indexing, which is essential. The query


optimizer utilizes the indexes to choose the best execution strategy for a particular
query based on the cost of accessing the data and the selectivity of the indexing
columns.

• Databases make use of a range of indexing strategies, including covering indexes,


index-only scans, and partial indexes. These techniques maximize the utilization of
indexes for particular types of queries and data access.

• When non-contiguous data blocks are stored in an index, it can result in index
fragmentation, which makes the index less effective. Regular index maintenance,
such as defragmentation and reorganization, can decrease fragmentation.

Summary

• Indexing is a very useful technique that helps in optimizing the search time
in database queries.
• The table of database indexing consists of a search key and pointer.
• There are four types of indexing: Primary, Secondary Clustering and Multivalued
Indexing.

UNIT - V - Internal Sorting 21


• Primary indexing is divided into two types, dense and sparse.
▪ Dense indexing is used when the index table contains records for every
search key.
▪ Sparse indexing is used when the index table does not use a search key for
every record.
• Multilevel indexing uses B+ Tree and the purpose of indexing is to provide better
performance for data retrieval.

Clustered Index Non-Clustered Index


A clustered index is faster. A non-clustered index is slower.
The clustered index requires less A non-Clustered index requires more
memory for operations. memory for operations.
In a clustered index, the clustered In the Non-Clustered index, the index is the
index is the main data. copy of data.
A table can have only one clustered A table can have multiple non-clustered
index. indexes.
The clustered index has the A non-Clustered index does not have the
inherent ability to store data on the inherent ability to store data on the disk.
disk.
Clustered index store pointers to The non-clustered index stores both the
block not data. value and a pointer to the actual row that
holds the data
In Clustered index leaf nodes are In Non-Clustered index leaf nodes are not
actual data itself. the actual data itself rather they only contain
included columns.
In a Clustered index, Clustered key In a Non-Clustered index, the index key
defines the order of data within a defines the order of data within the index.
table.
A Clustered index is a type of index A Non-Clustered index is a special type of
in which table records are index in which the logical order of the index
physically reordered to match the does not match the physical stored order of
index. the rows on the disk.
The size of the primary clustered The size of the non-clustered index is
index is large. compared relatively. The composite is
smaller.
Primary Keys of the table by The composite key when used with unique
default are clustered indexes. constraints of the table act as the non-
clustered index

UNIT - V - Internal Sorting 22


Space
Time Complexity
Algorithm Complexity
Best Average Worst Worst
Selection Sort O(n2) O(n2) O(n2) O(1)
Bubble Sort O(n) O(n2) O(n2) O(1)
Insertion Sort O(n) O(n2) O(n2) O(1)
Heap Sort O(n log(n)) O(n log(n)) O(n log(n)) O(1)
Quick Sort O(n log(n)) O(n log(n)) O(n2) O(n)
Merge Sort O(n log(n)) O(n log(n)) O(n log(n)) O(n)
Bucket Sort O(n +k) O(n +k) O(n2) O(n)
Radix Sort O(nk) O(nk) O(nk) O(n + k)
Count Sort O(n +k) O(n +k) O(n +k) O(k)
Shell Sort O(n log(n)) O(n log(n)) O(n2) O(1)
Tim Sort O(n) O(n log(n)) O(n log (n)) O(n)
Tree Sort O(n log(n)) O(n log(n)) O(n2) O(n)
Cube Sort O(n) O(n log(n)) O(n log(n)) O(n)

UNIT - V - Internal Sorting 23


The efficiency of an algorithm depends on two parameters
• Space Complexity - Space Complexity is the total memory space required by the program
for its execution. Both are calculated as the function of input size(n). One important thing
here is that despite these parameters, the efficiency of an algorithm also depends upon
the size of the input.
• Time Complexity - Time Complexity is defined as the number of times a particular
instruction set is executed rather than the total time taken. It is because the total time
taken also depends on some external factors like the compiler used, the processor’s
speed.

Types of Time Complexity


• Best Time Complexity - Define the input for which the algorithm takes
less time or minimum time. In the best case calculate the lower bound of
an algorithm.
Example - In the linear search when search data is present at the
first location of large data then the best case occurs.
• Average Time Complexity - In the average case take all random inputs
and calculate the computation time for all inputs and then divide it by
the total number of inputs.
• Worst Time Complexity - Define the input for which algorithm takes a
long time or maximum time. In the worst calculate the upper bound of
an algorithm. Example: In the linear search when search data is present
at the last location of large data then the worst case occurs.

UNIT - V - Internal Sorting 24

You might also like