Data Structures Material
Data Structures Material
SYLLABUS
DATA STRUCTURES
UNIT I
Introduction: Introduction of Algorithms, Analyzing Algorithms. Arrays: Sparse Matrices
- Representation of Arrays. Stacks and Queues. Fundamentals - Evaluation of Expression Infix to
Postfix Conversion - Multiple Stacks and Queues
UNIT II
Linked List: Singly Linked List - Linked Stacks and Queues - Polynomial Addition -
More on Linked Lists - Sparse Matrices - Doubly Linked List and Dynamic - Storage
Management - Garbage Collection and Compaction.
UNIT III
Trees: Basic Terminology - Binary Trees - Binary Tree Representations - Binary Trees -
Traversal - More on Binary Trees - Threaded Binary Trees - Binary Tree Representation of Trees
- Council Binary Trees. Graphs: Terminology and Representations - Traversals, Connected
Components and Spanning Trees Shortest Paths and Transitive Closure
UNIT IV
External Sorting: Storage Devices -Sorting with Disks: K-Way Merging - Sorting with
Tapes. Symbol Tables: Static Tree Tables - Dynamic Tree Tables - Hash Tables: Hashing
Functions - Overflow Handling.
UNIT V
Internal Sorting: Insertion Sort - Quick Sort - 2 Way Merge Sort - Heap Sort - Shell Sort -
Sorting on Several Keys. Files: Files, Queries and Sequential organizations - Index Techniques -
File Organizations.
TEXT BOOKS
1. Ellis Horowitz, Sartaj Shani, Data and File Structures Galgotia Publication.
2. Ellis Horowitz, Sartaj Shani, Sanguthevar Rajasekaran, “Computer Algorithms
Galgotia Publication.
1
Material Prepare by : Mrs.D.KALPANA Assistant Professor of CS
Text Book(s)
1. Ellis Horowitz, Sartaj Shani, Data Structures, Galgotia Publication.
2. Ellis Horowitz, Sartaj Shani, Sanguthevar Rajasekaran, Computer Algorithms,
GalgotiaPublication.
3. S.Lovelyn Rose, R.Venkatesan, Data Structures, Wiley India Private
Limited,2015, 1st Edition
• Algorithm
• Use of the Algorithms
• Need for algorithms
• Characteristics of an Algorithm
• Properties of Algorithm
• Types of Algorithms
• Design an Algorithm
• Two Factors of Algorithm Complexity
• Advantages
• Disadvantages
Algorithm
Algorithm
• Use of the Algorithms
• Need for algorithms
• Characteristics of an Algorithm
• Properties of Algorithm
• Types of Algorithms
• Design an Algorithm
• Two Factors of Algorithm Complexity
• Space Complexity
• Time Complexity
• Advantages
• Disadvantages
UNIT 1 - ALGORITHM 2
• An Algorithm is a set of finite rules or instructions to be followed in
calculations or other problem-solving operations.
• A procedure for solving a mathematical problem in a finite number of steps
that frequently involves recursive operations.
• Algorithm refers to a sequence of finite steps to solve a particular problem.
• Algorithms can be simple and complex depending on the users needs.
UNIT 1 - ALGORITHM 3
Characteristics of an Algorithm
UNIT 1 - ALGORITHM 4
Example - lock of 4-digit PIN, traveling salesman problem, NP ( Non-
deterministic Polynomial Time ) Hard Problems
• Greedy Algorithm - The solution for the next part is built based on the
immediate benefit of the next part. The one solution that gives the most
benefit than the next part of the solution.
Examples - Kruskal’s algorithm and Prim's algorithm for finding
minimum spanning trees and the algorithm for finding optimum
Huffman trees.
• Dynamic Programming Algorithm - This algorithm uses the concept of
using the already found solution to avoid repetitive calculation of the same
part of the problem. It divides the problem into smaller overlapping
subproblems and solves them.
UNIT 1 - ALGORITHM 5
Examples - Google maps to find paths and search engines
• Randomized Algorithm - In the randomized algorithm, we use a random
number so it gives immediate benefit. The random number helps in deciding
the expected outcome. Used in critical areas like cryptography, network
algorithms, machine learning, and data analysis,
Examples - Throwing Two Dice
Design an Algorithm
• The problem that is to be solved clearly by the algorithm.
• The constraints of the problem must be considered while solving the problem.
• The input to be taken to solve the problem.
• The output is to be expected when the problem is solved.
• The solution to this problem is within the given constraints.
• Fixed Part: This refers to the space that is required by the algorithm.
Example - Input variables, output variables, program size,
• Variable Part: This refers to the space that can be different based on the
implementation of the algorithm.
Example - Temporary variables, dynamic memory allocation,
recursion stack space, etc.
UNIT 1 - ALGORITHM 6
2. Time Complexity - The time complexity of an algorithm refers to the amount of
time required by the algorithm to execute and get the result. This can be for normal
operations, conditional if-else statements, loop statements, etc.
T(P) = C + TP(I),
C is the constant time part
TP(I) is the variable part of the algorithm, which depends on the characteristic I
UNIT 1 - ALGORITHM 7
• The asymptotic analysis measures the efficiency of algorithms that do not
depend on machine-specific constants and don’t require algorithms to be
implemented.
• The time taken by programs to be compared.
• Asymptotic Notation and Analysis based on input size in Complexity Analysis
of Algorithms
• Asymptotic notations are mathematical tools to represent the time complexity
of algorithms for asymptotic analysis.
• Asymptotic Notations are mathematical tools used to analyze the performance
of algorithms by understanding how their efficiency changes as the input size
grows.
• These notations provide to express the behavior of an algorithm’s time or space
complexity as the input size approaches infinity.
• Comparing algorithms directly, asymptotic analysis focuses on understanding
the relative growth rates of algorithms’ complexities.
• Asymptotic analysis allows for the comparison of algorithms’ space and time
complexities by examining their performance characteristics as the input size
varies.
Big O (<=) worst case - Upper Big Omega (Ω) (>=) best Big Theta (Θ) (==) worst
Bound case - Lower Bound case - Tight Bound
The rate of growth of an rate of growth is greater meaning the rate of growth
algorithm is less than or equal than or equal to the value. is equal to the value.
to the value.
The upper bound of the The algorithm’s lower The bounding of function
algorithm is represented by bound is represented by from above and below is
Big O notation. Only the above Omega notation. The represented by theta
function is bounded by Big O. asymptotic lower bound is notation. The exact
Asymptotic upper bound is given by Omega notation. asymptotic behavior is done
given by Big O notation. by this theta notation.
It is defined as the upper It is defined as the lower It is defined as the tightest
bound and the upper bound bound and the lower bound bound so it is the average
on an algorithm is the most on an algorithm is the least case times that the algorithm
amount of time required so it amount of time required so can perform.
is the worst case performance. it is the best case.
Mathematically: Big Oh is 0 <= Mathematically: Big Omega Mathematically – Big Theta
f(n) <= Cg(n) for all n >= n0 is 0 <= Cg(n) <= f(n) for all n is 0 <= C2g(n) <= f(n) <=
>= n0 C1g(n) for n >= n0
UNIT 1 - ALGORITHM 8
===============
Array
Array
• Arrays in Data Structures
• Need an Array of Data Structures
• Types of Arrays
• Advantages of Arrays
• Disadvantages of Arrays
• An array is a linear data structure that collects elements of the same data type
and stores them in contiguous and adjacent memory locations.
• Arrays work on an index system starting from 0 to (n-1), where n is the size of
the array.
A class consists of ten students, and the class has to publish their results. If the
user had declared all ten variables individually, it would be challenging to manipulate
and maintain the data.
UNIT 1 - ALGORITHM 9
If more students were to join, it would become more difficult to declare all the
variables and keep track of it. To overcome this problem, arrays came into the picture.
Types of Arrays
Advantages of Arrays
• Arrays store multiple elements of the same type with the same name.
• You can randomly access elements in the array using an index number.
• Array memory is predefined, so there is no extra memory loss.
• Arrays avoid memory overflow.
• 2D arrays can efficiently represent the tabular data.
Disadvantages of Arrays
=======================
UNIT 1 - ALGORITHM 10
Sparse Matrix
Example 00304
00570
00000
02600
Arrays Representation
The two-dimensional array is used to represent a sparse matrix in which there
are three rows as
• Row - Index of the row, where the non-zero element is located
• Column - Index of the column, where the non-zero element is located
• Value: - Value of the non-zero element located at the index in row and
column.
UNIT 1 - ALGORITHM 11
• Time Complexity - O(NM), where N is the number of rows in the sparse
matrix and M is the number of columns in the sparse matrix.
• Auxiliary Space - O(NM), where N is the number of rows in the sparse
matrix and M is the number of columns in the sparse matrix.
Linked List Representation
Each node in the linked list has four fields. These four fields are
• Row: Index of the row, where the non-zero element is located
• Column: Index of the column, where the non-zero element is located
• Value: Value of the non-zero element located at the index in row and
column.
• Next node: Address of the next node
UNIT 1 - ALGORITHM 12
• Store only the nonzero elements of the matrix, together with their indices.
• Reduce computation time by eliminating operations on zero elements.
• The efficiencies in execution time for programs working with large amounts
of sparse data.
Disadvantages
• Not everything can be made into a sparse matrix for representation.
• Holds less information.
=================
Data structures are fundamental concepts for organizing and storing data
efficiently.
• Stack
• Queue
• Linear data structure - Both the stack and queue are the linear data structure,
which means that the elements are stored sequentially and accessed in a single
run.
UNIT 1 - ALGORITHM 13
• Flexible in size - Both the stack and queue are flexible in size, which means
they can grow and shrink according to the requirements at the run-time.
Stacks
• A stack is a linear data structure that follows the Last In First Out -LIFO
principle.
• This means that the last element added to the stack is the first one to be
removed.
Example - a pile of plates can only add or remove the top plate
Operations on Stack
• Push - Adds an element to the top of the stack.
• Pop - Removes and returns the top element of the stack.
• Peek or Top - Returns the top element of the stack without removing it.
• IsEmpty - Checks if the stack is empty.
• Size - Returns the number of elements in the stack.
In stack, the top is a pointer which is used to keep track of the last inserted element.
To implement the stack the user should know the size of the stack. The user needs to
allocate the memory to get the size of the stack.
Uses of Stack
• Function Call Management: The call stack in programming languages
keeps track of function calls and returns.
• Expression Evaluation: Used in parsing expressions and evaluating
postfix or prefix notations.
• Backtracking: Helps in algorithms that require exploring all possibilities,
such as maze solving and depth-first search.
UNIT 1 - ALGORITHM 14
Applications of Stacks
• Function calls - Stacks are used to keep track of the return addresses of
function calls, allowing the program to return to the correct location after
a function has finished executing.
• Recursion - Stacks are used to store the local variables and return
addresses of recursive function calls, allowing the program to keep track
of the current state of the recursion.
• Expression evaluation - Stacks are used to evaluate expressions in
postfix notation (Reverse Polish Notation).
• Syntax parsing - Stacks are used to check the validity of syntax in
programming languages and other formal languages.
• Memory management - Stacks are used to allocate and manage memory
in some operating systems and programming languages.
Advantages of Stacks
• Simplicity - Stacks are a simple and easy-to-understand data structure,
making them suitable for a wide range of applications.
• Efficiency - Push and pop operations on a stack can be performed in
constant time (O(1)), providing efficient access to data.
• Last-in, First-out (LIFO) - Stacks follow the LIFO principle, ensuring that
the last element added to the stack is the first one removed. This behavior
is useful in many scenarios, such as function calls and expression
evaluation.
• Limited memory usage - Stacks only need to store the elements that have
been pushed onto them, making them memory-efficient compared to other
data structures.
Disadvantages of Stacks
• Limited access - Elements in a stack can only be accessed from the top,
making it difficult to retrieve or modify elements in the middle of the
stack.
• Overflow - If more elements are pushed onto a stack than it can hold, an
overflow error will occur, resulting in a loss of data.
• Not suitable for random access - Stacks do not allow for random access
to elements, making them unsuitable for applications where elements
need to be accessed in a specific order.
• Limited capacity - Stacks have a fixed capacity, which can be a limitation
if the number of elements that need to be stored is unknown or highly
variable.
Queue
UNIT 1 - ALGORITHM 15
• In Queue, the technical words for insertion and deletion
are enqueue() and dequeue().
Operations on Queue
• Enqueue - Adds an element to the end (rear) of the queue.
• Dequeue - Removes and returns the front element of the queue.
• Front or Peek - Returns the front element of the queue without removing
it.
• IsEmpty - Checks if the queue is empty.
• Size - Returns the number of elements in the queue.
Uses of Queue
• Task Scheduling - Operating systems use queues to manage tasks and
processes.
• Breadth-First Search (BFS) - In graph traversal algorithms, queues help
in exploring nodes level by level.
• Buffering - Used in situations where data is transferred asynchronously,
such as IO buffers and print spooling.
Applications of Queue
• Multi programming - Multi programming means when multiple
programs are running in the main memory. It is essential to organize these
multiple programs and these multiple programs are organized as queues.
• Network - In a network, a queue is used in devices such as a router or a
switch. another application of a queue is a mail queue which is a directory
that stores data and controls files for mail messages.
• Job Scheduling - The computer has a task to execute a particular number
of jobs that are scheduled to be executed one after another. These jobs are
assigned to the processor one by one which is organized using a queue.
• Shared resources - Queues are used as waiting lists for a single shared
resource.
Advantages of Queue
• A large amount of data can be managed efficiently with ease.
• Operations such as insertion and deletion can be performed with ease as
they follow the first in first out rule.
• Queues are useful when a particular service is used by multiple
consumers.
UNIT 1 - ALGORITHM 16
• Queues are fast in speed for data inter-process communication.
• Queues can be used in the implementation of other data structures.
Disadvantages of Queue
• The operations such as insertion and deletion of elements from the
middle are time-consuming.
• In a classical queue, a new element can only be inserted when the
existing elements are deleted from the queue.
• Searching an element takes O(N) time.
• Maximum size of a queue must be defined prior in case of array
implementation.
Stack Queue
A linear data structure that follows A linear data structure that follows
the Last In First Out (LIFO) principle. the First In First Out (FIFO) principle.
Enqueue (add an item), Dequeue (remove
Push (add an item), Pop (remove an
an item), Front (view the first item), Rear
item), Peek (view the top item)
(view the last item)
Elements are added and removed from Elements are added at the rear and
the same end (the top). removed from the front.
Function call management (call stack),
Scheduling processes in operating
expression evaluation and syntax
systems, managing requests in a printer
parsing, undo mechanisms in text
queue, breadth-first search in graphs.
editors.
Browser history (back button), Customer service lines, CPU task
reversing a word. scheduling.
A stack of plates: you add and remove A queue at a ticket counter: the first person
plates from the top. in line is the first to be served.
Enqueue: O(1)
Push: O(1)
Dequeue: O(1)
Pop: O(1)
Front: O(1)
Peek: O(1)
Rear: O(1)
Typically uses a contiguous block of Typically uses a circular buffer or linked
memory or linked list. list.
Can be implemented using arrays or Can be implemented using arrays, linked
linked lists. lists, or circular buffers.
==================
UNIT 1 - ALGORITHM 17
Evaluation of expressions
Evaluation of expressions
• Three different types of arithmetic
expression
• Infix Notation
• Prefix Notation
• Postfix Notation
• Parsing Expressions
• Precedence
• Associativity
• Infix Notation
• Prefix (Polish) Notation
• Postfix (Reverse-Polish) Notation
Infix Notation
• The expression in infix notation like a - b + c, where operators are used in-
between operands.
• It is easy for us humans to read, write, and speak in infix notation but the same
does not go well with computing devices.
• An algorithm to process infix notation could be difficult and costly in terms of
time and space consumption.
Prefix Notation
UNIT 1 - ALGORITHM 18
Postfix Notation
Parsing Expressions
To parse any arithmetic expression the user needs operator precedence and
associativity.
Precedence
Example
Associativity
Associativity describes the rule where operators with the same precedence
appear in an expression.
UNIT 1 - ALGORITHM 19
Example - In expression a + b − c, both + and − have the same precedence, then
which part of the expression will be evaluated first, is determined by associativity of
those operators. Here, both + and − are left-associative, so the expression will be
evaluated as (a + b) − c.
Example − In a + b*c, the expression part b*c will be evaluated first, with
multiplication as precedence over addition. The user here uses parenthesis for a + b to
be evaluated first, like (a + b)*c.
==========================
Operands - Variables – a, b, x, y, p, q, k
Operators – Symbols - + , -, *, %
UNIT 1 - ALGORITHM 20
Conversion of Infix to Postfix
A + B / C * (D + E) – F
Result
Stack –
( Postfix –
Only
Symbol only Variables Remarks
Operators
and the pop of
or Symbols
the operands )
( (
A ( A
+ (+ A
B (+ AB
/ (+/ AB
C (+/ ABC
At the top of the stack, there is division.
* (+* ABC/ Multiplication and Division have the same
priority so pop the division – Rule 2
( (+*( ABC/
D (+*( ABC/D
+ (+*(+ ABC/D
E (+*(+ ABC/DE
) (+*(+) ABC/DE+ If the operator is closed within the
parenthesis, then need to pop the addition
operator – Rule 4
- (- ABC/DE+*+ Multiplication is the highest priority and
Minus is the lowest priority so the lowest
priority cannot be placed before the
highest priority. Then pop * Rule – 3.
Then + and – have the same priority so
pop the + - Rule 2
F (-) ABC/DE+*+F If the operator is closed within the
parenthesis, then need to pop the Minus
operator – Rule 4
ABC/DE+*+F-
UNIT 1 - ALGORITHM 21
Conversion of Infix to Prefix
A*B–C/D+E
Expression Remarks
(A*B)-(C/D)+E Place the parenthesis for each Expression. Based on the
higher priority parenthesis given.
Multiplication and Division has the highest priority
((*AB) – (/CD))+E Apply the prefix in the parenthesis that is operator
must place before the operands.
+(*AB) – (/CD)E Apply the prefix in the parenthesis that is operator
must place before the operands.
+– (*AB) (/CD)E Apply the prefix in the parenthesis that is operator
must place before the operands.
+– *AB/CDE Remove the parenthesis
(A+B) / C * D - E
Expression Remarks
(+AB) / C * D - E Consider the parenthesis for the Expression.
((+AB) / C) * D - E Apply the prefix in the parenthesis that is operator
must place before the operands.
((/ +ABC) * D ) - E Apply the prefix in the parenthesis that is operator
must place before the operands.
(* / +ABCD ) - E Apply the prefix in the parenthesis that is operator
must place before the operands.
- * / +ABCDE Remove the parenthesis
AB – DE +F */
Reading of Postfix Stack Top Remarks
A A
B B
- (A – B) Consider the Operators as the
Expression 1 (A – B) as the top
of the stack
D D
E E
+ D+E Consider the Operators as the
Expression 2
F F
* (D + E) * F Consider the Operators as the
Expression 3
/ ((A – B) / ( D + E)) * F Consider the Operators as the
Expression 4
UNIT 1 - ALGORITHM 22
Conversion of Prefix to Infix
+-*AB / CDE
Step 1 – First make it as Reverse the Prefix Expressions – EDC / BA * - +
ABC / - AK /L -*
Reading of Stack Top Remarks
Postfix
A
B
C
/ /BC Consider the Operators as the
Expression 1
- -A/BC Consider the Operators as the
Expression 2
A
K
/ -A/BC /AK Consider the Operators as the
Expression 3
L - A/BC /AKL
- -A/BC -/AKL Consider the Operators as the
Expression 5
* * - A/BC -/AKL
UNIT 1 - ALGORITHM 23
Unit: II LINKEDLIST 8 hours
Linked List: Singly Linked List – Linked Stack and Queue - Polynomial Addition –
More on Linked List – Sparse Matrix – Doubly Linked List - Dynamic Storage
Management – Garbage Collection and Compaction
Linked List
Polynomial Addition
Doubly Linked List
Dynamic Storage Management
Garbage Collection and Compaction
Linked List
Linked List
Two Fields
Uses of Linked List
Types of Linked List
Singly Linked Lists
Structure of Singly Linked List
Creation and Traversal of Singly Linked List
Doubly Linked Lists
Structure of Doubly Linked List
Advantage
Creation and Traversal of Doubly Linked List
Circular Linked Lists
Structure of Circular Linked List
Advantage
Creation and Traversal of Circular Linked List.
Two Types
Applications
Two ways to create a circular linked list
Two ways to traverse a circular linked list
Circular Doubly Linked List
Structure of Doubly Circular Linked List
Creation and Traversal of Doubly Circular Linked List
Operations in Linked List
Insertion Operation Linked List
Three Ways
Deletion Operation Linked List
Three Ways
Reverse Operation Linked List
o A linked list starts with a head node which points to the first node.
o Every node consists of data which holds the actual data value associated with the
node and a next pointer which holds the memory address of the next node in the
linked list.
o The last node is called the tail node in the list which points to null indicating the
end of the list.
Two Fields
Advantages
Dynamic data structure - A linked list is a dynamic arrangement at runtime by
allocating and deallocating memory.
No memory wastage - In the Linked list, efficient memory utilization can be
achieved since the size of the linked list increase or decrease at run time so there is
no memory wastage and there is no need to pre-allocate the memory.
Implementation - Linear data structures like stacks and queues are often easily
implemented using a linked list.
Insertion and Deletion Operations - Insertion and deletion operations are easier.
There is no need to shift elements after the insertion or deletion of an element only
the address present in the next pointer needs to be updated.
Flexible - The elements in Linked List are not stored in contiguous memory
locations.
Efficient for large data - When working with large datasets and can grow and
shrink dynamically.
Scalability - Contains the ability to add or remove elements at any position.
Disadvantages
Memory usage - More memory is required a pointer to store the address of the
next element.
Traversal -It is more time-consuming as compared to an array.
Reverse Traversing - Not possible in a singly linked list and in doubly-linked list
can possible with a pointer to the previously connected nodes with each node.
There is wastage of memory.
Random Access - Random access is not possible due to dynamic memory
allocation.
Lower efficiency at times - Searching for an element can be slower in a linked list.
Complex implementation - More complex to array for a complex programming.
Singly linked lists contain two buckets in one node. They are
Traversals can be done in one direction only as there is only a single link between
two nodes of the same list.
It is a unidirectional linked list and can traverse it in one direction from head to tail
node.
Doubly Linked Lists contain three buckets in one node. They are
A doubly linked list is a bi-directional linked list. The user can traverse it in both
directions. Unlike singly linked lists, its nodes contain one extra pointer called the
previous pointer. This pointer points to the previous node.
A doubly linked list of singly linked lists is a data structure that consists of a set of
singly linked lists for each of which is doubly linked. It is used to store data in a way that
allows for fast insertion and deletion of elements.
The data structures allows for quick insertion and deletion of elements.
It is easy to implement and can be used in a variety of applications.
A doubly linked list is a type of data structure that allows for the storage of data in
a linear like a singly linked list. A doubly linked list allows for both forward and
backward traversal of the data stored within it. This makes it an ideal data structure for
applications that require the ability to move both forward and backward through a list of
data.
To add data to our list, we can write a function to traverse our list. To do this, the
user needs to create a variable that will keep track of the current node that is traversing.
The user can set this variable to the head node of our list to start. Then, the user need to
create a while loop that will continue until the current node is None, which indicates that
have reached the end of our list.
Circular linked lists can exist in both singly linked list and doubly linked list. Since
the last node and the first node of the circular linked list are connected, the traversal in
this linked list will go on forever until it is broken.
A circular linked list is a unidirectional linked list. The user can traverse it in only
one direction. But this type of linked list has its last node pointing to the head node. So
while traversing, need to be careful and stop traversing when you revisit the head node.
Circular linked lists are implemented using a singly linked list data structure. This
means that each node in the list is connected to the next node via a pointer. The last node
in the list is then connected back to the first node, creating the ring-like structure.
Accessing data in a circular linked list is to accessing data in a linked list. There is
no defined beginning or end to the list. It can be difficult to know where to start
traversing the list. As a result, many implementations of circular linked lists use a head
pointer that points to the first node in the list.
Advantages
There is no defined beginning or end to the list, data can be added and removed
from the list at any time. This makes circular linked lists ideal for applications
where data needs to be constantly added or removed, such as in a real-time
application.
The data is stored in a ring structure. It can be accessed in a continuous loop. This
makes circular linked lists ideal for applications where data needs to be processed
in a continuous loop such as in a real-time application or simulation.
There is no defined beginning or end to the list, circular linked lists are more
efficient when it comes to memory usage and memory for pointers that point to
the beginning and end of the list.
Circular linked lists require a single pointer to be stored in memory is the head
pointer.
Circular linked lists are easier to implement the use of additional data structures,
such as stacks and queues, to keep track of the list's beginning and end
Creation and Traversal of Circular Linked List
A circular linked list is a type of data structure that can store a collection of items.
It is related to both the singly linked list and the doubly linked list. Unlike a singly linked
list, which has a NULL pointer at the end of the list, a circular linked list has a pointer
that points back to the first node in the list. This makes it possible to traverse the entire
list without having to keep track of the end of the list.
Two types of circular linked lists
Singly linked - Each node has a pointer that points to the next node in the list. The
last node in the list points back to the first node.
Doubly linked - Each node has pointers that point to both the next node and the
previous node.
Adding a new node in linked list is a more than one step activity. A node has the
same structure and finds the location where to be inserted.
The insertion into a singly linked list can be performed at different positions.
Based on the position of the new node being inserted, the insertion is categorized into the
following categories.
SN Operation Description
It involves inserting any element at the front of the list. The
Insertion at
1 user need make the new node as the head of the list. Adding an
beginning
element at the beginning of the list.
It involves insertion at the last of the linked list. The new node
Insertion at end of
2 can be inserted as the only node in the list and inserted at the
the list
last one. Adding an element at the ending of the list.
It involves insertion after the specified node of the linked list.
Insertion after The user uses the desired number of nodes to reach the node
3
specified node after which the new node will be inserted. Adding an element
at any position within the list
Deletion is a more than one step process with pictorial representation. First, locate
the target node to be removed, by using searching algorithms.
This will remove the link that pointing to the target node by using the remove of
the target node is pointing at.
The user needs to use the deleted node. The user can keep in memory and
deallocate memory and wipe off the target node completely.
The node is inserted at the beginning of the list. While inserting it at the end, the
second last node of the list should point to the new node and the new node will point to
NULL.
Operation Description
It involves deletion of a node from the beginning of the list. Deleting
Deletion at
an element from the beginning of the list. For this the head to the
beginning
second node.
Deletion at the It involves deleting the last node of the list. The list can either be
end of the list empty or full. Deleting an element from the ending of the list.
Deletion after It involves deleting the node after the specified node in the list.
specified node Deleting an element at any position of the list.
In traversing, visit each node of the list at least once in order to
Traversing perform some specific operation on it.
Example - printing data part of each node present in the list.
In searching, the user match each element of the list with the given
Searching element. If the element is found on any of the location then location
of that element is returned otherwise null is returned. .
The user needs to make the last node to be pointed by the head node and reverse
the whole linked list. This operation is a thorough one.
First, traverse to the end of the list. It should be pointing to NULL. The user shall
make it point to its previous node.
The user has to make sure that the last node is not the last node. The head node
pointing to the last node to make all left side nodes point to their previous nodes one by
one.
The head node point to the new first node by using the temp node
==========
Linked list allocates the memory dynamically and time complexity for all the
operations are push, pop and peek.
In linked list of stack the nodes are maintained non-contiguously in the memory.
Each node contains a pointer to its immediate successor node in the stack. Stack is said to
be overflown if there is no space in the memory to create a node.
Stack Operations
Stack Operations Time Space
Stack Operations
Description Complexity Complexity
Insert a new element into
Push () O(1) O(1)
the beginning of the list.
Return the top element of
Pop () the Stack. Deletes the first o(n) o(n)
element from the list.
Returns true if the stack is
isEmpty O(1) O(1)
empty, false otherwise.
Peep () or Peek Return the top element. O(1) O(1)
Deleting a node from the top of stack is as pop operation. Deleting a node
from the linked list of stack and an element from the stack to pop are
Check for the underflow condition - The underflow condition occurs to
pop from an already empty stack. The stack is empty if the head pointer of
the list points to null.
Adjust the head pointer - In stack, the elements are popped only from one
end then the value stored in the head pointer must be deleted and the node
must be freed. The next node of the head node becomes the head node.
Peek Operation
Check if there is node present or not, if not then return.
Otherwise return the value of top node of the linked list
Display Operation - Traversing the nodes
Displaying all the nodes of a stack needs traversing all the nodes of the
linked list organized in the form of stack are
Copy the head pointer into a temporary pointer.
Move the temporary pointer through all the nodes of the list and print the
value field attached to every node.
Applications of Stacks
Function calls - Stacks are used to keep track of the return addresses of function
calls for execution.
Recursion - Stacks are used to store the local variables and return addresses of
recursive function calls.
Expression evaluation - Stacks are used to evaluate expressions in postfix
notation are called as Reverse Polish Notation.
Syntax parsing - Stacks are used to check the validity of syntax in programming
languages.
Memory management - Stacks are used to allocate and manage memory in
operating systems and programming languages.
Advantages of Stacks
Simplicity - Stacks are a simple and easy-to-understand for a wide range of
applications.
Efficiency - Push and pop operations on a stack can be performed in constant
time (O(1)) and providing efficient access to data.
Last-in First-out (LIFO) - The last element added to the stack is the first one
removed for function calls and expression evaluation.
Limited memory usage - Stacks need to store the elements that pushed them for
memory-efficient.
Disadvantages of Stacks
Limited access - Elements in a stack can only be accessed from the top and
difficult to retrieve or modify elements in the middle of the stack.
The array implementation cannot be used for the large applications where the
queues are implemented. One of the alternatives of array implementation is linked list of
queue.
Linked List
A linked list is a node-based linear data structure used to store data elements. Each
node in a linked list is made up of two key are
data - that store the data
next - the address of the next none in the linked list.
Queue
A queue is a linear data structure with both ends open for operations and First In
First Out (FIFO)
The first element in a queue is enqueue has to be the first element.
The last element out of a queue is enqueue.
Each element of the queue points to its immediate next element in the memory.
Front pointer
The front points to the first item of the queue
The address of the starting element of the queue.
Insertion is performed at rear. By adding an element to the end of the
queue.
The new element will be the last element of the queue
enQueue() - This operation adds a new node after the rear and moves the
rear to the next node.
Both front and rear are NULL indicates that the queue is empty.
Insert Operation
The insert operation is used to insert an element into a queue. The new element is
added as the last element of the queue.
Example
Delete Operation
The delete operation is used to delete the element that is first inserted in a queue,
i.e., the element whose address is stored in FRONT. Before deleting the value, we must
first check if FRONT=NULL because if the queue is empty and no more deletions can be
done. To delete a value from a queue that is empty is an underflow.
Application of Queue
Operating Systems - Queues are used to execute by the CPU by scheduling
algorithms within the operating system.
Data Traffic Management - Devices like Routers and switches use queues to
manage data buffers for data packet transmission.
Web Servers - Web servers use a queue data structure to manage and optimize
incoming requests. All requests to the server is added to the queue and are
processed by the FIFO.
Printers - Printing multiple documents use queues to manage the workload and
process each printing task on the queue.
Banking Systems - Banking transactions like RTGS are managed using queues and
processed at a time.
Call Center Systems - The queue data structure manages customer connect
requests to the support and works on a FIFO.
Television Broadcast - Television stations manage the broadcast queue that
ensuring programs or advertisements are played in the intended sequence.
Advantages of Queue
A large amount of data can be managed efficiently with ease.
Operations such as insertion and deletion can be performed with FIFO.
Queues are useful when a particular service is used by multiple operations.
Queues are fast in speed for data inter-process communication.
Disadvantages of Queue
The operations such as insertion and deletion of elements from the middle are
time consuming.
A new element can only be inserted when the existing elements are deleted from
the queue.
Searching an element takes O(N) time.
Maximum size of a queue must be defined prior in array.
==============
Polynomial Addition
Applications
Problem Statement
Pseudocode
Time and Space Complexities
Polynomial addition used to represent the Linked lists in the data structures.
Polynomials do not exist to develop algorithms which make it possible to address real
world situations. It is combining two or more polynomials are sequenced in an organized
way. Polynomials are method is to record the coefficients of each phrase and their related
exponents in arrays or linked lists.
Applications
Scientific Computing
Computer Graphics
Signal Processing
Cryptography.
Problem Statement
A linked list is being used to represent two polynomial expressions. Create a
function to combine these lists and print the result on the screen.
Example
Input:
1st number = 5x2 + 4x1 + 2x0
2nd number = -5x1 - 5x0
Output - 5x2-1x1-3x0
The node structure contains 3 things that are column number, a non-zero element,
and a pointer to the next node (if any is otherwise null). We can define the structure of
this node in C language as,
Here 0th row has a node that contains column number 2. The non-zero element 5
and a pointer to the next node is null ‘/’. Add all the non-zero elements to their row
number in the array through the linked list.
The above diagram represents all the non-zero elements of the given matrix in the
array through the linked list.
=================
Memory Allocation
Memory Allocation
Two types of memory allocations
Dynamic Storage Allocation
Use of Dynamic Memory Allocation
Advantages
Disadvantages
Garbage collection
Three Approaches
Mark and Sweep Algorithm
Advantages
Disadvantages
Reference Counting Algorithm
Advantages
Disadvantages
Copy Collecting Algorithm
Advantages
Disadvantages
Unit II Liked List Page 25
Garbage collection is a dynamic approach to automatic memory management and
heap allocation that processes and identifies dead memory blocks and reallocates storage
for reuse. The primary purpose of garbage collection is to reduce memory leaks.
Three approaches
Mark-and-sweep – When memory runs out the GC locates all accessible memory and
then regains available memory.
Reference counting –When the memory count is zero the object is garbage and then
destroyed. The free memory returns to the memory heap.
Copy collection – There are two memory partitions
First partition is full - GC locates all accessible data structures.
Second partition - compacting GC process and allowing continuous free memory
Three Approaches
Mark and Sweep Algorithm
Reference Counting Algorithm
Copy Collecting Algorithm
Example - objects in memory at the time the garbage collector takes over
1 6 e 3
2 0 i 4
3 0 d 5
4 0 g 0
5 0 a 0
6 8 b 7
7 10 k 0
8 0 c 0
9 0 j 10
10 0 f 0
11 0 h 12
12 11 l 0
Here, 0 are null pointers while the other pointers are memory addresses.
The free list is 0 as empty and the roots are 1, 6, and 9. After the mark-and-sweep
garbage collection the memory layout becomes
1 6 e 3
2 0
3 0 d 5
4 2
5 0 a 0
6 8 b 7
7 10 k 0
8 0 c 0
9 0 j 10
10 0 f 0
11 4
12 11
White blocks denote the free memory and the grey blocks denote the memory
taken by all the reachable objects.
Now the free segments which are denoted by white color are of varying sizes
for 5 free segments are of size 1, 1, 2, 3, 5 are size in units.
To create an object which takes 10 units of memory that memory can be
allocated only in the contiguous form of blocks, the creation of an object is not
possible although we have an available memory space of 12 units and it will
cause Out of Memory error.
This problem is termed Fragmentation. Fragmentation refers to an unwanted
problem in which a process is unloaded and loaded from memory and the free memory
space gets fragmented.
The user can reduce the fragmentation by compaction. The user reduces the loss of
memory.
Example - A contiguous block of free memory of size 12 units and allocate memory
to an object of size 10 units.
Reference Counting
Reference counting is based on the idea that an object is alive as long as there is at
least one reference to it. Whenever a new reference to an object is created, the reference
count of that object is incremented. Whenever a reference to an object is removed, the
reference count of that object is decremented. When the reference count of an object
reaches zero, it means that the object is no longer accessible and can be freed from
memory. Reference counting can be implemented in various ways, such as using smart
pointers, weak references, or explicit reference count manipulation.
Advantages of Reference Counting
Example - memory looks the colored boxes represent different objects and the
black box in the middle represents the half-way point in memory.
Filled up half of memory and initiate a collection. Old space is on the left and new
space on the right. Suppose further that only the red and light-blue boxes (objects 2 and 4)
are reachable from the stack. After copying and compacting are
The unreachable data in the first half and "throw away" the first half of memory
Obj 2 Obj 4
After copying the data into new space then restart the computation where it left
off. The computation continues allocating objects, but this time allocates them in the other
half of memory is the new space.
Obj 2 Obj 4 Obj 6 Obj 7 Obj 8
The new space fills up and we are ready to do another collection. The other half of
memory and compact them, throwing away the old data
Obj 4 Obj 6 Obj 8
=====================
Difference between Garbage collection and Compaction
Fragmentation Compaction
It is due to the creation of holes It is to manage the hole.
(small chunks of memory space).
This creates a loss of memory. This reduces the loss of memory.
Sometimes, the process cannot be Memory is optimized to accommodate the
accommodated. process.
External Fragmentation may cause It solves the issue of External
major problems. Fragmentation.
It depends on the amount of It depends on the number of blocks
memory storage and the average occupied and the number of holes left.
process size.
It occurs in contiguous and non- It works only if the relocation is dynamic.
contiguous allocation.
============
Binary Trees
Binary Tree Traversal
Threaded Binary Tree
Binary Tree Representations
Counting Binary Trees
Graphs
Graph Traversals
Connected Components
Spanning Trees
Shortest Paths
Transitive Closure
Binary Trees
Binary Trees
Components of a Binary Tree
Benefits of two child nodes
Binary Trees , Array and Linked Lists
Five Types of Binary Trees
Full Binary Tree
Properties of Full Binary Tree
Complete Binary Tree
Properties of Full Binary Tree
Perfect Binary Tree
Balanced Binary Tree
Degenerate Binary Tree
Binary Tree Implementation
Application of Binary Trees
Advantages of Binary Tree
Disadvantages of Binary Tree
Examples
Example
Binary Tree –
RABCDEFG
Root Node –R
A’s Left Child – C
A’s Right Child - D
B’s Subtree – BEFG
Child Node -
ACDBEFG
The left child node is the child node to the left.
Parent / Internal Node -
The right child node is the child node to the right.
The tree height is the maximum number of RABF edges from the root node to a leaf
node.
A full binary tree is a tree structure. A binary tree node can have
either two children or no child. A full Binary Tree is a kind of tree where each
node has either 0 or 2 child nodes.
The full binary tree is also known as a strict binary tree. The tree can only
be considered as the full binary tree if each node must contain either 0 or 2
children. The full binary tree can also be defined as the tree in which each node
must contain 2 children except the leaf nodes.
The number of leaf nodes is equal to the number of internal nodes plus 1.
In the above example, the number of internal nodes is 5 and the number
of leaf nodes is equal to 6.
The maximum number of nodes is the same as the number of nodes in
the binary tree, i.e., 2h+1 -1.
The minimum number of nodes in the full binary tree is 2*h-1.
The minimum height of the full binary tree is log2(n+1) - 1.
The maximum height of the full binary tree can be computed as:
n= 2*h - 1
n+1 = 2*h
h = n+1/2
Complete Binary Tree
A complete binary tree is another specific binary tree where each node on
all levels except the last level has two children. And at the lowest level, all
leaves should reside possibly on the left side. A complete Binary Tree has all
levels full of nodes, except the last level, which is can also be full, or filled from
left to right. The properties of a complete Binary Tree mean it is also balanced.
A binary tree is said to be perfect if every node must have two children
and every leaf is present on the same level. A perfect Binary Tree has all leaf
nodes on the same level, which means that all levels are full of nodes, and all
internal nodes have two child nodes. The properties of a perfect Binary Tree
means it is also full, balanced, and complete. A tree is a perfect binary tree if all
the internal nodes have 2 children, and all the leaf nodes are at the same level.
The balanced binary tree is a tree in which both the left and right trees
differ by at most 1. For example, AVL and Red-Black trees are balanced binary
tree.
Balance factor = height (left sub tree) – height (right sub tree)
It balances a binary tree for each node if its balance factor is either -1, 0 or
1. The height of the left sub tree and that of the right tree can vary by at most
one.
The Binary Tree can be implemented in a Singly Linked List. It is used to create
a structure where each node can be linked to both its left and right child nodes.
A Tree by visiting every node, one node at a time, is called traversal. A Tree can
branch out in different directions is the non-linear are different ways of traversing
Trees.
Pre-order
In-order
Post-order
Pre-order Traversal is done by visiting the root node first and then recursively
do a pre-order traversal of the left sub tree, followed by a recursive pre-order traversal
of the right sub tree. It's used for creating a copy of the tree, prefix notation of an
expression tree.
In-order Traversal is a type of Depth First Search, where each node is visited in
a certain order. The traversal in order is that the node is visited in between the
recursive function calls. The node is visited after the In-order Traversal of the left sub
tree, and before the In-order Traversal of the right sub tree.
Post-order Traversal
Post-order Traversal is a type of Depth First Search, where each node is visited
in a certain order. The traversal post is that visiting a node is done "after" the left and
right child nodes are called recursively.
===============
In order to effectively manage the space, a method was devised by Perlis and
Thornton in which the NULL links are replaced with special links known as threads.
Such binary trees with threads are known as threaded binary trees.
Each node in a threaded binary tree either contains a link to its child node or
thread to other nodes in the tree.
In one-way threaded binary trees, a thread will appear either in the right or left
link field of a node. If it appears in the right link field of a node then it will point to the
next node that will appear on performing in order traversal. Such trees are called Right
threaded binary trees.
If thread appears in the left field of a node then it will point to the nodes in-
order predecessor. Such trees are called Left threaded binary trees.
The in-order traversal of this binary tree yields D, B, E, A, C, F. When this tree is
represented as a right threaded binary tree, the right link field of leaf node D which
contains a NULL value is replaced with a thread that points to node B which is the in-
order successor of a node D. In the same way other nodes containing values in the
right link field will contain NULL value.
In two-way threaded Binary trees, the right link field of a node containing
NULL values is replaced by a thread that points to nodes in-order successor and left
field of a node containing NULL values is replaced by a thread that points to nodes in-
order predecessor.
The Binary Tree can be implemented in a Singly Linked List. It is used to create
a structure where each node can be linked to both its left and right child nodes.
Two methods
Array Representation
Linked List Representation
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10 5 16 - 8 15 20 - - - - - - - 23
The index 1 is holding the root, it has two children 5 and 16, they are placed at
location 2 and 3. Some children are missing so their place is left as blank. A binary tree
represents of depth 'n' using array representation in one dimensional array with a maximum
size of 2n + 1.
================
Graph
Graph Properties
Graph Representations
Two vertices
Adjacency Matrix
Adjacency List
Operations on Graphs
Graph Terminolog
Applications of Graph
Advantages of Graph
Disadvantages of Graph
Graph Properties
The properties of the graph are as follows
A weighted Graph is a Graph where the edges have values. The weight value of
an edge can represent things like distance, capacity, time or probability.
A connected Graph is when all the vertices are connected through edges
somehow. A Graph that is not connected is a Graph with isolated or disjoint sub
graphs or single isolated vertices.
A loop is called as self-loop is an edge that begins and ends on the same vertex.
A loop is a cycle that only consists of one edge.
Graph Representation
A Graph representation is stored in memory and different Graph representations
are
Take up more or less space.
Be faster or slower to search or manipulate.
Be better suited depending on what type of Graph like weighted, directed.
Be easier to understand and implement than others.
Adjacency Matrix is the representation for Graphs moving forward.
Graph representations store information about which vertices are adjacent and the
Two vertices
The adjacency matrix above represents an undirected Graph, so the values '1'
where the edges are the values in the adjacency matrix are symmetrical because the
edges go both ways undirected Graph.
To create a directed Graph with an adjacency matrix which vertices the edges go
from and to, by inserting the value at the correct indexes (i, j). To represent a weighted
Graph can put other values than '1' inside the adjacency matrix.
In the adjacency matrix the value 3 on index (0,1) tells us there is an edge from
vertex A to vertex B, and the weight for that edge is 3.
The weights are placed directly into the adjacency matrix for the correct edge,
Adjacency List
An Adjacency List has an array that contains all the vertices in the Graph and
each vertex has a Linked List or Array with the vertex edges.
A sparse Graph with many vertices and can save space by using an Adjacency
List compared to using an Adjacency Matrix, because an Adjacency Matrix would
reserve a lot of memory on empty Array elements for edges that don't exist.
A sparse Graph is a Graph where each vertex only has edges to a small portion
of the other vertices in the Graph.
In the adjacency list the vertices A to D are placed in an Array and each vertex
in the array has its index written right next to it.
Each vertex in the Array has a pointer to a Linked List that represents that
vertex's edges. More specifically, the Linked List contains the indexes to the adjacent or
neighbor vertices.
Example -Vertex A has a link to a Linked List with values 3, 1, and 2. These
values are the indexes to A's adjacent vertices D, B, and C.
An Adjacency List can represent a directed and weighted Graph
In the Adjacency List vertices are stored in an Array. Each vertex has a pointer
to a Linked List with edges stored as I, w, where i is the index of the vertex the edge,
and w is the weight of that edge.
Operations on Graphs
Insertion of Nodes or Edges in the graph – Insert a node into the graph.
Deletion of Nodes or Edges in the graph – Delete a node from the graph.
Searching on Graphs – Search an entity in the graph.
Traversal of Graphs – Traversing all the nodes in the graph.
Shortest Paths - From a source to a destination, a source to all other nodes
and between all pairs.
Minimum Spanning Tree - In a weighted, connected undirected graph,
finding the minimum weight edges to connect all.
Graph Terminology
Graph - A Graph G is a non-empty set of vertices or nodes V and a set of
edges E, where each edge connects a pair of vertices. Formally, a graph can be
represented as G= (V, E). Graphs can be classified based on various properties,
such as directedness of edges and connectivity.
Cycle - A Cycle in a graph is a path that starts and ends at the same vertex, with
no repetitions of vertices. A cycle is closed path in which both edges and
vertices cannot be repeated.
Advantages of Graph
Representing complex data - The relationships between the data points are
not direct.
Efficient data processing - To perform complex operations on large datasets
quickly and effectively.
Network analysis - Graphs are useful in social sciences, business and
marketing.
Path finding - To find the shortest path for logistics and transportation
planning.
Visualization -To communicate complex data and for presentations, reports
and data analysis.
Machine learning -To find the model of complex relationships between
variables like reference systems or fraud detection.
Users on Face book – It referred as vertices and they are friends to an edge
for connecting them. The Friend Suggestion system on Face book is based
on graph theory.
Resources Allocation in the Operating System - Edges is drawn from
resources to assigned functions or from the requesting process to the
desired resources.
Web pages - It referred as vertices on the World Wide Web. There is a link
from page A to page B that can represent an edge in a directed graph.
Disadvantages of Graph
Limited representation - Graphs can represent relationships between
objects and not their properties or attributes.
Difficulty in analysis - Graphs can difficult to interpret for large or
complex.
Scalability issues -The number of nodes and edges in a graph increases then
the processing time and memory also increases. This can make it difficult to
work with large or complex graphs.
Data quality issues -If the data is incomplete, inconsistent, or inaccurate
==========
Spanning tree
Spanning tree
Applications of the spanning tree
Properties of spanning-tree
Minimum Spanning tree
Applications of minimum spanning tree
Two Algorithms for Minimum spanning tree
Example
Prim's Algorithm
Steps of Prim’s Algorithm
Key Characteristics of Prim's Algorithm
Applications of Prim's Algorithm
Advantages of Prim's Algorithm
Disadvantages of Prim's Algorithm
Kruskal's Algorithm
Steps of Kruskal's Algorithm
Key Characteristics of Kruskal's Algorithm
Applications of Kruskal's Algorithm
Advantages of Kruskal's Algorithm
Disadvantages of Kruskal's Algorithm
The spanning tree is the minimum spanning tree. But before moving directly
towards the spanning tree,
A spanning tree can be defined as the sub graph of an undirected connected
graph. It includes all the vertices along with the least possible number of
edges.
If any vertex is missed, it is not a spanning tree. A spanning tree is a subset of
the graph that does not have cycles, and it also cannot be disconnected.
A spanning tree consists of (n-1) edges, where 'n' is the number of vertices or
nodes. Edges of the spanning tree may or may not have weights.
All the possible spanning trees created from the given graph G have the same
number of vertices, but the number of edges in the spanning tree would be
equal to the number of vertices in the given graph minus 1.
A complete undirected graph have nn-2 number of spanning trees where n is the
number of vertices in the graph.
A minimum spanning tree can be defined as the spanning tree in which the sum
of the weights of the edge has minimum. The weight of the spanning tree is the sum of
the weights given to the edges of the spanning tree.
The sum of the edges of the above graph is 16. The possible spanning trees are
The minimum spanning tree that is selected from the above spanning trees for
the given weighted graph is
Prim's Algorithm
Prim's algorithm is a greedy algorithm that incrementally grows the minimum
spanning tree from a starting vertex. The algorithm maintains two sets of vertices are
One set contains the vertices already included in the MST
Other set contains the remaining vertices.
Prim's algorithm iteratively selects the vertex with the smallest edge weight that
connects the two sets and adds it to the MST.
Initialization - Choose a starting vertex and mark it as visited.
Find the minimum-weight edge connected to the visited vertices.
Kruskal's Algorithm
Kruskal's algorithm is another greedy algorithm used to find the minimum
spanning tree. Unlike Prim's algorithm, Kruskal's algorithm processes the edges of the
graph in ascending order of their weights. It incrementally adds edges to the MST as
long as they do not create a cycle.
Dijkstras Algorithm
History of Dijkstra's Algorithm
Working of Dijkstra's Algorithm
Two properties of each Vertex
Examples
Advantages
Disadvantages
Dijkstra's Algorithm is a Graph algorithm that finds the shortest path from a
source vertex to all other vertices in the Graph of single source shortest path. It is a
type of Greedy Algorithm that only works on Weighted Graphs having positive
weights.
The time complexity
The adjacency matrix of the graph - O(V2)
The adjacency list of the graph - O((V + E) log V)
Where V- The number of vertices in the graph
E - The number of edges in the graph
The visited property show The path property stores the value of
whether all the node has been the minimum path to the node.
visited. The minimum path implies the
This property does not revisit any shortest way to reach the all node.
node. This property is revised when any
A node is marked visited only neighbor of the node is visited.
when the shortest path has been This property stores the final shortest
found. path of all node.
Examples
Disadvantages
Dijkstra’s algorithm cannot working with negative weights,
When working with dense graphs Dijkstra’s algorithm is not a good option to
calculate the shortest path between any pair of nodes.
=============
Transitive Closure is the reach ability matrix to reach from vertex u to vertex v
of a graph. One graph is given to find a vertex v which is reachable from another
vertex u, for all vertex pairs (u, v).
The final matrix is the Boolean type. When there is a value 1 for vertex u to
The input neighbouring matrix of the graph is used to initialise the solution matrix
in the first phase is shortest path.
The next step is to treat each node in the range of 0–n as a starting vertex when
iterate over them one by one. The nodes 1, 2...Nis iterated over once again while
the terminating vertex into account.
The next iteration for the shortest path must range from 1, 2,..., k-1, where vertex k
has been chosen as an intermediate vertex.
There are two potential outcomes for each pair of the starting and finishing
vertices I j), respectively.
================
Summary
An edge is one of the two primary units used to form graphs. Each edge has two
ends which are vertices to which it is attached.
If two vertices are endpoints of the same edge, they are adjacent.
A vertex's outgoing edges are directed edges that point to the origin.
A vertex's incoming edges are directed edges that point to the vertex's
destination.
The total number of edges occurring to a vertex in a graph is its degree.
The out-degree of a vertex in a directed graph is the total number of outgoing
edges, whereas the in-degree is the total number of incoming edges.
A vertex with an in-degree of zero is referred to as a source vertex, while one
with an out-degree of zero is known as sink vertex.
An isolated vertex is a zero-degree vertex that is not an edge's endpoint.
A path is a set of alternating vertices and edges, with each vertex connected by
an edge.
The path that starts and finishes at the same vertex is known as a cycle.
A path with unique vertices is called a simple path.
For each pair of vertices x, y, a graph is strongly connected if it contains a
directed path from x to y and a directed path from y to x.
A directed graph is weakly connected if all of its directed edges are replaced
with undirected edges, resulting in a connected graph. A weakly linked graph's
vertices have at least one out-degree or in-degree.
A tree is a connected forest. The primary form of the tree is called a rooted tree,
which is a free tree.
A spanning sub graph that is also a tree is known as a spanning tree.
A connected component is the unconnected graph's most connected subgraph.
A bridge, which is an edge of removal, would sever the graph.
Forest is a graph without a cycle.
================
Storage Devices -Sorting with Disks: K-Way Merging – Sorting with Tapes Symbol
Tables: Static Tree Tables - Dynamic Tree Tables - Hash Tables: Hashing Functions - Overflow
Handling.
Storage Devices
K-way merge sort
Difference between Static Data Structure and Dynamic Data Structure
Hash Table
Storage Devices
Storage Devices
Types of Computer Storage Devices
Primary Storage Devices
Secondary Storage Devices
Magnetic Storage Devices
Flash memory Devices
Optical Storage Devices
Cloud and Virtual Storage
Characteristics of Computer Storage Devices
The storage unit is a part of the computer system which is employed to store the
information and instructions to be processed. A storage device is an integral part of the
computer hardware which stores information/data to process the result of any computational
work. Without a storage device, a computer would not be able to run or even boot up. The
user can say that a storage device is hardware that is used for storing, porting, or extracting
data files. It can also store information/data both temporarily and permanently.
Types of Computer Storage Devices
These storage devices have their own specification and used in the storage devices are
Primary Storage Devices
Secondary Storage Devices
Magnetic Storage Devices
Flash memory Devices
Optical Storage Devices
Cloud and Virtual Storage
Optical Storage Devices is also secondary storage device. It is a removable storage device.
Following are some optical storage devices:
CD - It is known as Compact Disc. It contains tracks and sectors on its surface to store
data. It is made up of polycarbonate plastic and is circular in shape. CD can store data up
to 700MB. It is of two types:
CD-R: It stands for Compact Disc read-only. In this type of CD, once the data is
written cannot be erased. It is read-only.
CD-RW: It stands for Compact Disc Read Write. In this type of CD, the user can
easily write or erase data multiple times.
DVD - It is known as Digital Versatile Disc. DVDs are circular flat optical discs used to
store data. It comes in two different sizes one is 4.7GB single-layer discs and another one
is 8.5GB double-layer discs. DVDs look like CDs but the storage capacity of DVDs is more
than as compared to CDs. It is of two types:
DVD-R - It stands for Digital Versatile Disc read-only. In this type of DVD,
once the data is written cannot be erased. It is read-only. It is generally used to
write movies, etc.
DVD-RW - It stands for Digital Versatile Disc Read Write. In this type of DVD,
the user can easily write or erase data multiple times.
Blu-ray Disc - It is just like CD and DVD but the storage capacity of Blu-ray is up to
25GB. To run a Blu-ray disc the user need a separate Blu-ray reader. This Blu-ray
technology is used to read a disc from a blue-violet laser due to which the information is
stored in greater density with a longer wavelength.
K-way merge sort is a sophisticated sorting algorithm that extends the merge sort
approach. The goal of the k-way merge problem is to combine k-sorted arrays into one sorted
array that has the same elements. While the traditional merge sort algorithm merges two sub
arrays at a time, the K-way merge sort merges K sub arrays, which leads to better performance
when dealing with vast amounts of data.
This algorithm is handy when handling large datasets, as it can efficiently sort and
merge multiple sub arrays, leading to improved efficiency and faster processing times. The K-
way merge sort algorithm is used in various applications that require sorting large data sets.
1. Divide: The original array is divided into K sub arrays of approximately equal size.
2. Conquer: Each of these sub arrays is sorted individually. This can be done using any
sorting algorithm, but for efficiency, the user typically uses a recursive application of the
K-way merge sort.
3. Combine: The sorted sub arrays are merged and the user merge K arrays using a min-
heap data structure.
Time Complexity - Its time complexity is O(n logk n), where n is the number of elements
in the array. This efficiency is particularly noticeable when K is greater than 2 and there
is a significant amount of data.
Relations that are having either small or medium size than main memory.
Relations having a size larger than the memory size.
All the sorting algorithms that the data is in memory and that the user can swap
data elements efficiently. Sometimes, because of the size of the data the user cannot fit
all of it in memory. For this assignment, the user will be implementing an on-disk
sorting algorithm that is designed to use the disk efficiently to sort large data sets. The
sorting algorithm will work in two phases
First, sorting algorithm breaks the data into chunks and sorts each of these
individual chunks. This is accomplished by reading a chunk of data, sorting it,
writing it to a file, then reading more data, etc. At the end of this phase, the user
will have a number of files on disk that are all sorted.
Second, the user will need to merge all of these files into one large file. This is
accomplished by pair-wise merging of the files and then writing out the result to
a new, larger merged file. All of the files will be merged to one large file can done
memory efficiently.
=============
Hash Table
Uses of Hash Tables
Drawback of Hash function
Hashing
Terminologies in hashing
Two techniques for searching
Linear Searching
Binary Searching
Methods of Hash functions
Division Method
Multiplication Method
Universal Hashing
Collision
Causes of Hash Collisions
Poor Hash Function
High Load Factor
Similar Keys
Causes of Hash Collisions
Two types Collision Resolution Techniques
Open Addressing
Linear probing
Quadratic probing
Double Hashing technique
Close Addressing
Advantages of Hashing
Disadvantages of Hashing
Applications of Hashing
Hash table is a special function known as a hash function that maps a given
value with a key to access the elements faster.
A hash table is also referred as a hash map for key value pairs or a hash set for
keys only. It uses a hash function to map keys to a fixed-size array called a hash table.
This allows in faster search, insertion, and deletion operations.
A Hash table stores some information, and the information has basically two
main components
key
value
Hashing is a technique used in data structures to store and retrieve data efficiently.
It involves using a hash function to map data items to a fixed-size array which is
called a hash table.
Hashing is one of the searching techniques that uses a constant time. The
time complexity in hashing is O(1).
In both the searching techniques, the searching depends upon the number of
elements but the user wants the technique that takes a constant time. So, hashing
technique provides a constant time.
In hashing technique, the hash table and hash function are used. Using the hash
function, the user can calculate the address at which the value can be stored.
The hashing is to create the key or value pairs. If the key is given, then the
algorithm computes the index at which the value stored.
Index = hash(key)
The hash function is a function that takes a key and returns an index into the hash
table.
The goal of a hash function is to distribute keys evenly across the hash table,
minimizing collisions when two keys map to the same index.
Hashing is a technique to convert a range of key values into a range of indexes of an
array. The user going to use modulo operator to get a range of key values.
Example - size 20 and the following items are to be stored. Item are in the (key,
value) format.
Collision
A hash collision occurs when two different keys map to the same index in a
hash table. This can happen even with a good hash function, especially if the hash
table is full or the keys are similar.
Open Hashing
Open Hashing, one of the methods used to resolve the collision is known as a
chaining method to resolve the collision.
In this case, the user cannot directly use h(k) = ki/m as h(k) = 2k+3
The index of
key value
index = h(3) = The value 3 would be stored at the index 9.
3
(2(3)+3)%10 = 9
index = h(2) = The value 2 would be stored at the index 7
2
(2(2)+3)%10 = 7
index = h(9) = The value 9 would be stored at the index 1
9
(2(9)+3)%10 = 1
index = h(6) = The value 6 would be stored at the index 5
6
(2(6)+3)%10 = 5
The value 11 would be stored at the index 5.
Now, the user have two values (6, 11) stored at
the same index, i.e., 5. This leads to the
collision problem, so the user will use the
index = h(11) =
11 chaining method to avoid the collision. The
(2(11)+3)%10 = 5
user will create one more list and add the
value 11 to this list. After the creation of the
new list, the newly created list will be linked to
the list having value 6.
key Location(u)
3 ((2*3)+3)%10 = 9
2 ((2*2)+3)%10 = 7
9 ((2*9)+3)%10 = 1
6 ((2*6)+3)%10 = 5
11 ((2*11)+3)%10 = 5
13 ((2*13)+3)%10 = 9
7 ((2*7)+3)%10 = 7
12 ((2*12)+3)%10 = 7
Closed Hashing
Like open hashing, closed hashing is also a technique used for collision
resolution in hash tables. Unlike open hashing, where collisions are resolved by
chaining elements in separate chains, closed hashing aims to store all elements directly
within the hash table itself without the use of additional data structures like linked lists.
Search: To search for an element in closed hashing, the user calculate its hash
value and probe through the slots until the user find the desired element or an
empty slot. If the element is found, the user can retrieve it. If an empty slot is
encountered during probing, the element is not present in the hash table.
Insert: When inserting a new element, the user calculate its hash value and probe
through the slots until the user find an empty slot. Once an empty slot is found,
element into that slot.
Delete: Deleting an element in closed hashing typically involves marking the slot
as "deleted" or using a special marker to indicate that the slot was once occupied
but is now vacant.
The probing process can be done in various ways, such as linear probing, quadratic
probing, or double hashing.
Linear probing
Quadratic probing
Double Hashing technique
Linear Probing
Linear probing is one of the forms of open addressing. Each cell in the hash table
contains a key-value pair, so when the collision occurs by mapping a new key to the cell
already occupied by another key, then linear probing technique searches for the closest
free locations and adds a new key to that empty cell. In this case, searching is performed
sequentially, starting from the position where the collision occurs till the empty cell is
not found.
Advantages of Hashing
Data Retrieval Speed - Hashing allows for fast data retrieval by mapping a key (or
data) to a unique hash value, which serves as an index. This enables efficient
lookup operations, typically with constant time complexity O(1).
Data Integrity - Hashing is commonly used to verify data integrity. By comparing
the hash value of received data with the original hash value, the user can detect if
the data has been altered or corrupted during transmission.
Password Storage - Hashing is widely used for securely storing passwords.
Instead of storing plain-text passwords, systems store the hash values of
passwords. Even if the hash is compromised, it’s challenging for attackers to
reverse it to obtain the original password.
Data Structures - Hashing is a fundamental component of hash tables, which are
versatile data structures used in various applications like databases, caching, and
search engines for efficient data retrieval.
Cryptography - Hash functions are crucial in cryptographic applications, such as
digital signatures, message authentication codes (MACs), and data encryption
algorithms. They provide security by transforming data into a fixed-size hash that
is difficult to reverse engineer.
Data De-duplication - Hashing is used in data de-duplication techniques to
identify and eliminate duplicate data in storage systems, saving storage space.
Disadvantages of Hashing
Collision Risk - Collisions occur when two different inputs produce the same
hash value. While good hash functions aim to minimize collisions, they are still
possible. Collisions can have security implications and impact the efficiency of
hash tables.
Non-Reversible - Hash functions are designed to be one-way functions, meaning
it’s computationally infeasible to reverse the process and obtain the original input
data. This can be a disadvantage when reverse lookup is required.
=============
Insertion Sort - Quick Sort - 2 Way Merge Sort - Heap Sort – Shell Sort - Sorting on
Several Keys. Files: Files, Queries and Sequential organizations – Index Techniques -File
Organizations.
------------------------
Internal Sorting
File Organization
Indexing
Hash File Organization
File Organization
Merge Sort
Internal Sorting
Internal Sorting
• Types of Internal Sorting
• Bubble Sort
• Quick Sort
• Insertion Sort
• Heap Sort
• Shell sort
• Merge Sort
• Disadvantages of Merge Sort
• Advantages of Merge Sort
• Applications of Merge Sort
• Complexity Analysis of Merge Sort
• Recurrence Relation of Merge Sort
• Difference between Internal and External Sort
• Importance of Internal Sorting
Bubble Sort: A simple and intuitive sorting algorithm that repeatedly steps through the
list, compares adjacent elements, and swaps them if they are in the wrong order. Bubble
sort has limited practical applications due to its inefficiency on large datasets.
Shell sort
Shell sort is a highly efficient sorting algorithm and is based on insertion sort
algorithm. This algorithm avoids large shifts as in case of insertion sort, if the smaller
value is to the far right and has to be moved to the far left.
This algorithm uses insertion sort on a widely spread elements, first to sort them and
then sorts the less widely spaced elements. This spacing is termed as interval. This
interval is calculated based on Knuth's formula as
h=h*3+1
This algorithm is quite efficient for medium-sized data sets as its average and
worst-case complexity are of O(n), where n is the number of items.
• Here these values are {35, 14}, {33, 19}, {42, 27} and {10, 14}
Merge Sort
Merge sort is sorting algorithms that follows the divide and conquer approach. It
works by recursively dividing the input array into smaller sub arrays and sorting those
sub arrays then merging them back together to obtain the sorted array.
In simple terms, we can say that the process of merge sort is to divide the array into two
halves, sort each half, and then merge the sorted halves back together. This process is
repeated until the entire array is sorted.
▪ Divide: Divide the list or array recursively into two halves until it can no more be
divided.
▪ Conquer: Each sub array is sorted individually using the merge sort algorithm.
▪ Merge: The sorted sub arrays are merged back together in sorted order. The
process continues until all elements from both sub arrays have been merged.
Merge sort is one of the most optimized and most used sorting algorithms in the
industry but there are more sorting algorithms that are more optimized in different
cases.
▪ T(n) Represents the total time taken by the algorithm to sort an array of size n.
▪ 2T(n/2) represents time taken by the algorithm to recursively sort the two halves
of the array. Since each half has n/2 elements, we have two recursive calls with
input size as (n/2).
▪ O(n) represents the time taken to merge the two sorted halves
Complexity Analysis of Merge Sort
• Time Complexity:
▪ Best Case: O(n log n), When the array is already sorted or nearly sorted.
▪ Average Case: O(n log n), When the array is randomly ordered.
▪ Worst Case: O(n log n), When the array is sorted in reverse order.
• Auxiliary Space: O(n), Additional space is required for the temporary array used
during merging.
Applications of Merge Sort
• Sorting large datasets
• External sorting (when the dataset is too large to fit in memory)
• Inversion counting
• Merge Sort and its variations are used in library methods of programming
languages.
Example its variation Tim Sort is used in Python, Java Android and Swift. The
main reason why it is preferred to sort non-primitive types is stability which is not
there in Quick Sort. For example
• It is a preferred algorithm for sorting Linked lists.
• It can be easily parallelized as we can independently sort sub arrays and then
merge.
Internal sorting plays a crucial role in the efficient operation of computer systems and
databases.
• Data Retrieval Efficiency - When data is sorted, searching for specific information
becomes faster and more efficient. This is crucial in applications like search
engines and database management systems.
• Resource Optimization - Internal sorting reduces the need for frequent data
transfer between primary and secondary storage, which helps conserve resources
and enhance the overall system performance.
• User Experience - In user-facing applications, such as e-commerce websites,
internal sorting ensures that search results are presented quickly and in a user-
friendly manner, enhancing the user experience.
• Memory Constraints - Sorting large datasets in primary memory can be
challenging if memory resources are limited.
• Algorithm Selection - Choosing the right sorting algorithm is critical. The selection
depends on the data size, distribution, and the desired efficiency.
• Stability and prevention – Sorting algorithms alter the relative order of equal
elements in specific applications where stability and preservation of the order.
File Organization
• There are two important features of file:
• File Organization
• Three types of organizing the file
• Sequential access file organization
• Advantages of sequential file
• Disadvantages of sequential file
• Direct access files organization
• Advantages of direct access file organization
• Disadvantages of direct access file organization
• Indexed sequential access file organization
• Advantages of Indexed sequential access file organization
• Disadvantages of Indexed sequential access file organization
• Direct access file helps in online transaction processing system (OLTP) like online
railway reservation system.
• In direct access file, sorting of the records are not required.
• It accesses the desired records immediately.
• It updates several files quickly.
• It has better control over record allocation.
Disadvantages of direct access file organization
• In indexed sequential access file, sequential file and random file access is
possible.
• It accesses the records very fast if the index table is properly organized.
• The records can be inserted in the middle of the file.
• It provides quick access for sequential and direct processing.
• It reduces the degree of the sequential search.
Disadvantages of Indexed sequential access file organization
• Indexed sequential access file requires unique keys and periodic reorganization.
• Indexed sequential access file takes longer time to search the index for the data
access or retrieval.
• It requires more storage space.
• It is expensive because it requires special software.
• It is less efficient in the use of storage space as compared to other file
organizations.
================
Indexing
Indexing
• Attributes of Indexing
• Two types of file organization
• Sequential File Organization
• Dense Index
Attributes of Indexing
• Access Types: This refers to the type of access such as value-based search, range
access, etc.
• Access Time: It refers to the time needed to find a particular data element or set of
elements.
• Insertion Time: It refers to the time taken to find the appropriate space and insert
new data.
• Deletion Time: Time taken to find an item and delete it as well as update the index
structure.
• Sequential File Organization or Ordered Index File - In this, the indices are based on
a sorted ordering of the values. These are generally fast and a more traditional type
of storing mechanism. These Ordered or Sequential file organizations might store
the data in a dense or sparse format.
• Dense Index - For every search key value in the data file, there is an index record.
This record contains the search key and also a reference to the first data record with
that search key value.
Sparse Index
The index record appears only for a few items in the data file. Each item points to a block
as shown.
To locate a record, we find the index record with the largest search key value less than or
equal to the search key value we are looking for.
The user start at that record pointed to by the index record, and proceed along with the
pointers in the file that is, sequentially until we find the desired record.
Indices are based on the values being distributed uniformly across a range of
buckets. The buckets to which a value is assigned are determined by a function called a
hash function. There are primarily three methods of indexing
▪ Clustered Indexing - When more than two records are stored in the same file this
type of storing is known as cluster indexing. By using cluster indexing we can
reduce the cost of searching reason being multiple records related to the same thing
are stored in one place and it also gives the frequent joining of more than two tables
(records).
The clustering index is defined on an ordered data file. The data file is ordered on a
non-key field. In some cases, the index is created on non-primary key columns
which may not be unique for each record. In such cases, in order to identify the
records faster, we will group two or more columns together to get the unique
▪ Students studying each semester, for example, are grouped together. First-semester
students, second-semester students, third-semester students, and so on are
categorized.
▪ Primary Indexing - This is a type of Clustered Indexing wherein the data is sorted
according to the search key and the primary key of the database table is used to
create the index. It is a default format of indexing where it induces sequential file
organization. As primary keys are unique and are stored in a sorted manner, the
performance of the searching operation is quite efficient.
Example - Contents page of a book. Each entry gives us the page number or
location of the information stored. The actual data information on each page
of the book is not organized but have an ordered reference (contents page) to
where the data points actually lie. The user can have only dense ordering in
the non-clustered index as sparse ordering is not possible because data is not
physically organized accordingly. It requires more time as compared to the
clustered index because some amount of extra work is done in order to
extract the data by further following the pointer. In the case of a clustered
index, data is directly present in front of the index.
Multilevel Indexing with the growth of the size of the database, indices also grow.
As the index is stored in the main memory, a single-level index might become too large a
size to store with multiple disk accesses. The multilevel indexing segregates the main
block into various smaller blocks so that the same can be stored in a single block. The outer
blocks are divided into inner blocks which in turn are pointed to the data blocks. This can
be easily stored in the main memory with fewer overheads.
• Improved Query Performance - Indexing enables faster data retrieval from the
database. The database may rapidly discover rows that match a specific value or
collection of values by generating an index on a column, minimizing the amount of
time it takes to perform a query.
• Efficient Data Access - Indexing can enhance data access efficiency by lowering the
amount of disk I/O required to retrieve data. The database can maintain the data
pages for frequently visited columns in memory by generating an index on those
columns, decreasing the requirement to read from disk.
• Optimized Data Sorting - Indexing can also improve the performance of sorting
operations. By creating an index on the columns used for sorting, the database can
avoid sorting the entire table and instead sort only the relevant rows.
• Consistent Data Performance - Indexing can assist ensure that the database
performs consistently even as the amount of data in the database rises. Without
indexing, queries may take longer to run as the number of rows in the table grows,
while indexing maintains a roughly consistent speed.
• By ensuring that only unique values are inserted into columns that have been
indexed as unique, indexing can also be utilized to ensure the integrity of data. This
avoids storing duplicate data in the database, which might lead to issues when
performing queries or reports.
Disadvantages of Indexing
• Indexing necessitates more storage space to hold the index data structure, which
might increase the total size of the database.
• Indexing can reduce insert and update performance since the index data structure
must be updated each time data is modified.
Features of Indexing
• The development of data structures, such as B-Trees or hash table, that provide
quick access to certain data items is known as indexing. The data structures
themselves are built on the values of the indexed columns, which are utilized to
quickly find the data objects.
• The most important columns for indexing columns are selected based on how
frequently they are used and the sorts of queries they are subjected to.
The cardinality, selectivity, and uniqueness of the indexing columns can be
considered.
• There are several different index types used by databases, including primary,
secondary, clustered, and non-clustered indexes. Based on the particular needs of
the database system, each form of index offers benefits and drawbacks.
• For the database system to function at its best, periodic index maintenance is
required. According to changes in the data and usage patterns, maintenance work
involves building, updating, and removing indexes.
• When non-contiguous data blocks are stored in an index, it can result in index
fragmentation, which makes the index less effective. Regular index maintenance,
such as defragmentation and reorganization, can decrease fragmentation.
Summary
• Indexing is a very useful technique that helps in optimizing the search time
in database queries.
• The table of database indexing consists of a search key and pointer.
• There are four types of indexing: Primary, Secondary Clustering and Multivalued
Indexing.