Fundamental of the Data Structure
A data structure is a specialized format of data for arranging and storing data so that any user can
easily access and worked within an appropriate data to run a program efficiently. Computer
memory data can be organized logically or mathematically, and this process is known as a data
structure. In general, selecting a particular format of data depends on two factors. The data must
be rich enough to satisfy the real relationships of the data in the real world. And on the other
hand, the data structure should be so simple that one can easily process the data when it needs to
be used.
Characteristics of data structures:
i. Linear: A linear describes data characteristics whether the data items are arranged in
sequential form like an array.
ii. Non-Linear: A Non-Linear data structure describes the characteristics of data items
that are not in sequential form like a tree, graph.
iii. Static: It is a static data structure that describes the size and structures of a collection
of data items associated with a memory location at compile time that are fixed.
Example - Array.
iv. Homogenous: It is a characteristic of data structures representing whether the data
type of all elements is the same. Example- Array.
v. Non-Homogenous: It is a characteristic of data structures representing whether the
data type elements may or may not be the same.
vi. Dynamic: It is a dynamic data structure that defines the shrinking and expanding of
data items at the run time or the program's execution. It is also related to the
utilization of memory location that can be changed at the program's run time.
Example: Linked list.
vii. It has some rules that define how the data items are related to each other.
viii. It defines some rules to display the relationship between data items and how they
interact with each other.
ix. It has some operations used to perform on data items like insertion, searching,
deletion, etc.
x. It helps in reducing the usage of memory resources.
xi. Time Complexity: The execution time of a program in a data structure should be
minimum as possible.
xii. Space Complexity: The memory usage through all data items in a data structure
should be less possible.
Basic Operations of Data Structures
Some basic functions are chosen based on all types of data that occur in a data structure.
i. Traversing: It is used to visit each variable once. It is also referred to as visiting
elements in the data structure.
ii. Searching: It is used to find the location of a given element in the whole data
structure. Example, an array.
iii. Insertion: It is used to insert an element at the specified position in data elements.
iv. Deletion: A deletion operation is used to delete an element from the specified
location.
v. Sorting: It is used to arrange or sort the data elements in ascending or descending
order. However, the data items are arranged in some logical order, such as Name key,
account number, etc.
vi. Merging: The Merge operation is used to join two or more sorted data elements to a
data structure.
Needs of the data structures
Data is a basic fact or entity information that is used to calculate or manipulate. Two types of
data are used in a data structure, such as numerical data and alphanumeric data. These data
structures specify the nature of the data item undergoing some function. Integers and floating
points are types of numeric data types. The data can be single or a set of values that are
organized in a particular fashion. The data organization leads to memory that creates a logical
relationship between data items and makes it necessary to use data structures.
Advantages of Data Structures
There are some advantages of data structure:
1. Efficiency: The efficiency and organization of a program depend on the selection of the
right data structures. Suppose we want to search for a particular item from a collection of
data records. In that case, our data should be organized in linear like an array that helps to
perform the sequential search, such as element by element. However, it is efficient but
more time consuming because we need to arrange all elements and then search each
element. Hence, we choose a better option in a data structure, making the search process
more efficient, like a binary search tree, selection or hash tables.
2. Reusability: In the data structure, many operations make the programs reusable. For
example, when we write programs or implement a particular data structure, we can use it
from any source location or place to get the same results.
3. Abstraction: The data structure is maintained by the ADT, which provides different levels
of abstraction. The client can interact with data structures only through the interface.
4. The data structure helps to simplify the process of collection of data through the software
systems.
5. It is used to save collection data storage on a computer that can be used by various
programs.
Disadvantages of data structures
1. A user who has deep knowledge about the functionality of the data structure can make
changes to it.
2. If there is an error in the data structure, an expert can detect the bug; The original user
cannot help themselves solve the problem and fix it.
Following are the data structures and their uses are as follows:
Array: An array data structure is a collection of elements of the same data type used to store data
in contiguous memory locations. It has a fixed size collection of data elements that cannot be
changed at runtime of a program. It is mostly used in a computer program to organize the data so
that related items or values can be easily searched or sorted in a system.
Linked List: It is a collection of data links called nodes where each node contains a data value
and the address of the next link. All elements of the linked list are not stored in neighbouring
memory locations. In simple words, it is a sequence of data nodes that connect through
links. Each node of a list is consists of two items: the basic idea behind the uses of a linked list in
a data structure is that each node within the structure includes a data part where we store values,
and a pointer indicates the next node can be found. The linked list's starting point denotes
the head of the list, and the endpoints represent the node's tail.
Stack: It is a linear data structure that enables the elements to be inserted and deleted from one
end, called the Top of Stack (TOS). A stack data structure follows the last in first out (LIFO)
operation to insert and remove an element from the stack list. It is an abstract data type with two
possible operations to insert and delete elements in the stack: push and pop. A push operation is
used in the stack to insert elements at the top of the list, hide those elements that already
available in the list, or initialize the stack if it is empty. The pop operation is used in the stack
data structure to delete a data item from the top of the list.
Queue: It is a linear data structure that enables the insertion at one end of the list, known as
the rear, and the deletion of the elements occurred at another end of the list, called the front. It is
a sequential collection of data elements based on the First in First out (FIFO) data structure,
which means the first inserted elements in a queue will be removed first from the queue list.
These queue operations are described below:
1. Enqueue(): It is a queue operation used to insert an element to the list.
2. Dequeue(): It is a queue operation used to delete an item from the list.
3. Peek(): It is used to get the first element of the queue list without removing it.
4. IsFull(): It is an IsFull operation that indicates whether the list is full.
5. IsEmpty(): It is an IsEmpty operation that represents whether the list is empty.
Graphs: A graph is a non- linear data structure consisting of finite sets of vertices (nodes)
and edges to create an illustrated representation of a set of objects. These edges and nodes are
connecting through any two nodes in the graph. The connected node can be represented as a
directed or undirected graph. In a directed graph, each node is directly connected with edges in
only one direction. In an undirected graph, each node is connected with edges in all directions.
Hence it is also known as a bidirectional node.
Trees: It is a type of non-linear data structure representing the hierarchical data. It is a finite set
of elements where one of these nodes or elements is called a root node, and the remaining
elements of a data structure consisting of a value called Subtrees. Every node of the tree
maintains the parent-child relationship, where only one parent node and the remaining node in
the tree is the child node. A node can have multiple child nodes but has a single parent node.
There are some types of trees, such as a binary tree, binary search tree, expression trees, AVL
tree and B trees.
Heap: A heap data structure is a special type of complete binary tree that satisfies the heap
property and arranged the elements in a specific order. Max and Min heap are the types of heap
data structures. In Max heap, the root node's value is always higher or equal to the value of all
existing child nodes in the heap tree. While in Min heap, the value of the root node/element is
always shorter than the existing elements of the heap node. Each child node's value in the min-
heap is equal to or larger than the parent node's value.
Hash Table: It is a non-linear type of data structure that holds and organizes data in key-value
pairs to access the particular keys/data elements. A key is a null value that is mapped or linked to
an element. Hashing makes our data structure much simpler and faster when performing
insertion and search operations on various data elements, regardless of the data's size.
Dictionary: A dictionary is a type of data structure that holds data elements in a group of objects
and is similar to a hash table, except that it is an ordered or unordered collection of data elements
in key-value pairs. Each key is associated with a single value. When we retrieve a key, the
dictionary will return the associated value of a key.
For example, students = {'James' = 25, 'Jasmine' = '17', 'Rosy = '19', 'Margret' = '24', 'Price' =
'28'}
Hash Table
Hash table is one of the most important data structures that uses a special function known as a
hash function that maps a given value with a key to access the elements faster.
A Hash table is a data structure that stores some information, and the information has basically
two main components, i.e., key and value. The hash table can be implemented with the help of an
associative array. The efficiency of mapping depends upon the efficiency of the hash function
used for mapping.
For example, suppose the key value is John and the value is the phone number, so when we pass
the key value in the hash function shown as below:
Hash(key)= index;
Backward Skip 10sPlay VideoForward Skip 10s
When we pass the key in the hash function, then it gives the index.
Hash(john) = 3;
The above example adds the john at the index 3.
Drawback of Hash function
A Hash function assigns each value with a unique key. Sometimes hash table uses an imperfect
hash function that causes a collision because the hash function generates the same key of two
different values.
Hashing
Hashing is one of the searching techniques that uses a constant time. The time complexity in
hashing is O(1). Till now, we read the two techniques for searching, i.e., linear search and binary
search. The worst time complexity in linear search is O(n), and O(logn) in binary search. In both
the searching techniques, the searching depends upon the number of elements but we want the
technique that takes a constant time. So, hashing technique came that provides a constant time.
In Hashing technique, the hash table and hash function are used. Using the hash function, we can
calculate the address at which the value can be stored.
The main idea behind the hashing is to create the (key/value) pairs. If the key is given, then the
algorithm computes the index at which the value would be stored. It can be written as:
Index = hash(key)
There are three ways of calculating the hash function:
1. Division method
2. Folding method
3. Mid square method
In the division method, the hash function can be defined as:
h(ki) = ki % m;
where m is the size of the hash table.
For example, if the key value is 6 and the size of the hash table is 10. When we apply the hash
function to key 6 then the index would be:
h(6) = 6%10 = 6
The index is 6 at which the value is stored.
Collision
When the two different values have the same value, then the problem occurs between the two
values, known as a collision. In the above example, the value is stored at index 6. If the key
value is 26, then the index would be:
h(26) = 26%10 = 6
Therefore, two values are stored at the same index, i.e., 6, and this leads to the collision problem.
To resolve these collisions, we have some techniques known as collision techniques.
The following are the collision techniques:
1. Open Hashing: It is also known as closed addressing.
2. Closed Hashing: It is also known as open addressing.
Open Hashing
In Open Hashing, one of the methods used to resolve the collision is known as a chaining
method.
Let's first understand the chaining to resolve the collision.
Suppose we have a list of key values
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
In this case, we cannot directly use h(k) = ki/m as h(k) = 2k+3
The index of key value 3 is:
index = h(3) = (2(3)+3)%10 = 9
The value 3 would be stored at the index 9.
The index of key value 2 is:
index = h(2) = (2(2)+3)%10 = 7
The value 2 would be stored at the index 7.
The index of key value 9 is:
index = h(9) = (2(9)+3)%10 = 1
The value 9 would be stored at the index 1.
The index of key value 6 is:
index = h(6) = (2(6)+3)%10 = 5
The value 6 would be stored at the index 5.
The index of key value 11 is:
index = h(11) = (2(11)+3)%10 = 5
The value 11 would be stored at the index 5. Now, we have two values (6, 11) stored at the same
index, i.e., 5. This leads to the collision problem, so we will use the chaining method to avoid the
collision. We will create one more list and add the value 11 to this list. After the creation of the
new list, the newly created list will be linked to the list having value 6.
The index of key value 13 is:
index = h(13) = (2(13)+3)%10 = 9
The value 13 would be stored at index 9. Now, we have two values (3, 13) stored at the same
index, i.e., 9. This leads to the collision problem, so we will use the chaining method to avoid the
collision. We will create one more list and add the value 13 to this list. After the creation of the
new list, the newly created list will be linked to the list having value 3.
The index of key value 7 is:
index = h(7) = (2(7)+3)%10 = 7
The value 7 would be stored at index 7. Now, we have two values (2, 7) stored at the same index,
i.e., 7. This leads to the collision problem, so we will use the chaining method to avoid the
collision. We will create one more list and add the value 7 to this list. After the creation of the
new list, the newly created list will be linked to the list having value 2.
The index of key value 12 is:
index = h(12) = (2(12)+3)%10 = 7
According to the above calculation, the value 12 must be stored at index 7, but the value 2 exists
at index 7. So, we will create a new list and add 12 to the list. The newly created list will be
linked to the list having a value 7.
The calculated index value associated with each key value is shown in the below table:
key Location(u)
3 ((2*3)+3)%10 = 9
2 ((2*2)+3)%10 = 7
9 ((2*9)+3)%10 = 1
6 ((2*6)+3)%10 = 5
11 ((2*11)+3)%10 = 5
13 ((2*13)+3)%10 = 9
7 ((2*7)+3)%10 = 7
12 ((2*12)+3)%10 = 7
Closed Hashing
In Closed hashing, three techniques are used to resolve the collision:
1. Linear probing
2. Quadratic probing
3. Double Hashing technique
Linear Probing
Linear probing is one of the forms of open addressing. As we know that each cell in the hash
table contains a key-value pair, so when the collision occurs by mapping a new key to the cell
already occupied by another key, then linear probing technique searches for the closest free
locations and adds a new key to that empty cell. In this case, searching is performed sequentially,
starting from the position where the collision occurs till the empty cell is not found.
Let's understand the linear probing through an example.
Consider the above example for the linear probing:
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5 respectively. The calculated index
value of 11 is 5 which is already occupied by another key value, i.e., 6. When linear probing is
applied, the nearest empty cell to the index 5 is 6; therefore, the value 11 will be added at the
index 6.
The next key value is 13. The index value associated with this key value is 9 when hash function
is applied. The cell is already filled at index 9. When linear probing is applied, the nearest empty
cell to the index 9 is 0; therefore, the value 13 will be added at the index 0.
The next key value is 7. The index value associated with the key value is 7 when hash function is
applied. The cell is already filled at index 7. When linear probing is applied, the nearest empty
cell to the index 7 is 8; therefore, the value 7 will be added at the index 8.
The next key value is 12. The index value associated with the key value is 7 when hash function
is applied. The cell is already filled at index 7. When linear probing is applied, the nearest empty
cell to the index 7 is 2; therefore, the value 12 will be added at the index 2.
Quadratic Probing
In case of linear probing, searching is performed linearly. In contrast, quadratic probing is an
open addressing technique that uses quadratic polynomial for searching until a empty slot is
found.
It can also be defined as that it allows the insertion ki at first free location from (u+i2)%m where
i=0 to m-1.
Let's understand the quadratic probing through an example.
Consider the same example which we discussed in the linear probing.
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5, respectively. We do not need to
apply the quadratic probing technique on these key values as there is no occurrence of the
collision.
The index value of 11 is 5, but this location is already occupied by the 6. So, we apply the
quadratic probing technique.
When i = 0
Index= (5+02)%10 = 5
When i=1
Index = (5+12)%10 = 6
Since location 6 is empty, so the value 11 will be added at the index 6.
The next element is 13. When the hash function is applied on 13, then the index value comes out
to be 9, which we already discussed in the chaining method. At index 9, the cell is occupied by
another value, i.e., 3. So, we will apply the quadratic probing technique to calculate the free
location.
When i=0
Index = (9+02)%10 = 9
When i=1
Index = (9+12)%10 = 0
Since location 0 is empty, so the value 13 will be added at the index 0.
The next element is 7. When the hash function is applied on 7, then the index value comes out to
be 7, which we already discussed in the chaining method. At index 7, the cell is occupied by
another value, i.e., 7. So, we will apply the quadratic probing technique to calculate the free
location.
When i=0
Index = (7+02)%10 = 7
When i=1
Index = (7+12)%10 = 8
Since location 8 is empty, so the value 7 will be added at the index 8.
The next element is 12. When the hash function is applied on 12, then the index value comes out
to be 7. When we observe the hash table then we will get to know that the cell at index 7 is
already occupied by the value 2. So, we apply the Quadratic probing technique on 12 to
determine the free location.
When i=0
Index= (7+02)%10 = 7
When i=1
Index = (7+12)%10 = 8
When i=2
Index = (7+22)%10 = 1
When i=3
Index = (7+32)%10 = 6
When i=4
Index = (7+42)%10 = 3
Since the location 3 is empty, so the value 12 would be stored at the index 3.
The final hash table would be:
Therefore, the order of the elements is 13, 9, _, 12, _, 6, 11, 2, 7, 3.
Double Hashing
Double hashing is an open addressing technique which is used to avoid the collisions. When the
collision occurs then this technique uses the secondary hash of the key. It uses one hash value as
an index to move forward until the empty location is found.
In double hashing, two hash functions are used. Suppose h1(k) is one of the hash functions used
to calculate the locations whereas h2(k) is another hash function. It can be defined as "insert ki at
first free place from (u+v*i)%m where i=(0 to m-1)". In this case, u is the location computed
using the hash function and v is equal to (h2(k)%m).
Consider the same example that we use in quadratic probing.
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and
h1(k) = 2k+3
h2(k) = 3k+1
key Location (u) v probe
3 ((2*3)+3)%10 = 9 - 1
2 ((2*2)+3)%10 = 7 - 1
9 ((2*9)+3)%10 = 1 - 1
6 ((2*6)+3)%10 = 5 - 1
11 ((2*11)+3)%10 = 5 (3(11)+1)%10 =4 3
13 ((2*13)+3)%10 = 9 (3(13)+1)%10 = 0
7 ((2*7)+3)%10 = 7 (3(7)+1)%10 = 2
12 ((2*12)+3)%10 = 7 (3(12)+1)%10 = 7 2
As we know that no collision would occur while inserting the keys (3, 2, 9, 6), so we will not
apply double hashing on these key values.
On inserting the key 11 in a hash table, collision will occur because the calculated index value of
11 is 5 which is already occupied by some another value. Therefore, we will apply the double
hashing technique on key 11. When the key value is 11, the value of v is 4.
Now, substituting the values of u and v in (u+v*i)%m
When i=0
Index = (5+4*0)%10 =5
When i=1
Index = (5+4*1)%10 = 9
When i=2
Index = (5+4*2)%10 = 3
Since the location 3 is empty in a hash table; therefore, the key 11 is added at the index 3.
The next element is 13. The calculated index value of 13 is 9 which is already occupied by some
another key value. So, we will use double hashing technique to find the free location. The value
of v is 0.
Now, substituting the values of u and v in (u+v*i)%m
When i=0
Index = (9+0*0)%10 = 9
We will get 9 value in all the iterations from 0 to m-1 as the value of v is zero. Therefore, we
cannot insert 13 into a hash table.
The next element is 7. The calculated index value of 7 is 7 which is already occupied by some
another key value. So, we will use double hashing technique to find the free location. The value
of v is 2.
Now, substituting the values of u and v in (u+v*i)%m
When i=0
Index = (7 + 2*0)%10 = 7
When i=1
Index = (7+2*1)%10 = 9
When i=2
Index = (7+2*2)%10 = 1
When i=3
Index = (7+2*3)%10 = 3
When i=4
Index = (7+2*4)%10 = 5
When i=5
Index = (7+2*5)%10 = 7
When i=6
Index = (7+2*6)%10 = 9
When i=7
Index = (7+2*7)%10 = 1
When i=8
Index = (7+2*8)%10 = 3
When i=9
Index = (7+2*9)%10 = 5
Since we checked all the cases of i (from 0 to 9), but we do not find suitable place to insert 7.
Therefore, key 7 cannot be inserted in a hash table.
The next element is 12. The calculated index value of 12 is 7 which is already occupied by some
another key value. So, we will use double hashing technique to find the free location. The value
of v is 7.
Now, substituting the values of u and v in (u+v*i)%m
When i=0
Index = (7+7*0)%10 = 7
When i=1
Index = (7+7*1)%10 = 4
Since the location 4 is empty; therefore, the key 12 is inserted at the index 4.
The final hash table would be:
The order of the elements is _, 9, _, 11, 12, 6, _, 2, _, 3.