Indexing in DBMS
o Indexing is used to optimize the performance of a database by minimizing the
number of disk accesses required when a query is processed.
o The index is a type of data structure. It is used to locate and access the data in a
database table quickly.
Index structure:
Indexes can be created using some database columns.
o The first column of the database is the search key that contains a copy of the
primary key or candidate key of the table. The values of the primary key are stored
in sorted order so that the corresponding data can be accessed easily.
o The second column of the database is the data reference. It contains a set of
pointers holding the address of the disk block where the value of the particular key
can be found.
Advantages of Indexing
Important pros/ advantage of Indexing are:
It helps you to reduce the total number of I/O operations needed to
retrieve that data, so you don't need to access a row in the database
from an index structure.
Offers Faster search and retrieval of data to users.
Indexing also helps you to reduce tablespace as you don't need to link
to a row in a table, as there is no need to store the ROWID in the Index.
Thus you will able to reduce the tablespace.
You can't sort data in the lead nodes as the value of the primary key
classifies it.
Disadvantages of Indexing
Important drawbacks/cons of Indexing are:
To perform the indexing database management system, you need a
primary key on the table with a unique value.
You can't perform any other indexes on the Indexed data.
You are not allowed to partition an index-organized table.
SQL Indexing Decrease performance in INSERT, DELETE, and
UPDATE query.
Indexing Methods
Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted are
known as ordered indices.
Example: Suppose we have an employee table with thousands of record and each of which
is 10 bytes long. If their IDs start with 1, 2, 3....and so on and we have to search student
with ID-543.
o In the case of a database with no index, we have to search the disk block from
starting till it reaches 543. The DBMS will read the record after reading
543*10=5430 bytes.
o In the case of an index, we will search using indexes and the DBMS will read the
record after reading 542*2= 1084 bytes which are very less compared to the
previous case.
Primary Index
o If the index is created on the basis of the primary key of the table, then it is known
as primary indexing. These primary keys are unique to each record and contain 1:1
relation between the records.
o As primary keys are stored in sorted order, the performance of the searching
operation is quite efficient.
o The primary index can be classified into two types: Dense index and Sparse index.
Dense Index
In dense index, there is an index record for every search key value in the database.
This makes searching faster but requires more space to store index records itself.
In this, the number of records in the index table is same as the number of records in the
main table. It needs more space to store index record itself. The index records have the
search key and a pointer to the actual record on the disk.
Sparse Index
In sparse index, index records are not created for every search key. An index record
here contains a search key and an actual pointer to the data on the disk. To search a
record, we first proceed by index record and reach at the actual location of the data. If
the data we are looking for is not where we directly reach by following the index, then
the system starts sequential search until the desired data is found.
In the data file, index record appears only for a few items. Each item points to a block.
In this, instead of pointing to each record in the main table, the index points to the records
in the main table in a gap.
Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored
on the disk along with the actual database files. As the size of the database grows, so
does the size of the indices. There is an immense need to keep the index records in
the main memory so as to speed up the search operations. If single-level index is used,
then a large size index cannot be kept in memory which leads to multiple disk
accesses.
Multi-level Index helps in breaking down the index into several smaller indices in order
to make the outermost level so small that it can be saved in a single disk block, which
can easily be accommodated anywhere in the main memory.
Clustering Index
o A clustered index can be defined as an ordered data file. Sometimes the index is
created on non-primary key columns which may not be unique for each record.
o In this case, to identify the record faster, we will group two or more columns to get
the unique value and create index out of them. This method is called a clustering
index.
o The records which have similar characteristics are grouped, and indexes are created
for these group.
Example: suppose a company contains several employees in each department. Suppose we
use a clustering index, where all employees which belong to the same Dept_ID are
considered within a single cluster, and index pointers point to the cluster as a whole. Here
Dept_Id is a non-unique key.
The previous schema is little confusing because one disk block is shared by records which
belong to the different cluster. If we use separate disk block for separate clusters, then it is
called better technique.
Secondary Index
In the sparse indexing, as the size of the table grows, the size of mapping also grows.
These mappings are usually kept in the primary memory so that address fetch should be
faster. Then the secondary memory searches the actual data based on the address got from
mapping. If the mapping size grows then fetching the address itself becomes slower. In this
case, the sparse index will not be efficient. To overcome this problem, secondary indexing is
introduced.
In secondary indexing, to reduce the size of mapping, another level of indexing is
introduced. In this method, the huge range for the columns is selected initially so that the
mapping size of the first level becomes small. Then each range is further divided into
smaller ranges. The mapping of the first level is stored in the primary memory, so that
address fetch is faster. The mapping of the second level and actual data are stored in the
secondary memory (hard disk).
For example:
o If you want to find the record of roll 111 in the diagram, then it will search the
highest entry which is smaller than or equal to 111 in the first level index. It will get
100 at this level.
o Then in the second index level, again it does max (111) <= 111 and gets 110. Now
using the address 110, it goes to the data block and starts searching each record till
it gets 111.
o This is how a search is performed in this method. Inserting, updating or deleting is
also done in the same manner.
Short note on B+ Tree:
o The B+ tree is a balanced binary search tree. It follows a multi-level index format.
o In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf
nodes remain at the same height.
o In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can
support random access as well as sequential access.
Structure of B+ Tree
o In the B+ tree, every leaf node is at equal distance from the root node. The B+ tree
is of the order n where n is fixed for every B+ tree.
o It contains an internal node and leaf node.
Internal node
o An internal node of the B+ tree can contain at least n/2 record pointers except the
root node.
o At most, an internal node of the tree contains n pointers.
Leaf node
o The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key
values.
o At most, a leaf node contains n record pointer and n key values.
o Every leaf node of the B+ tree contains one block pointer P to point to next leaf
node.
Searching a record in B+ Tree
Suppose we have to search 55 in the below B+ tree structure. First, we will fetch for the
intermediary node which will direct to the leaf node that can contain a record for 55.
So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the
end, we will be redirected to the third leaf node. Here DBMS will perform a sequential
search to find 55.
B+ Tree Insertion
Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf node
after 55. It is a balanced tree, and a leaf node of this tree is already full, so we cannot
insert 60 there.
In this case, we have to split the leaf node, so that it can be inserted into tree without
affecting the fill factor, balance and order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will
split the leaf node of the tree in the middle so that its balance is not altered. So we can
group (50, 55) and (60, 65, 70) into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should
have 60 added to it, and then we can have pointers to a new leaf node.
This is how we can insert an entry when there is overflow. In a normal scenario, it is very
easy to find the node where it fits and then place it in that leaf node.
Summary:
Indexing is a small table which is consist of two columns.
Two main types of indexing methods are 1)Primary Indexing 2)
Secondary Indexing.
Primary Index is an ordered file which is fixed length size with two fields.
The primary Indexing is also further divided into two types 1)Dense
Index 2)Sparse Index.
In a dense index, a record is created for every search key valued in the
database.
A sparse indexing method helps you to resolve the issues of dense
Indexing.
The secondary Index is an indexing method whose search key specifies
an order different from the sequential order of the file.
Clustering index is defined as an order data file.
Multilevel Indexing is created when a primary index does not fit in
memory.
The biggest benefit of Indexing is that it helps you to reduce the total
number of I/O operations needed to retrieve that data.
The biggest drawback to performing the indexing database
management system, you need a primary key on the table with a unique
value.
https://www.exploredatabase.com/2016/11/find-blocking-factor-and-number-
of-blocks-need-to-store-table-in-dbms.html
https://myexamnote.com/how-to-solve-blocking-factor-questions-in-dbms/
https://www.youtube.com/watch?v=TrTSTU_J3ys