Indexing in DBMS
o Indexing is used to optimize the performance of a database by minimizing the number
of disk accesses required when a query is processed.
o The index is a type of data structure. It is used to locate and access the data in a
database table quickly.
Index structure:
Indexes can be created using some database columns.
o The first column of the database is the search key that contains a copy of the primary
key or candidate key of the table. The values of the primary key are stored in sorted
order so that the corresponding data can be accessed easily.
o The second column of the database is the data reference. It contains a set of pointers
holding the address of the disk block where the value of the particular key can be
found.
Indexing Methods
Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted are
known as ordered indices.
Example: Suppose we have an employee table with thousands of records and each of which
is 10 bytes long. If their IDs start with 1, 2, 3....and so on and we have to search student with
ID-543.
o In the case of a database with no index, we have to search the disk block from starting
till it reaches 543. The DBMS will read the record after reading 543*10=5430 bytes.
o In the case of an index, we will search using indexes and the DBMS will read the
record after reading 542*2= 1084 bytes which are very less compared to the previous
case.
Primary Index
o If the index is created on the basis of the primary key of the table, then it is known as
primary indexing. These primary keys are unique to each record and contain 1:1
relation between the records.
o As primary keys are stored in sorted order, the performance of the searching operation
is quite efficient.
o The primary index can be classified into two types: Dense index and Sparse index.
Dense index
o The dense index contains an index record for every search key value in the data file. It
makes searching faster.
o In this, the number of records in the index table is same as the number of records in
the main table.
o It needs more space to store index record itself. The index records have the search key
and a pointer to the actual record on the disk.
Sparse index
o In the data file, index record appears only for a few items. Each item points to a block.
o In this, instead of pointing to each record in the main table, the index points to the
records in the main table in a gap.
Clustering Index
o A clustered index can be defined as an ordered data file. Sometimes the index is
created on non-primary key columns which may not be unique for each record.
o In this case, to identify the record faster, we will group two or more columns to get
the unique value and create index out of them. This method is called a clustering
index.
o The records which have similar characteristics are grouped, and indexes are created
for these group.
Example: suppose a company contains several employees in each department. Suppose we
use a clustering index, where all employees which belong to the same Dept_ID are
considered within a single cluster, and index pointers point to the cluster as a whole. Here
Dept_Id is a non-unique key.
The previous schema is little confusing because one disk block is shared by records which
belong to the different cluster. If we use separate disk block for separate clusters, then it is
called better technique.
Secondary Index
In the sparse indexing, as the size of the table grows, the size of mapping also grows. These
mappings are usually kept in the primary memory so that address fetch should be faster. Then
the secondary memory searches the actual data based on the address got from mapping. If the
mapping size grows then fetching the address itself becomes slower. In this case, the sparse
index will not be efficient. To overcome this problem, secondary indexing is introduced.
In secondary indexing, to reduce the size of mapping, another level of indexing is introduced.
In this method, the huge range for the columns is selected initially so that the mapping size of
the first level becomes small. Then each range is further divided into smaller ranges. The
mapping of the first level is stored in the primary memory, so that address fetch is faster. The
mapping of the second level and actual data are stored in the secondary memory (hard disk).
For example:
o If you want to find the record of roll 111 in the diagram, then it will search the highest
entry which is smaller than or equal to 111 in the first level index. It will get 100 at
this level.
o Then in the second index level, again it does max (111) <= 111 and gets 110. Now
using the address 110, it goes to the data block and starts searching each record till it
gets 111.
o This is how a search is performed in this method. Inserting, updating or deleting is
also done in the same manner.
B Tree:
Indexing is the first thing that comes to mind when we consider a database's performance. In
this article, we'll learn about B-trees and how database indexing using B-tree in DBMS can
improve the databases' performance.
Insertion operation
The insertion operation for a B Tree is done similar to the Binary Search Tree but the elements
are inserted into the same node until the maximum keys are reached. The insertion is done
using the following procedure −
Step 1 − Calculate the maximum (m−1)(m−1) and, minimum (⌈m2⌉−1)(⌈m2⌉−1) number of
keys a node can hold, where m is denoted by the order of the B Tree.
Step 2 − The data is inserted into the tree using the binary search insertion and once the keys
reach the maximum number, the node is split into half and the median key becomes the internal
node while the left and right keys become its children.
Step 3 − All the leaf nodes must be on the same level.
The keys, 5, 3, 21, 9, 13 are all added into the node according to the binary search property but
if we add the key 22, it will violate the maximum key property. Hence, the node is split in half,
the median key is shifted to the parent node and the insertion is then continued.
Another hiccup occurs during the insertion of 11, so the node is split and median is shifted to
the parent.
While inserting 16, even if the node is split in two parts, the parent node also overflows as it
reached the maximum keys. Hence, the parent node is split first and the median key becomes
the root. Then, the leaf node is split in half the median of leaf node is shifted to its parent.
The final B tree after inserting all the elements is achieved.