An in Depth Look at Database Indexing
An in Depth Look at Database Indexing
                  Developers who deal with relational databases have used or at least heard
                  about indexing, and it’s a very common concept in the database world. However,
                  the most important part is to understand what to index & how the indexing is
                  going to boost the query response time. For doing that you need to understand
                  how you are going to query your database tables. A proper index can be created
                  only when you know exactly what your query & data access patterns look like.
                  We will use MySQL with a default InnoDB database engine, although concepts
                  explained in this article are more or less same in other database servers as well
                  like Oracle, MSSQL etc.
                  The Engine column in the above screen shot represents the engine that is used
                  to create the table. Here InnoDB is used.
                  Now Insert some random data in the table, my table with 5 rows looks like the
                  following:
                  I have not created any index till now on this table. Let’s verify this by the
                  command: SHOW INDEX . It returns 0 results.
                  At this moment, if we run a simple SELECT query, since there is no user defined
                  index, the query will scan the whole table to find out the result:
                   EXPLAIN      shows how the query engine plans to execute the query. In the above
                  screenshot, you can see that the rows column returns 5 & possible_keys
                  returns null . possible_keys represents what all available indices are there
                  which can be used in this query. The key column represents which index is
                  actually going to be used out of all possible indices in this query.
                  Primary Key:
                  The above query is very inefficient. Let’s optimise this query. We will make the
                   phone_no      column a PRIMARY KEY assuming that no two users can exist in our
                  system with the same phone number. Take the following into consideration
                  when creating a primary key:
                       The ideal primary key type should be a number like INT or BIGINT
                       because integer comparisons are faster, so traversing through the index
                       will be very fast.
                  Since we don’t have any primary key defined as of now, let’s see what InnoDB by
                  default created for us:
                   EXTENDED      shows all the indices that are not usable by the user but managed
                  completely by MySQL.
                  Here we see that MySQL has defined a composite index (we will discuss
                  composite indices later) on DB_ROW_ID , DB_TRX_ID , DB_ROLL_PTR , & all
                  columns defined in the table. In the absence of a user defined primary key, this
                  index is used to find records uniquely.
Let’s now create the primary index on phone_no & examine the created index:
                  Note that CREATE INDEX can not be used to create a primary index, but ALTER
                  TABLE     is used.
                  In the above screenshot, we see that one primary index is created on the
                  column phone_no . The columns of the following images are described as
                  follows:
                   Non_unique : If the value is 1, the index is not unique, if the value is 0, the index
                  is unique.
                   Key_name : The name of the index created. The name of the primary index is
                  always PRIMARY in MySQL, irrespective of if you have provided any index name
                  or not while creating the index.
Null : YES if the column may contain NULL values and blank if it does not.
                   Index_type : Indicates which indexing data structure is used for this index.
                  Some possible candidates are — BTREE , HASH , RTREE , or FULLTEXT .
Comment : The information about the index not described in its own column.
                   Index_comment : The comment for the index specified when you created the
                  index with the COMMENT attribute.
                  Now let’s see if this index reduces the number of rows which will be searched
                  for a given phone_no in the WHERE clause of a query.
TJz8cx0CrDPswJzfooUNA5HThlP5bAqZ5f8w
                  In this snapshot, notice that the rows column has returned 1 only, the
                   possible_keys       & key both returns PRIMARY . So it essentially means that using
                  the primary index named as PRIMARY (the name is auto assigned when you
                  create the primary key), the query optimizer just goes directly to the record &
                  fetches it. It’s very efficient. This is exactly what an index is for — to minimize
                  the search scope at the cost of extra space.
                  Clustered Index:
                  A clustered index is collocated with the data in the same table space or same
                  disk file. You can consider that a clustered index is a B-Tree index whose leaf
                  nodes are the actual data blocks on disk, since the index & data reside together.
                  This kind of index physically organizes the data on disk as per the logical order
                  of the index key.
aVIkXV0c5nNwQHjL1T501JC0OG-E9iZGzt3H
The yellow coloured big rectangle represents a disk block / data block
                       the blue coloured rectangles represent data stored as rows inside that
                       block
                       the footer area represents the index of the block where red coloured
                       small rectangles reside in sorted order of a particular key. These small
                       blocks are nothing but sort of pointers pointing to offsets of the records.
                  Records are stored on the disk block in any arbitrary order. Whenever new
                  records are added, they get added in the next available space. Whenever an
                  existing record is updated, the OS decides whether that record can still fit into
                  the same position or a new position has to be allocated for that record.
                  In this way, you really don’t need to care about actually organizing the physical
                  record in a certain order, rather a small index section is maintained in that order
                  & fetching or maintaining records becomes very easy.
SELECT * FROM index_demo WHERE phone_no > '9010000000' AND phone_no < '9020000000'
                  A data block is fetched in memory when the query is executed. Say the data
                  block contains phone_no in the range from 9010000000 to 9030000000 . So
                  whatever range you requested for in the query is just a subset of the data
                  present in the block. If you now fire the next query to get all the phone numbers
                  in the range, say from 9015000000 to 9019000000 , you don’t need to fetch any
                  more blocks from the disk. The complete data can be found in the current block
                  of data, thus clustered_index reduces the number of disk IO by collocating
                  related data as much as possible in the same data block. This reduced disk IO
                  causes improvement in performance.
                  So if you have a well thought of primary key & your queries are based on the
                  primary key, the performance will be super fast.
                       When you define a PRIMARY KEY on your table, InnoDB uses it as the
                       clustered index. Define a primary key for each table that you create. If
                       there is no logical unique and non-null column or set of columns, add a
                       new auto-increment column, whose values are filled in automatically.
                       If you do not define a PRIMARY KEY for your table, MySQL locates the
                       first UNIQUE index where all the key columns are NOT NULL and InnoDB
                       uses it as the clustered index.
                  In short, the MySQL InnoDB engine actually manages the primary index as
                  clustered index for improving performance, so the primary key & the actual
                  record on disk are clustered together.
                  In the following diagram, the left side rectangles represent leaf level index
                  blocks, and the right side rectangles represent the data blocks. Logically the
                  data blocks look to be aligned in a sorted order, but as already described earlier,
                  the actual physical locations may be scattered here & there.
  Is it possible to create a primary index on a non-primary key?
  In MySQL, a primary index is automatically created, and we have already
  described above how MySQL chooses the primary index. But in the database
  world, it’s actually not necessary to create an index on the primary key column
  — the primary index can be created on any non primary key column as well. But
  when created on the primary key, all key entries are unique in the index, while in
  the other case, the primary index may have a duplicated key as well.
- If the primary key does not exist, you get the following error:
"ERROR 1091 (42000): Can't DROP 'PRIMARY'; check that column/key exists"
  Secondary Index:
  Any index other than a clustered index is called a secondary index. Secondary
  indices does not impact physical storage locations unlike primary indices.
iWZI5S-Lqf9EljZxrNpmFCIajB8kmsTVkQ0i
  So to understand, you can assume that the secondary index has reference to the
  primary key’s address, although it’s not the case. Retrieving data through the
  secondary index means you have to traverse two B+ trees — one is the
  secondary index B+ tree itself, and the other is the primary index B+ tree.
0eg06hWYJWhXPt1QNuaDlETYrmnSKAo6Nf44
  Also, if a primary key is very large like a URL , since secondary indexes contain a
  copy of the primary key column value, it can be inefficient in terms of storage.
  More secondary keys means a greater number of duplicate copies of the
  primary key column value, so more storage in case of a large primary key. Also
  the primary key itself stores the keys, so the combined effect on storage will be
  very high.
  This process is expensive when several secondary indexes exist. Also other
  tables may have a foreign key reference to the primary key, so you need to
  delete those foreign key references before you delete the primary key.
  Unlike other database servers, in MySQL a unique key column can have as many
  null    values as possible. In SQL standard, null means an undefined value. So if
  MySQL has to contain only one null value in a unique key column, it has to
  assume that all null values are the same.
  But logically this is not correct since null means undefined — and undefined
  values can’t be compared with each other, it’s the nature of null . As MySQL
  can’t assert if all null s mean the same, it allows multiple null values in the
  column.
The following command shows how to create a unique key index in MySQL:
ApzPAl3z-AwYSR7YXofmjf17TYXgPLHoX6AZ
  Composite Index:
  MySQL lets you define indices on multiple columns, up to 16 columns. This
  index is called a Multi-column / Composite / Compound index.
  Let’s say we have an index defined on 4 columns — col1 , col2 , col3 , col4 .
  With a composite index, we have search capability on col1 , (col1, col2) ,
  (col1, col2, col3)             , (col1, col2, col3, col4) . So we can use any left side
  prefix of the indexed columns, but we can’t omit a column from the middle & use
  that like — (col1, col3) or (col1, col2, col4) or col3 or col4 etc. These
  are invalid combinations.
  If you have queries containing a WHERE clause on multiple columns, write the
  clause in the order of the columns of the composite index. The index will benefit
  that query. In fact, while deciding the columns for a composite index, you can
  analyze different use cases of your system & try to come up with the order of
  columns that will benefit most of your use cases.
  Composite indices can help you in JOIN & SELECT queries as well. Example: in
  the following SELECT * query, composite_index_2 is used.
SmJU2MejEJjaWUtJxkYprwJXNye6fOhYvkFr
  When several indexes are defined, the MySQL query optimizer chooses that
  index which eliminates the greatest number of rows or scans as few rows as
  possible for better efficiency.
  MySQL maintains something called index statistics which helps MySQL infer
  what the data looks like in the system. Index statistics is a generilization though,
  but based on this meta data, MySQL decides which index is appropriate for the
  current query.
  In our example, for the following record, a composite index key is formed by
  concatenating pan_no , name , age — HJKXS9086Wkousik28 .
     +--------+------+------------+------------+
      name
      age
      pan_no
      phone_no
     +--------+------+------------+------------+
      kousik
        28
      HJKXS9086W
      9090909090
       If you are creating an index in col1 & a composite index in ( col1 , col2 ),
       then only the composite index should be fine. col1 alone can be served
       by the composite index itself since it’s a left side prefix of the index.
  Covering Index:
  A covering index is a special kind of composite index where all the columns
  specified in the query somewhere exist in the index. So the query optimizer
  does not need to hit the database to get the data — rather it gets the result from
  the index itself. Example: we have already defined a composite index on
  (pan_no, name, age)              , so now consider the following query:
SELECT age FROM index_demo WHERE pan_no = 'HJKXS9086W' AND name = 'kousik'
  The columns mentioned in the SELECT & WHERE clauses are part of the
  composite index. So in this case, we can actually get the value of the age
  column from the composite index itself. Let’s see what the EXPLAIN command
  shows for this query:
EXPLAIN FORMAT=JSON SELECT age FROM index_demo WHERE pan_no = 'HJKXS9086W' AND name = '111kousik1';
1HqlKe6UuO9ldQ3tgbZ0zxsHdm8YBxHARAUK
  In the above response, note that there is a key — using_index which is set to
  true    which signifies that the covering index has been used to answer the query.
  Partial Index:
  We already know that Indices speed up our queries at the cost of space. The
  more indices you have, the more the storage requirement. We have already
  created an index called secondary_idx_1 on the column name . The column
  name    can contain large values of any length. Also in the index, the row locators’
  or row pointers’ metadata have their own size. So overall, an index can have a
  high storage & memory load.
  In MySQL, it’s possible to create an index on the first few bytes of data as well.
  Example: the following command creates an index on the first 4 bytes of name.
  Though this method reduces memory overhead by a certain amount, the index
  can’t eliminate many rows, since in this example the first 4 bytes may be
  common across many names. Usually this kind of prefix indexing is supported on
  CHAR    , VARCHAR , BINARY , VARBINARY type of columns.
ZdBDdRbFqPSdScLJ51qAVPaDffc4qUXcAtUB
  There are many other indices as well like Spatial index and Full Text Search
  index offered by MySQL. I have not yet experimented with those indices, so I’m
  not discussing them in this post.
       With DML operations, indices are updated, so write operations are quite
       costly with indexes. The more indices you have, the greater the cost.
       Indexes are used to make read operations faster. So if you have a system
       that is write heavy but not read heavy, think hard about whether you
       need an index or not.
       Indices might need some maintenance as well if old data still remains in
       the index. They need to be deleted otherwise memory will be hogged, so
       try to have a monitoring plan for your indices.
  Please do clap & share with your friends & on social media if you like this
  article. :)
  References:
  1. https://dev.mysql.com/doc/refman/5.7/en/innodb-index-types.html
  2. https://www.quora.com/What-is-difference-between-primary-index-
       and-secondary-index-exactly-And-whats-advantage-of-one-over-
       another
3. https://dev.mysql.com/doc/refman/8.0/en/create-index.html
  4. https://www.oreilly.com/library/view/high-performance-
       mysql/0596003064/ch04.html
5. http://www.unofficialmysqlguide.com/covering-indexes.html
6. https://dev.mysql.com/doc/refman/8.0/en/multiple-column-indexes.html
7. https://dev.mysql.com/doc/refman/8.0/en/show-index.html
8. https://dev.mysql.com/doc/refman/8.0/en/create-index.html
             Kousik Nath
             Engineer @ PayPal, loves to have deep discussion on distributed and scalable systems, system
             architecture, design patterns, algorithmic problem solving. Linkedin:
             https://www.linkedin.com/in/kousikn/
  If you read this far, tweet to the author to show them you care.
     Tweet a thanks
  Learn to code for free. freeCodeCamp's open source curriculum has helped
  more than 40,000 people get jobs as developers.                          Get started
freeCodeCamp is a donor-supported tax-exempt 501(c)(3) charity organization (United States Federal Tax
Identification Number: 82-0779546)
Our mission: to help people learn to code for free. We accomplish this by creating thousands of videos, articles,
and interactive coding lessons - all freely available to the public. We also have thousands of freeCodeCamp study
groups around the world.
Donations to freeCodeCamp go toward our education initiatives, and help pay for servers, services, and staff.
Trending Guides
Python Print Same Line Text Align in CSS Excel Absolute Reference
What Does Coding Mean? Python Split String Square a Number in Python
What is Data Analysis? Python List insert() How to Lock Cells in Excel
How to Comment Out CSS                  Merge Sort Algorithm                      Python Delete Key from Dict
Double vs Float in C++           What is an SVG File?                 Beginner Tech Jobs Examples
Our Charity
About Alumni Network Open Source Shop Support Sponsors Academic Honesty Code of Conduct