KEMBAR78
Chapter - 4 - Data Warehouse Indexes | PDF | Database Index | Data Warehouse
0% found this document useful (0 votes)
40 views11 pages

Chapter - 4 - Data Warehouse Indexes

Chapter 4 discusses the importance of indexes in data warehouses for optimizing query performance, highlighting their role in improving query response time, data load time, and data quality. It details different types of indexes, including clustered, non-clustered, and bitmap indexes, along with performance metrics and effective indexing tips. The chapter emphasizes that bitmap indexes are particularly beneficial for low-cardinality columns in analytical workloads, while B-tree indexes are suited for high-cardinality data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views11 pages

Chapter - 4 - Data Warehouse Indexes

Chapter 4 discusses the importance of indexes in data warehouses for optimizing query performance, highlighting their role in improving query response time, data load time, and data quality. It details different types of indexes, including clustered, non-clustered, and bitmap indexes, along with performance metrics and effective indexing tips. The chapter emphasizes that bitmap indexes are particularly beneficial for low-cardinality columns in analytical workloads, while B-tree indexes are suited for high-cardinality data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Chapter 4

Data Warehouse
Indexes
Amol D. Vibhute (PhD)
Assistant Professor

Email ID:- amol.vibhute@sicsr.ac.in


Roadmap of Chapter:
• Introduction
• Data Warehouse Indexes and their Performance

Friday, February 21, 2025 Dr. Amol 2


Introduction:
• In data warehouses, indexes are crucial for optimizing query performance, and their effectiveness is measured using
metrics like query response time, data load time, and data quality.
• What are Indexes in Data Warehouses?
– Indexes are data structures that store the values of a specific column or a combination of columns, along with pointers
to the rows that contain them.
– They allow the database to quickly locate and retrieve specific data rows, significantly speeding up query execution.
– In data warehouses, which are designed for analytical workloads, efficient data retrieval is vital.
• Why are Indexes Important in Data Warehouses?
– Improved Query Performance: Indexes drastically reduce the time it takes to execute queries, especially those that
involve large datasets.
– Faster Data Retrieval: They enable the database to quickly locate and retrieve specific data rows, which is
– essential for analytical tasks.
– Efficient Analytical Workloads: Data warehouses are used for intensive analyses of large datasets, and indexes are
crucial for making these analyses efficient.
Friday, February 21, 2025 Dr. Amol 3
Cont.…
• Performance Metrics to Measure Index Effectiveness:
– Query Response Time: This is the time it takes for a query to return results, a key indicator of system
performance.
– Data Load Time: How quickly data can be loaded into the warehouse is important for real-time analysis.
– Data Quality: Indexes can impact data integrity, so metrics like accuracy, completeness, and consistency
should be monitored.
– User Satisfaction: If queries are slow, users may become frustrated, so measuring user satisfaction is
important.
– Resource Usage: Monitor CPU, memory, and disk I/O usage to ensure efficient resource utilization.

Friday, February 21, 2025 Dr. Amol 4


Cont.…
• Tips for Effective Indexing:
– Identify Frequently Queried Columns: Focus on indexing columns that are frequently used in WHERE
clauses or JOIN conditions.
– Use Composite Indexes: Create composite indexes for columns that are often used together in queries.
– Use Bitmap Indexes for Data Warehousing: Bitmap indexes are particularly well-suited for data
warehousing environments.
– Regularly Review and Optimize Indexes: As data and query patterns change, periodically review and
optimize indexes to maintain performance.
– Consider Materialized Views: For complex queries, consider using materialized views to store pre-
computed results, which can be indexed for faster retrieval.

Friday, February 21, 2025 Dr. Amol 5


Types of Indexes:
• Clustered Index:
– Concept: Determines the physical order of data in a table, making it ideal for queries that frequently retrieve data in a specific
order.
– Example: In a customer table, if you frequently display customers sorted alphabetically by last name, a clustered index on th e
LastName column would optimize these queries.
– Benefits:
Improved Query Performance: Column-based storage and query processing can result in up to 10 times the query performance compared to traditional
row-oriented storage.
– Data Compression: Columnstore indexes offer high levels of data compression, reducing storage space requirements.
– Efficient for Analytical Queries: Well-suited for queries involving aggregations, joins, and other analytical operations.

• Non-Clustered Index:
– Stores an index structure separate from the data, allowing multiple indexes on a table
– Example: In a sales transaction table, you might have a non-clustered index on the
ProductID column to quickly find sales data for a specific product.
– Benefits:
– Faster Data Retrieval: Improve the speed of queries that filter or sort data based on indexed columns.
– Flexibility: Can be created on multiple columns and can be used for various
query types.

Friday, February 21, 2025 Dr. Amol 6


Bitmap Index:
• Using Bitmap Indexes in Data Warehouses:
– Bitmap indexes are widely used in data warehousing environments. The environments typically have large amounts of data and ad
hoc queries, but a low level of concurrent DML transactions. For such applications, bitmap indexing provides:
• Reduced response time for large classes of ad hoc queries.
• Reduced storage requirements compared to other indexing techniques.
• Dramatic performance gains even on hardware with a relatively small number of CPUs or a small amount of memory.
• Efficient maintenance during parallel DML and loads.
– Fully indexing a large table with a traditional B-tree index can be prohibitively expensive in terms of space because the indexes can
be several times larger than the data in the table. Bitmap indexes are typically only a fraction of the size of the indexed d ata in the
table.
– An index provides pointers to the rows in a table that contain a given key value. A regular index stores a list of row -ids for each key
corresponding to the rows with that key value. In a bitmap index, a bitmap for each key value replaces a list of row -ids.
– Each bit in the bitmap corresponds to a possible row-id, and if the bit is set, it means that the row with the corresponding row-id
contains the key value. A mapping function converts the bit position to an actual row-id, so that the bitmap index provides the same
functionality as a regular index. Bitmap indexes store the bitmaps in a compressed way. If the number of distinct key values is
small, bitmap indexes compress better and the space saving benefit compared to a B-tree index becomes even better.
– Benefits for Data Warehousing Applications:
• Bitmap indexes are primarily intended for data warehousing applications where users query the data rather than update it. The y are not suitable
for OLTP applications with large numbers of concurrent transactions modifying the data.
• Parallel query and parallel DML work with bitmap indexes. Bitmap indexing also supports parallel create indexes and concatenated indexes.
• Bitmap indexes are required to take advantage of Oracle's star transformation capabilities.
– A Bitmap Index is a special type of index used in databases and data warehouses to speed up queries on low-cardinality columns (columns with a
few distinct values, like "Gender" or "Marital Status").

Friday, February 21, 2025 Dr. Amol 7


Cont.…
• Cardinality:
– The advantages of using bitmap indexes are greatest for columns in which the ratio of the number of distinct values
to the number of rows in the table is small. We refer to this ratio as the degree of cardinality. A gender column, which
has only two distinct values (male and female), is optimal for a bitmap index. However, data warehouse
administrators also build bitmap indexes on columns with higher cardinalities.

– For example, on a table with one million rows, a column with 10,000 distinct values is a candidate for a bitmap index.
A bitmap index on this column can outperform a B-tree index, particularly when this column is often queried in
conjunction with other indexed columns. In fact, in a typical data warehouse environments, a bitmap index can be
considered for any non-unique column.

– B-tree indexes are most effective for high-cardinality data: that is, for data with many possible values, such as
customer_name or phone_number. In a data warehouse, B-tree indexes should be used only for unique columns or
other columns with very high cardinalities (that is, columns that are almost unique). The majority of indexes in a data
warehouse should be bitmap indexes.

– In ad hoc queries and similar situations, bitmap indexes can dramatically improve query performance. AND and OR
conditions in the WHERE clause of a query can be resolved quickly by performing the corresponding Boolean
operations directly on the bitmaps before converting the resulting bitmap to rowids. If the resulting number of rows is
small, the query can be answered quickly without resorting to a full table scan.

Friday, February 21, 2025 Dr. Amol 8


Cont.…
• Example: Bitmap Index

– The following shows a portion of a company's customers table.

– SELECT cust_id, cust_gender, cust_marital_status, cust_income_level


– FROM customers;

– Because cust_gender, cust_marital_status, and cust_income_level are


all low-cardinality columns (there are only three possible values for
marital status and region, two possible values for gender, and 12 for
income level), bitmap indexes are ideal for these columns. Do not create
a bitmap index on cust_id because this is a unique column. Instead, a
unique B-tree index on this column provides the most efficient
representation and retrieval.
– Table llustrates the bitmap index for the cust_gender column in this
example. It consists of two separate bitmaps, one for gender.

– Each entry (or bit) in the bitmap corresponds to a single row of the
customers table. The value of each bit depends upon the values of the
corresponding row in the table. For example, the bitmap cust_gender='F'
contains a one as its first bit because the gender is F in the first row of
the customers table. The bitmap cust_gender='F' has a zero for its third
bit because the gender of the third row is not F.
Friday, February 21, 2025 Dr. Amol 9
Cont.…
• Using B-Tree Indexes in Data Warehouses:
– A B-tree index is organized like an upside-down tree. The bottom level of the index holds the actual data values and
pointers to the corresponding rows, much as the index in a book has a page number associated with each index
entry.
– In general, use B-tree indexes when you know that your typical query refers to the indexed column and retrieves a
few rows. In these queries, it is faster to find the rows by looking at the index. However, using the book index
analogy, if you plan to look at every single topic in a book, you might not want to look in the index for the topic and
then look up the page. It might be faster to read through every chapter in the book. Similarly, if you are retrieving
most of the rows in a table, it might not make sense to look up the index to find the table rows. Instead, you might
want to read or scan the table.
– B-tree indexes are most commonly used in a data warehouse to index unique or near-unique keys. In many cases, it
may not be necessary to index these columns in a data warehouse, because unique constraints can be maintained
without an index, and because typical data warehouse queries may not work better with such indexes. B-tree indexes
are more common in environments using third normal form schemas. In general, bitmap indexes should be more
common than B-tree indexes in most data warehouse environments.
– Multi-level tree structure
– Breaks data into pages or blocks
– Should be used for high-cardinality (unique) coloumns

Friday, February 21, 2025 Dr. Amol 10


Thank You !!!

Friday, February 21, 2025 Dr. Amol 11

You might also like