Data Warehousing and
Data Mining
Data Warehouse Design
Physical Design
Bitmap Indexing
Contain a large number of data
organised in a table structure, i.e. in
the form of rows and columns
Read Write Tables (High volume use
of Insert, Update, Delete statements)
Read Only Tables Static Data
Aggregate Tables
Dimension Tables
Fact Tables
Indexes help to quickly locate pieces of
information about a topic which are
scattered around a number of
resources. For e.g, the index of a book
tells what pages of the book refer to a
concept listed in the indexes.
Indexes are created in Databases/
Datawarehouses for efficient
identification of the requested rows
the act of classifying and providing an index
in order to make items easier to retrieve
B Tree (Binary Tree) Indexing
Hash Table
Bitmap Indexing
Normally used for OLTP applications
(many read-write operations)
B-trees are always height balanced, with all leaf
nodes at the same level
Update and search operations affect only a few disk
pages, so performance is good.
B-trees keep related records on the same disk page,
which takes advantage of locality of reference.
B-trees guarantee that every node in the tree will be
full at least to a certain minimum percentage. This
improves space efficiency while reducing the typical
number of disk fetches necessary during a search or
update operation over many thousands of records.
Contains a fact table, plus a number
of dimension tables whose keys are
foreign keys of the fact table.
How would you efficiently answer the
query: Find the total sales of
raincoats for each location last year.
You may assume suitable indexes.
Number of relations can be large
join size estimation is difficult.
Dimension tables are often small,
especially when considering only the
tuples matching a select operation.
Often data can be assumed to be
static (a snapshot of the operational
data). This means that we have the
option to precompute.
Suppose there are only 4 different
locations in our previous example.
Then we may represent the locations of N
fact table tuples using only 2N bits.
However, a (clustered) index on location
seems to require at least N log N bits.
For each possible value
(of location) and each
tuple, store a bit that is 1
if the tuple contains the
given value.
Store these bits ordered
by column.
In the context of this
example, the bitmap
index is a join index that
can be used when
computing joins.
Suitable for columns having a very small
number of unique values (very low
cardinality)
Normally used where the cardinality is <
0.05%
create bitmap index gender_state on
person (sex)
Create a set of bitmap indexes from
data
Quantize data into buckets
Table with no insert, update , delete
or very few DML statements is a
good choice for bitmap indexes.
For example
Bitmaps can be combined using the
logical operations AND, OR , NOT.
A AND B
A OR B
NOT A
Note: Some RDBMSs like Oracle also implements a MINUS
operation internally
A MINUS B is equivalent to A AND NOT B
Implement one or two bitmap
compression techniques like WordAligned Hybrid (WAH) and Byte-Aligned
Bitmap Code (BBC)
Compare storage requirements with and
without compression for various data
sets (focus on varying cardinality, i.e.
number of buckets per attribute)
Develop Query Engine to execute selection queries
(i.e. Attribute 1 between 3 and 5 AND Attribute 2 =
4)
Identify correct bitmaps to use, appropriate bitmap
AND, OR bitmap operations
Try query execution over both non-compressed
bitmaps and whichever compression you used. If
using BBC then try decompressing appropriate
bitmaps and executing. If using WAH, then
implement query execution over compressed
bitmaps.
Compare query execution times to sequential scan
Bitmap indexes
Must be non-unique
Include NULL values (B*Tree indexes do not)
Maximum 30 columns (B*Tree indexes max 32
columns)
Can be locally partitioned
Cannot be globally partitioned
Can be function-based indexes (built-in or userdefined)
Can be created on global temporary tables
Cannot be reversed
Compact size.
Efficient hardware support for bitmap
operations (AND, OR, XOR, NOT).
Fast search.
Multiple differentiate bitmap indexes
for different kind of queries.
If 3 or more sessions simultaneously
working to INSERT into a Bitmap Indexed
Table, a deadlock situation can arise.
This may result in corruption of the Index.
The solution for this is to REBUILD the
Bitmap Index
(Note: Unlike B Tree Indexes, Bitmap Indexes
cannot be rebuilt online.
<BMI> rebuild)
We need to use Alter
CPU overhead - Require more CPU
cycles
Modification requires more effort than
B Tree and hence not suitable for
large concurrent DML operations
Deadlock