12 Introduction To OLTP and OLAP
12 Introduction To OLTP and OLAP
User gets
instant update
on the account
balance after
withdrawing
the money
TRANSACTIONS
• Single event that changes something
• Different types of transactions
– Customer orders
– Receipts
– Invoices
– Payments
INSERT
INSERT UPDATE
UPDATE RETRIEVE
RETRIEVE
INSERT INSERT
UPDATE UPDATE
RETRIEVE RETRIEVE
TRANSACTIONS
Cash at
register
gone up
Inventory
of video
game gone
down
Ordering of
new video
game for
the store
OLTP Segmentation
• They can be segmented into:
– Real-time Transaction Processing
– Batch Processing
Real-time Transaction processing
• Multiple users can fetch the information
• Very fast response rate
• Transactions processed immediately
• Everything is processed in real time
Batch Processing
• Where information is required in batch
• Offline access to information
• Presorting (sequence) is applied
• Takes time to process information
Day
Day 1 Day 2 Day 3 .......... 30
Monthly
purchase of
Retail Store
Characteristics of OLTP Model
• Online connectivity
• LAN,WAN
• Availability
– Available 24 hours a day
• Response rate
– Rapid response rate
– Load balancing by prioritizing the transactions
Characteristics of OLTP Model
• Cost
– Cost of transactions is less
• Update facility
– Less lock periods
– Instant updates
– Use the full potential of hardware and software
Limitations of Relational Models
• Create and maintain large number of tables
for the voluminous data
• For new functionalities, new tables are added
• Unstructured data cannot be stored in
relational databases
• Very difficult to manage the data with
common denominator (keys)
Answer a Quick Question
• The super market store is deciding on introducing a new product. The key
questions they are debating are: “Which product should they introduce?”
and “Should it be specific to a few customer segments?”
• The super market store is looking at offering some discount on their year-
end sale. The questions here are: “How much discount should they offer?”
and “Should it be different discounts for different customer segments?”
• The supermarket is looking at rewarding its most consistent salesperson.
The question here is:“How to zero in on its most consistent salesperson
(consistent on several parameters)?All the queries stated above have more
to do with analysis than simple reporting”
• Ideally these queries are not meant to be solved by an OLTP system.
OLAP - Online Analytical Processing
OLAP differs from traditional databases in the way data is conceptualized
and stored.
In OLAP data is held in the dimensional form rather than the relational
form.
OLAP’s life blood is multi-dimensional data.
OLAP tools are based on the multi-dimensional data model. The multi-
dimensional data model views data in the form of a data cube.
Online Analytical Processing (OLAP) is a technology that is used to
organize large business databases and support business intelligence.
OLAP databases are divided into one or more cubes. The cubes are
designed in such a way that creating and viewing reports become easy.
OLAP databases are divided into one or more cubes, and each cube is
organized and designed by a cube administrator to fit the way that you
retrieve and analyze data so that it is easier to create and use the PivotTable
reports and PivotChart reports that you need.
OLAP (Online Analytical Processing)
• OLAP is a category of software that allows users to analyze
information from multiple database systems at the same time. It
is a technology that enables analysts to extract and view business
data from different points of view
• Analysts frequently need to group, aggregate and join data.
These operations in relational databases are resource intensive.
With OLAP, data can be pre-calculated and pre-aggregated,
making analysis faster.
• Provides multidimensional view of data
• Used for analysis of data
• Data can be viewed from different perspectives
• Determine why data appears the way it does
• Drill down approach is used to further dig down deep into the
data
OLAP - Example
Let us consider the data of a supermarket store, “AllGoods” store, for the
year “2001”.
This data as captured by the OLTP system is under the following column
headings: Section, Product-CategoryName, YearQuarter, and SalesAmount.
We have a total of 32 records/rows.
The Section column can have one value from amongst “Men”, “Women”,
“Kid”, and “Infant”.
The ProductCategory Name column can have either the value
“Accessories” or the value “Clothing”.
The YearQuarter column can have one value from amongst “Q1”, “Q2”,
“Q3”, and “Q4”.
The SalesAmount column record the sales figures for each Section,
ProductCategory Name, and Year Quarter.
OLAP - Example
Characteristics of OLAP
• Multidimensional analysis
In Table 3.7, data has been plotted along two dimensions as we can now look at the
SalesAmount from two perspectives, i.e. by YearQuarter and ProductCategoryName. The
calendar quarters have been listed along the vertical axis and the product categories have been
listed across the horizontal axis. Each unique pair of values of these two dimensions
corresponds to a single point of SalesAmount data. For example, the Accessories sales for Q2
add up to $9680.00 whereas the Clothing sales for the same quarter total up to $12366.00.
Their sales figures correspond to a single point of SalesAmount data, i.e. $22046.
Three Dimensional
What if the company’s analyst wishes to view the data — all of it — along all the three
dimensions (Year-Quarter, ProductCategoryName, and Section) and all on the same table
at the same time? For this theanalyst needs a three-dimensional view of data as arranged
in Table 3.8. In this table, one can now look atthe data by all the three dimensions/
perspectives, i.e. Section, ProductCategoryName, YearQuarter. If theanalyst wants to
look for the section which recorded maximum Accessories sales in Q2, then by giving
aquick glance to Table 3.8, he can conclude that it is the Kid section.
Can we go beyond Three Dimensional?
Well, if the question is “Can you go beyond the third dimension?” the answer is
YES!
If at all there is any constraint, it is because of the limits of your software. But if
the question is “Should you go beyond the third dimension?” we will say it is
entirely on what data has been captured by your operational transactional systems
and what kind of queries you wish your OLAP system to respond to.
Now that we understand multi-dimensional data, it is time to look at the
functionalities and characteristics of an OLAP system. OLAP systems are
characterized by a low volume of transactions that involve very complex queries.
Some typical applications of OLAP are: budgeting, sales forecasting, sales
reporting, business process manage
Example: Assume a financial analyst reports that the sales by the company have
gone up. The next question is “Which Section is most responsible for this
increase?” The answer to this question is usually followed by a barrage of
questions such as “Which store in this Section is most responsible for the
increase?” or “Which particular product category or categories registered the
maximum incréase?” The answers to these are provided by multidimensional
analysis or OLAP;
Can we go beyond Three Dimensional?
Let us go back to our example of a company’s
(“AllGoods”) sales data viewed along three dimensions:
Section, ProductCategoryName, and YearQuarter.
Given below are a set of queries, related to example,
that a typical OLAP system is capable of responding to:
•What will be the future sales trend for “Accessories” in the “Kid’s” Section?
•Given the customers buying pattern, will it be profitable to launch product
“XYZ” in the “Kid's” Section?
• What impact will a 5% increase in the price of produces have on the
customers?
Advantages of an OLAP System
• Consistency of information.
• “What if ” analysis.
OLTP OLAP
Online Transaction Processing Online Analytical Processing
Database Design Typically normalized tables. OLTP Typically de-normalized tables; uses
system adopts ER (Entity Relationship) star or snowflake schema
model
Operations Read/Write Mostly read
Backup and Recovery Regular backups of operational data are Instead of regular backups, data
mandatory. Requires concurrency control warehouse is refreshed periodically
(locking) and recovery mechanisms using data from operational data
(logging) sources
Joins Many Few
2 OLAP systems are used by knowledge workers such as executives, OLTP systems are used by clerks, DBAs, or
managers and analysts. database professionals.
5 Based on Star Schema, Snowflake, Schema and Fact Constellation Based on Entity Relationship Model.
Schema.
7 Provides summarized and consolidated data. Provides primitive and highly detailed
data.
8 Provides summarized and multidimensional view of data. Provides detailed and flat relational view
of data.
CS 336 40
Decision Support
• Information technology to help the
knowledge worker (executive, manager,
analyst) make faster & better decisions
– “What were the sales volumes by region and product category for
the last year?”
– “How did the share price of comp. manufacturers correlate with
quarterly profits over the past 10 years?”
– “Which orders should we fill to maximize revenues?”
CS 336 41
Three-Tier Decision Support Systems
• Warehouse database server
– Almost always a relational DBMS, rarely flat files
• OLAP servers
– Relational OLAP (ROLAP): extended relational DBMS that maps
operations on multidimensional data to standard relational
operators
– Multidimensional OLAP (MOLAP): special-purpose server that
directly implements multidimensional data and operations
• Clients
– Query and reporting tools
– Analysis tools
– Data mining tools
CS 336 42
The Complete Decision Support
System
Information Sources Data Warehouse OLAP Servers Clients
Server (Tier 2) (Tier 3)
(Tier 1)
e.g., MOLAP
Semistructured Analysis
Sources
Data
Warehouse serve
extract Query/Reporting
transform
load serve
refresh
etc. e.g., ROLAP
Operational
DB’s Data Mining
serve
Data Marts
CS 336 43
Data Warehouse vs. Data Marts
• Enterprise warehouse: collects all information about
subjects (customers,products,sales,assets,
personnel) that span the entire organization
– Requires extensive business modeling (may take years to design
and build)
• Data Marts: Departmental subsets that focus on selected
subjects
– Marketing data mart: customer, product, sales
– Faster roll out, but complex integration in the long run
• Virtual warehouse: views over operational dbs
– Materialize sel. summary views for efficient query processing
– Easy to build but require excess capability on operat. db servers
CS 336 44
Approaches to OLAP Servers
• Relational DBMS as Warehouse Servers
• Two possibilities for OLAP servers
• (1) Relational OLAP (ROLAP)
– Relational and specialized relational DBMS to
store and manage warehouse data
– OLAP middleware to support missing pieces
• (2) Multidimensional OLAP (MOLAP)
– Array-based storage structures
– Direct access to array data structures
CS 336 45
OLAP Server: Query Engine
Requirements
• Aggregates (maintenance and querying)
– Decide what to precompute and when
• Query language to support multidimensional
operations
– Standard SQL falls short
• Scalable query processing
– Data intensive and data selective queries
CS 336 46
OLAP for Decision Support
• OLAP = Online Analytical Processing
• Support (almost) ad-hoc querying for business analyst
• Think in terms of spreadsheets
– View sales data by geography, time, or product
• Extend spreadsheet analysis model to work with
warehouse data
– Large data sets
– Semantically enriched to understand business terms
– Combine interactive queries with reporting functions
• Multidimensional view of data is the foundation of
OLAP
– Data model, operations, etc.
CS 336 47
Warehouse Models & Operators
• Data Models
– relations
– stars & snowflakes
– cubes
• Operators
– slice & dice
– roll-up, drill down
– pivoting
– other
CS 336 48
Multi-Dimensional Data
• Measures - numerical data being tracked
• Dimensions - business parameters that define a
transaction
• Example: Analyst may want to view sales data
(measure) by geography, by time, and by product
(dimensions)
• Dimensional modeling is a technique for
structuring data around the business concepts
• ER models describe “entities” and “relationships”
• Dimensional models describe “measures” and
“dimensions”
CS 336 49
The Multi-Dimensional Model
“Sales by product line over the past six months”
“Sales by store between 1990 and 1995”
Store Info Key columns joining fact table
to dimension tables Numerical Measures
...
CS 336 50
Dimensional Modeling
CS 336 51
Dimension Hierarchies
Store Dimension Product Dimension
Total Total
Region Manufacturer
District Brand
Stores Products
CS 336 52
ROLAP: Dimensional Modeling
Using Relational DBMS
• Special schema design: star, snowflake
• Special indexes: bitmap, multi-table join
• Special tuning: maximize query throughput
• Proven technology (relational model,
DBMS), tend to outperform specialized
MDDB especially on large data sets
• Products
– IBM DB2, Oracle, Sybase IQ, RedBrick, Informix
CS 336 53
MOLAP: Dimensional Modeling
Using the Multi Dimensional Model
• MDDB: a special-purpose data model
• Facts stored in multi-dimensional arrays
• Dimensions used to index array
• Sometimes on top of relational DB
• Products
– Pilot, Arbor Essbase, Gentia
CS 336 54
Star Schema (in RDBMS)
CS 336 55
Star Schema Example
CS 336 56
Star
Schema
with Sample
Data
CS 336 57
The “Classic” Star Schema
Store Dimension Fact Table
A single fact table, with
Time Dimension
STORE KEY STORE KEY
PERIOD KEY
detail and summary data
Store Description PRODUCT KEY
Period Desc
City
State
PERIOD KEY
Dollars
Year Fact table primary key has
District ID Quarter
District Desc.
Region_ID
Units
Price
Month
Day
only one key column per
Region Desc.
Regional Mgr.
Level
Product Dimension Current Flag
Resolution dimension
PRODUCT KEY Sequence
Product Desc.
Brand Each key is generated
Color
Size
Manufacturer
Level
Each dimension is a single
table, highly denormalized
Example:
Select A.STORE_KEY, A.PERIOD_KEY, A.dollars from Level is needed
Fact_Table A whenever aggregates
where A.STORE_KEY in (select STORE_KEY are stored with detail
from Store_Dimension B facts.
where region = “North” and Level = 2)
and etc...CS 336 59
The “Level” Problem
• Level is a problem because because it causes
potential for error. If the query builder, human
or program, forgets about it, perfectly
reasonable looking WRONG answers can occur.
• One alternative: the FACT CONSTELLATION
model...
CS 336 60
The “Fact Constellation” Schema
Store Dimension Fact Table Time Dimension
STORE KEY STORE KEY
PERIOD KEY
Store Description PRODUCT KEY
City PERIOD KEY Period Desc
State Year
Dollars Quarter
District ID
Units
District Desc. Month
Price
Region_ID Day
Region Desc. Current Flag
Regional Mgr.
Product Dimension
Sequence
PRODUCT KEY
Product Desc.
Brand District Fact Table
Color
Region Fact Table
Size District_ID
Manufacturer Region_ID
PRODUCT_KEY
PRODUCT_KEY
PERIOD_KEY PERIOD_KEY
Dollars
Dollars
Units Units
Price Price
CS 336 61
The “Fact Constellation” Schema
Store Dimension Fact Table Time Dimension
STORE KEY STORE KEY
PRODUCT KEY
PERIOD KEY In the Fact Constellations,
Store Description
City
State
PERIOD KEY
Dollars
Period Desc
Year aggregate tables are
District ID Quarter
District Desc.
Region_ID
Units
Price
Month
Day
created
Region Desc.
Product Dimension
Regional Mgr.
PRODUCT KEY
Sequence separately from the detail,
Current Flag
Product Desc.
BrandDist rict Fact Table therefor
Color
Size District_ID
Manufacturer
PRODUCT_KEY
it is impossible to pick up,
Region Fact Table
Region_ID
PRODUCT_KEY
PERIOD_KEY
Dollars
for PERIOD_KEY
Dollars
Units
Price example, Store detail when
Units
Price
querying
the District Fact Table.
Major Advantage: No need for the “Level” indicator in the dimension tables,
since no aggregated data is stored with lower-level detail
Disadvantage: Dimension tables are still very large in some cases, which can
slow performance; front-end must be able to detect existence of aggregate
facts, which requires more extensive metadata
CS 336 62
Another Alternative to “Level”
• Fact Constellation is a good alternative to the
Star, but when dimensions have very high
cardinality, the sub-selects in the dimension
tables can be a source of delay.
• An alternative is to normalize the dimension
tables by attribute level, with each smaller
dimension table pointing to an appropriate
aggregated fact table, the “Snowflake Schema”
...
CS 336 63
The “Snowflake” Schema
Store Dimension
STORE KEY District_ID Region_ID
Store Description District Desc. Region Desc.
City Region_ID Regional Mgr.
State
District ID
District Desc.
Region_ID
Region Desc.
Regional Mgr.
Store Fact Table District Fact Table RegionFact Table
Region_ID
STORE KEY District_ID
PRODUCT_KEY
PRODUCT_KEY PERIOD_KEY
PRODUCT KEY PERIOD_KEY Dollars
PERIOD KEY Dollars Units
Units Price
Dollars Price
Units
Price
CS 336 64
The “Snowflake” Schema
Store Dimension
• No LEVEL in dimension tables
STORE KEY District_ID Region_ID
Store Description
City
District Desc.
Region_ID
Region Desc.
Regional Mgr.
• Dimension tables are normalized by
State
District ID
decomposing at the attribute level
District Desc.
Region_ID • Each dimension table has one key for
Region Desc.
Regional Mgr.
Store Fact Table District Fact Table
District_ID
RegionFact Table
Region_ID
each level of the dimensionís hierarchy
STORE KEY PRODUCT_KEY
PRODUCT_KEY
PRODUCT KEY
PERIOD KEY
PERIOD_KEY
Dollars
PERIOD_KEY
Dollars
Units
• The lowest level key joins the
Dollars
Units
Price
Price
dimension table to both the fact table
Units
Price and the lower level attribute table
How does it work? The best way is for the query to be built by understanding which
summary levels exist, and finding the proper snowflaked attribute tables,
constraining there for keys, then selecting from the fact table.
CS 336 65
The “Snowflake” Schema
Store Dimension
• Additional features: The original Store
STORE KEY District_ID Region_ID
Store Description District Desc. Region Desc. Dimension table, completely de-
City Region_ID Regional Mgr.
State normalized, is kept intact, since certain
District ID
District Desc.
Region_ID
queries can benefit by its all-
Region Desc.
Regional Mgr.
Store Fact Table District Fact Table RegionFact Table encompassing content.
District_ID Region_ID
STORE KEY
PRODUCT KEY
PRODUCT_KEY
PERIOD_KEY
PRODUCT_KEY
PERIOD_KEY
Dollars
• In practice, start with a Star Schema
PERIOD KEY Dollars
Units
Units
Price and create the “snowflakes” with
Dollars Price
Units queries. This eliminates the need to
Price
create separate extracts for each table,
and referential integrity is inherited
from the dimension table.
CS 336 68
Aggregates
Add up amounts by day
In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
CS 336 69
Another Example
Add up amounts by day, product
In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale prodId storeId date amt
p1 s1 1 12 sale prodId date amt
p2 s1 1 11 p1 1 62
p1 s3 1 50 p2 1 19
p2 s2 1 8
p1 s1 2 44 p1 2 48
p1 s2 2 4
rollup
drill-down
CS 336 70
Aggregates
• Operators: sum, count, max, min,
median, ave
• “Having” clause
• Using dimension hierarchy
– average by region (within store)
– maximum by month (within date)
CS 336 71
ROLAP vs. MOLAP
• ROLAP:
Relational On-Line Analytical Processing
• MOLAP:
Multi-Dimensional On-Line Analytical
Processing
CS 336 72
The MOLAP Cube
dimensions = 2
CS 336 73
3-D Cube
Fact table view: Multi-dimensional cube:
dimensions = 3
CS 336 74
Example
roll-up to region
Dimensions:
NY
SF
Time, Product, Store
roll-up to brand
LA
Attributes:
10
Product (upc, price, …)
Juice
Product
34
Store …
Milk
56 …
Coke
Cream 32 Hierarchies:
Soap 12 Product Brand …
Bread 56 roll-up to week Day Week Quarter
M T W Th F S S
Store Region
Country
Time
56 units of bread sold in LA on M
CS 336 75
Cube Aggregation: Roll-up
Example: computing sums
s1 s2 s3
day 2 ...
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8
s1 s2 s3
sum 67 12 50
s1 s2 s3
p1 56 4 50
p2 11 8 129
rollup sum
p1 110
p2 19
drill-down
CS 336 76
Cube Operators for Roll-up
s1 s2 s3
day 2 ...
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8 sale(s1,*,*)
s1 s2 s3
sum 67 12 50
s1 s2 s3
p1 56 4 50
p2 11 8 129
sum
sale(s2,p2,*) p1 110
p2 19 sale(*,*,*)
CS 336 77
Extended Cube
* s1 s2 s3 *
p1 56 4 50 110
p2 11 8 19
day 2 *
s1 67
s2 12
s3 *50 129
p1 44 4 48
p2
s1 s2 s3 * sale(*,p2,*)
day 1 * 44 4 48
p1 12 50 62
p2 11 8 19
* 23 8 50 81
CS 336 78
Aggregation Using Hierarchies
s1 s2 s3 store
day 2
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50 region
p2 11 8
country
region A region B
p1 56 54
p2 11 8
(store s1 in Region A;
stores s2, s3 in Region B)
CS 336 79
Slicing
s1 s2 s3
day 2
p1 44 4
p2 s1 s2 s3
day 1
p1 12 50
p2 11 8
TIME = day 1
s1 s2 s3
p1 12 50
p2 11 8
CS 336 80
Slicing & Sales
($ millions)
Products Time
Pivoting Store s1 Electronics
d1
$5.2
d2
Toys $1.9
Clothing $2.3
Cosmetics $1.1
Store s2 Electronics $8.9
Toys $0.75
Clothing $4.6
Cosmetics $1.5
Sales
($ millions)
Products d1
Store s1 Store s2
Store s1 Electronics $5.2 $8.9
Toys $1.9 $0.75
Clothing $2.3 $4.6
Cosmetics $1.1 $1.5
Store s2 Electronics
Toys
Clothing
CS 336 81
Summary of Operations
• Aggregation (roll-up)
– aggregate (summarize) data to the next higher dimension
element
– e.g., total sales by city, year total sales by region, year
• Navigation to detailed data (drill-down)
• Selection (slice) defines a subcube
– e.g., sales where city =‘Gainesville’ and date = ‘1/15/90’
• Calculation and ranking
– e.g., top 3% of cities by average income
• Visualization operations (e.g., Pivot)
• Time functions
– e.g., time average
CS 336 82
Query & Analysis Tools
• Query Building
• Report Writers (comparisons, growth, graphs,…)
• Spreadsheet Systems
• Web Interfaces
• Data Mining
CS 336 83
MOLAP, ROLAP, HOLAP
• MOLAP
– Multidimensional OLAP
• ROLAP
– Relational OLAP
• HOLAP
– Hybrid OLAP
MOLAP
• Uses multidimensional approach to solve a
problem
• Directly stores the information in cubes
• Used in SSAS (SQL Server Analysis Services)
ROLAP
• Relational databases are used to store the
data
• Translates OLAP queries to appropriate SQL
statements
• Data created by OLTP is directly used
Do it Exercise
Attributes such as num_sold are called measure attributes, since they can be
used to measure some value, and can be aggregated.
Attributes like make, color, size are called dimension attributes, since they
define the dimensions on which measure attributes are viewed.
Data that can be modeled as dimension attributes and measure attributes are
called multi-dimensional data.
Dimension Hierarchies
Cross Tabs and Data Cubes
OLAP systems allow analyst to view different summaries of the data.
The following table can be derived from
sales(make, color, size, num_sold)
Relational representation
make color num_sold
Cross-tab or pivot table Toyota white 8
Toyota red 35
WHITE RED SILVER TOTAL Toyota silver 10
TOYOTA 8 35 10 53 Toyota all 53
Nissan white 20
NISSAN 20 10 5 35 Nissan red 10
HOLDEN 14 7 28 49 Nissan silver 5
Nissan all 35
FORD 20 2 5 27 Holden white 14
TOTAL 62 54 48 164 Holden red 7
Holden silver 28
Holden all 49
Ford white 20
Ford red 2
Ford silver 5
Ford all 27
all white 62
all red 54
all silver 48
all all 164
Data Cubes
The generalization of a cross tab, which is 2-dimensional, to n
dimensions can be visualized as a n-dimensional cube, called
the data cube.
white
color
red
silver
all
MOLAP vs ROLAP
OLAP systems can use multi-dimensional array to store data cubes, called
multidimensional OLAP systems (MOLAP) .
Alternatively, they can stored data as relations in relational databases, called
relational OLAP systems (ROLAP).
ROLAP
The main relation, which relates dimensions to measures, is called the fact table.
e.g., sales(prod_id, date, shop_id, num_sold)
Very large, accumulation of facts such as sales
Each dimension can have additional attributes and an associated dimensional
table.
E.g., product(prod_id, price, color)
prod_id is a foreign key of sales
shops(shop_id, location, manager)
sales
prod_id
prod_id
date
Price
shop_id
color
num_sold shop_id
Location
manager
The Star Schema
Dimension tables are not in 3NF
The snowflake schema
A variation of the star schema where the
dimension tables are normalized.
Fact constellation
A set of fact tables that share some dimension
tables
OLAP Queries
A common operation is to aggregate a measure over one or more dimensions, e.g.,
find total/average sales for a product.
find total sales in each city/state/month etc
find top 2 products by total sales
Roll-up: moving from finer granularity to coarser granularity by means of
aggregation.
E.g., given total sales for each city, find total sales for each state.
Drill-down: The inverse of roll-up
Pivoting: aggregate on selected dimensions
Slicing and dicing:
E.g., from the data cube find the cross-tab on Model and Color for medium
cars . The cross-tab can be viewed as a slice of the data cube.
Query Processing Issues
Expensive aggregations are common
Pre-compute all aggregates? Maybe infeasible!
Materialized views can help.
Which views to materialize?
given a query and some materialized views, can we use the views to answer
the query? How?
How frequently should we refresh the views to make them consistent with the
underlying tables?
What indexes should one use?
SQL:1999 Extended Aggregations*
Example 1
Select make, color, size, sum(number) from sales
group by cube(make, color, size)
Calculates 8 groupings:
(make, color, size), (make, color), (make, size), …., ().
Example 2
Select make, color, sum(number) from sales
Group by rollup(make, color, size)
Calculates 4 groupings:
(make, color, size), (make, color), (make), ().
Examples in Oracle: Rollup
Oracle Rollup Example
OLTP and OLAP
Should OLAP be Performed Directly
on Operational Databases?
• An OLAP system on the other hand requires mostly a read only access to
data records for summarization and aggregation. If concurrency control and
recovery mechanisms are applied for such OLAP operations, it will
severely impact the throughput of an OLAP system.
OLAP Operations on Multi-dimensional Data
• Slice
• Dice
• Roll-up
• Drill down
• Drill through
• Drill across
• Pivot/Rotate
Do It Exercise
Hint: Provide the participants with a sample data sheet (Excel sheet) and
ask them to demonstrate their understanding of the various OLAP
operations on multi-dimensional data.
Data Warehouse
A repository of information gathered from multiple sources, stored under a unified
schema, usually at a single site .
Data may be augmented with additional attributes, such as timestamp, and
summary information.
Data are stored for a long time, permitting access to historical data.
Interactive response times expected for complex queries; ad-hoc updates
uncommon.
Building Data Warehouse
Issues:
– Semantic integration: When getting data from
multiple sources, must eliminate mismatches, e.g.,
different currencies.
– Heterogeneous sources: must access data from a
variety of source formats.
– Load, refresh, purge: Must load data, periodically
refresh it, and purge too old or useless data
– Metadata management: Must keep track of
source, loading time, etc.
Elements of data warehouse EIS/DSS
Apps
4
Data
Data
Replication &
Cleansing Metadata
1
Information
Operational Data Informational Directory
Database
3
Elements of data warehouse
Data Replication Manager
copying & distribution of data across databases
• data that needs to be copied, source/destination, frequency, data
transforms
• refresh copy entire source, propagate changes only
all external data is transformed & cleansed before adding to warehouse
Informational Database
database that stores data copied from multiple sources by data replication
manager
Information Directory
metadata manager - collects metadata from databases on network
EIS/DSS tools
SQL based query tools
some vendors use extended SQL
Query/Reporting tools
Formulate queries without (extended) SQL or other languages
Result displayed as table, graph, report,
Spreadsheet systems
Web interfaces
Vendor-specific tools
Oracle Discoverer:
• http://www.oracle.com/tools/disc/index.html
Column stores
A recently proposed data storage method that
allows more efficient aggregation queries in
data warehouses
stores data as columns rather than as rows.
See http://en.wikipedia.org/wiki/Column-
oriented_DBMS.
OLAP in BI
Answer a Quick Question
ERP provides several business benefits, here we enumerate the top three:
In short ERP systems are adept at capturing, storing and moving the data
across the various units smoothly.