KEMBAR78
Dimensional Modelling-Data Warehouse & Data Mining | PPTX
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Affiliated Institution of G.G.S.IP.U, Delhi
BCA
Data Warehouse & Data Mining
20302
Dimensional Modelling
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Design Requirements
Design of the DW must directly reflect the
way the managers look at the business
2
Should capture the measurements of
importance along with parameters by which
these parameters are viewed
It must facilitate data analysis, i.e.,
answering business questions
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
ER Modeling
• A logical design technique that seeks to eliminate data
redundancy
• Illuminates the microscopic relationships among data
elements
• Perfect for OLTP systems
• Responsible for success of transaction processing in
Relational Databases
3
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Problems with ER Model
ER models are NOT suitable for DW?
• End user cannot understand or remember an ER
Model
• Many DWs have failed because of overly complex ER
designs
• Not optimized for complex, ad-hoc queries
• Data retrieval becomes difficult due to normalization
• Browsing becomes difficult
4
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
ER vs Dimensional Modeling
• ER models are constituted to
–Remove redundant data (normalization)
–Facilitate retrieval of individual records
having certain critical identifiers
–Thereby optimizing OLTP performance
• Dimensional model supports the reporting
and analytical needs of a data warehouse
system. 5
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Dimensional Modeling:
Salient Features
• Represents data in a standard framework
• Framework is easily understandable by end
users
• Contains same information as ER model
• Packages data in symmetric format
• Resilient to change
• Facilitates data retrieval/analysis
6
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Dimensional Modeling: Vocabulary
• Measures or facts
• Facts are “numeric” & “additive”
• For example; Sale Amount, Sale Units
• Factors or dimensions
• Star Schemas
• Snowflake & Starflake Schemas
7
Sales Amt = f (Product, Location,
Fact Dimensions
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Star Schema
8
Sales Fact
Table
Location
Dimension
Promotion
Dimension
Product
Dimension
Time
Dimension
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Dimensional Modeling
• Facts are stored in FACT Tables
• Dimensions are stored in DIMENSION
tables
• Dimension tables contains textual
descriptors of business
• Fact and dimension tables form a Star
Schema
• “BIG” fact table in center surrounded by
“SMALL” dimension tables
9
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Fact Tables
• Contains numerical measurements of the
business
• Each measurement is taken at the intersection
of all dimensions
• Intersection is the composite key
• Represents Many-to-many relationships
between dimensions
• Examples of facts
Sale_amt, Units_sold, Cost, Customer_count
10
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Dimension Tables
• Contains attributes for dimensions
• 50 to 100 attributes common
• Best attributes are textual and descriptive
• DW is only as good as the dimension attributes
• Contains hierarchal information albeit
redundantly
• Entry points into the fact table
11
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
12Star Schema (in RDBMS)
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
13
Star Schema Example
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
– The time independent, textual and descriptive
attributes by which users describe objects.
– Combining all the attributes including hierarchies,
rollups and sub-references into a single dimension
is denormalization.
– Often the “by” word in a query or report
– Not time dependent
Dimensions
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
Facts
– Business Measurements
– Most Facts are Numeric
– Additive, Semi-Additive, Non-Additive
– Built from the lowest level of detail (grain)
– Very Efficient
– Time dependent
TRINITY INSTITUTE OF PROFESSIONAL STUDIES
Sector – 9, Dwarka Institutional Area, New Delhi-75
THANK YOU

Dimensional Modelling-Data Warehouse & Data Mining

  • 1.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Affiliated Institution of G.G.S.IP.U, Delhi BCA Data Warehouse & Data Mining 20302 Dimensional Modelling
  • 2.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Design Requirements Design of the DW must directly reflect the way the managers look at the business 2 Should capture the measurements of importance along with parameters by which these parameters are viewed It must facilitate data analysis, i.e., answering business questions
  • 3.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 ER Modeling • A logical design technique that seeks to eliminate data redundancy • Illuminates the microscopic relationships among data elements • Perfect for OLTP systems • Responsible for success of transaction processing in Relational Databases 3
  • 4.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Problems with ER Model ER models are NOT suitable for DW? • End user cannot understand or remember an ER Model • Many DWs have failed because of overly complex ER designs • Not optimized for complex, ad-hoc queries • Data retrieval becomes difficult due to normalization • Browsing becomes difficult 4
  • 5.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 ER vs Dimensional Modeling • ER models are constituted to –Remove redundant data (normalization) –Facilitate retrieval of individual records having certain critical identifiers –Thereby optimizing OLTP performance • Dimensional model supports the reporting and analytical needs of a data warehouse system. 5
  • 6.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Dimensional Modeling: Salient Features • Represents data in a standard framework • Framework is easily understandable by end users • Contains same information as ER model • Packages data in symmetric format • Resilient to change • Facilitates data retrieval/analysis 6
  • 7.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Dimensional Modeling: Vocabulary • Measures or facts • Facts are “numeric” & “additive” • For example; Sale Amount, Sale Units • Factors or dimensions • Star Schemas • Snowflake & Starflake Schemas 7 Sales Amt = f (Product, Location, Fact Dimensions
  • 8.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Star Schema 8 Sales Fact Table Location Dimension Promotion Dimension Product Dimension Time Dimension
  • 9.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Dimensional Modeling • Facts are stored in FACT Tables • Dimensions are stored in DIMENSION tables • Dimension tables contains textual descriptors of business • Fact and dimension tables form a Star Schema • “BIG” fact table in center surrounded by “SMALL” dimension tables 9
  • 10.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Fact Tables • Contains numerical measurements of the business • Each measurement is taken at the intersection of all dimensions • Intersection is the composite key • Represents Many-to-many relationships between dimensions • Examples of facts Sale_amt, Units_sold, Cost, Customer_count 10
  • 11.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Dimension Tables • Contains attributes for dimensions • 50 to 100 attributes common • Best attributes are textual and descriptive • DW is only as good as the dimension attributes • Contains hierarchal information albeit redundantly • Entry points into the fact table 11
  • 12.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 12Star Schema (in RDBMS)
  • 13.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 13 Star Schema Example
  • 14.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 – The time independent, textual and descriptive attributes by which users describe objects. – Combining all the attributes including hierarchies, rollups and sub-references into a single dimension is denormalization. – Often the “by” word in a query or report – Not time dependent Dimensions
  • 15.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 Facts – Business Measurements – Most Facts are Numeric – Additive, Semi-Additive, Non-Additive – Built from the lowest level of detail (grain) – Very Efficient – Time dependent
  • 16.
    TRINITY INSTITUTE OFPROFESSIONAL STUDIES Sector – 9, Dwarka Institutional Area, New Delhi-75 THANK YOU