0% found this document useful (0 votes)

110 views29 pages

Cube Implementations

The document discusses data cube implementation and OLAP systems. A data cube is represented as a lattice of cuboids, where each node is a cuboid containing aggregated measures for a dimension combination. Cuboids allow for drill down and roll up queries. Relational and multidimensional OLAP systems are discussed for storing and querying the cube lattice. Relational OLAP stores cuboids in tables while multidimensional OLAP stores them as multi-dimensional arrays. Large scale OLAP systems like Apache Kylin, Druid, and Presto are also mentioned.

Uploaded by

Parv Agarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

110 views29 pages

Cube Implementations

Uploaded by

Parv Agarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Data Cube Implementation

pm jat @ daiict

21-09-2023 Data Cube Implementation 1

Summaries in “Data Cube”

Origin: ℱ𝑠𝑢𝑚(𝑝𝑟𝑖𝑐𝑒) 𝑓𝑎𝑐𝑡

Item Axis: 𝐼𝑡𝑒𝑚𝐼𝐷 ℱ𝑠𝑢𝑚(𝑝𝑟𝑖𝑐𝑒) (𝑓𝑎𝑐𝑡)

Store Axis: 𝑆𝑡𝑜𝑟𝑒𝐼𝐷 ℱ𝑠𝑢𝑚(𝑝𝑟𝑖𝑐𝑒) (𝑓𝑎𝑐𝑡)

Customer Axis: 𝐶𝑢𝑠𝑡𝐼𝐷 ℱ𝑠𝑢𝑚(𝑝𝑟𝑖𝑐𝑒) (𝑓𝑎𝑐𝑡)

Floor: 𝐼𝑡𝑒𝑚𝐼𝐷,𝑆𝑡𝑜𝑟𝑒𝐼𝐷 ℱ𝑠𝑢𝑚(𝑝𝑟𝑖𝑐𝑒) (𝑓𝑎𝑐𝑡)

Front Wall: 𝐼𝑡𝑒𝑚𝐼𝐷,𝐶𝑢𝑠𝑡𝐼𝐷 ℱ𝑠𝑢𝑚(𝑝𝑟𝑖𝑐𝑒) 𝑓𝑎𝑐𝑡

Right Side Wall: 𝑆𝑡𝑜𝑟𝑒𝐼𝐷,𝐶𝑢𝑠𝑡𝐼𝐷 ℱ𝑠𝑢𝑚(𝑝𝑟𝑖𝑐𝑒) (𝑓𝑎𝑐𝑡)

Rest: 𝐶𝑢𝑠𝑡,𝑆𝑡𝑜𝑟𝑒,𝐼𝑡𝑒𝑚 ℱ𝑠𝑢𝑚(𝑝𝑟𝑖𝑐𝑒) 𝑓𝑎𝑐𝑡

Data Cube is a set of “cuboids” for “each
possible subset of the given
21-09-2023 dimensions”. Correct?
Data Cube Implementation 2
“Data Cube” as
Lattice of Cuboids
GROUP BY time

GROUP BY time, item

GROUP BY time, item, location

GROUP BY time, item, location, Supplier

21-09-2023
Figure Source: Data Mining Textbook [9] Data Cube Implementation 3
Data Cube as “lattice of Cuboids”
• Data Cube is represented as a “lattice of Cuboids”, where
• Node: Cuboid
– that contains “aggregated values” (called as measures) for “a
dimension(attribute) combination”
• Edge: Parent-Child Relationship, where
– child node has one (exactly one) extra attribute (dimension) than its parent
– child node cuboid can be computed from the parent cuboid aggregations
• Data cube is a set of “cuboids” for “each possible subset of the given dimensions”.
Correct?

21-09-2023 Data Cube Implementation 4

Data Cube as “4D cuboid” 4D representation of Lattice
shown here

21-09-2023
Figure Source: Data Mining Textbook [9] Data Cube Implementation 5
Querying Lattice
• Each node is
a view.
• “dimension
hierarchies”
• Rollup, and
Drill down?

A query can be reading full

node or part of a node from the lattice

21-09-2023
Figure Source: Data Mining Textbook[9] Data Cube Implementation 6
“Cube” “location” “Data Hierarchy”
Lattices can even be maintained
for different specific attributes –
making the cube further lower
granularity

21-09-2023 Data Cube Implementation 7

Figure Source: Data Mining Textbook[9]
“Cube” “Price” “Data Hierarchy”
Lattices can even be maintained for
different specific attribute values –
making the cube further lower
granularity

Figure Source: Data Mining Textbook[9]

21-09-2023 Data Cube Implementation 8
A Simple MR approach for Lattice Computation

Questions Remain: How efficient it would be? Can the resulting cube be materialized?
If yes, where do we store it? Can we
21-09-2023 Datahave an index on dimensions?
Cube Implementation 9
A Simple Spark solution for Lattice Computation

How do we materialize, index, search?

21-09-2023 BTW, What is major limitation of Spark-SQL wrt “SQL”
Data Cube Implementation 10 ?
OLAP Systems
• Materialized “Pre-Computation” and “Pre-Aggregation” of Lattice Cuboids remain
the key of OLAP systems for response time for interactive data analysis.
• Many systems offer query optimization based on pre-computed aggregates and
automatic maintenance of stored aggregates during updating of base data.
• Materializing all combinations of aggregates may become infeasible because it takes
too much storage and initial computation time.
• Instead, modern OLAP systems adopt the practical pre-aggregation approach of
materializing only select combinations of aggregates and then reusing these to
efficiently compute other aggregates.

21-09-2023 Data Cube Implementation 11

Types of OLAP Systems
• Relational OLAP (ROLAP) – Lattice Cuboids are stored in a Relational Tables
• Multidimensional OLAP (MOLAP) [3][4] – Lattices are stored as multi-dimensional
arrays, or so?
• Hybrid OLAP (HOLAP)

21-09-2023 Data Cube Implementation 12

ROLAP: Cube Lattice as a Relation

Identify:
• Base Cuboid? what is the dimension?
• Identify 2-D, 1-D, and 0-D cuboids?

21-09-2023 Data Cube Implementation 13

Relational OLAP (ROLAP)
• “Relational OLAP (ROLAP) systems use relational database technology for storing
data, and they also employ specialized index structures, such as bit-mapped indices,
to achieve good query performance” [3]
– Here we have schemas like “Star” and “Snowflake”
– Indexes based on Dimension attributes - may span to dimension tables
• Materialized Cubes are also stored in Relational tables.
– Cube may not be stored in Full. Say only base cubes are materialized, or say up to
N-M levels are stored, and so on.

21-09-2023 Data Cube Implementation 14

Relational OLAP (ROLAP)
• Strength is “Already established and are Robust Systems”
– Large data values could be efficiently processed using robust relational systems.
• With the help of indexes, particularly “bitmap indexes”
• Still, the disadvantages are:
– could be inefficient on ad hoc queries where we do not have materialized
“Cube”.
– May not scale, unsuited for “Big Data”

21-09-2023 Data Cube Implementation 15

Multidimensional OLAP (MOLAP) systems
• Cube lattices are represented in some other structure like multi-dimensional arrays.
• “Multidimensional data cubes are stored on disks in specialized multidimensional
structures”. [3]
• “MOLAP systems typically include provisions for handling sparse arrays and apply advanced
indexing and hashing to locate the data when performing queries” [3]
• MOLAP systems generally provide more space-efficient storage as well as faster query
response times.
• Options here are
– Proprietary MOLAP systems
– Multi-dimensional databases (MDDB). Some Relational Systems also support it, for
instance, Oracle [11]
– Key value stores becoming more common in “Big Data”

21-09-2023 Data Cube Implementation 16

Multidimensional OLAP (MOLAP)
• Though Multidimensional OLAP systems are a natural choice for data cubes and all
• BUT, traditionally they have been considered unsuitable for larger data volumes
– May require high main memory

21-09-2023 Data Cube Implementation 17

[3] Pedersen, Torben Bach, and Christian S. Jensen.
21-09-2023
"Multidimensional database technology." Computer 34.12 Data Cube Implementation
(2001) 18
Multidimensional Database History

[3] Pedersen, Torben Bach, and Christian S. Jensen.

21-09-2023
"Multidimensional database technology." Computer 34.12 Data Cube Implementation
(2001) 19
OLAP Systems
• Traditionally:
– Most DBMS systems “do support”
– Special planning is required in terms of “building”, “indexing”, “updating” data
cubes
– Special extensions for extract, integrated, load or third party tools!
• Large scale OLAP systems?
– Apache Kylin on Hadoop,
– Apache Druid,
– Presto,
– Cassandra

21-09-2023 Data Cube Implementation 20

Apache Kylin
• The Kylin project was started in 2013, in eBay's R&D in China.
• In November 2014, Kylin joined the Apache Software
https://en.wikipedia.org/wiki/Apache_Kylin

• Keylin 4 is the latest stable version:

https://kylin.apache.org/docs/index.html
21-09-2023 Data Cube Implementation 21
Apache Kylin Rationale
• Kylin’s core idea is the precomputation of result sets
• It calculates all possible query results in advance according to the specified
dimensions and indicators and speed up OLAP queries with fixed query patterns. .

https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
21-09-2023 Data Cube Implementation 22
Apache Kylin Rationale
• Kylin’s core idea is the precomputation of result sets
• It calculates all possible query results in advance according to the specified
dimensions and indicators and speed up OLAP queries with fixed query patterns. .

https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
21-09-2023 Data Cube Implementation 23
Apache Kylin - Architecture

21-09-2023 Data Cube Implementation 24

https://kylin.apache.org/index.html
Apache Kylin - Architecture

21-09-2023 Data Cube Implementation 25

Apache Kylin

21-09-2023 Data Cube Implementation

https://kylin.apache.org/docs/index.html 26
21-09-2023 Data Cube Implementation
https://kylin.apache.org/docs/index.html 27
Related Readings
[1] Lee, Suan, et al. "Scalable distributed data cube computation for large-scale multidimensional data
analysis on a Spark cluster." Cluster Computing 22 (2019): 2063-2087.
[2] Chen, Wenhao, et al. "An optimized distributed OLAP system for big data." 2017 2nd IEEE
International Conference on Computational Intelligence and Applications (ICCIA). IEEE, 2017.
[3] Pedersen, Torben Bach, and Christian S. Jensen. "Multidimensional database technology." Computer
34.12 (2001): 40-46.
[4] Thomsen, Erik. OLAP solutions: building multidimensional information systems. John Wiley & Sons,
2002.
[5] Jiatao Tao, Apache Kylin4 — A new storage and compute architecture, Apache Kylin Technical Blog,
2021. https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-
architecture/
[6] Pedersen, Torben Bach, and Christian S. Jensen. "Multidimensional database technology." Computer
34.12 (2001): 40-46.

21-09-2023 Data Cube Implementation 28

Related Readings
[7] Yang Li, The future of Apache Kylin: More powerful and easy-to-use OLAP,
https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
[8] https://cwiki.apache.org/confluence/display/KYLIN/Global+Dictionary+on+Spark
[9] (Book) Han, Jiawei, Micheline Kamber, and Jian Pei. "Data mining concepts and techniques third
edition." University of Illinois at Urbana-Champaign Micheline Kamber Jian Pei Simon Fraser University
(2012).
[10] (Book) Thomsen, Erik. OLAP solutions: building multidimensional information systems. John Wiley &
Sons, 2002.
[11] Oracle, “Understanding Multidimensional Databases”
https://docs.oracle.com/cd/E12825_01/epm.111/esb_dbag/frameset.htm?dinconc.htm

21-09-2023 Data Cube Implementation 29

Data Cube Insights for Analysts
No ratings yet
Data Cube Insights for Analysts
14 pages
Implementation: Data Warehouse
No ratings yet
Implementation: Data Warehouse
56 pages
Data Warehousing & Modeling: Module - 2
No ratings yet
Data Warehousing & Modeling: Module - 2
144 pages
09 Data Serving
No ratings yet
09 Data Serving
46 pages
Duck Data Umpire by Cubical Kits: Sarathchand P.V. B.E (Cse), M.Tech (CS), (PHD) Professor and Research Scholar
No ratings yet
Duck Data Umpire by Cubical Kits: Sarathchand P.V. B.E (Cse), M.Tech (CS), (PHD) Professor and Research Scholar
4 pages
What Is OLAP - On - Line Analytical Processing
No ratings yet
What Is OLAP - On - Line Analytical Processing
34 pages
Cca498 - Final - Review - Jiajia
No ratings yet
Cca498 - Final - Review - Jiajia
86 pages
Module 2 DMDW
No ratings yet
Module 2 DMDW
132 pages
1.7 Efficient Processing of OLAP Queries & OLAP Servers
No ratings yet
1.7 Efficient Processing of OLAP Queries & OLAP Servers
14 pages
Apache Kylin - Extreme OLAP Engine For Hadoop Presentation
No ratings yet
Apache Kylin - Extreme OLAP Engine For Hadoop Presentation
34 pages
Data Warehousing & OLAP Insights
No ratings yet
Data Warehousing & OLAP Insights
53 pages
P7 CubeTech
No ratings yet
P7 CubeTech
34 pages
Chapter 3 Topic - 4
No ratings yet
Chapter 3 Topic - 4
29 pages
Data Mining New Notes Unit 2 PDF
No ratings yet
Data Mining New Notes Unit 2 PDF
15 pages
OLAP Implementation Techniques: High Performance Data Warehouse Design and Construction
No ratings yet
OLAP Implementation Techniques: High Performance Data Warehouse Design and Construction
34 pages
Data Warehousing Implementation
No ratings yet
Data Warehousing Implementation
18 pages
DWDM Module 2
No ratings yet
DWDM Module 2
76 pages
Lecture 8 p2
No ratings yet
Lecture 8 p2
43 pages
Data Warehousing Explained
No ratings yet
Data Warehousing Explained
21 pages
DM and DW Notes-Module2
No ratings yet
DM and DW Notes-Module2
18 pages
Cube Computation and Indexes For Data Warehouses: CPS 196.03 Notes 7
No ratings yet
Cube Computation and Indexes For Data Warehouses: CPS 196.03 Notes 7
28 pages
3-Data Warehouse Modeling - Data Cube and OLAP-18!12!2024
No ratings yet
3-Data Warehouse Modeling - Data Cube and OLAP-18!12!2024
25 pages
Data Cube
No ratings yet
Data Cube
42 pages
Data Warehousing - C02 - OLAP
No ratings yet
Data Warehousing - C02 - OLAP
46 pages
Lecture 11
No ratings yet
Lecture 11
14 pages
DM Module 2
No ratings yet
DM Module 2
47 pages
Chapter 2 and 3
No ratings yet
Chapter 2 and 3
89 pages
Olap 2
No ratings yet
Olap 2
46 pages
Lecture 09
No ratings yet
Lecture 09
19 pages
Cube Lattices for Data Mining
No ratings yet
Cube Lattices for Data Mining
6 pages
Data Mining and Warehosuing Lecture 02
No ratings yet
Data Mining and Warehosuing Lecture 02
22 pages
Module 2
No ratings yet
Module 2
19 pages
03 - A Survey On OLAP
No ratings yet
03 - A Survey On OLAP
9 pages
Data Cube
No ratings yet
Data Cube
55 pages
Visually Mining The Datacube Using A Pixel-Oriented Technique
No ratings yet
Visually Mining The Datacube Using A Pixel-Oriented Technique
8 pages
Scalable Data Cube Analysis Over Big Data
No ratings yet
Scalable Data Cube Analysis Over Big Data
12 pages
OLAP for Business Intelligence
No ratings yet
OLAP for Business Intelligence
34 pages
Batch B DWM Experiments
No ratings yet
Batch B DWM Experiments
90 pages
Data Warehousing: Data Models and OLAP Operations
No ratings yet
Data Warehousing: Data Models and OLAP Operations
39 pages
Concepts and Techniques: - Chapter 5
No ratings yet
Concepts and Techniques: - Chapter 5
95 pages
DWDM Unit 2 Part 2 by Jithender Tulasi
No ratings yet
DWDM Unit 2 Part 2 by Jithender Tulasi
63 pages
Cs498 Week 10 Slide
No ratings yet
Cs498 Week 10 Slide
38 pages
DM Unit 2
No ratings yet
DM Unit 2
19 pages
DWH Lectures OLAP
No ratings yet
DWH Lectures OLAP
51 pages
Online Analytical Processing
No ratings yet
Online Analytical Processing
24 pages
Difference Between Column-Stores and OLAP Data Cubes
No ratings yet
Difference Between Column-Stores and OLAP Data Cubes
3 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
46 pages
Machine Learning Based Multidimensional Big Data Analytics Over
No ratings yet
Machine Learning Based Multidimensional Big Data Analytics Over
7 pages
DW Seminar
No ratings yet
DW Seminar
13 pages
05 Molap Rolap
No ratings yet
05 Molap Rolap
58 pages
Olap Case Study - VJ
No ratings yet
Olap Case Study - VJ
16 pages
CURE For Cubes: Cubing Using A ROLAP Engine: Konstantinos Morfonios Kmorfo@di - Uoa.gr Yannis Ioannidis Yannis@
No ratings yet
CURE For Cubes: Cubing Using A ROLAP Engine: Konstantinos Morfonios Kmorfo@di - Uoa.gr Yannis Ioannidis Yannis@
12 pages
05 Molap Rolap
No ratings yet
05 Molap Rolap
57 pages
Database Systems & Data Warehousing
No ratings yet
Database Systems & Data Warehousing
36 pages
Algebra OLAP
No ratings yet
Algebra OLAP
20 pages
OLAP
No ratings yet
OLAP
8 pages
DM Unit 2
No ratings yet
DM Unit 2
12 pages
Nosql Databases Types
No ratings yet
Nosql Databases Types
29 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
DynamoDB Data Model Guide
No ratings yet
DynamoDB Data Model Guide
75 pages
Lab-7 202103010 202103041
No ratings yet
Lab-7 202103010 202103041
13 pages
My LC Grind List
No ratings yet
My LC Grind List
5 pages
Caching With SQL Server Compact and The Microsoft Sync Framework
No ratings yet
Caching With SQL Server Compact and The Microsoft Sync Framework
6 pages
Orchadmin Guide for Data Engineers
No ratings yet
Orchadmin Guide for Data Engineers
2 pages
Oracle Apps Dba Interview
No ratings yet
Oracle Apps Dba Interview
74 pages
Advanced Excel Dashboard Data
No ratings yet
Advanced Excel Dashboard Data
76 pages
SAP BODS Mock Test Answers
No ratings yet
SAP BODS Mock Test Answers
7 pages
Bca Notes
50% (2)
Bca Notes
3 pages
UNIT-1 Serial Buses
No ratings yet
UNIT-1 Serial Buses
67 pages
Acc 82u33zphs 100030
No ratings yet
Acc 82u33zphs 100030
2 pages
Sort
No ratings yet
Sort
174 pages
SAP B1 SQL Queries for Business Data
100% (1)
SAP B1 SQL Queries for Business Data
10 pages
Log Files & Analyzing Tools
No ratings yet
Log Files & Analyzing Tools
18 pages
RS232 Interfacing
No ratings yet
RS232 Interfacing
41 pages
GPIO & Power Consumption Guide
No ratings yet
GPIO & Power Consumption Guide
42 pages
Java Practical Slips Solution
No ratings yet
Java Practical Slips Solution
3 pages
Visualizing 802.11 Wireshark Data
No ratings yet
Visualizing 802.11 Wireshark Data
30 pages
SQL Basics and Key Concepts
No ratings yet
SQL Basics and Key Concepts
12 pages
SAP CRM Territory Management Enhancements
No ratings yet
SAP CRM Territory Management Enhancements
4 pages
The Second Round of National Exit Exam Questions (1-45)
100% (5)
The Second Round of National Exit Exam Questions (1-45)
10 pages
Class 5 Cyber Olympiad Key
No ratings yet
Class 5 Cyber Olympiad Key
4 pages
DBT Cloud Advanced Architecture Guide
0% (1)
DBT Cloud Advanced Architecture Guide
4 pages
Personal System 2 Hardware Interface Technical Reference May88
No ratings yet
Personal System 2 Hardware Interface Technical Reference May88
625 pages
Data Representation B
No ratings yet
Data Representation B
29 pages
TCS 112 Study Questions by Premier
No ratings yet
TCS 112 Study Questions by Premier
4 pages
P-660HW-T1 v3 - 1
No ratings yet
P-660HW-T1 v3 - 1
2 pages
Configuring Advanced File Solutions: This Lab Contains The Following Exercises and Activities
No ratings yet
Configuring Advanced File Solutions: This Lab Contains The Following Exercises and Activities
16 pages
Certified List of Candidates: Quezon - City of Lucena Quezon - City of Lucena
No ratings yet
Certified List of Candidates: Quezon - City of Lucena Quezon - City of Lucena
3 pages
2-Bit Magnitude Comparator
No ratings yet
2-Bit Magnitude Comparator
53 pages
Typical DMZ Configuration WithFTP SMTP and DNS Servers
No ratings yet
Typical DMZ Configuration WithFTP SMTP and DNS Servers
5 pages
Lab8 DB
No ratings yet
Lab8 DB
3 pages

Cube Implementations

Uploaded by

Cube Implementations

Uploaded by

Data Cube Implementation

21-09-2023 Data Cube Implementation 1

Origin: ℱ𝑠𝑢𝑚(𝑝𝑟𝑖𝑐𝑒) 𝑓𝑎𝑐𝑡

Store Axis: 𝑆𝑡𝑜𝑟𝑒𝐼𝐷 ℱ𝑠𝑢𝑚(𝑝𝑟𝑖𝑐𝑒) (𝑓𝑎𝑐𝑡)

Customer Axis: 𝐶𝑢𝑠𝑡𝐼𝐷 ℱ𝑠𝑢𝑚(𝑝𝑟𝑖𝑐𝑒) (𝑓𝑎𝑐𝑡)

Floor: 𝐼𝑡𝑒𝑚𝐼𝐷,𝑆𝑡𝑜𝑟𝑒𝐼𝐷 ℱ𝑠𝑢𝑚(𝑝𝑟𝑖𝑐𝑒) (𝑓𝑎𝑐𝑡)

Front Wall: 𝐼𝑡𝑒𝑚𝐼𝐷,𝐶𝑢𝑠𝑡𝐼𝐷 ℱ𝑠𝑢𝑚(𝑝𝑟𝑖𝑐𝑒) 𝑓𝑎𝑐𝑡

Rest: 𝐶𝑢𝑠𝑡,𝑆𝑡𝑜𝑟𝑒,𝐼𝑡𝑒𝑚 ℱ𝑠𝑢𝑚(𝑝𝑟𝑖𝑐𝑒) 𝑓𝑎𝑐𝑡

GROUP BY time, item

GROUP BY time, item, location

GROUP BY time, item, location, Supplier

21-09-2023 Data Cube Implementation 4

A query can be reading full

21-09-2023 Data Cube Implementation 7

Figure Source: Data Mining Textbook[9]

How do we materialize, index, search?

21-09-2023 Data Cube Implementation 11

21-09-2023 Data Cube Implementation 12

21-09-2023 Data Cube Implementation 13

21-09-2023 Data Cube Implementation 14

21-09-2023 Data Cube Implementation 15

21-09-2023 Data Cube Implementation 16

21-09-2023 Data Cube Implementation 17

[3] Pedersen, Torben Bach, and Christian S. Jensen.

21-09-2023 Data Cube Implementation 20

• Keylin 4 is the latest stable version:

21-09-2023 Data Cube Implementation 24

21-09-2023 Data Cube Implementation 25

21-09-2023 Data Cube Implementation

21-09-2023 Data Cube Implementation 28

21-09-2023 Data Cube Implementation 29

You might also like