KEMBAR78
Column-vs-Row Databases | PDF | Relational Model | Database Index
0% found this document useful (0 votes)
30 views12 pages

Column-vs-Row Databases

The document compares column-stores and row-stores, highlighting their differences in data storage, efficiency for read-only queries, and optimization techniques. It discusses the advantages of column-stores in terms of I/O efficiency and compression, while also noting potential drawbacks for certain applications. Additionally, it explores methods to emulate column-store benefits in row-stores through vertical partitioning and materialized views.

Uploaded by

Usama Riaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views12 pages

Column-vs-Row Databases

The document compares column-stores and row-stores, highlighting their differences in data storage, efficiency for read-only queries, and optimization techniques. It discusses the advantages of column-stores in terms of I/O efficiency and compression, while also noting potential drawbacks for certain applications. Additionally, it explores methods to emulate column-store benefits in row-stores through vertical partitioning and materialized views.

Uploaded by

Usama Riaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

11/30/2024

Column-Stores vs. Row-Stores

Contents

 Column-store introduction
 Column-store data model
 Emulation of Column Store in Row store
 Column store optimization
 Experiment and Results
 Conclusion

1
11/30/2024

Row Store and Column Store

Figure taken form [2]

 In row store data are stored in the disk tuple by tuple.


 Where in column store data are stored in the disk
column by column

Row Store and Column Store


 Most of the queries does not process all the attributes of a
particular relation.

 For example the query


Select c.name and c.address
From CUSTOMES as c
Where c.region=abc;

 Only process three attributes of the relation CUSTOMER. But


the customer relation can have more than three attributes.

 Column-stores are more I/O efficient for read-only queries as


they read, only those attributes which are accessed by a query.

2
11/30/2024

Row Store and Column Store

Row Store Column Store

(+) Easy to add/modify a record (+) Only need to read in relevant data

(-) Might read in unnecessary data (-) Tuple writes require multiple accesses

 So column stores are suitable for read-mostly, read-


intensive, large data repositories

Why Column Stores?


 Can be significantly faster than row stores for some
applications
 Fetch only required columns for a query
 Better cache effects
 Better compression (similar attribute values within a column)
 But can be slower for other applications
 OLTP with many row inserts, ..

3
11/30/2024

Column Stores - Data Model

 Standard relational logical data model


 EMP(name, age, salary, dept)
 DEPT(dname, floor)
 Table – collection of projections
 Projection – set of columns
 Horizontally partitioned into segments with segment
identifier

Column Stores - Data Model

 To answer queries, projections are joined using Storage


keys and join indexes
 Storage Keys:
 Within a segment, every data value of every column is
associated with a unique Skey
 Values from different columns with matching Skey belong to
the same logical row

4
11/30/2024

Column Stores – Data Model

 Join Indexes
 T1 and T2 are projections on T
 M segments in T1 and N segments in T2
 Join Index from T1 to T2 is a table of the form:
 (s: Segment ID in T2, k: Storage key in Segment s)
 Each row in join index matches corresponding row in T1
 Join indexes are built such that T could be efficiently
reconstructed from T1 and T2

Compression
 Trades I/O for CPU
 Increased column-store opportunities:
 Higher data value locality in column stores
 Techniques such as run length encoding far more useful
 Schemes
 Null Suppression
 Dictionary encoding
 Run Length encoding
 Bit-Vector encoding
 Heavyweight schemes

10

5
11/30/2024

Query Execution - Operators

 Select: Same as relational algebra, but produces a bit


string
 Project: Same as relational algebra
 Join: Joins projections according to predicates

 Aggregation: SQL like aggregates


 Sort: Sort all columns of a projection

11

Query Execution - Operators


 Decompress: Converts compressed column to
uncompressed representation
 Mask(Bitstring B, Projection Cs) => emit only those
values whose corresponding bits are 1
 Concat: Combines one or more projections sorted in
the same order into a single projection
 Permute: Permutes a projection according to the
ordering defined by a join index
 Bitstring operators: Band – Bitwise AND, Bor – Bitwise
OR, Bnot – complement

12

6
11/30/2024

Row Store Vs Column Store


 Now the simplistic view about the difference in storage
layout leads to that one can obtain the performance
benefits of a column-store using a row-store by making
some changes to the physical structure of the row store.

 This changes can be


 Vertically partitioning
 Using index-only plans
 Using materialized views

13

Vertical Partitioning
 Process:
 Full Vertical partitioning of each relation
 Each column =1 Physical table
 This can be achieved by adding integer position column to every table
 Adding integer position is better than adding primary key
 Join on Position for multi column fetch

 Problems:
 “Position” - Space and disk bandwidth
 Header for every tuple – further space wastage
 e.g. 24 byte overhead in PostgreSQL

14

7
11/30/2024

Index-only plans: Example

15

Materialized Views
 Process:
 Create ‘optimal' set of MVs for given query workload
 Objective:
 Provide just the required data
 Avoid overheads
 Performs better
 Expected to perform better than other two approach

 Problems:
 Practical only in limited situation
 Require knowledge of query workloads in advance

16

8
11/30/2024

Materialized Views: Example

 Select F.custID
from Facts as F
where F.price>20

17

Optimizing Column oriented Execution

 Different optimization for column oriented database


 Compression
 Late Materialization
 Block Iteration
 Invisible Join

18

9
11/30/2024

Compression

 Low information entropy (high data value locality) leads


to High compression ratio
 Advantage
 Disk Space is saved
 Less I/O
 CPU cost decrease if we can perform operation without
decompressing
 Light weight compression schemes do better

19

Compression

 If data is sorted on one column that column will be


super-compressible in row store

 eg. Run length encoding

Figure taken form [2]


20

10
11/30/2024

Late Materialization

 Most query results entity-at-a-time not column-at-a-time


 So at some point of time multiple column must be
combined
 One simple approach is to join the columns relevant for a
particular query
 But further performance can be improve using late-
materialization

21

Late Materialization

 Delay Tuple Construction


 Might avoid constructing it altogether
 Intermediate position lists might need to be constructed
 Eg: SELECT R.a FROM R WHERE R.c = 5 AND R.b = 10
 Output of each predicate is a bit string
 Perform Bitwise AND
 Use final position list to extract R.a

22

11
11/30/2024

Late Materialization

 Advantages
 Unnecessary construction of tuple is avoided
 Direct operation on compressed data
 Cache performance is improved (PAX)

23

Thank You!

24

12

You might also like