0% found this document useful (0 votes)

35 views5 pages

Data Partitioning

Uploaded by

taff

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views5 pages

Data Partitioning

Uploaded by

taff

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2/5/2020 Data Partitioning

(/learn)

Data Partitioning

Data partitioning is a technique to break up a big database (DB) into

many smaller parts. It is the process of splitting up a DB/table across
multiple machines to improve the manageability, performance,
availability, and load balancing of an application. The justification for
data partitioning is that, after a certain scale point, it is cheaper and
more feasible to scale horizontally by adding more machines than to
grow it vertically by adding beefier servers.

1. Partitioning Methods

There are many different schemes one could use to decide how to break
up an application database into multiple smaller DBs. Below are three
of the most popular schemes used by various large scale applications.

a. Horizontal partitioning: In this scheme, we put different rows into

different tables. For example, if we are storing different places in a
table, we can decide that locations with ZIP codes less than 10000 are
stored in one table and places with ZIP codes greater than 10000 are
stored in a separate table. This is also called a range based partitioning
as we are storing different ranges of data in separate tables. Horizontal
partitioning is also called as Data Sharding.

The key problem with this approach is that if the value whose range is
used for partitioning isn’t chosen carefully, then the partitioning
scheme will lead to unbalanced servers. In the previous example,
splitting location based on their zip codes assumes that places will be
evenly distributed across the different zip codes. This assumption is not
https://www.educative.io/courses/grokking-the-system-design-interview/mEN8lJXV1LA 1/5
2/5/2020 Data Partitioning

valid as there will be a lot of places in a thickly populated area like

(/learn)
Manhattan as compared to its suburb cities.

b. Vertical Partitioning: In this scheme, we divide our data to store

tables related to a specific feature in their own server. For example, if
we are building Instagram like application - where we need to store
data related to users, photos they upload, and people they follow - we
can decide to place user profile information on one DB server, friend
lists on another, and photos on a third server.

Vertical partitioning is straightforward to implement and has a low

impact on the application. The main problem with this approach is that
if our application experiences additional growth, then it may be
necessary to further partition a feature specific DB across various
servers (e.g. it would not be possible for a single server to handle all the
metadata queries for 10 billion photos by 140 million users).

c. Directory Based Partitioning: A loosely coupled approach to work

around issues mentioned in the above schemes is to create a lookup
service which knows your current partitioning scheme and abstracts it
away from the DB access code. So, to find out where a particular data
entity resides, we query the directory server that holds the mapping
between each tuple key to its DB server. This loosely coupled approach
means we can perform tasks like adding servers to the DB pool or
changing our partitioning scheme without having an impact on the
application.

2. Partitioning Criteria

a. Key or Hash-based partitioning: Under this scheme, we apply a

hash function to some key attributes of the entity we are storing; that
yields the partition number. For example, if we have 100 DB servers
and our ID is a numeric value that gets incremented by one each time a
new record is inserted. In this example, the hash function could be ‘ID
% 100’, which will give us the server number where we can store/read
https://www.educative.io/courses/grokking-the-system-design-interview/mEN8lJXV1LA 2/5
2/5/2020 Data Partitioning

that record. This approach should ensure a uniform allocation of data

(/learn)
among servers. The fundamental problem with this approach is that it
effectively fixes the total number of DB servers, since adding new
servers means changing the hash function which would require
redistribution of data and downtime for the service. A workaround for
this problem is to use Consistent Hashing.

b. List partitioning: In this scheme, each partition is assigned a list of

values, so whenever we want to insert a new record, we will see which
partition contains our key and then store it there. For example, we can
decide all users living in Iceland, Norway, Sweden, Finland, or Denmark
will be stored in a partition for the Nordic countries.

c. Round-robin partitioning: This is a very simple strategy that ensures

uniform data distribution. With ‘n’ partitions, the ‘i’ tuple is assigned to
partition (i mod n).

d. Composite partitioning: Under this scheme, we combine any of the

above partitioning schemes to devise a new scheme. For example, first
applying a list partitioning scheme and then a hash based partitioning.
Consistent hashing could be considered a composite of hash and list
partitioning where the hash reduces the key space to a size that can be
listed.

3. Common Problems of Data Partitioning

On a partitioned database, there are certain extra constraints on the

different operations that can be performed. Most of these constraints
are due to the fact that operations across multiple tables or multiple
rows in the same table will no longer run on the same server. Below are
some of the constraints and additional complexities introduced by
partitioning:

a. Joins and Denormalization: Performing joins on a database which is

running on one server is straightforward, but once a database is
https://www.educative.io/courses/grokking-the-system-design-interview/mEN8lJXV1LA 3/5
2/5/2020 Data Partitioning
running on one server is straightforward, but once a database is
partitioned and(/learn)
spread across multiple machines it is often not feasible

to perform joins that span database partitions. Such joins will not be
performance efficient since data has to be compiled from multiple
servers. A common workaround for this problem is to denormalize the
database so that queries that previously required joins can be
performed from a single table. Of course, the service now has to deal
with all the perils of denormalization such as data inconsistency.

b. Referential integrity: As we saw that performing a cross-partition

query on a partitioned database is not feasible, similarly, trying to
enforce data integrity constraints such as foreign keys in a partitioned
database can be extremely difficult.

Most of RDBMS do not support foreign keys constraints across databases

on different database servers. Which means that applications that
require referential integrity on partitioned databases often have to
enforce it in application code. Often in such cases, applications have to
run regular SQL jobs to clean up dangling references.

c. Rebalancing: There could be many reasons we have to change our

partitioning scheme:

1. The data distribution is not uniform, e.g., there are a lot of places for a
particular ZIP code that cannot fit into one database partition.
2. There is a lot of load on a partition, e.g., there are too many requests
being handled by the DB partition dedicated to user photos.

In such cases, either we have to create more DB partitions or have to

rebalance existing partitions, which means the partitioning scheme
changed and all existing data moved to new locations. Doing this
without incurring downtime is extremely difficult. Using a scheme like
directory based partitioning does make rebalancing a more palatable
experience at the cost of increasing the complexity of the system and
ti g i gl i t f f il (i
https://www.educative.io/courses/grokking-the-system-design-interview/mEN8lJXV1LA th l k i /d t b ) 4/5
2/5/2020 Data Partitioning
creating a new single point of failure (i.e. the lookup service/database).
(/learn)

← Back Next →
Mark as Completed
(/courses/grokking- (/courses/grokking-
the- the-
system- system-
design- design-
Caching Indexes
interview/3j6NnJrpp5p) interview/gxkVE8NEvXj)

Stuck? DISCUSS
Get help (https://discuss.educative.io/c/grokking-the-system-design- 47
Send
on interview-design-gurus/glossary-of-system-design-basics- Recommendations
feedback
sharding-or-data-partitioning)

https://www.educative.io/courses/grokking-the-system-design-interview/mEN8lJXV1LA 5/5

How To Partition PostgreSQL Database
No ratings yet
How To Partition PostgreSQL Database
8 pages
Partitioning For Database Performance
No ratings yet
Partitioning For Database Performance
3 pages
5 Partitioning
No ratings yet
5 Partitioning
23 pages
3 RD Unit Partioning
No ratings yet
3 RD Unit Partioning
3 pages
ADB25 Lab 5
No ratings yet
ADB25 Lab 5
6 pages
Lec 18 Notes
No ratings yet
Lec 18 Notes
1 page
Chapter 2
No ratings yet
Chapter 2
61 pages
I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
No ratings yet
I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
42 pages
Parallel Databases
No ratings yet
Parallel Databases
19 pages
Partitioning PDF
No ratings yet
Partitioning PDF
5 pages
Unit I
No ratings yet
Unit I
43 pages
Data Partition Survey
No ratings yet
Data Partition Survey
23 pages
I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
No ratings yet
I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
42 pages
A Comprehensive Guide To Oracle Partitioning With Samples
No ratings yet
A Comprehensive Guide To Oracle Partitioning With Samples
36 pages
Data Partitioning & K-Means Guide
No ratings yet
Data Partitioning & K-Means Guide
8 pages
Chapter 20: Parallel Databases
No ratings yet
Chapter 20: Parallel Databases
6 pages
Partitioning in Oracle
No ratings yet
Partitioning in Oracle
5 pages
U4 - 5 I o Parallelism
No ratings yet
U4 - 5 I o Parallelism
8 pages
CH14
No ratings yet
CH14
43 pages
Lecture 2 Lecture PPT #3,4,5,6
No ratings yet
Lecture 2 Lecture PPT #3,4,5,6
34 pages
2 Parallel Databases
No ratings yet
2 Parallel Databases
44 pages
An Optimized Scheme For Vertical Partitioning of A
No ratings yet
An Optimized Scheme For Vertical Partitioning of A
8 pages
Module 3 - Parallel and Distributed Database
No ratings yet
Module 3 - Parallel and Distributed Database
22 pages
Chapter 21: Parallel Databases
No ratings yet
Chapter 21: Parallel Databases
43 pages
2: Data Model: Creating An E Cient Data Model For Highly-Loaded Applications
No ratings yet
2: Data Model: Creating An E Cient Data Model For Highly-Loaded Applications
83 pages
CDA C2 R 074 en File 68.en
No ratings yet
CDA C2 R 074 en File 68.en
3 pages
ECS781P-9-Cloud Data Management
No ratings yet
ECS781P-9-Cloud Data Management
79 pages
20762C 03
No ratings yet
20762C 03
29 pages
Parallelism and Data Partitioning
No ratings yet
Parallelism and Data Partitioning
50 pages
Cloud Computing Unit-3 Complete Notes 13-09-2024 Complete Notes
No ratings yet
Cloud Computing Unit-3 Complete Notes 13-09-2024 Complete Notes
25 pages
Ads Mse
No ratings yet
Ads Mse
22 pages
CH 21
No ratings yet
CH 21
10 pages
Partition Table in STARS Concept and Evaluations
No ratings yet
Partition Table in STARS Concept and Evaluations
8 pages
Distributed Data Store
No ratings yet
Distributed Data Store
11 pages
CDA C2 R 072 en File 70.en
No ratings yet
CDA C2 R 072 en File 70.en
2 pages
ICDE 2018 A Graph-Based Database Partitioning Method For Parallel OLAP Query Processing
No ratings yet
ICDE 2018 A Graph-Based Database Partitioning Method For Parallel OLAP Query Processing
12 pages
Database Partitioning Strategies Review
No ratings yet
Database Partitioning Strategies Review
4 pages
Third Year Engineering: 21BTCS604 - Advanced DBMS
No ratings yet
Third Year Engineering: 21BTCS604 - Advanced DBMS
51 pages
IO Parallelism
No ratings yet
IO Parallelism
4 pages
Distributed Database
No ratings yet
Distributed Database
5 pages
Oracle Performance Tuning - Oracle Partitioning - Introduction
No ratings yet
Oracle Performance Tuning - Oracle Partitioning - Introduction
57 pages
Week08 - Physical Design
No ratings yet
Week08 - Physical Design
24 pages
Performance Tuning - Partitioning
No ratings yet
Performance Tuning - Partitioning
11 pages
Oracle Partitioning For Developers
No ratings yet
Oracle Partitioning For Developers
70 pages
Deep Dive Dynamo DB
No ratings yet
Deep Dive Dynamo DB
88 pages
Data Mining Questions
No ratings yet
Data Mining Questions
9 pages
CH 21
No ratings yet
CH 21
44 pages
Ads QB
No ratings yet
Ads QB
17 pages
SQL Server Partitioning
100% (2)
SQL Server Partitioning
20 pages
Parallel and Distributed Storage Advances
No ratings yet
Parallel and Distributed Storage Advances
43 pages
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
No ratings yet
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
39 pages
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
No ratings yet
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
39 pages
Lecture 9 - Physical Design
No ratings yet
Lecture 9 - Physical Design
15 pages
Lec21Notes Merged
No ratings yet
Lec21Notes Merged
20 pages
PracticalPartitioning v2
No ratings yet
PracticalPartitioning v2
76 pages
Oracle Partitioning in Oracle Database 11g
No ratings yet
Oracle Partitioning in Oracle Database 11g
47 pages
Unit 4
No ratings yet
Unit 4
18 pages
Modeling and Simulation For A Drop-Impact Analysis of Multi-Layered Printed Circuit Boards
No ratings yet
Modeling and Simulation For A Drop-Impact Analysis of Multi-Layered Printed Circuit Boards
16 pages
Cube-Voyager - Technical Brochure
No ratings yet
Cube-Voyager - Technical Brochure
3 pages
Mathmatics Demarcation - Wiskunde Afbakening
No ratings yet
Mathmatics Demarcation - Wiskunde Afbakening
4 pages
Bcan 201 (New) Dca 201
No ratings yet
Bcan 201 (New) Dca 201
2 pages
Colleges Pune City 3
No ratings yet
Colleges Pune City 3
4 pages
Lecture 8
No ratings yet
Lecture 8
16 pages
MathsWatch Essential Questions SAMPLE
100% (1)
MathsWatch Essential Questions SAMPLE
16 pages
Scope 64264 CC 3921 1720773685
No ratings yet
Scope 64264 CC 3921 1720773685
61 pages
BS-1868 2010
100% (1)
BS-1868 2010
28 pages
Shades and Silver Dax Murray Instant Download
100% (1)
Shades and Silver Dax Murray Instant Download
28 pages
Registers of 8085 Microprocessor
No ratings yet
Registers of 8085 Microprocessor
5 pages
Computational Physics Lab: Writing Up: Laboratory Class Attendance
No ratings yet
Computational Physics Lab: Writing Up: Laboratory Class Attendance
3 pages
GLC60 70VX 1
No ratings yet
GLC60 70VX 1
8 pages
Com 101
No ratings yet
Com 101
76 pages
The Definite German Articles: Der Die Das
No ratings yet
The Definite German Articles: Der Die Das
5 pages
Leica Lens Book: Leica M System, Leica R System
100% (1)
Leica Lens Book: Leica M System, Leica R System
9 pages
Acceptance for Road Repair Contract
No ratings yet
Acceptance for Road Repair Contract
1 page
PADS Tutorial
No ratings yet
PADS Tutorial
59 pages
Bochaver Et Al 20221687066806424
No ratings yet
Bochaver Et Al 20221687066806424
17 pages
Plumbing Safety & Workshop Guide
No ratings yet
Plumbing Safety & Workshop Guide
46 pages
OLAP Operations
No ratings yet
OLAP Operations
20 pages
LAB 1 Installing Servers
No ratings yet
LAB 1 Installing Servers
7 pages
MHWirth - Pile Top Drill Rigs - en (Brochure)
No ratings yet
MHWirth - Pile Top Drill Rigs - en (Brochure)
12 pages
Legendre Polynomials Assignment
100% (1)
Legendre Polynomials Assignment
2 pages
PCHT
No ratings yet
PCHT
7 pages
Salt Harbour Part B Analysis-Easterly
No ratings yet
Salt Harbour Part B Analysis-Easterly
4 pages
Modified Fuel-less Air Engine Design
No ratings yet
Modified Fuel-less Air Engine Design
46 pages
Zeroth Review PPT Template (20-24)
No ratings yet
Zeroth Review PPT Template (20-24)
15 pages
Philips Hts3450
No ratings yet
Philips Hts3450
80 pages
Lag Manual
No ratings yet
Lag Manual
23 pages

Data Partitioning

Uploaded by

Data Partitioning

Uploaded by

2/5/2020 Data Partitioning

Data partitioning is a technique to break up a big database (DB) into

a. Horizontal partitioning: In this scheme, we put different rows into

valid as there will be a lot of places in a thickly populated area like

b. Vertical Partitioning: In this scheme, we divide our data to store

Vertical partitioning is straightforward to implement and has a low

c. Directory Based Partitioning: A loosely coupled approach to work

a. Key or Hash-based partitioning: Under this scheme, we apply a

that record. This approach should ensure a uniform allocation of data

b. List partitioning: In this scheme, each partition is assigned a list of

c. Round-robin partitioning: This is a very simple strategy that ensures

d. Composite partitioning: Under this scheme, we combine any of the

3. Common Problems of Data Partitioning

On a partitioned database, there are certain extra constraints on the

a. Joins and Denormalization: Performing joins on a database which is

b. Referential integrity: As we saw that performing a cross-partition

Most of RDBMS do not support foreign keys constraints across databases

c. Rebalancing: There could be many reasons we have to change our

In such cases, either we have to create more DB partitions or have to

You might also like