0% found this document useful (0 votes)

151 views37 pages

Introduction To Distributed Databases: Intro To Database Systems Andy Pavlo

The document discusses distributed database systems and introduces key concepts. It covers system architectures like shared memory, shared disk, and shared nothing. It also discusses data partitioning techniques, including naive table partitioning and horizontal partitioning. The document emphasizes that queries should work the same on distributed systems as on single-node systems through data transparency.

Uploaded by

akshay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

151 views37 pages

Introduction To Distributed Databases: Intro To Database Systems Andy Pavlo

Uploaded by

akshay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

22 Introduction to

Distributed Databases

Intro to Database Systems Andy Pavlo

15-445/15-645
Fall 2019 AP Computer Science
Carnegie Mellon University
2

ADMINISTRIVIA

Homework #5: Monday Dec 3rd @ 11:59pm

Project #4: Monday Dec 10th @ 11:59pm

Extra Credit: Wednesday Dec 10th @ 11:59pm

Final Exam: Monday Dec 9th @ 5:30pm

CMU 15-445/645 (Fall 2019)

ADMINISTRIVIA

Monday Dec 2th – Oracle Lecture

→ Shasank Chavan (VP In-Memory Databases)

Wednesday Dec 4th – Potpourri + Review

→ Vote for what system you want me to talk about.
→ https://cmudb.io/f19-systems

Sunday Nov 24th – Extra Credit Check

→ Submit your extra credit assignment early to get feedback
from me.

CMU 15-445/645 (Fall 2019)

U P C O M I N G D ATA B A S E E V E N T S

Oracle Research Talk

→ Tuesday December 4th @ 12:00pm
→ CIC 4th Floor

CMU 15-445/645 (Fall 2019)

PA R A L L E L V S . D I S T R I B U T E D

Parallel DBMSs:
→ Nodes are physically close to each other.
→ Nodes connected with high-speed LAN.
→ Communication cost is assumed to be small.

Distributed DBMSs:
→ Nodes can be far from each other.
→ Nodes connected using public network.
→ Communication cost and problems cannot be ignored.

CMU 15-445/645 (Fall 2019)

DISTRIBUTED DBMSs

Use the building blocks that we covered in single-

node DBMSs to now support transaction
processing and query execution in distributed
environments.
→ Optimization & Planning
→ Concurrency Control
→ Logging & Recovery

CMU 15-445/645 (Fall 2019)

T O D AY ' S A G E N D A

System Architectures
Design Issues
Partitioning Schemes
Distributed Concurrency Control

CMU 15-445/645 (Fall 2019)

SYSTEM ARCHITECTURE

A DBMS's system architecture specifies what

shared resources are directly accessible to CPUs.

This affects how CPUs coordinate with each other

and where they retrieve/store objects in the
database.

CMU 15-445/645 (Fall 2019)

SYSTEM ARCHITECTURE

Network
Network
Network

Shared Shared Shared Shared

Everything Memory Disk Nothing

CMU 15-445/645 (Fall 2019)

SHARED MEMORY

CPUs have access to common

memory address space via a fast Network
interconnect.
→ Each processor has a global view of all the
in-memory data structures.
→ Each DBMS instance on a processor has to
"know" about the other instances.

CMU 15-445/645 (Fall 2019)

SHARED DISK

All CPUs can access a single logical

disk directly via an interconnect, but
each have their own private
Network
memories.
→ Can scale execution layer independently
from the storage layer.
→ Must send messages between CPUs to
learn about their current state.

CMU 15-445/645 (Fall 2019)

SHARED DISK EXAMPLE

Node
Page ABC Storage
Update
Get Id=101
101

Get Id=101 Page ABC

Node
Get Id=200
Page XYZ
Application
Server Node

CMU 15-445/645 (Fall 2019)

SHARED NOTHING

Each DBMS instance has its own Network

CPU, memory, and disk.
Nodes only communicate with each
other via network.
→ Hard to increase capacity.
→ Hard to ensure consistency.
→ Better performance & efficiency.

CMU 15-445/645 (Fall 2019)

SHARED NOTHING EXAMPLE

Get Id=10 Node
Get Id=200 P1→ID:1-150
P1→ID:1-100

Node
Get Id=200
Get Id=200 P3→ID:101-200
Application
Server Node
P2→ID:201-300
P2→ID:151-300

CMU 15-445/645 (Fall 2019)

E A R LY D I S T R I B U T E D D ATA B A S E S Y S T E M S

MUFFIN – UC Berkeley (1979)

SDD-1 – CCA (1979)
System R* – IBM Research (1984) Stonebraker Bernstein

Gamma – Univ. of Wisconsin (1986)

NonStop SQL – Tandem (1987)
Mohan DeWitt

Gray
CMU 15-445/645 (Fall 2019)
16

DESIGN ISSUES

How does the application find data?

How to execute queries on distributed data?
→ Push query to data.
→ Pull data to query.
How does the DBMS ensure correctness?

CMU 15-445/645 (Fall 2019)

HOMOGENOUS VS. HETEROGENOUS

Approach #1: Homogenous Nodes

→ Every node in the cluster can perform the same set of
tasks (albeit on potentially different partitions of data).
→ Makes provisioning and failover "easier".

Approach #2: Heterogenous Nodes

→ Nodes are assigned specific tasks.
→ Can allow a single physical node to host multiple "virtual"
node types for dedicated tasks.

CMU 15-445/645 (Fall 2019)

MONGODB HETEROGENOUS ARCHITECTURE

Shards (mongod)
Router
(mongos) P1 P2
Get Id=101
Router
(mongos)
⋮
P3 P4
Application
Server
Config Server P1→ID:1-100
(mongod) P2→ID:101-200
P3→ID:201-300
⋮ P4→ID:301-400
CMU 15-445/645 (Fall 2019)
19

D ATA T R A N S PA R E N C Y

Users should not be required to know where data

is physically located, how tables are partitioned
or replicated.

A SQL query that works on a single-node DBMS

should work the same on a distributed DBMS.

CMU 15-445/645 (Fall 2019)

D ATA B A S E PA R T I T I O N I N G

Split database across multiple resources:

→ Disks, nodes, processors.
→ Sometimes called "sharding"

The DBMS executes query fragments on each

partition and then combines the results to produce
a single answer.

CMU 15-445/645 (Fall 2019)

N A Ï V E TA B L E PA R T I T I O N I N G

Each node stores one and only table.

Assumes that each node has enough storage space
for a table.

CMU 15-445/645 (Fall 2019)

N A Ï V E TA B L E PA R T I T I O N I N G

Table1 Table2 Partitions

Table1

Ideal Query: Table2

SELECT * FROM table

CMU 15-445/645 (Fall 2019)

H O R I Z O N TA L PA R T I T I O N I N G

Split a table's tuples into disjoint subsets.

→ Choose column(s) that divides the database equally in
terms of size, load, or usage.
→ Hash Partitioning, Range Partitioning

The DBMS can partition a database physical

(shared nothing) or logically (shared disk).

CMU 15-445/645 (Fall 2019)

H O R I Z O N TA L PA R T I T I O N I N G

Partitioning Key
Table1 Partitions
101 a XXX 2019-11-29 hash(a)%4 = P2
102 b XXY 2019-11-28 hash(b)%4 = P4 P1 P2
103 c XYZ 2019-11-29 hash(c)%4 = P3
104 d XYX 2019-11-27 hash(d)%4 = P2
105 e XYY 2019-11-29 hash(e)%4 = P1

Ideal Query: P3 P4
SELECT * FROM table
WHERE partitionKey = ?

CMU 15-445/645 (Fall 2019)

CONSISTENT HASHING
1 0 hash(key1)
E Replication Factor = 3
A

C If hash(key)=D

hash(key2)
B
D

1/2 CMU 15-445/645 (Fall 2019)

L O G I C A L PA R T I T I O N I N G
Node Id=1
Id=2
Storage
Get Id=1

Id=1
Id=2
Get Id=3 Id=3
Application Id=4
Server Node
Id=3
Id=4

CMU 15-445/645 (Fall 2019)

P H Y S I C A L PA R T I T I O N I N G
Node
Id=1
Get Id=1 Id=2

Get Id=3
Application
Server Node
Id=3
Id=4

CMU 15-445/645 (Fall 2019)

S I N G L E- N O D E V S . D I S T R I B U T E D

A single-node txn only accesses data that is

contained on one partition.
→ The DBMS does not need coordinate the behavior
concurrent txns running on other nodes.

A distributed txn accesses data at one or more

partitions.
→ Requires expensive coordination.

CMU 15-445/645 (Fall 2019)

T R A N S A C T I O N C O O R D I N AT I O N

If our DBMS supports multi-operation and

distributed txns, we need a way to coordinate their
execution in the system.

Two different approaches:

→ Centralized: Global "traffic cop".
→ Decentralized: Nodes organize themselves.

CMU 15-445/645 (Fall 2019)

TP MONITORS

Example of a centralized coordinator.

Originally developed in the 1970-80s to provide
txns between terminals and mainframe databases.
→ Examples: ATMs, Airline Reservations.

Many DBMSs now support the same functionality

internally.

CMU 15-445/645 (Fall 2019)

C E N T R A L I Z E D C O O R D I N AT O R
P1
Coordinator P2
Commit
Lock Request
Request P3 Partitions
P4

Acknowledgement P1 P2

Application Safe to commit?

Server P3 P4

CMU 15-445/645 (Fall 2019)

C E N T R A L I Z E D C O O R D I N AT O R

Partitions

Middleware
Commit
Query Requests
Request Safe to commit?
P1 P2

Application P1→ID:1-100
Server P2→ID:101-200 P3 P4
P3→ID:201-300
P4→ID:301-400

CMU 15-445/645 (Fall 2019)

D E C E N T R A L I Z E D C O O R D I N AT O R

Partitions

Commit
Begin Request
Request P1 P2

Query Request
Safe to commit?
Application
Server P3 P4

CMU 15-445/645 (Fall 2019)

DISTRIBUTED CONCURRENCY CONTROL

Need to allow multiple txns to execute

simultaneously across multiple nodes.
→ Many of the same protocols from single-node DBMSs
can be adapted.
This is harder because of:
→ Replication.
→ Network Communication Overhead.
→ Node Failures.
→ Clock Skew.

CMU 15-445/645 (Fall 2019)

DISTRIBUTED 2PL
Waits-For Graph

Set A=2 Set B=7

T1 T2
Application Application
Server Set B=9 Set A=0 Server

A=2
A=1 B=7
B=8

NETWORK
Node 1 Node 2
CMU 15-445/645 (Fall 2019)
37

CONCLUSION

I have barely scratched the surface on distributed

database systems…
It is hard to get right.

More info (and humiliation):

→ Kyle Kingsbury's Jepsen Project

CMU 15-445/645 (Fall 2019)

NEXT CLASS

Distributed OLTP Systems

Replication
CAP Theorem
Real-World Examples

CMU 15-445/645 (Fall 2019)

Distributed Database System
No ratings yet
Distributed Database System
5 pages
22 Distributed
No ratings yet
22 Distributed
6 pages
CMU Database Systems Course
No ratings yet
CMU Database Systems Course
61 pages
CMU Database Systems Course Info
No ratings yet
CMU Database Systems Course Info
40 pages
01 History
No ratings yet
01 History
61 pages
CMU Database Systems Course Intro
No ratings yet
CMU Database Systems Course Intro
57 pages
14 Queryexecution2
No ratings yet
14 Queryexecution2
47 pages
01 Relationalmodel
No ratings yet
01 Relationalmodel
61 pages
Database Storage: Intro To Database Systems Andy Pavlo
No ratings yet
Database Storage: Intro To Database Systems Andy Pavlo
54 pages
01 Relationalmodel
No ratings yet
01 Relationalmodel
70 pages
CMU Database Course Intro
No ratings yet
CMU Database Course Intro
43 pages
01 History
No ratings yet
01 History
54 pages
01 Introduction
No ratings yet
01 Introduction
52 pages
CS3492-DBMS Unit-5
No ratings yet
CS3492-DBMS Unit-5
9 pages
Database Course Overview
No ratings yet
Database Course Overview
37 pages
23 Distributedoltp
No ratings yet
23 Distributedoltp
86 pages
Database Storage: Intro To Database Systems Andy Pavlo
No ratings yet
Database Storage: Intro To Database Systems Andy Pavlo
63 pages
03 Storage1
No ratings yet
03 Storage1
65 pages
Adbms Tech-Neo Searchable
No ratings yet
Adbms Tech-Neo Searchable
130 pages
BDS Session 5 - NoSQL DB
No ratings yet
BDS Session 5 - NoSQL DB
51 pages
03-Storage1 Slides
No ratings yet
03-Storage1 Slides
110 pages
NoSQL Sharding and Replication Guide
No ratings yet
NoSQL Sharding and Replication Guide
28 pages
What Is A Distributed Database
No ratings yet
What Is A Distributed Database
8 pages
21bcs9882 Yuvraj Dbms 9
No ratings yet
21bcs9882 Yuvraj Dbms 9
6 pages
NoSQL & MongoDB Overview
No ratings yet
NoSQL & MongoDB Overview
47 pages
ECS781P-9-Cloud Data Management
No ratings yet
ECS781P-9-Cloud Data Management
79 pages
Distributed Databases: Daniel Marcous
No ratings yet
Distributed Databases: Daniel Marcous
41 pages
02 Distdbms Storage
No ratings yet
02 Distdbms Storage
62 pages
NoSql Unit 2
No ratings yet
NoSql Unit 2
72 pages
Parallel Query Execution in DBMS
No ratings yet
Parallel Query Execution in DBMS
63 pages
00 Introduction
No ratings yet
00 Introduction
19 pages
Dbms Unit
No ratings yet
Dbms Unit
71 pages
04 Storage2
No ratings yet
04 Storage2
72 pages
16 Concurrencycontrol
No ratings yet
16 Concurrencycontrol
98 pages
20it403 DBMS Digital Material Unit V
No ratings yet
20it403 DBMS Digital Material Unit V
74 pages
03 Storage1
No ratings yet
03 Storage1
55 pages
Parallel Database Systems Guide
No ratings yet
Parallel Database Systems Guide
132 pages
Lec21Notes Merged
No ratings yet
Lec21Notes Merged
20 pages
NoSQL & Distributed Databases Overview
No ratings yet
NoSQL & Distributed Databases Overview
124 pages
00-Schedule - CMU 15-445 - 645 - Intro To Database Systems (Fall 2021)
No ratings yet
00-Schedule - CMU 15-445 - 645 - Intro To Database Systems (Fall 2021)
2 pages
Chapter 4 NOSQL 250525 070847
No ratings yet
Chapter 4 NOSQL 250525 070847
28 pages
06 Hashtables
No ratings yet
06 Hashtables
85 pages
Syllabus ADBMS
No ratings yet
Syllabus ADBMS
3 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
No SQL
No ratings yet
No SQL
109 pages
Nosql Databases
No ratings yet
Nosql Databases
379 pages
Nosql M2-P1-P2
No ratings yet
Nosql M2-P1-P2
75 pages
CSE 453 Slide 1
No ratings yet
CSE 453 Slide 1
46 pages
Distributed Databases - Princip - Ceri, Stefano, 1955
57% (7)
Distributed Databases - Princip - Ceri, Stefano, 1955
426 pages
Beginner'S Guide To Concepts of Nosql and Mongodb: Documented By: - Maulin Shah
No ratings yet
Beginner'S Guide To Concepts of Nosql and Mongodb: Documented By: - Maulin Shah
5 pages
Unit3 Dbms
No ratings yet
Unit3 Dbms
17 pages
DSA Notes Unit-03
No ratings yet
DSA Notes Unit-03
144 pages
DDBS P
No ratings yet
DDBS P
8 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Chapter 1 Introduction To Database Concepts
No ratings yet
Chapter 1 Introduction To Database Concepts
44 pages
Module 1
No ratings yet
Module 1
34 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
Esoteric Principles of Vedic Astrology - Bepin Bihari
82% (11)
Esoteric Principles of Vedic Astrology - Bepin Bihari
326 pages
Lecture #01: Relational Model & Relational Algebra: 1 Databases
No ratings yet
Lecture #01: Relational Model & Relational Algebra: 1 Databases
4 pages
Ben G Weber - Data Science in Production - Building Scalable Model Pipelines With Python-Independently Published (2020)
No ratings yet
Ben G Weber - Data Science in Production - Building Scalable Model Pipelines With Python-Independently Published (2020)
234 pages
(TGX) Downloaded From Torrentgalaxy - To
No ratings yet
(TGX) Downloaded From Torrentgalaxy - To
1 page
First Order Homogeneous Linear Systems With Constant Coefficients
No ratings yet
First Order Homogeneous Linear Systems With Constant Coefficients
15 pages
Sorting & Aggregations: Intro To Database Systems Andy Pavlo
No ratings yet
Sorting & Aggregations: Intro To Database Systems Andy Pavlo
57 pages
First Order Homogeneous Linear Systems With Constant Coefficients
No ratings yet
First Order Homogeneous Linear Systems With Constant Coefficients
15 pages
Query Execution: Intro To Database Systems Andy Pavlo
No ratings yet
Query Execution: Intro To Database Systems Andy Pavlo
56 pages
Bessel Functions: 2 00 0 2 N 0 ( 1) (N!) X 2 2n
No ratings yet
Bessel Functions: 2 00 0 2 N 0 ( 1) (N!) X 2 2n
3 pages
First Order Linear Systems With Constant Coefficients: Department of Mathematics IIT Guwahati Shb/Su
No ratings yet
First Order Linear Systems With Constant Coefficients: Department of Mathematics IIT Guwahati Shb/Su
29 pages
Systems of First Order Differential Equations: Department of Mathematics IIT Guwahati Shb/Su
No ratings yet
Systems of First Order Differential Equations: Department of Mathematics IIT Guwahati Shb/Su
16 pages
Query Planning & Optimization: Intro To Database Systems Andy Pavlo
No ratings yet
Query Planning & Optimization: Intro To Database Systems Andy Pavlo
30 pages
Power Series Solutions To The Legendre Equation & The Legendre Polynomials
No ratings yet
Power Series Solutions To The Legendre Equation & The Legendre Polynomials
11 pages
Power Series & Legendre Polynomials
No ratings yet
Power Series & Legendre Polynomials
11 pages
Operator Methods for ODE Solutions
No ratings yet
Operator Methods for ODE Solutions
9 pages
Bessel Functions for Math Students
No ratings yet
Bessel Functions for Math Students
22 pages
The Method of Frobenius: Department of Mathematics IIT Guwahati Shb/Su
No ratings yet
The Method of Frobenius: Department of Mathematics IIT Guwahati Shb/Su
33 pages
Advanced ODE Solution Methods
No ratings yet
Advanced ODE Solution Methods
16 pages
Series Solutions of Linear Ordinary Differential Equations: Department of Mathematics IIT Guwahati
No ratings yet
Series Solutions of Linear Ordinary Differential Equations: Department of Mathematics IIT Guwahati
26 pages
Higher Order Linear Odes: Theory: Department of Mathematics Iit Guwahati
No ratings yet
Higher Order Linear Odes: Theory: Department of Mathematics Iit Guwahati
17 pages
Lectures 8-9
No ratings yet
Lectures 8-9
15 pages
Solutions of Higher Order Constant Coefficient Linear Odes: Homogeneous Case
No ratings yet
Solutions of Higher Order Constant Coefficient Linear Odes: Homogeneous Case
9 pages
First-Order IVP Existence & Uniqueness
No ratings yet
First-Order IVP Existence & Uniqueness
20 pages
Applications of First Order Odes: Department of Mathematics Iit Guwahati (Shb/Su)
No ratings yet
Applications of First Order Odes: Department of Mathematics Iit Guwahati (Shb/Su)
14 pages
Meka 1.9.0 Tutorial: Multi-label Extension for WEKA
No ratings yet
Meka 1.9.0 Tutorial: Multi-label Extension for WEKA
18 pages
HD4000 HDMI Audio Setup Guide
No ratings yet
HD4000 HDMI Audio Setup Guide
10 pages
Project X
No ratings yet
Project X
46 pages
Network Interface Specifications
No ratings yet
Network Interface Specifications
12 pages
DB2 DBA Resume - V. Prasad
No ratings yet
DB2 DBA Resume - V. Prasad
3 pages
C# Course Outline 2023
No ratings yet
C# Course Outline 2023
3 pages
Windows Server & OS Keys Guide
100% (1)
Windows Server & OS Keys Guide
2 pages
Apple Wacc
0% (2)
Apple Wacc
3 pages
Bluebeam Revu Keyboard Shortcuts 2017-UK
No ratings yet
Bluebeam Revu Keyboard Shortcuts 2017-UK
8 pages
Bitcoin Script Verification Guide
No ratings yet
Bitcoin Script Verification Guide
10 pages
Transducer Master Compatibility Matrix
80% (5)
Transducer Master Compatibility Matrix
149 pages
Prestigio MultiBoard Interactive Display
No ratings yet
Prestigio MultiBoard Interactive Display
15 pages
Hubert SlidesCarnival
No ratings yet
Hubert SlidesCarnival
29 pages
McAfee Corp Q2
No ratings yet
McAfee Corp Q2
345 pages
Expense Tracking System Doc
No ratings yet
Expense Tracking System Doc
11 pages
SAP ArchiveLink
No ratings yet
SAP ArchiveLink
4 pages
E-Registry System - Tor
No ratings yet
E-Registry System - Tor
11 pages
SQL vs. NOSQL PDF
No ratings yet
SQL vs. NOSQL PDF
7 pages
Phase 1: Creating A Program: Editor Program Editor Source Code
No ratings yet
Phase 1: Creating A Program: Editor Program Editor Source Code
9 pages
Lab 12
No ratings yet
Lab 12
8 pages
Computer Science
No ratings yet
Computer Science
18 pages
Web Quiz App Design Overview
No ratings yet
Web Quiz App Design Overview
12 pages
Woodwork Technology Motivate PDF Hammer SCR
No ratings yet
Woodwork Technology Motivate PDF Hammer SCR
16 pages
Yahoo Messenger Shutdown Guide
No ratings yet
Yahoo Messenger Shutdown Guide
3 pages
JQuery MCQ (Awt)
No ratings yet
JQuery MCQ (Awt)
37 pages
Dynamic Data Exchange (DDE) : Slick! For Windows As DDE Server
No ratings yet
Dynamic Data Exchange (DDE) : Slick! For Windows As DDE Server
7 pages
NS700 Upgrade Guide for Technicians
No ratings yet
NS700 Upgrade Guide for Technicians
9 pages
How To Fix An Damaged Usb On Ubuntu
No ratings yet
How To Fix An Damaged Usb On Ubuntu
7 pages
Login History
No ratings yet
Login History
985 pages
Software Requirements and Specifications
No ratings yet
Software Requirements and Specifications
88 pages