22 Introduction to
Distributed Databases
Intro to Database Systems Andy Pavlo
15-445/15-645
Fall 2019 AP Computer Science
Carnegie Mellon University
2
ADMINISTRIVIA
Homework #5: Monday Dec 3rd @ 11:59pm
Project #4: Monday Dec 10th @ 11:59pm
Extra Credit: Wednesday Dec 10th @ 11:59pm
Final Exam: Monday Dec 9th @ 5:30pm
CMU 15-445/645 (Fall 2019)
3
ADMINISTRIVIA
Monday Dec 2th – Oracle Lecture
→ Shasank Chavan (VP In-Memory Databases)
Wednesday Dec 4th – Potpourri + Review
→ Vote for what system you want me to talk about.
→ https://cmudb.io/f19-systems
Sunday Nov 24th – Extra Credit Check
→ Submit your extra credit assignment early to get feedback
from me.
CMU 15-445/645 (Fall 2019)
4
U P C O M I N G D ATA B A S E E V E N T S
Oracle Research Talk
→ Tuesday December 4th @ 12:00pm
→ CIC 4th Floor
CMU 15-445/645 (Fall 2019)
5
PA R A L L E L V S . D I S T R I B U T E D
Parallel DBMSs:
→ Nodes are physically close to each other.
→ Nodes connected with high-speed LAN.
→ Communication cost is assumed to be small.
Distributed DBMSs:
→ Nodes can be far from each other.
→ Nodes connected using public network.
→ Communication cost and problems cannot be ignored.
CMU 15-445/645 (Fall 2019)
6
DISTRIBUTED DBMSs
Use the building blocks that we covered in single-
node DBMSs to now support transaction
processing and query execution in distributed
environments.
→ Optimization & Planning
→ Concurrency Control
→ Logging & Recovery
CMU 15-445/645 (Fall 2019)
7
T O D AY ' S A G E N D A
System Architectures
Design Issues
Partitioning Schemes
Distributed Concurrency Control
CMU 15-445/645 (Fall 2019)
8
SYSTEM ARCHITECTURE
A DBMS's system architecture specifies what
shared resources are directly accessible to CPUs.
This affects how CPUs coordinate with each other
and where they retrieve/store objects in the
database.
CMU 15-445/645 (Fall 2019)
9
SYSTEM ARCHITECTURE
Network
Network
Network
Shared Shared Shared Shared
Everything Memory Disk Nothing
CMU 15-445/645 (Fall 2019)
10
SHARED MEMORY
CPUs have access to common
memory address space via a fast Network
interconnect.
→ Each processor has a global view of all the
in-memory data structures.
→ Each DBMS instance on a processor has to
"know" about the other instances.
CMU 15-445/645 (Fall 2019)
11
SHARED DISK
All CPUs can access a single logical
disk directly via an interconnect, but
each have their own private
Network
memories.
→ Can scale execution layer independently
from the storage layer.
→ Must send messages between CPUs to
learn about their current state.
CMU 15-445/645 (Fall 2019)
12
SHARED DISK EXAMPLE
Node
Page ABC Storage
Update
Get Id=101
101
Get Id=101 Page ABC
Node
Get Id=200
Page XYZ
Application
Server Node
CMU 15-445/645 (Fall 2019)
13
SHARED NOTHING
Each DBMS instance has its own Network
CPU, memory, and disk.
Nodes only communicate with each
other via network.
→ Hard to increase capacity.
→ Hard to ensure consistency.
→ Better performance & efficiency.
CMU 15-445/645 (Fall 2019)
14
SHARED NOTHING EXAMPLE
Get Id=10 Node
Get Id=200 P1→ID:1-150
P1→ID:1-100
Node
Get Id=200
Get Id=200 P3→ID:101-200
Application
Server Node
P2→ID:201-300
P2→ID:151-300
CMU 15-445/645 (Fall 2019)
15
E A R LY D I S T R I B U T E D D ATA B A S E S Y S T E M S
MUFFIN – UC Berkeley (1979)
SDD-1 – CCA (1979)
System R* – IBM Research (1984) Stonebraker Bernstein
Gamma – Univ. of Wisconsin (1986)
NonStop SQL – Tandem (1987)
Mohan DeWitt
Gray
CMU 15-445/645 (Fall 2019)
16
DESIGN ISSUES
How does the application find data?
How to execute queries on distributed data?
→ Push query to data.
→ Pull data to query.
How does the DBMS ensure correctness?
CMU 15-445/645 (Fall 2019)
17
HOMOGENOUS VS. HETEROGENOUS
Approach #1: Homogenous Nodes
→ Every node in the cluster can perform the same set of
tasks (albeit on potentially different partitions of data).
→ Makes provisioning and failover "easier".
Approach #2: Heterogenous Nodes
→ Nodes are assigned specific tasks.
→ Can allow a single physical node to host multiple "virtual"
node types for dedicated tasks.
CMU 15-445/645 (Fall 2019)
18
MONGODB HETEROGENOUS ARCHITECTURE
Shards (mongod)
Router
(mongos) P1 P2
Get Id=101
Router
(mongos)
⋮
P3 P4
Application
Server
Config Server P1→ID:1-100
(mongod) P2→ID:101-200
P3→ID:201-300
⋮ P4→ID:301-400
CMU 15-445/645 (Fall 2019)
19
D ATA T R A N S PA R E N C Y
Users should not be required to know where data
is physically located, how tables are partitioned
or replicated.
A SQL query that works on a single-node DBMS
should work the same on a distributed DBMS.
CMU 15-445/645 (Fall 2019)
20
D ATA B A S E PA R T I T I O N I N G
Split database across multiple resources:
→ Disks, nodes, processors.
→ Sometimes called "sharding"
The DBMS executes query fragments on each
partition and then combines the results to produce
a single answer.
CMU 15-445/645 (Fall 2019)
21
N A Ï V E TA B L E PA R T I T I O N I N G
Each node stores one and only table.
Assumes that each node has enough storage space
for a table.
CMU 15-445/645 (Fall 2019)
22
N A Ï V E TA B L E PA R T I T I O N I N G
Table1 Table2 Partitions
Table1
Ideal Query: Table2
SELECT * FROM table
CMU 15-445/645 (Fall 2019)
23
H O R I Z O N TA L PA R T I T I O N I N G
Split a table's tuples into disjoint subsets.
→ Choose column(s) that divides the database equally in
terms of size, load, or usage.
→ Hash Partitioning, Range Partitioning
The DBMS can partition a database physical
(shared nothing) or logically (shared disk).
CMU 15-445/645 (Fall 2019)
24
H O R I Z O N TA L PA R T I T I O N I N G
Partitioning Key
Table1 Partitions
101 a XXX 2019-11-29 hash(a)%4 = P2
102 b XXY 2019-11-28 hash(b)%4 = P4 P1 P2
103 c XYZ 2019-11-29 hash(c)%4 = P3
104 d XYX 2019-11-27 hash(d)%4 = P2
105 e XYY 2019-11-29 hash(e)%4 = P1
Ideal Query: P3 P4
SELECT * FROM table
WHERE partitionKey = ?
CMU 15-445/645 (Fall 2019)
25
CONSISTENT HASHING
1 0 hash(key1)
E Replication Factor = 3
A
C If hash(key)=D
hash(key2)
B
D
1/2 CMU 15-445/645 (Fall 2019)
26
L O G I C A L PA R T I T I O N I N G
Node Id=1
Id=2
Storage
Get Id=1
Id=1
Id=2
Get Id=3 Id=3
Application Id=4
Server Node
Id=3
Id=4
CMU 15-445/645 (Fall 2019)
27
P H Y S I C A L PA R T I T I O N I N G
Node
Id=1
Get Id=1 Id=2
Get Id=3
Application
Server Node
Id=3
Id=4
CMU 15-445/645 (Fall 2019)
29
S I N G L E- N O D E V S . D I S T R I B U T E D
A single-node txn only accesses data that is
contained on one partition.
→ The DBMS does not need coordinate the behavior
concurrent txns running on other nodes.
A distributed txn accesses data at one or more
partitions.
→ Requires expensive coordination.
CMU 15-445/645 (Fall 2019)
30
T R A N S A C T I O N C O O R D I N AT I O N
If our DBMS supports multi-operation and
distributed txns, we need a way to coordinate their
execution in the system.
Two different approaches:
→ Centralized: Global "traffic cop".
→ Decentralized: Nodes organize themselves.
CMU 15-445/645 (Fall 2019)
31
TP MONITORS
Example of a centralized coordinator.
Originally developed in the 1970-80s to provide
txns between terminals and mainframe databases.
→ Examples: ATMs, Airline Reservations.
Many DBMSs now support the same functionality
internally.
CMU 15-445/645 (Fall 2019)
32
C E N T R A L I Z E D C O O R D I N AT O R
P1
Coordinator P2
Commit
Lock Request
Request P3 Partitions
P4
Acknowledgement P1 P2
Application Safe to commit?
Server P3 P4
CMU 15-445/645 (Fall 2019)
33
C E N T R A L I Z E D C O O R D I N AT O R
Partitions
Middleware
Commit
Query Requests
Request Safe to commit?
P1 P2
Application P1→ID:1-100
Server P2→ID:101-200 P3 P4
P3→ID:201-300
P4→ID:301-400
CMU 15-445/645 (Fall 2019)
34
D E C E N T R A L I Z E D C O O R D I N AT O R
Partitions
Commit
Begin Request
Request P1 P2
Query Request
Safe to commit?
Application
Server P3 P4
CMU 15-445/645 (Fall 2019)
35
DISTRIBUTED CONCURRENCY CONTROL
Need to allow multiple txns to execute
simultaneously across multiple nodes.
→ Many of the same protocols from single-node DBMSs
can be adapted.
This is harder because of:
→ Replication.
→ Network Communication Overhead.
→ Node Failures.
→ Clock Skew.
CMU 15-445/645 (Fall 2019)
36
DISTRIBUTED 2PL
Waits-For Graph
Set A=2 Set B=7
T1 T2
Application Application
Server Set B=9 Set A=0 Server
A=2
A=1 B=7
B=8
NETWORK
Node 1 Node 2
CMU 15-445/645 (Fall 2019)
37
CONCLUSION
I have barely scratched the surface on distributed
database systems…
It is hard to get right.
More info (and humiliation):
→ Kyle Kingsbury's Jepsen Project
CMU 15-445/645 (Fall 2019)
38
NEXT CLASS
Distributed OLTP Systems
Replication
CAP Theorem
Real-World Examples
CMU 15-445/645 (Fall 2019)