Transactions
Stats (Out of 90+2)
○ Median: 76.5, Mean: 73.9, High: 92, Standard Deviation: 10.5
○ Regrade requests open for 1 week (next Tuesday)
Exam
Grades
Feedback Themes
- TAs were ultra-responsive during the “24 hours”
- Test was too long. We’ll recalibrate for Finals
- Most liked 24 hour flex time
UPDATE Product
SET Price = Price – 1.99
WHERE pname = ‘Gizmo’
SQL INSERT INTO SmallProduct(name, price)
Writes SELECT pname, price
FROM Product
WHERE price <= 0.99
DELETE Product
WHERE price <=0.99
UPDATE Product
SET Price = Price – 1.99
WHERE pname = ‘Gizmo’
SQL INSERT INTO SmallProduct(name, price)
Writes SELECT pname, price
FROM Product
Readfrom another table
WHERE price <= 0.99
DELETE Product
WHERE price <=0.99
How?
Mobile Game
Report & Share
Example Real-Time
DBMS
Business/Product
Analysis
Game App User Events
DB
DB v0
Q1: 1000 users/sec? Q7: How to model/evolve game data? Q4: Which user cohorts?
Q2: Offline? Q8: How to scale to millions of users? Q5: Next features to build?
(Recap lectures) Q3: Support v1, v1’ versions? Q9: When machines die, restore game Experiments to run?
state gracefully? Q6: Predict ads demand?
App designer Systems designer Product/Biz designer
How?
Mobile Game
Report & Share
Example Real-Time
DBMS
Business/Product
Analysis
Game App User Events
DB
DB v0
Q1: 1000 users/sec? Q7: How to model/evolve game data? Q4: Which user cohorts?
Q2: Offline? Q8: How to scale to millions of users? Q5: Next features to build?
(Recap lectures) Q3: Support v1, v1’ versions? Q9: When machines crash, restore Experiments to run?
game state gracefully? Q6: Predict ads demand?
App designer Systems designer Product/Biz designer
1. Why Transactions?
Today’s
Lecture 2. Transactions
3. Properties of Transactions: ACID
4. Logging
Example
Unpack
ATM DB:
Transaction
Read Balance Read Balance
Give money vs Update Balance
Update Balance Give money
Visa does > 60,000 TXNs/sec with users & merchants
Want your 4$ Starbucks transaction to wait for a stranger’s 10k$ bet in Las Vegas ?
⇒ Transactions can (1) be quick or take a long time, (2) unrelated to you
Transactions are at the core of
-- payment, stock market, banks, ticketing
-- Gmail, Google Docs (e.g., multiple people editing)
Money Money (@4:29 am day+1)
Example
Monthly
bank
interest
transaction
‘T-Monthly-423’
Monthly Interest 10%
4:28 am Starts run on 100M bank accounts
Takes 24 hours to run
UPDATE Money
SET Balance = Balance * 1.1
Money Money (@4:29 am day+1)
Example
Monthly
bank
interest
transaction
Cost to update all data
100M bank accounts → 100M seeks? (worst case)
Performance (@10 msec/seek, that’s 1 million secs)
Problem1: SLOW :(
Money Money (@10:45 am)
Example
??
Monthly ?? Did T-Monthly-423 complete?
?? Which tuples are bad?
bank
interest ??
transaction
Case1: T-Monthly-423 crashed
‘T-Monthly-423’ Case2: T-Monthly-423 completed
With crash Monthly Interest 10% 4002 deposited 20$ at 10:45 am
4:28 am Starts run on 100M bank accounts
Takes 24 hours to run
Network outage at 10:29 am,
System access at 10:45 am
Problem 2: Wrong :(
15
Primary data structures/algorithms
LOGS LOCKS
Big Scale
Roadmap
?????
1. Why Transactions?
Today’s
Lecture 2. Properties of Transactions: ACID
3. Logging
Transactions: Basic Definition
A transaction (“TXN”) is a sequence of In the real world, a TXN
either happened
one or more operations (reads or completely or not at all
writes) which reflects a single real-world (e.g., you withdrew 100$
transition. from bank. Or not.)
START TRANSACTION
UPDATE Product
SET Price = Price – 1.99
WHERE pname = ‘Gizmo’
COMMIT
Transactions in SQL
• In “ad-hoc” SQL, each statement = one transaction
• In a program, multiple statements can be grouped together as a transaction
START TRANSACTION
UPDATE Bank SET amount = amount – 100
WHERE name = ‘Bob’
UPDATE Bank SET amount = amount + 100
WHERE name = ‘Joe’
COMMIT
Motivation for Transactions
Group user actions (reads & writes) into Transactions helps with two goals:
1. Recovery & Durability: Keep the data consistent and durable.
Despite system crashes, user canceling TXN part way, etc.
This lecture!
Idea: Use LOGS. Support to “commit” or “rollback” TXNs
2. Concurrency: Get better performance by parallelizing TXNs
without creating ‘bad data.’ Despite slow disk writes and reads.
Next lecture
Idea: Use LOCKS. Run several user TXNs concurrently.
Example 1: Protection against crashes / aborts
Scenario: Make a CheapProducts table, from a Products table
Client 1:
INSERT INTO CheapProduct(name, price)
SELECT pname, price
FROM Product Crash / abort!
WHERE price <= 0.99
DELETE Product
WHERE price <=0.99
What goes wrong?
Client 1:
START TRANSACTION
INSERT INTO CheapProduct(name, price)
SELECT pname, price
FROM Product
WHERE price <= 0.99
DELETE Product
WHERE price <=0.99
COMMIT
Now we’d be fine! We’ll see how / why this lecture
Example 2: Multiple users: single statements
Client 1: [at 10:01 am] Client 2: [at 10:01 am]
UPDATE Product UPDATE Product
SET Price = Price – 1.99 SET Price = Price*0.5
WHERE pname = ‘Gizmo’ WHERE pname=‘Gizmo’
Two managers attempt to discount products at same time -
What could go wrong?
Client 1: START TRANSACTION Client 2: START TRANSACTION
UPDATE Product UPDATE Product
SET Price = Price – 1.99 SET Price = Price*0.5
WHERE pname = ‘Gizmo’ WHERE pname=‘Gizmo’
COMMIT COMMIT
Now works like a charm- we’ll see how / why next lecture…
3. Properties of Transactions
1. Atomicity
2. Consistency
What you will
learn about in 3. Isolation
this section
4. Durability
ACID: Atomicity
• TXN is all or nothing
• Commits: all the changes are made
• Aborts: no changes are made
ACID: Consistency
• The tables must always satisfy user-specified integrity constraints
• E.g., Account number is unique, Sum of debits and of credits is 0
• How consistency is achieved:
• Programmer writes a TXN to go from one consistent state to a
consistent state
• System makes sure that the TXN is atomic (e.g., if EXCEPTION, rolls
back)
ACID: Isolation
• A TXN executes concurrently with other TXNs
• Effect of TXNs is the same as TXNs running one after another
Conceptually,
• similar to OS “sandboxes”
• E.g. TXNs can’t observe each other’s “partial updates”
ACID: Durability
• The effect of a TXN must persist after the TXN
• And after the whole program has terminated
• And even if there are power failures, crashes, etc.
• ⇒ Write data to durable IO (e.g., disk)
ACID Summary
• Atomic
• State shows either all the effects of TXN, or none of them
• Consistent
• TXN moves from a state where integrity holds, to another where integrity
holds
• Isolated
• Effect of TXNs is the same as TXNs running one after another
• Durable
• Once a TXN has committed, its effects remain in the database
A Note: ACID is one popular option!
• Many debates over ACID, both historically and currently
• Some “NoSQL” DBMSs relax ACID
• In turn, now “NewSQL” reintroduces ACID compliance to
NoSQL-style DBMSs…
⇒ Usually, depends on what consistency and performance your
application needs
ACID is an extremely important & successful paradigm,
but still debated!
4. Atomicity & Durability via Logging
Conceptual
Idea:
Trip to Europe
1. Make TODO list. Buy 2. Actual Visit
tickets
(Much longer than buying tickets)
Recall (on disks)
▹ Sequential reads FASTER than random reads
▹ Sequential writes (aka “appends”) FASTER than random writes
Big Idea Big Idea: LOGs (or log files or ledger)
▹ Any value that changes? Append to LOG!
LOGS! ■ LOG is a compact “todo” list of data updates
▹ Intuition:
(aka TODO/ ■ Data pages: (a) Update in RAM (fast) (b) Update on disk
ledger) later (slow)
■ LOGs: ( c) Append “todo” in LOGs and (d) control when
you Flush LOGs to disk
Many kinds of LOGs. We’ll study a few key ones!
1. How to make/use LOGs?
What you will
learn about in
this section
2. How to make it fast? (Mess with memory
and disk)
Basic Idea: (Physical) Logging
Idea:
• Log consists of an ordered list of Update Records
• Log record contains UNDO information for every update!
<TransactionID, &reference, old value, new value>
(e.g., key)
What DB does?
• Owns the log “service” for all applications/transactions.
• Appends to log. Flush when necessary — force writes to disk
This is sufficient to UNDO any transaction!
Money Money (@4:29 am day+1) WA Log (@4:29 am day+1)
Example
Update
Records
Monthly
bank Commit
interest Record
transaction
‘T-Monthly-423’
Full run
Monthly Interest 10%
4:28 am Starts run on 100M bank accounts
Takes 24 hours to run
START TRANSACTION
UPDATE Money
SET Amt = Amt * 1.10
COMMIT
Money Money (@10:45 am) WA Log (@10:29 am)
Example
??
Monthly ??
??
bank
interest ??
transaction
TXN ‘T-Monthly-423’
Did T-Monthly-423 complete?
With crash Monthly Interest 10%
4:28 am Starts run on 100M bank accounts
Which tuples are bad?
Takes 24 hours to run Case1: T-Monthly-423 was crashed
Network outage at 10:29 am, Case2: T-Monthly-423 completed. 4002
System access at 10:45 am deposited 20$ at 10:45 am
Can you infer from RED log records?
Money (@10:45 am) Money (after recovery) WA Log (@10:29 am)
Example
Monthly
bank
interest
transaction
System recovery (after 10:45 am)
1. Rollback uncommitted transactions
- Restore old values from WAL Log (if any)
Recovery 2.
- Notify developers about aborted TXN
Redo Recent transactions (w/ new values)
3. Back in business; Redo (any pending) transactions
1. How to make/use LOGs?
What you will
learn about in
this section
2. ⇒ How to make it fast? (Mess with
memory and disk)
A picture of logging
Log is a file (like any
A=7 Log data table)
1. Pages updated in
B=5
Main Memory RAM
2. Flushed as DB
blocks on disk
(sequential I/O)
A=7 “Flushing to disk”
= writing to disk
Data on Disk Log on Disk from main
memory
A picture of logging
T: R(A=7), W(A=13) [Update Record]
[T reads A=7, writes A=13] <Tid, &A, 7,13>
Operation
T A=13 Log recorded in update
log in main
B=5
Main Memory memory!
A=7
Data on Disk Log on Disk
Why do we need logging for
atomicity?
• Could we just write TXN updates to disk only once whole TXN
complete?
• Then, if abort / crash and TXN not complete, it has no effect- atomicity!
• With unlimited memory and time, this could work…
• ⇒ We need to log partial results of TXNs because of:
• Memory constraints (e.g. , billions of updates)
• Time constraints (what if one TXN takes very long?)
We need to write partial results to disk!
…And so we need a LOG to (maybe) undo these partial results!
What is the correct way to LOG to disk?
• We’ll look at the Write-Ahead Logging (WAL) protocol
• We’ll see why it works by looking at other protocols which are incorrect!
Remember: Key idea is to ensure durability
while maintaining our ability to “undo”!
Write-Ahead Logging (WAL)
TXN Commit Protocol
Write-ahead Logging (WAL)
Commit Protocol
[Update Record]
Commit after we’ve written
COMMIT
T: R(A), W(A) <Tid, &A, 7,13> record
log to disk but before
we’ve written data to
disk…
T A=13 Log-RAM
B=5
Main Memory
OK, Commit!
A=7
Data on Disk Log-Disk
Write-ahead Logging (WAL)
Commit Protocol
Commit after we’ve written
T: R(A), W(A) log to disk but before
we’ve written data to
disk… this is WAL!
T A=13
Main Memory
OK, Commit!
If we crash now, is T
<Tid, &A, 7,13> durable?
Yes
A=7
A=13
USE THE LOG!
Data on Disk Log-Disk
Write-Ahead Logging (WAL)
Algorithm: WAL
For each tuple update, write Update Record into LOG-RAM
Follow two Flush rules for LOG
● Rule1: Flush Update Record into LOG-Disk before → Durability
corresponding data page goes to storage
● Rule2: Before TXN commits,
- Flush all Update Records to LOG-Disk
- Flush COMMIT Record to LOG-Disk
→ Atomicity
Transaction is committed once COMMIT
record is on stable storage
Time > 0 Time > 0
Flush Update TXN
Flush COMMIT Record
Data Flush
Record to LOG-Disk to LOG-Disk COMMIT
Rule1: For each tuple update Rule2: Before TXN commits
after
shouldhappen
all records ateflushe
to logdisk
Incorrect Commit Protocol #1
Let’s try committing
before we’ve written
either data or LOG to
T: R(A), W(A) A: 7→13 disk…
T A=13 Log-RAM
OK, Commit!
B=5
Main Memory
If we crash now, is T
durable?
No
Lost T’s update!
A=7
Data on Disk Log-Disk
Incorrect Commit Protocol #2
Let’s try committing after
we’ve written data but
before we’ve written LOG
T: R(A), W(A) A: 7→13 to disk…
T A=13 Log-RAM
OK, Commit!
B=5
Main Memory
If we crash now, is T
durable? Yes! Except…
A=13 How do we know
whether T was
Data on Disk Log-Disk
committed??
loseinformation onleg
Money Money (@4:29 am day+1) WAL (@4:29 am day+1)
Example
Monthly
bank
interest
transaction
Cost to update all data Cost to Append to log
100M bank accounts → 100M seeks? (worst + 1 seek to get ‘end of log’
case) + write 100M log entries sequentially
Performance (@10 msec/seek, that’s 1 Million secs) (fast!!! < 10 sec)
[Lazily update data on disk later,
when convenient.]
Speedup for TXN Commit
1 Million secs vs 10 sec!!!
Logging Summary
• If DB says TX commits, TX effect remains after database
crash
• DB can undo actions and help us with atomicity
• This is only half the story…