Chapter Five
Database Recovery Techniques
Chapter 5 Outline
1. Types of Failure
2. Purpose of Database Recovery
3. Recovery concepts
4. Recovery techniques
Samara University Department of Computer Science 2
Types of Failure
• Transaction failure: Transactions may
fail because of errors (integer overflow
and division by zero), incorrect input,
deadlock, incorrect synchronization, etc.
• System failure: System may fail
because of application error, operating
system fault, RAM failure, etc.
• Media failure: Disk head crash, power
disruption, etc.
Samara University Department of Computer Science 3
Purpose of Database Recovery
• To bring the database into the most recent
consistent state just before the time of
failure occurs.
• To ensure the transaction properties of
Atomicity and Durability (a committed
transaction cannot be canceled and all its
updates must be applied permanently to
the database).
• To ensure robustness, After a failure the
DBMS recovery manager is responsible for
bringing the system into a consistent state
before transactions can resume.
Samara University Department of Computer Science 4
Recovery Concepts outline
1. Catastrophic & non-catastrophic failure
2. The Log File
3. Caching (Buffering) of Disk Blocks
4. Write-Ahead Logging, Steal/No-Steal, and
Force/No-Force
5. Checkpoints in the System Log and Fuzzy
Check pointing
6. Transaction Rollback and Cascading
Rollback
7. Transaction Actions That Do Not Affect the
Database
Samara University Department of Computer Science 5
1. Catastrophic & non-catastrophic
failure (1)
• A typical strategy for recovery may be
summarized informally as follows:
1. If there is extensive damage to a wide portion
of the database due to catastrophic failure,
such as a disk crash, the recovery method
restores a past copy of the database that was
backed up to archival storage (typically tape or
other large capacity offline storage media) and
reconstructs a more current state by reapplying
or redoing the operations of committed
transactions from the backed up log, up to the
time of failure. Samara University Department of Computer Science 6
Catastrophic & non-catastrophic
failure (2)
2. When the database on disk is not physically
damaged, and a non-catastrophic failure has
occurred, the recovery strategy is to identify any
changes that may cause an inconsistency in the
database. For example, a transaction that has
updated some database items on disk but has
not been committed needs to have its changes
reversed by undoing its write operations. It may
also be necessary to redo some operations in
order to restore a consistent state of the
database
Samara University Department of Computer Science 7
2. The Log File (1)
The system must keep information about
the changes that were applied to data
items by the various transactions. This
information is typically kept in the system
log.
• Holds the information that is necessary
for the recovery process
• Records all relevant operations in the
order in which they occur
• Is an append-only file.
• Holds various types of log records (or log
entries).
Samara University Department of Computer Science 8
Log File Entries(2)
Types of records (entries) in log file:
• [start_transaction,T]: Records that
transaction T has started execution.
• [write_item,T,X,old_value,new_value]: T has
changed the value of item X from old_value
to new_value.
• [read_item,T,X]: T has read the value of
item X (not needed in many cases).
• [end_transaction,T]: T has ended execution
• [commit,T]: T has completed successfully,
and committed.
• [abort,T]: T has been aborted.
Samara University Department of Computer Science 9
The Log File (3)
For write_item log entry, old value of item before
modification (BFIM - BeFore Image) and the new value
after modification (AFIM – AFter Image) are stored.
BFIM needed for UNDO, AFIM needed for REDO. A
sample log is given below. Back P and Next P point to
the previous and next log records of the same
transaction.
T ID Back P Next P Operation Data item BFIM AFIM
T1 0 1 Begin
T1 1 4 Write X X = 100 X = 200
T2 0 8 Begin
T1 2 5 W Y Y = 50 Y = 100
T1 4 7 R M M = 200 M = 200
T3 0 9 R N N = 400 N = 400
T1 5 nil End
3. Caching (Buffering) of Disk Blocks (1)
• Cache the disk pages (blocks)–containing DB items
to be Updated into main memory buffers.
• Then update the main memory buffers before being
written back to disk.
• For efficient recovery purpose, the caching of disk
pages is handled by the DBMS instead of the OS.
• Typically, a collection of in-memory buffers, called
DBMS cache kept under the control of the DBMS.
• A directory for the cache is used to keep track of
which DB items are in the buffers.
A table of <disk page address, buffer location> entries.
• The DBMS cache holds the database disk blocks
including; Data blocks , Index blocks , Log blocks
Caching (Buffering) of Disk Blocks (2)
• When DBMS requests action on some item
First checks the cache directory to
determine if the corresponding disk page is
in the cache.
If no, the item must be located on disk and
the appropriate disk pages are copied into
the cache.
It may be necessary to replace (flush) some
of cache buffers to make space available for
the new item. if dirty bit value=1(modified)
FIFO or LRU can be used as replacement
strategies.
Caching (Buffering) of Disk Blocks (3)
• Dirty bit
• Associate with each buffer in the cache a dirty
bit. The dirty bit can be included in the directory
entry.
• It indicates whether or not the buffer has been
modified. Set dirty bit=0 when the page is first
read from disk to the buffer cache. Set dirty bit=1
as soon as the corresponding buffer is modified.
• When the buffer content is replaced –flushed-
from the cache, write it back to the
corresponding disk page only if dirty bit=1
Caching (Buffering) of Disk Blocks (4)
• Pin-unpin bit.
– A page is pinned –i.e. pin-unpin bit value=1-, if it
cannot be written back to disk as yet.
• Strategies that can be used when flushing occurs.
– In-place updating: Writes the buffer back to the
same original disk location overwriting the old value on
disk
– Shadowing: Writes the updated buffer at a different
disk location. Multiple versions of data items can be
maintained. The old value called BFIM –before image
The new value AFIM –after image are maintained
• The new value and old value are kept on disk, so no
need of log for recovery.
4. Write-ahead Logging Protocol (1)
The information needed for recovery must be
written to the log file on disk before changes
are made to the database on disk. Write-
Ahead Logging (WAL) protocol consists of two
rules:
For Undo: Before a data item’s AFIM is flushed
to the database on disk (overwriting the BFIM)
its BFIM must be written to the log and the log
must be saved to disk.
For Redo: Before a transaction executes its
commit operation, all its AFIMs must be written
to the log and the log must be saved on a
stable store.
Write-ahead Logging Protocol (2)
• Without ensuring that this BFIM is recorded in the
appropriate log entry and the log is flushed to disk
before the BFIM is overwritten with the AFIM in the
DB on disk, the recovery will not be possible.
• With Write-Ahead Logging, the log blocks
containing the associated log records for a
particular data block update must first be written to
disk before the data block itself can be written
back to disk.
Write-ahead Logging Protocol (3)
• Commercial DBMSs and WAL
– IBM DB2, Informix, Microsoft SQL Server, Oracle 8,
and Sybase ASE all use a WAL scheme for recovery.
• To facilitate the recovery process, the DBMS
recovery subsystem may need to maintain a number
of lists.
List of active transactions: transactions started but
not committed yet
List of committed transactions since last
checkpoint.
List of aborted transactions since last checkpoint.
Steal/No-Steal and Force/No-Force(1)
Specify how to flush database cache buffers to
database on disk:
Steal: Cache buffers updated by a transaction
may be flushed to disk before the transaction
commits (recovery may require UNDO).
Advantage: avoid the need for a very large
buffer space
No-Steal: Cache buffers cannot be flushed
until after transaction commit (NO-UNDO).
(Buffers are pinned till transactions commit).
• Deferred update follows this approach
Steal/No-Steal and Force/No-Force(2)
• Force: Cache is flushed (forced) to disk
before transaction commits (NO-REDO).
• No-Force: Some cache flushing may be
deferred till after transaction commits
(recovery may require REDO).
• These give rise to four different ways for
handling recovery: Steal/No-Force
(Undo/Redo), Steal/Force (Undo/No-redo),
No-Steal/No-Force (Redo/No-undo), No-
Steal/Force (No-undo/No-redo).
• Typical database systems follow a steal/no-
force strategy
Checkpointing(1)
• In case of failure, the recovery manager requires that
the entire log be examined to process recovery→ time
consuming
• A quick way to limit the amount of log to scan on
recovery can be achieved using checkpoints.
• A [checkpoint] record is written into the log periodically
at that point when the system writes out to the DB on
disk all DBMS buffers that have been modified.
• Hence, all transactions with [commit, T] entry in the
log before [checkpoint] entry do not need to have their
WRITE operations redone in case of crash.
• Since all their update have been recorded in the DB
on disk during checkpointing.
Checkpointing (2)
• The recovery manager must decides at what
intervals to take a checkpoint.
• The intervals measured in time, say every m
minutes. Or the intervals measured in the
number of committed transactions since the last
checkpoint, say t transactions.
– m & t are system parameter
• Take a checkpoint consists of the following
1. Suspend execution of transactions temporarily.
2. Force-write all main memory buffers that have been
modified to disk.
3. Write a [checkpoint] record to the log, and force-write
the log to disk.
4. Resume executing transactions.
Checkpointing (3)
• The time needed to force-write all modified memory
buffers may delay transaction processing step 1.
• Use fuzzy checkpointing to reduce this delay.
– The system can resume transaction processing
after [checkpoint] record written to the log without
having to wait for step 2 to finish.
– However, until step 2 is completed, the previous
[checkpoint] record should remain to be valid.
• The system maintains a pointer to the valid
checkpoint which points to the previous [checkpoint]
recording the log
– Once step2 is conducted, that pointer is changed to
point to the new checkpoint in the log.
Example of Checkpoints (4)
Tc Tf
T1
T2
T3
T4
checkpoint system failure
• T1 can be ignored (updates already output to disk due to checkpoint)
• T2 and T3 redone.
• T4 undone
Samara University Department of Computer Science 23
5.Transaction Rollback & Cascading Rollback(1)
• If a transaction fails, roll back this transaction.
Any data item values changed by this
transaction and written to the DB must be
restored to their previous values –BFIM
The UNDO-type log entries are used to
restore the old values of these data items.
Only the write-item operations need to be
undone during transaction rollback.
• If the recovery protocol ensure recoverable
schedules without to ensure strict or cascadless
schedules, cascading rollback can occur
Transaction Rollback & Cascading rollback(2)
If a transaction T is rolled back, roll back every
transaction T’ that read a value of a data item
written by T, and so on.
• Read-item operations are recorded in the log
only to determine whether cascading rollback of
additional transactions is necessary
• Cascading rollback is complex and time
consuming.
– Almost all recovery mechanisms are designed
such that cascading rollback is never required –
they guarantee strict or cascadless schedules
7. Transaction Actions That Do Not Affect DB
• Generating and printing messages or reports from
information retrieved from the database.
• If a transaction fails before completion, we may not want the
user to get these reports, since the transaction has failed to
complete.
• If such erroneous reports are produced, part of the recovery
process would have to inform the user that these reports are
wrong, since the user may take an action based on these
reports that affects the database.
• Hence, such reports should be generated only after the
transaction reaches its commit point.
• A common method of dealing with such actions is to issue
the commands that generate the reports but keep them as
batch jobs, which are executed only after the transaction
reaches its commit point. If the transaction fails, the batch
jobs are canceled.
Recovery techniques outline
1. Recovery algorithm
2. NO-UNDO/REDO Recovery Based on
Deferred Update
3. Recovery Techniques Based on Immediate
Update
4. Shadow Paging
5. The ARIES Recovery Algorithm
6. Recovery in Multi-database Systems
7. Database Backup and Recovery from
Catastrophic Failures
1. Recovery algorithm
• Recovery algorithms have two parts
1.Actions taken during normal transaction
processing to ensure enough information
exists to recover from failures
2.Actions taken after a failure to recover the
database contents to a state that ensures
atomicity, consistency and durability
2. Deferred Update (NO-UNDO/REDO)
Recovery Protocol (1)
Deferred Update: A modified data item in
the cache cannot be written back to disk till
after the transaction commits (buffer is
pinned).
If a transaction fails before reaching its
commit point, it will not have changed the
database in any way, so UNDO is not
needed. It may be necessary to REDO the
effect of the operations of a committed
transaction from the log, because their
effect may not yet have been recorded in
the database on disk.
Hence, deferred update is also known as
the NO-UNDO/REDO algorithm.
Deferred Update (NO-UNDO/REDO)
Recovery Protocol (2)
A typical deferred update protocol can be stated as
follows
1. A transaction cannot change the database on disk
until it reaches its commit point.
2. A transaction does not reach its commit point until
all its update operations are recorded in the log and
the log is force-written to disk –i.e. WAL-
Samara University Department of Computer Science 30
Deferred Update (NO-UNDO/REDO)
Recovery Protocol (3)
• Advantages
1.A transaction does not record any changes in the
DB on disk until it commits –never rollback because
of transaction failure during transaction execution
2.A transaction will never read the value of an item
that is written by an uncommitted transaction, hence
no cascading rollback will occur.
• Drawbacks
–Limits the concurrent execution of transactions
because all items remain locked until the transaction
reaches its commit point
–Require excessive buffer space to hold all updates
Samara University Department of Computer Science 31
3. Recovery Techniques Based on
Immediate Update
Immediate Update: A data item modified in cache can
be written back to disk before transaction commits.
• However, the update operation must still be recorded
in the log (on disk) before it is applied to the database
using WAL protocol- If a transaction fails after
recording some changes in the database on disk but
before reaching its commit point, the effect of its
operations on the database must be undone; that is,
the transaction must be rolled back.
In the general case of immediate update, both undo
and redo may be required during recovery. This
technique, known as the UNDO/REDO algorithm,
requires both operations during recovery, and is
used most often in practice.
UNDO and REDO Recovery Actions
To maintain atomicity and durability, some
transaction’s may have their operations redone or
undone during recovery. UNDO (roll-back) is
needed for transactions that are not committed yet.
REDO (roll-forward) is needed for committed
transactions whose writes may have not yet been
flushed from cache to disk.
Undo: Restore all BFIMs from log to database on
disk. UNDO proceeds backward in log (from most
recent to oldest UNDO).
Redo: Restore all AFIMs from log to database on
disk. REDO proceeds forward in log (from oldest to
most recent REDO).
4. Shadow Paging (NO-UNDO/NO-
REDO)
The AFIM does not overwrite its BFIM but is
recorded at another place (new version) on the
disk. Thus, a data item can have AFIM and
BFIM (Shadow copy of the data item) at two
different places on the disk.
X Y
X' Y'
Database
X and Y: Shadow (old) copies of data items
X` and Y`: Current (new) copies of data items
Samara University Department of Computer Science 34
5. ARIES Database Recovery (1)
Used in practice, it is based on:
1. WAL (Write Ahead Logging)
2. Repeating history during redo: ARIES will
retrace all actions of the database system prior
to the crash to reconstruct the correct database
state.
3. Logging changes during undo: It will prevent
ARIES from repeating the completed undo
operations if a failure occurs during recovery,
which causes a restart of the recovery process.
Samara University Department of Computer Science 36
ARIES Database Recovery (2)
The Log and Log Sequence Number (LSN):
A log record is written for (a) data update (Write),
(b) transaction commit, (c) transaction abort, (d)
undo, and (e) transaction end. In the case of undo
a compensating log record is written.
A unique LSN is associated with every log record.
LSN increases monotonically and indicates the disk
address of the log record it is associated with. In
addition, each data page stores the LSN of the
latest log record corresponding to a change for that
page.
Samara University Department of Computer Science 37
ARIES Database Recovery (3)
The Log and Log Sequence Number (LSN) (cont.)
A log record stores:
1. LSN of previous log record for same transaction:
It links the log records of each transaction.
2. Transaction ID.
3. Type of log record.
For a write operation, additional information is logged:
1. Page ID for the page that includes the item
2. Length of the updated item
3. Its offset from the beginning of the page
4. BFIM of the item
5. AFIM of the item
Samara University Department of Computer Science 38
ARIES Database Recovery (4)
The Transaction table and the Dirty Page table
For efficient recovery, the following tables are also
stored in the log during checkpointing:
Transaction table: Contains an entry for each active
transaction, with information such as transaction ID,
transaction status, and the LSN of the most recent log
record for the transaction.
Dirty Page table: Contains an entry for each dirty
page (buffer) in the cache, which includes the page ID
and the LSN corresponding to the earliest update to
that page.
Samara University Department of Computer Science
39
ARIES Database Recovery (5)
“Fuzzy” Checkpointing
A checkpointing process does the following:
1. Writes a begin_checkpoint record in the log, then
forces updated (dirty) buffers to disk.
2. Writes an end_checkpoint record in the log, along
with the contents of transaction table and dirty
page table.
3. Writes the LSN of the begin_checkpoint record to
a special file. This special file is accessed during
recovery to locate the last checkpoint information.
To allow the system to continue to execute
transactions, ARIES uses “fuzzy checkpointing”.
Samara University Department of Computer Science 40
ARIES Database Recovery (6)
The following steps are performed for recovery
1. Analysis phase: Start at the begin_checkpoint record and
proceed to the end_checkpoint record. Access transaction
table and dirty page table that were appended to the log.
During this phase some other records may be written to
the log and the transaction table may be modified. The
analysis phase compiles the set of redo and undo
operations to be performed and ends.
2. Redo phase: Starts from the point in the log where all
dirty pages have been flushed, and move forward.
Operations of committed transactions are redone.
3. Undo phase: Starts from the end of the log and proceeds
backward while performing appropriate undo. For each
undo it writes a compensating record in the log.
The recovery completes at the end of undo phase.
Samara University Department of Computer Science 41
6. Recovery in Multi-database Systems
• A multi-database transaction require access to
multiple databases.
– The DBs may even be stored on different types of
DBMS.
• Some DBMS may be relational, whereas others
are object oriented, etc.
– Each DBMS involved in the multi-database
transaction may have its own recovery technique
and transaction manager for separate DBMSs.
• Use a two-level recovery mechanism to maintain
the atomicity of a multi-database transaction.
– A global recovery manager, or coordinator.
– The local recovery managers.
Samara University Department of Computer Science 43
Two-phase commit (cont.)
Phase1: Coordinator (usually application
program running in middle-tier of 3-tier
architecture) sends “Ready-to-commit?” query
to each participating database, then waits for
replies. A participating database replies Ready-
to-commit only after saving all actions in its
local log on disk.
Phase2: If coordinator receives Ready-to-
commit signals from all participating databases,
it sends Commit to all; otherwise, it send Abort
to all. This protocol can survive most types of
crashes.
Samara University Department of Computer Science 44
7. Database Backup and Recovery from
Catastrophic Failures (1)
• All the techniques discussed till now apply to non-
catastrophic failures.
– The system log is maintained on the disk and it is not
lost as result of the failure.
– Similarly, the shadow directory is not lost when
shadow paging is used.
• The recovery manager must also have to handle
more catastrophic failures such as disk crashes.
– Main technique used is that of DB backup.
• The whole DB and the log are periodically copied
into a cheap media as tapes.
Samara University Department of Computer Science 45
Database Backup and Recovery from
Catastrophic Failures (2)
• It is customary to backup the system log at more
frequent intervals than full database backup.
– The log is substantially small and hence can be
backed up more frequently –than DB itself
– Thus users do not lose all transactions they have
performed since the last DB backup.
– A new log is started after each DB backup.
• To recover from disk crash
– Recreate the DB on disk from the latest backup
copy on tape.
– Reconstruct the effect of all committed transactions
from the backup copies of log
Samara University Department of Computer Science 46