Module 5
Module 5
Recovery
Dr. Jyotismita Chaki
Introduction to Transaction Processing
• A transaction is an executing program that forms a logical unit of database
processing.
• A transaction includes one or more database access operations—these can
include insertion, deletion, modification (update), or retrieval operations.
• Transaction processing systems are systems with large databases and
hundreds of concurrent users executing database transactions.
• Examples of such systems include airline reservations, banking, credit card
processing, online retail purchasing, stock markets, supermarket checkouts,
and many other applications.
• These systems require high availability and fast response time for hundreds
of concurrent users.
Transaction and System Concepts
• A transaction is an atomic unit of work that should either be completed in its
entirety or not done at all.
• For recovery purposes, the system needs to keep track of when each transaction
starts, terminates, and commits, or aborts.
• Therefore, the recovery manager of the DBMS needs to keep track of the
following operations:
• BEGIN_TRANSACTION. This marks the beginning of transaction execution.
• READ or WRITE. These specify read or write operations on the database items that are
executed as part of a transaction.
• END_TRANSACTION. This specifies that READ and WRITE transaction operations have ended
and marks the end of transaction execution.
• COMMIT_TRANSACTION. This signals a successful end of the transaction so that any changes
(updates) executed by the transaction can be safely committed to the database and will not
be undone.
• ROLLBACK (or ABORT). This signals that the transaction has ended unsuccessfully, so that any
changes or effects that the transaction may have applied to the database must be undone.
Transaction and System Concepts
• A transaction goes into an active state immediately after it starts execution,
where it can execute its READ and WRITE operations.
• When the transaction ends, it moves to the partially committed state.
• At this point, some types of concurrency control protocols may do
additional checks to see if the transaction can be committed or not.
• If these checks are successful, the transaction is said to have reached its
commit point and enters the committed state.
• However, a transaction can go to the failed state if one of the checks fails or
if the transaction is aborted during its active state.
• The transaction may then have to be rolled back to undo the effect of its
WRITE operations on the database.
• The terminated state corresponds to the transaction leaving the system.
Transaction and System Concepts
W(A) W(A)
R(B) W(A)
W(B) R(A)
R(A) commit
R(B) commit
Cascadeless Schedule
Types of Schedules
• A schedule is strict if for any two T1 T2
transactions Ti, Tj, if a write operation R(A)
of Ti precedes a conflicting operation
of Tj (either read or write), then the R(A)
Strict Schedule
Characterizing schedules based on
recoverability
• Operations from different transactions can be interleaved in the
schedule S.
• However, for each transaction Ti that participates in the schedule S,
the operations of Ti in S must appear in the same order in which they
occur in Ti.
• The order of operations in S is considered to be a total ordering,
meaning that for any two operations in the schedule, one must occur
before the other.
• It is possible theoretically to deal with schedules whose operations
form partial orders, but we will assume for now total ordering of the
operations in a schedule.
Characterizing schedules based on
recoverability
• For the purpose of recovery and concurrency control, we are mainly interested in
the read_item (Reads a database item named X into a program variable also
named X) and write_item (Writes the value of program variable X into the
database item named X) operations of the transactions, as well as the commit
and abort operations.
• A shorthand notation for describing a schedule uses the symbols b, r, w, e, c, and
a for the operations begin_transaction, read_item, write_item, end_transaction,
commit, and abort, respectively, and appends as a subscript the transaction id
(transaction number) to each operation in the schedule.
• In this notation, the database item X that is read or written follows the r and w
operations in parentheses.
• In some schedules, we will only show the read and write operations, whereas in
other schedules we will show additional operations, such as commit or abort.
Characterizing schedules based on recoverability:
Interpretation of shorthand notations
• b<Transaction id> → E.g: b1 [begin Transaction T1].
• r<Transaction id>(<data item>) → E.g: r1(X) [Transaction T1 is performing
read_item on data item X].
• w<Transaction id>(<data item>) → E.g: w1(X) [Transaction T1 is performing
write_item on data item X].
• c<Transaction id> → E.g: c1 [Transaction T1 has committed].
• a<Transaction id> → E.g: a1 [Transaction T1 has aborted or rolled back].
Characterizing schedules based on
recoverability: read_item and write_item
• Executing a read_item(X) command includes the following steps:
1.Find the address of the disk block that contains item X.
2.Copy the disk block into a buffer in main memory if that disk is not already in some
main memory buffer.
3.Copy item X from the buffer to the program variable named X.
• Executing a write_item(X) command includes the following steps:
1.Find the address of the disk block that contains item X.
2.Copy the disk block into a buffer in main memory if that disk is not already in some
main memory buffer.
3.Copy item X from the program variable named X into its correct location in the
buffer.
4.Store the updated block from the buffer back to disk (either immediately or at some
later point in time).
Characterizing schedules based on
recoverability
• The schedule in the following Figure , which we shall call Sa, can be
written as follows in this notation:
• Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);
Characterizing schedules based on
recoverability
• Similarly, the schedule for the
following Figure, which we call
Sb, can be written as follows, if
we assume that transaction T1
aborted after its read_item(Y)
operation:
• Sb: r1(X); w1(X); r2(X); w2(X); r1(Y);
a1;
Characterizing schedules based on recoverability:
Conflicting Operations in a Schedule
• Two operations in a schedule are said to conflict if they satisfy all three of
the following conditions:
1. they belong to different transactions;
2. they access the same item X; and
3. at least one of the operations is a write_item(X).
• For example, in schedule Sa, the operations r1(X) and w2(X) conflict, as do
the operations r2(X) and w1(X), and the operations w1(X) and w2(X).
• However, the operations r1(X) and r2(X) do not conflict, since they are both
read operations; the operations w2(X) and w1(Y) do not conflict because
they operate on distinct data items X and Y; and the operations r1(X) and
w1(X) do not conflict because they belong to the same transaction.
Characterizing schedules based on recoverability:
Conflicting Operations in a Schedule
• Two operations are conflicting if changing their order can result in a
different outcome.
• For example, if we change the order of the two operations r1(X); w2(X) to
w2(X); r1(X), then the value of X that is read by transaction T1 changes,
because in the second ordering the value of X is read by r1(X) after it is
changed by w2(X), whereas in the first ordering the value is read before it is
changed. This is called a read-write conflict.
• The other type is called a write-write conflict where we change the order
of two operations such as w1(X); w2(X) to w2(X); w1(X). For a write-write
conflict, the last value of X will differ because in one case it is written by T2
and in the other case by T1.
• Two read operations are not conflicting because changing their order
makes no difference in outcome.
Characterizing schedules based on
recoverability
• For some schedules it is easy to recover from transaction and system
failures, whereas for other schedules the recovery process can be
quite involved.
• In some cases, it is even not possible to recover correctly after a
failure.
• Hence, it is important to characterize the types of schedules for which
recovery is possible, as well as those for which recovery is relatively
simple.
• These characterizations do not actually provide the recovery
algorithm; they only attempt to theoretically characterize the
different types of schedules.
Characterizing schedules based on
recoverability
• Once a transaction T is committed, it should never be necessary to
roll back T. This ensures that the durability property of transactions is
not violated.
• The schedules that theoretically meet this criterion are called
recoverable schedules.
• A schedule where a committed transaction may have to be rolled
back during recovery is called nonrecoverable and hence should not
be permitted by the DBMS.
Characterizing schedules based on
recoverability
• Consider the following schedule-
• Here,
• T2 performs a dirty read (Reading
from an uncommitted transaction is
called as a dirty read) operation.
• The commit operation of T2 is
delayed till T1 commits or roll backs.
• T1 commits later.
• T2 is now allowed to commit.
• In case, T1 would have failed, T2 has a
chance to recover by rolling back.
Characterizing schedules based on
recoverability
• Consider the following schedule-
• Here
• T2 performs a dirty read operation.
• T2 commits before T1.
• T1 fails later and roll backs.
• The value that T2 read now stands to
be incorrect.
• T2 can not recover since it has
already committed.
Characterizing schedules based on
recoverability
• The condition for a recoverable schedule is as follows:
• A schedule S is recoverable if no transaction T in S commits until all transactions T′
that have written some item X that T reads have committed.
• A transaction T reads from transaction T′ in a schedule S if some item X is first
written by T′ and later read by T.
• In addition, T′ should not have been aborted before T reads item X, and there should
be no transactions that write X after T′ writes it and before T reads it (unless those
transactions, if any, have aborted before T reads X).
• Some recoverable schedules may require a complex recovery process, but
if sufficient information is kept (in the log), a recovery algorithm can be
devised for any recoverable schedule.
• The schedules Sa and Sb from the preceding section are both recoverable,
since they satisfy the above definition.
Characterizing schedules based on
recoverability
• Consider the schedule Sa′ given below, which is the same as schedule
Sa except that two commit operations have been added to Sa:
• Sa′: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1;
• Sa′ is recoverable, even though it suffers from the lost update
problem.
• However, consider the two schedules Sc and Sd that follow:
• Sc: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1;
• Sc is not recoverable because T2 reads item X from T1, but T2
commits before T1 commits. The problem occurs if T1 aborts after the
c2 operation in Sc; then the value of X that T2 read is no longer valid
and T2 must be aborted after it is committed, leading to a schedule
that is not recoverable.
Characterizing schedules based on
recoverability
• For the schedule to be recoverable, the c2 operation in Sc must be
postponed until after T1 commits, as shown in Sd.
• Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2;
• If T1 aborts instead of committing, then T2 should also abort as
shown in Se, because the value of X it read is no longer valid.
• Se: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); a1; a2;
• In Se, aborting T2 is acceptable since it has not committed yet, which
is not the case for the nonrecoverable schedule Sc.
Characterizing schedules based on
recoverability: Types of recoverable schedule
• In a recoverable schedule, no committed transaction ever needs to be
rolled back, and so the definition of a committed transaction as
durable is not violated.
• However, it is possible for a phenomenon known as cascading
rollback (or cascading abort) to occur in some recoverable schedules,
where an uncommitted transaction has to be rolled back because it
read an item from a transaction that failed.
• This is illustrated in schedule Se, where transaction T2 has to be rolled
back because it read item X from T1, and T1 then aborted.
Characterizing schedules based on
recoverability
• Here,
• Transaction T2 depends on transaction
T1.
• Transaction T3 depends on transaction
T2.
• Transaction T4 depends on transaction
T3.
• In this schedule,
• The failure of transaction T1 causes
the transaction T2 to rollback.
• The rollback of transaction T2 causes
the transaction T3 to rollback.
• The rollback of transaction T3 causes
the transaction T4 to rollback.
• Such a rollback is called as a Cascading
Rollback.
Characterizing schedules based on
recoverability: Types of recoverable schedule
• Because cascading rollback can be time-consuming—since numerous
transactions can be rolled back —it is important to characterize the
schedules where this phenomenon is guaranteed not to occur.
• A schedule is said to be cascadeless, or to avoid cascading rollback, if
every transaction in the schedule reads only items that were written by
committed transactions.
• In this case, all items read will not be discarded because the transactions
that wrote them have committed, so no cascading rollback will occur.
• To satisfy this criterion, the r2(X) command in schedules Sd and Se must be
postponed until after T1 has committed (or aborted), thus delaying T2 but
ensuring no cascading rollback if T1 aborts.
Characterizing schedules based on
recoverability
• Cascadeless schedule allows only
committed read operations.
• However, it allows uncommitted
write operations.
Characterizing schedules based on
recoverability: Types of recoverable schedule
• Finally, there is a third, more restrictive type of schedule, called a
strict schedule, in which transactions can neither read nor write an
item X until the last transaction that wrote X has committed (or
aborted).
• Strict schedules simplify the recovery process.
• In a strict schedule, the process of undoing a write_item(X) operation
of an aborted transaction is simply to restore the before image
(old_value or BFIM) of data item X.
• This simple procedure always works correctly for strict schedules, but
it may not work for recoverable or cascadeless schedules.
Characterizing schedules based on
recoverability
• For example, consider schedule Sf:
• Sf: w1(X, 5); w2(X, 8); a1;
• Suppose that the value of X was originally 5, which is the before image
stored in the system log along with the w1(X, 5) operation.
• If T1 aborts, as in Sf, the recovery procedure that restores the before image
of an aborted write operation will restore the value of X to 5, even though
it has already been changed to 8 by transaction T2, thus leading to
potentially incorrect results.
• Although schedule Sf is cascadeless, it is not a strict schedule, since it
permits T2 to write item X even though the transaction T1 that last wrote X
had not yet committed (or aborted).
• A strict schedule does not have this problem.
Characterizing schedules based on
recoverability
• Any strict schedule is also cascadeless, and any cascadeless schedule is also
recoverable.
• Suppose we have i transactions T1, T2, … , Ti, and their number of
operations are n1, n2, … , ni, respectively.
• If we make a set of all possible schedules of these transactions, we can
divide the schedules into two disjoint subsets: recoverable and
nonrecoverable.
• The cascadeless schedules will be a subset of the recoverable schedules,
and the strict schedules will be a subset of the cascadeless schedules.
• Thus, all strict schedules are cascadeless, and all cascadeless schedules are
recoverable.
Characterizing Schedules Based on
Serializability
• A schedule (or history) S of n
transactions T1, T2, … , Tn is an ordering
of the operations of the transactions.
• Operations from different transactions
can be interleaved in the schedule S.
• We characterize the types of schedules
that are always considered to be correct
when concurrent transactions are
executing.
• Such schedules are known as serializable
schedules. Suppose that two users—for (a) Transaction T1. (b) Transaction T2.
example, two airline reservations
agents—submit to the DBMS
transactions T1 and T2 as shown in
Figure at approximately the same time.
Characterizing Schedules Based on
Serializability
• If no interleaving of operations is permitted, there are only two
possible outcomes:
• Execute all the operations of transaction T1 (in sequence) followed by all the
operations of transaction T2 (in sequence).
• Execute all the operations of transaction T2 (in sequence) followed by all the
operations of transaction T1 (in sequence).
• These two schedules—called serial schedules.
• Therefore, in a serial schedule, only one transaction at a time is
active—the commit (or abort) of the active transaction initiates
execution of the next transaction.
Characterizing Schedules Based on
Serializability
The read and write operations of three transactions T1, T2, and T3.
Testing for Serializability of a Schedule
Testing for Serializability of a Schedule
X = 20 Y = 10 Z = 40 Copy of
X = 20 Y = 10 Z = 40
SD
Shadow paging: Example
Main Memory Main Memory
Current Shadow Current Shadow
Directory Directory Directory Directory
Block Pointer X = 20 Block Pointer Block Pointer X = 10 Block Pointer
No. No. No. No.
1 1 1 1
2 2 2 2
3 3 3 3
Copy of Copy of
X = 20 Y = 10 Z = 40 X = 20 Y = 10 Z = 40
SD SD
Shadow paging: Example
Main Memory Main Memory
Current Shadow Current Shadow
Directory Directory Directory Directory
Block Pointer X = 10 Block Pointer Block Pointer X = 10 Block Pointer
No. No. No. No.
1 Y = 10 1 1 Y = 20 1
2 2 2 2
3 3 3 3
Copy of Copy of
X = 20 Y = 10 Z = 40 X = 20 Y = 10 Z = 40
SD SD
X = 10 X = 10 Y = 20
Shadow paging: Example
Main Memory Main Memory
Current Shadow Current
Directory Directory Directory
Block Pointer X = 10 Block Pointer Block Pointer X = 10
No. No. No.
1 Y = 20 1 1 Y = 20
2 2 2
Commit Commit
3 3 3
Copy of
X = 20 Y = 10 Z = 40 X = 20 Y = 10 Z = 40
SD
X = 10 Y = 20 X = 10 Y = 20
Shadow paging: Example
Main Memory
Current
Directory
Block Pointer X = 10
No.
1 Y = 20
2
Commit
Successful Transaction
3
Z = 40
X = 10 Y = 20
Shadow paging: Example
Main Memory
Main Memory
Shadow
Current Shadow Directory
Transaction Fails
Directory Directory X = 10 Block Pointer
Block Pointer X = 10 Block Pointer No.
No. No. Y = 20 1
1 Y = 20 1 2
2 2 3
3 3
Copy of
X = 20 Y = 10 Z = 40
Copy of SD
X = 20 Y = 10 Z = 40
SD
X = 10
No Undo No Redo