CHAPTER THREE
Transaction Processing
1
Introduction
Single user Vs multiuser systems
◦ One criterion for classifying is a data base system
according to the number of users who can use
the system at the same time
Single-User System:
A DBMS is a single user if at most one user at a time can use the
system.
Multiuser System:
Many users can access the system concurrently.
Concurrency
Interleaved processing:
Concurrent execution of processes is interleaved in a single
CPU
Parallel processing:
Processes are concurrently executed in multiple CPUs.
2
Cont…
A Transaction:
◦ Logical unit of database processing that includes one or more access
operations (read, retrieval, write, insert or update and delete)
◦ An action, or series of actions, carried out by a single user or
application program, which reads or updates the contents of the
database.
◦ A transaction (set of operations) may be stand-alone specified in
a high level language like SQL submitted interactively, or may be
embedded within a program.
Transaction boundaries:
◦ One of specifying transaction boundaries is using explicit Begin
and End transaction .
◦ An application program may contain several transactions
separated by the Begin and End transaction boundaries
Granularity of data – size of a field, a record , or a whole
disk block . Transaction concepts are independent of
granularity
3
Cont…
Basic operations are read and write
◦ read_item(X): Reads a database item named X into a
program variable. To simplify our notation, we assume
that the program variable is also named X.
◦ write_item(X): Writes the value of program variable X
into the database item named X.
Note: x is the filed/column of a record or the whole block
of the disk.
read_item(X) command includes the following steps:
◦ Find the address of the disk block that contains item X.
◦ Copy that disk block into a buffer in main memory (if
that disk block is not already in some main memory
buffer).
◦ Copy item X from the buffer to the program variable
named X.
4
Cont…
write_item(X) command includes the following
steps:
◦ Find the address of the disk block that contains
item X.
◦ Copy that disk block into a buffer in main memory
(if that disk block is not already in some main
memory buffer)
◦ Copy item X from the program variable named X
into its correct location in the buffer.
◦ Store the updated block from the buffer back to
disk (either immediately or at some later point in
time).
The decision about when to store back a
modified disk block that is in main memory is
handled by the recovery manager of the
DBMS in cooperation with the underlying
operating system 5
Cont…
◦ Example of transactions
(a) Transaction T1
(b) Transaction T2
6
Cont…
Concurrency control: The process of managing simultaneous
operations on the database without having them interfere with one
another.
Potential problems caused by concurrency:
Lost update problem
Dirty read/ uncommitted dependency/ Temporary Update
problem
Incorrect summery/ inconsistent analysis problem
Note: it is possible to respond these problems using concurrency
control technique.
To illustrate these problems, we use a simple bank account relation that
contains the staff account balances. In this context, we are using the
transaction as the unit of concurrency control.
A.The Lost Update Problem
◦ This occurs when two transactions that access the same
database items have their operations interleaved in a way
that makes the value of some database item incorrect.
Example: ….see next slide
7
Cont….
Assume T1 is withdrawing £10 from an account with balance balx,
initially £100, and T2 is depositing £100 into the same account. If these
transactions are executed serially, one after the other with no
interleaving of operations, the final balance would be £190.
If T1 and T2 start at nearly the same time, and both read the balance as
£100. T2 increases balx by £100 to £200 and stores the update in the
database. Meanwhile, transaction T1 decrements its copy of balx by
£10 to £90 and stores this value in the database, overwriting the
previous update, and thereby ‘losing’ the £100 previously added to the
balance. see the following table.
8
Cont…
B.The Temporary Update /Dirty Read Problem
This problem occurs when one transaction is allowed to see the
intermediate results of another transaction before it has committed
which will be failed for some reason.
Example : From the example given on slide no 8.
If transaction T4 updates balx to £200, but it aborts the transaction so
that balx should be restored to its original value of £100. However, by
this time transaction T3 has read the new value of balx (£200) and is
using this value as the basis of the £10 reduction, giving a new incorrect
balance of £190, instead of £90. The value of balx read by T3 is called
dirty data.
See the following table.
9
Cont…
C.The Incorrect Summary Problem
◦ If one transaction is calculating an aggregate summary function on a number of
records while other transactions are updating some of these records, the
aggregate function may calculate some values before they are updated and others
after they are updated.
Example: to summarize account a transaction T6 is executing concurrently with
transaction T5. Transaction T6 is totaling the balances of account x (£100), account
y (£50), and account z (£25). However, in the meantime, transaction T5 has
transferred £10 from balx to balz, so that T6 now has the wrong result (£10
too high). See the following table
10
causes of Transaction failure
1. A computer failure (system crash):
A hardware or software error may occur in the computer system
during transaction execution. If the hardware crashes, the contents of
the computer’s internal memory may be lost.
2. A transaction or system error:
Some operation in the transaction may cause it to fail, such as
integer overflow or division by zero.
Transaction failure may also occur because of erroneous parameter
values or because of a logical programming error
3. Local errors or exception conditions detected by the transaction:
Certain conditions necessitate cancellation of the transaction
For example, data for the transaction may not be found or
condition, such as insufficient account balance in a banking
database, may cause a transaction, such as a fund withdrawal
from that account, to be canceled.
A programmed abort in the transaction causes it to fail.
11
Cont…
4. Concurrency control enforcement:
The concurrency control method may decide to abort the
transaction, to be restarted later, because it violates serializability or
because several transactions are in a state of deadlock.
5. Disk failure:
Some disk blocks may lose their data because of a read or write
malfunction or because of a disk read/write head crash.
This may happen during a read or a write operation of the
transaction.
6. Physical problems and catastrophes:
This refers to an endless list of problems that
includes power or air-conditioning failure, fire, theft,
sabotage, overwriting disks or tapes by mistake, and
mounting of a wrong tape by the operator.
12
Transaction and System Concepts
Transaction states and additional operations
◦ Transaction states:
Active state, Partially committed state , Committed state , Failed
state and Terminated State
Fig 1. State transition diagram illustrating the states for transaction execution
13
Transaction and System Concepts (cont…)
Transaction operations
◦ For recovery purposes, the system needs to keep track of when the transaction
starts, terminates, and commits or aborts. Recovery manager keeps track of the
following operations:
begin_transaction: This marks the beginning of transaction
execution
read or write: These specify read or write operations on the
database items that are executed as part of a transaction
End_transaction: This specifies that read and write
transaction operations have ended and marks the end limit of
transaction execution.
◦ commit_transaction:
This signals a successful end of the transaction so that any changes (updates)
executed by the transaction can be safely committed to the database and will
not be undone.
◦ rollback (or abort):
This signals that the transaction has ended unsuccessfully, so that any changes
or effects that the transaction may have applied to the database must be
undone.
14
Transaction and System Concepts (cont…)
The System Log
◦ Log or Journal: The log keeps track of all transaction operations that
affect the values of database items
This information is needed to permit recovery from transaction
failures
The log is kept on disk, so it is not affected by any type of failure
except for disk or catastrophic failure.
In addition, the log is periodically backed up to archival storage
(tape) to guard against such catastrophic failures.
◦ We can use a notation T to refer to a unique transaction-id that is
generated automatically by the system and is used to identify each
transaction:
◦ Types of log record:
[start_transaction,T]: Records that transaction T has started execution.
[write_item,T,X,old_value,new_value]: Records that transaction T has
changed the value of database item X from old_value to new_value.
[read_item,T,X]: Records that transaction T has read the value of database
item X.
15
The System Log (cont):
[commit,T]: Records that transaction T has completed
successfully, and affirms that its effect can be committed
(recorded permanently) to the database.
[abort,T]: Records that transaction T has been aborted.
16
Recovery using log records:
If the system crashes, we can recover to a
consistent database state by examining the log
record and using recovery methods.
1. Because the log contains a record of every write
operation that changes the value of some
database item, it is possible to undo the effect
of these write operations of a transaction T by
tracing backward through the log and resetting
all items changed by a write operation of T to
their old_values.
2. We can also redo the effect of the write
operations of a transaction T by tracing forward
through the log and setting all items changed by
a write operation of T (that did not get done
permanently) to their new_values.
17
Transaction and System Concepts (cont…)
Commit Point of a Transaction:
Definition a Commit Point:
◦ A transaction T reaches its commit point when all its
operations that access the database have been executed
successfully and the effect of all the transaction
operations on the database has been recorded in the log
◦ Beyond the commit point, the transaction is said to be
committed, and its effect is assumed to be permanently
recorded in the database.
The transaction then writes a commit record
[commit,T] in to the log
18
Transaction and System Concepts (cont…)
Undoing transactions
◦ If a system failure occurs, we search back in the log for all
transactions T that have written a [start_transaction,T] entry into
the log but no commit entry [commit,T] record yet
These transactions have to be rolled back to undo
their effects on the database during recovery process
Redoing transactions:
◦ Transactions that have written their commit entry in the log must
also have recorded all their write operations in the log; otherwise
they would not be committed, so their effect on the database can be
redone from the log entries. (Notice that the log file must be kept on
disk.
◦ At the time of a system crash, only the log entries that have been
written back to disk are considered in the recovery process because
the contents of main memory may be lost.)
19
Desirable Properties of Transactions
Transaction should posses several properties. They are often
called the ACID properties and should be enforced by the
concurrency control and recovery methods of the DBMS.
ACID properties:
Atomicity: A transaction is an atomic unit of processing; it is
either performed in its entirety or not performed at all.
Consistency preservation: A correct execution of the
transaction must take the database from one consistent state
to another.
Isolation: A transaction should not make its updates visible
to other transactions until it is committed; this property,
when enforced strictly, solves the temporary update problem
and makes cascading rollbacks of transactions unnecessary
Durability /permanency: Once a transaction changes the
database and the changes are committed, these changes must
never be lost because of subsequent failure.
20
Schedules
Schedule (or history) of transaction
◦ When transactions are executing concurrently in an interleaved fashion,
the order of execution of operations from the various transactions
form what is known as a transaction schedule (or history)
A schedule (or history) S of n transactions T1, T2,.. , Tn:
◦ is an ordering of the operations of the transactions subject to the
constraint that, for each transaction Ti that participates in S, the
operations of Ti in S must appear in the same order in which they
occur in Ti.
Note, however, that operations from other transactions Tj can be
interleaved with the operations of Ti in S.
A shorthand notation for describing a schedule uses the
symbols :
◦ r : for read_item operations ,
◦ w: write_item,
◦ c: commit and
◦ a: abort
21
Schedules (cont…)
Transaction numbers are appended as subscript to each
operation in the schedule
The database item X that is read or written follows the r
and w operations in parenthesis
◦ Example:
Sa: r1(X),r2(x),w1(x), r1(Y),w2(x);w1(Y)
Sb: r1(X),w1(x),r2(x), w2(x), r1(Y),a1
22
Conflicting vs non conflicting operations
Two operations in a schedule are said to conflict if they satisfy all three
of the following conditions:
◦ They belong to different transactions
◦ They access the same item X
◦ At least one of the operations is a write_item(X)
For example, in a schedule Sa, the operations at slide page no.22
◦ r1(x) and w2(X) conflict, as do the operations r2(X) and w1(X) and
the operations w1(X) and w2(X)
While
◦ The operations r1(x) and r2(x) do not conflict since they are both
are read operations
◦ r1(x) and w1(x) do not conflict because they belong to the same
transaction
◦ W2(x) and w1(y) do not conflict since they operate on distinct data
items x and y
23
Complete schedules
◦ A schedule S of n transactions T1, T2, ……..,Tn is said to be a
complete schedule if the following conditions hold:
1. The operations in S are exactly those operations in T1,T2, …Tn
including a commit or abort operations as the last operation for each
transaction in the schedule
2. For any pair of operations from the same transaction Ti, their order of
appearance in S is the same as their order of appearance in T
3. For any two conflicting operations, one of the two must occur before
the other in the schedule (theoretically, it is not necessary to determine
an order b/n pair of non conflicting operations)
❖ Condition (3) above allows for two non conflicting
operations to occur in the schedule without defining which
occurs first leading to the definition of partial order of the
operations in n tractions.
❖ In general, it is difficult to encounter complete schedules in
a transaction processing system, because new transactions
are continually being submitted to the system
24
Characterizing Schedules based on Recoverability
Schedules classified based on recoverability:
◦ Recoverable schedule:
Once a transaction T is committed, it should never be
necessary to rollback T
◼The schedules that theoretically meet this criterion are called
recoverable and those that do not are non recoverable
A schedule S is recoverable if no transaction T in S
commits until all transactions T’ that have written an
item that T reads have committed
◦ A transaction T2 reads from Transaction T1 in a
schedule S if some item X is first written by T1 and latter
read by T2
In addition, T1 should not have been aborted before T2
reads item X and there should be no transaction that
write X after T1 writes it and before T2 reads X 25
Characterizing Schedules based on Recoverability
Consider the schedule given as follow:
Sa’ : r1(X),r2(x),w1(x), r1(Y),w2(x);c2;w1(Y);c1
◦ Sa’ is recoverable despite it suffers from lost update problem
However, consider the two partial schedules Sc and Sd below:
◦ Sc:r1(x);w1(x);r2(x);r1(y);w2(x);c2;a1 not recoverable
Sc is not recoverable because T2 reads X from T1 and then T2
commits before T1 commits.
If T1 aborts after the c2 operations in Sc, then the value of x that
T2 read is no longer valid and T2 must be aborted after it had been
committed, leading to a schedule that is not recoverable
For the above schedule to be recoverable, the c2 operation
in Sc must be postponed until after T1 commits as shown in
Sd
◦ Sd:r1(x);w1(x);r2(x);r1(y);w2(x);w1(y);c1;c2 Recoverable
If T1 aborts instead of committing, then T2 should also abort
as shown in Se because the X it read is no longer valid.
Se:r1(x);w1(x);r2(x);r1(y);w2(x);w1(y);a1;a2 Recoverable
26
Characterizing Schedules based on Serializability
Serial schedule:
◦ A schedule S is serial if, for every transaction T participating in the
schedule, all the operations of T are executed consecutively in the
schedule
Otherwise, the schedule is called non serial schedule.
Serializable schedule:
◦ A schedule S is serializable if it is equivalent to some serial schedule
of the same n transactions
Result equivalent:
◦ Two schedules are called result equivalent if they produce the same
final state of the database.
Conflict equivalent:
◦ Two schedules are said to be conflict equivalent if the order of any two
conflicting operations is the same in both schedules
◦ Conflict serializable:
A schedule S is said to be conflict serializable if it is conflict
equivalent to some serial schedule S’.
27
Cont…
Being serializable is not the same as being serial
Being serializable implies that the schedule is a correct
schedule
◦ It will leave the database in a consistent state.
◦ The interleaving is appropriate and will result in a state as if
the transactions were serially executed, yet will achieve
efficiency due to concurrent execution.
It’s not possible to determine when a
schedule begins and when it ends.
◦ Hence, we reduce the problem of checking the
whole schedule to checking only a committed
project of the schedule (i.e. operations from only
the committed transactions.)
28
Determining conflict serializability
◦ To determine serializability, first identify the pair of conflicting
operations and check if their order is preserved in one of the
possible serial schedules
◦ schedule A:
r1(x);w1(x),r1(y);w1(y);r2(x);w2(x)- serial schedule
◦ schedule B:
r2(x);w2(x); r1(x);w1(x),r1(y);w1(y)- serial schedule
◦ schedule C:
r1(x);r2(x);w1(x);w2(x)w1(y)- (not serializable).
◦ ScheduleD :
r1(x);w1(x);r2(x);w2(x);r1(y);w1(y)-(serializable,
equivalent to schedule A).
29