DBMS Transaction Essentials
DBMS Transaction Essentials
Transactions are a set of operations used to perform a logical set of work. It is the bundle of all the
instructions of a logical operation. A transaction usually means that the data in the database has
changed. One of the major uses of DBMS is to protect the user’s data from system failures. It is done by
ensuring that all the data is restored to a consistent state when the computer is restarted after a crash.
The transaction is any one execution of the user program in a DBMS. One of the important properties of
the transaction is that it contains a finite number of steps. Executing the same program multiple times
will generate multiple transactions.
Example: Consider the following example of transaction operations to be performed to withdraw cash
from an ATM vestibule.
1. Transaction Start.
9. Transaction Completed.
Read/Access data (R): Accessing the database item from disk (where the database stored data)
to memory variable.
Write/Change data (W): Write the data item from the memory variable to the disk.
Commit: Commit is a transaction control language that is used to permanently save the changes
done in a transaction
Example: Transfer of 50₹ from Account A to Account B. Initially A= 500₹, B= 800₹. This data is brought to
RAM from Hard Disk.
Stages of Transaction
For a transaction to be performed in DBMS, it must possess several properties often called ACID
properties.
A – Atomicity
C – Consistency
I – Isolation
D – Durability
Transaction States
Transactions can be implemented using SQL queries and Servers. In the diagram, you can see
how transaction states work.
Transaction States
The transaction has four properties. These are used to maintain consistency in a database, before and
after the transaction.
Property of Transaction:
Atomicity
Consistency
Isolation
Durability
Atomicity
States that all operations of the transaction take place at once if not, the transactions are
aborted.
There is no midway, i.e., the transaction cannot occur partially. Each transaction is treated as one
unit and either run to completion or is not executed at all.
Abort: If a transaction aborts, then all the changes made are not visible.
Commit : If a transaction commits then all the changes made are visible.
Consistency
The integrity constraints are maintained so that the database is consistent before and after the
transaction.
The execution of a transaction will leave a database in either its prior stable state or anew stable
state.
The consistent property of database states that every transaction sees a consistent database
instance.
The transaction is used to transform the database from one consistent state to another
consistent state.
Isolation
It shows that the data which is used at the time of execution of a transaction cannot be used by
the second transaction until the first one is completed.
In isolation, if the transaction T1 is being executed and using the data item X, then that data item
can’t be accessed by any other transaction T2 until the transaction T1ends.
The concurrency control subsystem of the DBMS enforced the isolation property
Durability
The durability property is used to indicate the performance of the database’s consistent state. It
states that the transaction made the permanent changes.
They cannot be lost by the erroneous operation of a faulty transaction or by the system failure.
When a transaction is completed, then the database reaches a state known as the consistent
state. That consistent state cannot be lost, even in the event of a system’s failure.
The recovery subsystem of the DBMS has the responsibility of Durability property.
The Two-Phase Locking (2PL) system furnishes a robust assurance against conflicts in the database. It can
alleviate performance degradation by managing the overhead of acquiring and releasing data locks.
Consequently, this system facilitates constrained Serialisability, enhancing the overall performance of the
DBMS. This guarantees that the result aligns with some sequential execution, optimising the database
operations.
Several algorithms can be used to check for a Serializable schedule in DBMS. One such algorithm is the
conflict Serializability algorithm, which checks for potential transaction conflicts. A conflict occurs when
two transactions try to access the same data item in conflicting ways (e.g., one transaction tries to read a
data item while another transaction writes to it). If there are no conflicts, the schedule is guaranteed to
be Serializable. However, if there are conflicts, the program may or may not be Serializable.
Another algorithm that can check for Serializability is the view Serializability in the DBMS algorithm,
which checks for potential mutual dependencies between transactions. A mutual dependency exists
when two transactions depend on each other for their respective outputs to be correct. The schedule is
guaranteed to be Serializable if there are no mutual dependencies. However, if there are mutual
dependencies, the schedule may or may not be Serializable.
1) Conflict Serializability
For instance, let's consider an order table and a customer table as two instances. Every order is linked to
a specific customer, even though a single client may place multiple orders. There are constraints to
achieving conflict Serializability in the database. Here are a few of them:
1) Different transactions must be utilised for the two procedures.
3) Between the two operations, there should be at least one write operation.
2) View Serializability
View Serializability takes a more abstract perspective, considering the results visible to users rather than
the specific operations performed. It emphasizes the order in which transactions' results are presented,
providing a higher-level understanding of consistency.
View Serializability constitutes a category of operation within Serializability, where each transaction is
required to yield specific results, and these outcomes correspond to the results of correctly sequential
execution of the data item. Unlike conflict Serializability, view Serializability focuses on preventing
database inconsistency. The view Serializability functionality of a DBMS empowers users to perceive
databases in conflicting manners.
Concurrency control is a very important concept of DBMS which ensures the simultaneous
execution or manipulation of data by several processes or user without resulting in data
inconsistency.
Concurrency control provides a procedure that is able to control concurrent execution of the
operations in the database.
The fundamental goal of database concurrency control is to ensure that concurrent execution of
transactions does not result in a loss of database consistency. The concept of serializability can
be used to achieve this goal, since all serializable schedules preserve consistency of the
database. However, not all schedules that preserve consistency of the database are serializable.
Another is to treat some operations besides read and write as fundamental low-level operations
and to extend concurrency control to deal with them.
Concurrency Control Problems
There are several problems that arise when numerous transactions are executed simultaneously in a
random manner. The database transaction consist of two major operations “Read” and “Write”. It is very
important to manage these operations in the concurrent execution of the transactions in order to
maintain the consistency of the data.
Dirty read problem occurs when one transaction updates an item but due to some unconditional events
that transaction fails but before the transaction performs rollback, some other transaction reads the
updated value. Thus creates an inconsistency in the database. Dirty read problem comes under the
scenario of Write-Read conflict between the transactions in the database
1. The lost update problem can be illustrated with the below scenario between two transactions T1
and T2.
4. T1 performs rollback
5. T2 has already read the uncommitted data of T1 which is no longer valid, thus creating
inconsistency in the database.
Lost update problem occurs when two or more transactions modify the same data, resulting in the
update being overwritten or lost by another transaction. The lost update problem can be illustrated with
the below scenario between two transactions T1 and T2.
4. T2 updates the same data item based on its initial read and performs commit.
5. This results in the modification of T1 gets lost by the T2’s write which causes a lost update
problem in the database.
Concurrency Control Protocols
Concurrency control protocols are the set of rules which are maintained in order to solve the
concurrency control problems in the database. It ensures that the concurrent transactions can execute
properly while maintaining the database consistency. The concurrent execution of a transaction is
provided with atomicity, consistency, isolation, durability, and serializability via the concurrency control
protocols.
In locked based protocol, each transaction needs to acquire locks before they start accessing or
modifying the data items. There are two types of locks used in databases.
Shared Lock : Shared lock is also known as read lock which allows multiple transactions to read
the data simultaneously. The transaction which is holding a shared lock can only read the data
item but it can not modify the data item.
Exclusive Lock : Exclusive lock is also known as the write lock. Exclusive lock allows a transaction
to update a data item. Only one transaction can hold the exclusive lock on a data item at a time.
While a transaction is holding an exclusive lock on a data item, no other transaction is allowed to
acquire a shared/exclusive lock on the same data item.
There are two kind of lock based protocol mostly used in database:
Two Phase Locking Protocol : Two phase locking is a widely used technique which ensures strict
ordering of lock acquisition and release. Two phase locking protocol works in two phases.
o Growing Phase : In this phase, the transaction starts acquiring locks before performing
any modification on the data items. Once a transaction acquires a lock, that lock can not
be released until the transaction reaches the end of the execution.
o Shrinking Phase : In this phase, the transaction releases all the acquired locks once it
performs all the modifications on the data item. Once the transaction starts releasing
the locks, it can not acquire any locks further.
Strict Two Phase Locking Protocol : It is almost similar to the two phase locking protocol the
only difference is that in two phase locking the transaction can release its locks before it
commits, but in case of strict two phase locking the transactions are only allowed to release the
locks only when they performs commits.
Timestamp based Protocol
In this protocol each transaction has a timestamp attached to it. Timestamp is nothing but the
time in which a transaction enters into the system.
The conflicting pairs of operations can be resolved by the timestamp ordering protocol through
the utilization of the timestamp values of the transactions. Therefore, guaranteeing that the
transactions take place in the correct order.
Database recovery techniques are used in database management systems (DBMS) to restore a database
to a consistent state after a failure or error has occurred. The main goal of recovery techniques is to
ensure data integrity and consistency and prevent data loss.
The rollback/undo recovery technique is based on the principle of backing out or undoing the effects of a
transaction that has not been completed successfully due to a system failure or error. This technique is
accomplished by undoing the changes made by the transaction using the log records stored in the
transaction log. The transaction log contains a record of all the transactions that have been performed
on the database. The system uses the log records to undo the changes made by the failed transaction
and restore the database to its previous state.
The commit/redo recovery technique is based on the principle of reapplying the changes made by a
transaction that has been completed successfully to the database. This technique is accomplished by
using the log records stored in the transaction log to redo the changes made by the transaction that was
in progress at the time of the failure or error. The system uses the log records to reapply the changes
made by the transaction and restore the database to its most recent consistent state.
Checkpoint Recoveryis a technique used to improve data integrity and system stability, especially in
databases and distributed systems. It entails preserving the system’s state at regular intervals, known as
checkpoints, at which all ongoing transactions are either completed or not initiated. This saved state,
which includes memory and CPU registers, is kept in stable, non-volatile storage so that it can withstand
system crashes. In the event of a breakdown, the system can be restored to the most recent checkpoint,
which reduces data loss and downtime. The frequency of checkpoint formation is carefully regulated to
decrease system overhead while ensuring that recent data may be restored quickly.
1. Transaction identifier: Unique Identifier of the transaction that performed the write operation.
Because all database modifications must be preceded by the creation of a log record, the system has
available both the old value prior to the modification of the data item and new value that is to be written
for data item. This allows system to perform redo and undo operations as appropriate:
1. Undo: using a log record sets the data item specified in log record to old value.
2. Redo: using a log record sets the data item specified in log record to new value.
1. Deferred Modification Technique: If the transaction does not modify the database until it has
partially committed, it is said to use deferred modification technique.
2. Immediate Modification Technique: If database modification occur while the transaction is still
active, it is said to use immediate modification technique.
Recovery using Log records
After a system crash has occurred, the system consults the log to determine which transactions need to
be redone and which need to be undone.
1. Transaction Ti needs to be undone if the log contains the record <Ti start> but does not contain
either the record <Ti commit> or the record <Ti abort>.
2. Transaction Ti needs to be redone if log contains record <Ti start> and either the record <Ti
commit> or the record <Ti abort>.
Use of Checkpoints – When a system crash occurs, user must consult the log. In principle, that need to
search the entire log to determine this information. There are two major difficulties with this approach:
2. Most of the transactions that, according to our algorithm, need to be redone have already
written their updates into the database. Although redoing them will cause no harm, it will cause
recovery to take longer.
To reduce these types of overhead, user introduce checkpoints. A log record of the form <checkpoint L>
is used to represent a checkpoint in log where L is a list of transactions active at the time of the
checkpoint. When a checkpoint log record is added to log all the transactions that have committed
before this checkpoint have <Ti commit> log record before the checkpoint record. Any database
modifications made by Ti is written to the database either prior to the checkpoint or as part of the
checkpoint itself. Thus, at recovery time, there is no need to perform a redo operation on Ti. After a
system crash has occurred, the system examines the log to find the last <checkpoint L> record. The redo
or undo operations need to be applied only to transactions in L, and to all transactions that started
execution after the record was written to the log. Let us denote this set of transactions as T. Same rules
of undo and redo are applicable on T as mentioned in Recovery using Log records part. Note that user
need to only examine the part of the log starting with the last checkpoint log record to find the set of
transactions T, and to find out whether a commit or abort record occurs in the log for each transaction in
T. For example, consider the set of transactions {T0, T1, . . ., T100}. Suppose that the most recent
checkpoint took place during the execution of transaction T67 and T69, while T68 and all transactions
with subscripts lower than 67 completed before the checkpoint. Thus, only transactions T67, T69, . . .,
T100 need to be considered during the recovery scheme. Each of them needs to be redone if it has
completed (that is, either committed or aborted); otherwise, it was incomplete, and needs to be
undone.
Log-based recovery is a technique used in database management systems (DBMS) to recover a database
to a consistent state in the event of a failure or crash. It involves the use of transaction logs, which are
records of all the transactions performed on the database.
In log-based recovery, the DBMS uses the transaction log to reconstruct the database to a consistent
state. The transaction log contains records of all the changes made to the database, including updates,
inserts, and deletes. It also records information about each transaction, such as its start and end times.
When a failure occurs, the DBMS uses the transaction log to determine which transactions were
incomplete at the time of the failure. It then performs a series of operations to undo the incomplete
transactions and redo the completed ones. This process is called the redo/undo recovery algorithm.
The redo operation involves reapplying the changes made by completed transactions that were not yet
saved to the database at the time of the failure. This ensures that all changes are applied to the
database.
The undo operation involves undoing the changes made by incomplete transactions that were saved to
the database at the time of the failure. This restores the database to a consistent state by reversing the
effects of the incomplete transactions.
Once the redo and undo operations are completed, the DBMS can bring the database back online and
resume normal operations.
Log-based recovery is an essential feature of modern DBMSs and provides a reliable mechanism for
recovering from failures and ensuring the consistency of the database.