Dbms Unit 4 Notes
Dbms Unit 4 Notes
                                                                        DBMS
                                                                      UNIT - IV
                                                  TRANSACTIONS PROCESSING CONCEPTS
Transaction Concept
Transaction: A Transaction is a list of actions to perform a single logical unit of work.
or
A transaction is a unit of program execution that accesses and possibly updates various data items.
T1:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Transaction State
Active: The initial state; the transaction stays in this state while it is executing
Failed: After the discovery that normal execution can no longer proceed.
Aborted: After the transaction has been rolled back and the database restored to its state prior to the start of the transaction.
Committed: After successful completion. A transaction is said to have terminated if it has either committed or aborted.
ACID Properties
To preserve the integrity of data the database system must ensure:
Atomicity: Either all operations of the transaction are properly reflected in the database or none are.
 Isolation: Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing
transactions. Intermediate transaction results must be hidden from other concurrently executed transactions.
   i.      That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished execution before Ti started, or Tj started execution
           after Ti finished.
Durability: After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.
Atomicity requirement
   •     If the transaction fails after step 3 and before step 6, money will be “lost” leading to an inconsistent database state Failure could be due
         to software or hardware
   •     The system should ensure that updates of a partially executed transaction are not reflected in the database
Durability requirement
Once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the database
by the transaction must persist even if there are software or hardware failures.
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Isolation requirement:
 If between steps 3 and 6, another transaction T2 is allowed to access the partially updated database, it will see an inconsistent database (the
sum A + B will be less than it should be).
T1 T2
1. read(A)
2. A := A – 50
4. read(B)
5. B := B + 50
6. write(B)
 Isolation can be ensured by running transactions serially that is, one after the other. However, executing multiple transactions concurrently has
significant benefits.
Schedules
Schedule: A sequences of instructions that specify the chronological order (i.e. starting with the earliest date and finishing with most recent) in
which instructions of concurrent transactions are executed
     •   a schedule for a set of transactions must consist of all instructions of those transactions
     •   must preserve the order in which the instructions appear in each individual transaction.
A transaction that successfully completes its execution will have a commit instruction as the last statement
A transaction that fails to successfully complete its execution will have an abort instruction as the last statement.
•     A schedule is the order in which the operations of multiple transactions appear for execution.
•     Serial schedules are always consistent.
•     Non-serial schedules are not always consistent.
    Serial Schedules-
In serial schedules,
Characteristics-
•     Consistent
•     Recoverable
•     Cascadeless
•     Strict
Example-01:
In this schedule,
•     There are two transactions T1 and T2 executing serially one after the other.
•     Transaction T1 executes first.
•     After T1 completes its execution, transaction T2 executes.
•     So, this schedule is an example of a Serial Schedule.
Non-Serial Schedules-
In non-serial schedules,
Characteristics-
•     Consistent
•     Recoverable
•     Cascadeless
•     Strict
Example-01:
In this schedule,
T1
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
T2
1. read(A);
2. temp:= A* 0.1;
3. A := A – temp;
4. write(A);
5. read(B);
6. B := B + temp;
7. write(B).
Schedule 1
Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
    Schedule 1.2
                                                                                                              Schedule 1.2
Schedule 2
Serializability
If a given non-serial schedule of ‘n’ transactions is equivalent to some serial schedule of ‘n’ transactions, then it is called as a serializable schedule.
Characteristics-
•     Consistent
•     Recoverable
•     Casacadeless
•     Strict
    Serial Schedules Vs Serializable Schedules-
 Thus, all the transactions necessarily execute       Thus, multiple                   transactions   can   execute
 serially one after the other.                        concurrently.
 Serial schedules lead to less resource utilization   Serializable schedules improve both resource
 and CPU throughput.                                  utilization and CPU throughput.
 Serial Schedules are less efficient as compared to   Serializable Schedules are always better than
 serializable schedules.                              serial schedules.
 (due to above reason)                                (due to above reason)
Types of Serializability-
1. Conflict Serializability
2. View Serializability
Conflict Serializability-
    If a given non-serial schedule can be converted into a serial schedule by swapping its non-conflicting operations, then it is called as a conflict
    serializable schedule.
Conflicting Operations-
Two operations are called as conflicting operations if all the following conditions hold true for them-
Example-
In this schedule,
Follow the following steps to check whether a given non-serial schedule is conflict serializable or not-
Step-01:
Step-02:
Start creating a precedence graph by drawing one node for each transaction.
Step-03:
•     Draw an edge for each conflict pair such that if Xi (V) and Yj (V) forms a conflict pair then draw an edge from Ti to Tj.
•     This ensures that Ti gets executed before Tj.
Step-04:
Problem-01:
Check whether the given schedule S is conflict serializable or not. If yes, then determine all the possible serialized schedules-
Solution-
Step-01:
List all the conflicting operations and determine the dependency between the transactions-
Step-02:
•     All the possible topological orderings of the above precedence graph will be the possible serialized schedules.
•     The topological orderings can be found by performing the Topological Sort of the above precedence graph.
After performing the topological sort, the possible serialized schedules are-
1. T1 → T3 → T4 → T2
2. T1 → T4 → T3 → T2
3. T4 → T1 → T3 → T2
View Serializability-
     If a given schedule is found to be view equivalent to some serial schedule, then it is called as a view
     serializable schedule.
Consider two schedules S1 and S2 each consisting of two transactions T1 and T2.
Schedules S1 and S2 are called view equivalent if the following three conditions hold true for them-
Condition-01:
For each data item X, if transaction Ti reads X from the database initially in schedule S1, then in schedule S2 also, Ti must perform the initial read
of X from the database.
 Thumb Rule
 “Initial readers must be same for all the data items”.
Condition-02:
If transaction Ti reads a data item that has been updated by the transaction Tj in schedule S1, then in schedule S2 also, transaction Ti must read
the same data item that has been updated by the transaction Tj.
     Thumb Rule
     “Write-read sequence must be same.”.
Condition-03:
    For each data item X, if X has been updated at last by transaction Ti in schedule S1, then in schedule S2 also, X must be updated at last by
    transaction Ti.
     Thumb Rule
     “Final writers must be same for all the data items”.
Method-01:
•     If the given schedule is conflict serializable, then it is surely view serializable. Stop and report your answer.
•     If the given schedule is not conflict serializable, then it may or may not be view serializable. Go and check using other methods.
        Thumb Rules
    •     All conflict serializable schedules are view serializable.
    •     All view serializable schedules may or may not be conflict serializable.
Method-02:
•       If there does not exist any blind write, then the schedule is surely not view serializable. Stop and report your answer.
•       If there exists any blind write, then the schedule may or may not be view serializable. Go and check using other methods.
        Thumb Rule
        No blind write means not a view serializable schedule.
Method-03:
Problem-01:
Solution-
Step-01:
List all the conflicting operations and determine the dependency between the transactions-
Step-02:
Now,
•     Since, the given schedule S is not conflict serializable, so, it may or may not be view serializable.
•     To check whether S is view serializable or not, let us use another method.
•     Let us check for blind writes.
Now,
Non-Serializable Schedules-
Characteristics-
Non-serializable schedules-
Irrecoverable Schedules-
If in a schedule,
Example-
Here,
Recoverable Schedules-
If in a schedule,
Here,
•     The commit operation of the transaction that performs the dirty read is delayed.
•     This ensures that it still has a chance to recover if the uncommitted transaction fails later.
Example-
Here,
Method-01:
•       If the given schedule is conflict serializable, then it is surely recoverable. Stop and report your answer.
•       If the given schedule is not conflict serializable, then it may or may not be recoverable. Go and check using other methods.
        Thumb Rules
    •    All conflict serializable schedules are recoverable.
    •    All recoverable schedules may or may not be conflict serializable.
Method-02:
• If there does not exist any dirty read operation, then the schedule is surely recoverable. Stop and report your answer.
• If there exists any dirty read operation, then the schedule may or may not be recoverable.
If there exists a dirty read operation, then follow the following cases-
Case-01:
    If the commit operation of the transaction performing the dirty read occurs before the commit or abort operation of the transaction which
    updated the value, then the schedule is irrecoverable.
Case-02:
    If the commit operation of the transaction performing the dirty read is delayed till the commit or abort operation of the transaction which
    updated the value, then the schedule is recoverable.
    Thumb Rule
    No dirty read means a recoverable schedule.
Recoverable Schedules-
If in a schedule,
1. Cascading Schedule
2. Cascadeless Schedule
3. Strict Schedule
Cascading Schedule-
•    If in a schedule, failure of one transaction causes several other dependent transactions to rollback or abort, then such a schedule is called as
     a Cascading Schedule or Cascading Rollback or Cascading Abort.
•    It simply leads to the wastage of CPU time.
Example-
Here,
In this schedule,
NOTE-
If the transactions T2, T3 and T4 would have committed before the failure of transaction T1, then the schedule would have been irrecoverable.
Cascadeless Schedule-
    If in a schedule, a transaction is not allowed to read a data item until the last transaction that has written it is committed or aborted, then such
    a schedule is called as a Cascadeless Schedule.
In other words,
Example-
NOTE-
Example-
Strict Schedule-
    If in a schedule, a transaction is neither allowed to read nor write a data item until the last transaction that has written it is committed or aborted,
    then such a schedule is called as a Strict Schedule.
In other words,
Example-
Remember-
Equivalence of Schedules-
In DBMS, schedules may have the following three different kinds of equivalence relations among them-
1. Result Equivalence
2. Conflict Equivalence
3. View Equivalence
•     If any two schedules generate the same result after their execution, then they are called as result equivalent schedules.
•     This equivalence relation is considered of least significance.
•     This is because some schedules might produce same results for some set of values and different results for some other set of values.
If any two schedules satisfy the following two conditions, then they are called as conflict equivalent schedules-
Problem-01:
Solution-
Let X = 2 and Y = 5.
      Failure Classification
      Transaction failure:
      Logical errors: transaction cannot complete due to some internal error condition
      System errors: the database system must terminate an active transaction due to an error condition (e.g., deadlock)
      System crash: a power failure or other hardware or software failure causes the system to crash.
  Disk failure: a head crash or similar disk failure destroys all or part of disk storage.
  Recoverability from Transaction Failure
  To ensure atomicity despite failures, we first output information describing the modifications to stable storage without modifying the database
  itself.
  We study two approaches:
            1.  log-based recovery, and
            2. shadow -paging
Log-Based Recovery
A log is kept on stable storage.
The log is a sequence of log records, and maintains a record of update activities on the database.
Each and every log record is given a unique id called log sequence number.
A log record is written for each of the following actions:
1. Updating a page
2. Commit
3. Abort
4. End
5. Undoing an update
Every log record contains the following fields:
 Prev LSN |Trans Id |Type| Page ID| Length| Offset |Before Image |After image
Other special log records exist to record significant events during transaction processing, such as the start of a transaction and the commit or abort
of a transaction. We denote the various types of log records as:
Whenever a transaction performs a write, it is essential that the log record for that write be created before the database is modified. Once a log
record exists, we can output the modification that has already been output to the database. Also we have the ability to undo a modification that has
already      been      output      to      the    database,      by      using     the      old-value     field     in     the    log      records.
For log records to be useful for recovery from system and disk failures, the log must reside on stable storage. However, since the log contains a
complete record of all database activity, the volume of data stored in the log may become unreasonable large.
                   Read (B);
                    B: = B + 50;
                    Write (B).
The immediate-update technique allows database modifications to be output to the database while the transaction is still in the active state.
These modifications are called uncommitted modifications. In the event of a crash or transaction failure, the system must use the old-value
field of the log records to restore the modified data items.
       Transactions T0 and T1 executed one after the other in the order T0 followed by T1. The portion of the log containing the relevant
information concerning these two transactions appears in the following,
Portion of the system log corresponding to T0 and T1
                  < T0 start >
                               < T0, A, 1000, 950 >
                               < T0, B, 2000, 2050 >
                   < T0 commit >
                   < T1 start >
                   < T1, C, 700, 600 >
                    < T0 commit >
Checkpoints
When a system failure occurs, we must consult the log to determine those transactions that need to be redone and those that need to be undone.
Rather than reprocessing the entire log, which is time-consuming and much of it unnecessary, we can use checkpoints:
   •   Output onto stable storage all the log records currently residing in main memory.
   •   Output to the disk all modified buffer blocks.
   •   Output onto stable storage a log record, <checkpoint>.
    •   The recovery system reads the logs backwards from the end to the last checkpoint.
    •   It maintains two lists, an undo-list and a redo-list.
    •   If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts the transaction in the redo-list.
    •   If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the transactions in the redo-list and their previous logs are
removed and then redone before saving their logs.
Shadow Paging
Shadow paging is an alternative to log-based recovery techniques, which has both advantages and disadvantages. It may require fewer disk
accesses, but it is hard to extend paging to allow multiple concurrent transactions. The paging is very similar to paging schemes used by the
operating                           system                            for                           memory                           management.
The idea is to maintain two page tables during the life of a transaction: the current page table and the shadow page table. When the transaction
starts, both tables are identical. The shadow page is never changed during the life of the transaction. The current page is updated with
each writeoperation. Each table entry points to a page on the disk. When the transaction is committed, the shadow page entry becomes a copy
of the current page table entry and the disk block with the old data is released. If the shadow is stored in nonvolatile memory and a system crash
occurs, then the shadow page table is copied to the current page table. This guarantees that the shadow page table will point to the database
pages corresponding to the state of the database prior to any transaction that was active at the time of the crash, making aborts
automatic.
   1. Commit overhead. The commit of a single transaction using shadow paging requires multiple blocks to be output -- the current page
      table, the actual data and the disk address of the current page table. Log-based schemes need to output only the log records.
   2. Data fragmentation. Shadow paging causes database pages to change locations (therefore, no longer contiguous.
   3. Garbage collection. Each time that a transaction commits, the database pages containing the old version of data changed by the
      transactions must become inaccessible. Such pages are considered to be garbage since they are not part of the free space and do not
      contain any usable information. Periodically it is necessary to find all of the garbage pages and add them to the list of free pages. This
      process is called garbage collection and imposes additional overhead and complexity on the system.
Deadlock Handling
System is deadlocked if there is a set of transactions such that every transaction in the set is waiting for another transaction in the set.
Deadlock prevention protocols ensure that the system will never enter into a deadlock state. Some prevention strategies:
1. Require that each transaction locks all its data items before it 74 begins execution (predeclaration).
2. Impose partial ordering of all data items and require that a transaction can lock data items only in the order specified by the partial order
(graph-based protocol
Following schemes use transaction timestamps for the sake of deadlock prevention alone.
wait-die scheme - non-preemptive
1. older transaction may wait for younger one to release data item. Younger transactions never wait for older ones; they are rolled back instead.
2. a transaction may die several times before acquiring needed data item
wound-wait scheme – preemptive
1. older transaction wounds (forces rollback) of younger transaction instead of waiting for it. Younger transactions may wait for older ones.
2. may be fewer rollbacks than wait-die scheme.
Deadlock prevention
Both in wait-die and in wound-wait schemes, a rolled back transaction is restarted with its original timestamp. Older transactions thus have
precedence over newer ones, and starvation is hence avoided.
Deadlock Detection
Deadlocks can be described as a wait-for graph, which consists of a pair G = (V,E), V is a set of vertices (all the transactions in the system) E is a
set of edges; each element is an ordered pair Ti →Tj.
If Ti → Tj is in E, then there is a directed edge from Ti to Tj, implying that T is waiting for T to release a data item.
When Ti requests a data item currently being held by Tj, then the edge Ti Tj is inserted in the wait-for graph. This edge is removed only when Tj
is no longer holding a data item needed by Ti.
The system is in a deadlock state if and only if the wait-for graph has a cycle. Must invoke a deadlock-detection algorithm periodically to look for
cycles.
Deadlock Recovery
When deadlock is detected:
Some transaction will have to rolled back (made a victim) to break deadlock. Select that transaction as victim that will incur minimum cost.
Rollback -- determine how far to roll back transaction Total rollback: Abort the transaction and then restart it.
More effective to roll back transaction only as far as necessary to break deadlock.
Starvation happens if same transaction is always chosen as victim. Include the number of rollbacks in the cost factor to avoid starvation
• Improved Performance
• Design Issues
There       are      2      ways        in     which        data        can                         be   stored   on   different   sites.   These    are:
1. Replication
In this approach, the entire relation is stored redundantly at 2 or more sites. If the entire database is available at all sites, it is a fully redundant
database.            Hence,             in          replication,         systems              maintain            copies               of            data.
This is advantageous as it increases the availability of data at different sites. Also, now query requests can be processed in parallel.
However, it has certain disadvantages as well. Data needs to be constantly updated. Any change made at one site needs to be recorded at every
site that relation is stored or else it may lead to inconsistency. This is a lot of overhead. Also, concurrency control becomes way more complex
as concurrent access now needs to be checked over a number of sites.
2. Fragmentation
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and each of the fragments is stored in different sites where
they’re required. It must be made sure that the fragments are such that they can be used to reconstruct the original relation (i.e, there isn’t any
loss                                                                     of                                                                      data).
Fragmentation       is    advantageous     as     it    doesn’t     create     copies      of   data,     consistency     is   not      a     problem.
Fragmentation of relations can be done in two ways:
  • Horizontal fragmentation – Splitting by rows – The relation is fragmented into groups of tuples so that each tuple is assigned to at least one
      fragment.
  • Vertical fragmentation – Splitting by columns – The schema of the relation is divided into smaller schemas. Each fragment must contain a
      common candidate key so as to ensure lossless join.
Directory System
A Derby database-A Derby database contains dictionary objects such as tables, columns, indexes, and jar files. A Derby database can also store
its own configuration information.
Note: An in-memory database does not use the file system, but the size limits listed in the table later in this topic still apply. For some limits,
the maximum value is determined by the available main memory instead of the available disk space and file system limitations.
• log directory
Contains files that make up the database transaction log, used internally for data recovery (not the same thing as the error log).
• seg0 directory
Contains one file for each user table, system table, and index (known as conglomerates).
• service.properties file
• tmp directory
       (might not exist.) A temporary directory used by Derby for large sorts and deferred updates and deletes. Sorts are used by a variety of
       SQL statements. For databases on read-only media, you might need to set a property to change the location of this directory. See
       "Creating Derby Databases for Read-Only Use".
• jar directory
(might not exist.) A directory in which jar files are stored when you use database class loading.
The following figure shows the files and directories in the Derby database directories that are used by the Derby software.
Derby imposes relatively few limitations on the number and size of databases and database objects. The following table shows some size
limitations of Derby databases and database objects.
                                                           Some operating systems impose a limit to the number of files allowed in a single
                                                           directory.
Indexes in each table                                      32,767 or storage
Columns in each table                                      1,012
Number of columns on an index key                          16