Module 4
Module 4
MODULE-4
SQL: Advanced Queries: More complex SQL retrieval queries, Specifying constraints as
assertions and action triggers, Views in SQL.
Transaction Processing: Introduction to Transaction Processing, Transaction and System
concepts, Desirable properties of Transactions, characterizing schedules based on
recoverability, characterizing schedules based on Serializability, Transaction support in SQL.
Textbook 1: Ch 7.1 to 7.3, Ch 20.1 to 20.6
RBT: L1, L2, L3
S1: More SQL: Complex Queries, Triggers, Views, and Schema
Modification
More Complex SQL Retrieval Queries:
Comparisons Involving NULL and Three-Valued Logic: SQL has various rules for dealing
with NULL values. NULL is used to represent a missing value, but that it usually has one of
three different interpretations—value unknown (value exists but is not known, or it is not
known whether or not the value exists), value not available (value exists but is purposely
withheld), or value not applicable (the attribute does not apply to this tuple or is undefined for
this tuple). Consider the following examples to illustrate each of the meanings of NULL.
1. Unknown value. A person’s date of birth is not known, so it is represented by NULL in
the database. An example of the other case of unknown would be
NULL for a person’s home phone because it is not known whether or not the person has a home
phone.
2. Unavailable or withheld value. A person has a home phone but does not want it to be
listed, so it is withheld and represented as NULL in the database.
3. Not applicable attribute. An attribute LastCollegeDegree would be NULL for a person
who has no college degrees because it does not apply to that person.
Page 1
Database Management System (BCS403)
Nested Queries, Tuples, and Set/Multiset Comparisons:
Some queries require that existing values in the database be fetched and then used in a
comparison condition. Such queries can be conveniently formulated by using nested queries,
which are complete select-from-where blocks within another SQL query. That other query is
called the outer query. These nested queries can also appear in the WHERE clause or the
FROM clause or the SELECT clause or other SQL clauses as needed.
What is a sub-query?
A sub-query is a query which is defined within another SQL statement. Sub-queries can be used
to restrict the amount of data processed as it normally introduces a selection criterion to the
SQL statement. It can also be used as a replacement for table joins. Sub-queries can be used
with other SQL commands like the SELECT, UPDATE, DELETE or INSERT.
Employee table
Project table
If the records of all employees who manage projects with value exceeding 250,000 is required,
then the SQL command that could accomplish this is
SELECT Employee * FROM Employee INNER JOIN
Project ON Employee.EmpID = Project.ProjectManagerID
AND ProjectBudget > 250000.00;
The same result could be achieved using a sub-query. Consider the SQL statement below
SELECT * FROM Employee WHERE EmpID IN
( SELECT ProjectManagerID FROM Project
Page 2
Database Management System (BCS403)
WHERE ProjectBudget > 250000.00 );
Correlated Nested Queries:
Whenever a condition in the WHERE clause of a nested query references some attribute of a
relation declared in the outer query, the two queries are said to be correlated. We can
understand a correlated query better by considering that the nested query is evaluated once for
each tuple (or combination of tuples) in the outer query.
Nested sub-query
A nested sub-query is like the example provided in the previous section. Here, there is no
dependency between the outer and inner queries (sub-query). The sub-query will be executed
first and only once and then the main query will be executed using the results from the sub-
query.
Correlated sub-query
Now consider the next SQL statement which refers also to the tables shown in Figure 1 and
Figure 2 (reproduced below). This is an example of the correlated sub-query where values from
the main query are referenced in the sub-query.
There is a field named Department in the Employee table and this stores the employee's work
department. There is also a field named Department in the Project table and this field stores the
department responsible for the project. The SQL statement provided will list out details of all
employee(s) who is/are managing project(s) that belong to their respective work department.
There are 7 rows in the Employee table and only 6 are returned by the query. The missing row
is for the record with EmpID M1038. It corresponds to ProjectName 'New Accounting System'
which belongs to IT but is managed by employee with EmpID M1038 who is from the Finance
and not the IT department.
Page 3
Database Management System (BCS403)
In sub-queries of this nature, the outer query will execute first, i.e. SELECT * FROM
Employee. Here, all records from the Employee table will be returned.
EXISTS and UNIQUE are Boolean functions that return TRUE or FALSE; hence, they can be
used in a WHERE clause condition. The EXISTS function in SQL is used to check whether the
result of a nested query is empty (contains no tuples) or not. The result of EXISTS is a Boolean
value TRUE if the nested query result contains at least one tuple, or FALSE if the nested query
result contains no tuples.
Query: Retrieve the name of each employee who has a dependent with the same first name
and is the same sex as the employee.
EXISTS and NOT EXISTS are typically used in conjunction with a correlated nested query. In
the nested query references the Ssn, Fname, and Sex attributes of the EMPLOYEE relation
from the outer query. For each EMPLOYEE tuple, evaluate the nested query, which retrieves
Page 4
Database Management System (BCS403)
all DEPENDENT tuples with the same Essn, Sex, and Dependent_name as the EMPLOYEE
tuple; if at least one tuple EXISTS in the result of the nested query, then select that EMPLOYEE
tuple. EXISTS(Q) returns TRUE if there is at least one tuple in the result of the nested query
Q, and returns FALSE otherwise. On the other hand, NOT EXISTS(Q) returns
TRUE if there are no tuples in the result of nested query Q, and returns FALSE otherwise.
Next, we illustrate the use of NOT EXISTS.
S2:Query :Retrieve the names of employees who have no dependents.
SELECT Fname, Lname
FROM EMPLOYEE
WHERE NOT EXISTS
( SELECT *FROM DEPENDENT WHERE Ssn = Essn);
In, the correlated nested query retrieves all DEPENDENT tuples related to a particular
EMPLOYEE tuple. If none exist, the EMPLOYEE tuple is selected because the WHERE-
clause condition will evaluate to TRUE in this case.
Page 5
Database Management System (BCS403)
Query: List the names of managers who have at least one dependent.
Query: Retrieve the name of each employee who works on all the projects controlled by
department number 5 can be written using EXISTS and NOT EXISTS in SQL systems.
Rewrite as
SELECT Lname, Fname
FROM EMPLOYEE
WHERE NOT EXISTS ( SELECT *
FROM WORKS_ON B
WHERE ( B.Pno IN ( SELECT Pnumber
FROM PROJECT
WHERE Dnum = 5 ) AND
NOT EXISTS ( SELECT *
FROM WORKS_ON C
WHERE C.Essn = Ssn
AND C.Pno = B.Pno )));
Explicit Sets and Renaming in SQL:
It is also possible to use an explicit set of values in the WHERE clause, rather than a nested
query. Such a set is enclosed in parentheses in SQL.
Query : Retrieve the Social Security numbers of all employees who work on project
numbers 1, 2, or 3.
Q: SELECT DISTINCT Essn
FROM WORKS_ON
WHERE Pno IN (1, 2, 3);
In SQL, it is possible to rename any attribute that appears in the result of a query by adding
the qualifier AS followed by the desired new name.
Q: SELECT E.Lname AS Employee_name, S.Lname AS Supervisor_name
FROM EMPLOYEE AS E, EMPLOYEE AS S
WHERE E.Super_ssn = S.Ssn;
Joined Tables in SQL and Outer Joins:
SQL to permit users to specify a table resulting from a join operation in the FROM clause of a
query. For example, retrieves the name and address of every employee who works for the
Page 7
Database Management System (BCS403)
‘Research’ department. It may be easier to specify the join of the EMPLOYEE and
DEPARTMENT relations in the WHERE clause, and then to select the desired tuples and
attributes. This can be written in SQL as
Q: SELECT Fname, Lname, Address
FROM (EMPLOYEE JOIN DEPARTMENT ON Dno = Dnumber)
WHERE Dname = ‘Research’;
The concept of a joined table also allows the user to specify different types of join, such as
NATURAL JOIN and various types of OUTER JOIN . In a NATURAL JOIN on two relations
R and S, no join condition is specified; an implicit EQUIJOIN condition for each pair of
attributes with the same name from R and S is created.
The implied join condition for this NATURAL JOIN is EMPLOYEE.Dno = DEPT.Dno,
because this is the only pair of attributes with the same name after renaming:
Q: SELECT Fname, Lname, Address
FROM (EMPLOYEE NATURAL JOIN
(DEPARTMENT AS DEPT (Dname, Dno, Mssn, Msdate)))
WHERE Dname = ‘Research’;
Aggregate Functions in SQL:
Aggregate functions are used to summarize information from multiple tuples into a single-
tuple summary. Grouping is used to create subgroups of tuples before summarization.
Grouping and aggregation are required in many database applications, and we will introduce
their use in SQL through examples. A numberof built-in aggregate functions exist: COUNT,
SUM, MAX, MIN, and AVG.
The functions SUM, MAX, MIN, and AVG can be applied to a set or multiset of numeric values
and return, respectively, the sum, maximum value, minimum value, and average (mean) of
those values.
Query : Find the sum of the salaries of all employees, the maximum salary, the minimum
salary, and the average salary.
Q: SELECT SUM (Salary), MAX (Salary), MIN (Salary), AVG (Salary)
FROM EMPLOYEE;
This query returns a single-row summary of all the rows in the EMPLOYEE table. We could
use AS to rename the column names in the resulting single-row table; for example,
Q: SELECT SUM (Salary) AS Total_Sal, MAX (Salary) AS Highest_Sal,
MIN (Salary) AS Lowest_Sal, AVG (Salary) AS Average_Sal
FROM EMPLOYEE;
Page 8
Database Management System (BCS403)
Query : Find the sum of the salaries of all employees of the ‘Research’ department, as
well as the maximum salary, the minimum salary, and the average salary in this
department.
Q: SELECT SUM (Salary), MAX (Salary), MIN (Salary), AVG (Salary)
FROM (EMPLOYEE JOIN DEPARTMENT ON Dno = Dnumber)
WHERE Dname = ‘Research’;
Query: Retrieve the total number of employees in the company and the number of
employees in the ‘Research’ department .
Q : SELECT COUNT (*) FROM EMPLOYEE;
Q: SELECT COUNT (*) FROM EMPLOYEE, DEPARTMENT
WHERE DNO = DNUMBER AND DNAME = ‘Research’;
Here the asterisk (*) refers to the rows (tuples), so COUNT (*) returns the number of rows in
the result of the query. We can also use the COUNT function to count values in a column rather
than tuples, as in the next example.
Query: Count the number of distinct salary values in the database.
Q: SELECT COUNT (DISTINCT Salary)
FROM EMPLOYEE;
In general, NULL values are discarded when aggregate functions are applied to a particular
column (attribute); the only exception is for COUNT (*) because tuples instead of values are
counted.
Grouping: The GROUP BY and HAVING Clauses:
In many cases we want to apply the aggregate functions to subgroups of tuples in a relation,
where the subgroups are based on some attribute values. For example, we may want to find the
average salary of employees in each department or the number of employees who work on each
project. In these cases we need to partition the relation into nonoverlapping subsets (or groups)
of tuples. Each group (partition) will consist of the tuples that have the same value of some
attribute(s), called the grouping attribute(s).
SQL has a GROUP BY clause for this purpose. The GROUP BY clause specifies the grouping
attributes, which should also appear in the SELECT clause, so that the value resulting from
applying each aggregate function to a group of tuples appears along with the value of the
grouping attribute(s).
Query: For each department, retrieve the department number, the number of employees
in the department, and their average salary.
Q: SELECT Dno, COUNT (*), AVG (Salary)
Page 9
Database Management System (BCS403)
FROM EMPLOYEE
GROUP BY Dno;
The EMPLOYEE tuples are partitioned into groups—each group having the same value for the
GROUP BY attribute Dno. Hence, each group contains the employees who work in the same
department. The COUNT and AVG functions are applied to each such group of tuples. Notice
that the SELECT clause includes only the grouping attribute and the aggregate functions to be
applied on each group of tuples.
Query: For each project, retrieve the project number, the project name, and the number
of employees who work on that project.
Q: SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON
WHERE Pnumber = Pno GROUP BY Pnumber, Pname;
Above query shows a join condition in conjunction with GROUP BY. In this case, the grouping
and functions are applied after the joining of the two relations in the WHERE clause.
SQL provides a HAVING clause, which can appear in conjunction with a GROUP BY clause,
for this purpose. HAVING provides a condition on the summary information regarding the
group of tuples associated with each value of the grouping attributes.
Query: For each project on which more than two employees work, retrieve the project
number, the project name, and the number of employees who work on the project.
Q: SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON
WHERE Pnumber = Pno
GROUP BY Pnumber, Pname
HAVING COUNT (*) > 2;
Page 10
Database Management System (BCS403)
Page 11
Database Management System (BCS403)
condition similar to the WHERE clause of an SQL query. For example, to specify the
constraint that the salary of an employee must not be greater than the salary of the
manager of the department that the employee works for in SQL, we can write the
following assertion:
CREATE ASSERTION SALARY_CONSTRAINT
CHECK ( NOT EXISTS ( SELECT *
FROM EMPLOYEE E, EMPLOYEE M,
DEPARTMENT D
WHERE E.Salary>M.Salary
AND E.Dno = D.Dnumber
AND D.Mgr_ssn = M.Ssn ) );
The constraint name SALARY_CONSTRAINT is followed by the keyword CHECK, which is
followed by a condition in parentheses that must hold true on every database state for the
assertion to be satisfied.
Note that the CHECK clause and constraint condition can also be used to specify constraints
on individual attributes and domains and on individual tuples. A major difference between
CREATE ASSERTION and the individual domain constraints and tuple constraints is that the
CHECK clauses on individual attributes, domains, and tuples are checked in SQL only when
tuples are inserted or updated in a specific table.
Introduction to Triggers in SQL:
In many cases it is convenient to specify the type of action to be taken when certain events
occur and when certain conditions are satisfied. For example, it may be useful to specify a
condition that, if violated, causes some user to be informed of the violation.
A manager may want to be informed if an employee’s travel expenses exceed a certain limit by
receiving a message whenever this occurs. The action that the DBMS must take in this case is
to send an appropriate message to that user. The condition is thus used to monitor the database.
Other actions may be specified, such as executing a specific stored procedure or triggering other
updates. The CREATE TRIGGER statement is used to implement such actions in SQL.
Suppose we want to check whenever an employee’s salary is greater than the salary of his or
her direct supervisor in the COMPANY database. Several events can trigger this rule: inserting
a new employee record, changing an employee’s salary, or changing an employee’s supervisor.
Suppose that the action to take would be to call an external stored procedure
SALARY_VIOLATION,which will notify the supervisor.
CREATE TRIGGER SALARY_VIOLATION
Page 12
Database Management System (BCS403)
BEFORE INSERT OR UPDATE OF SALARY, SUPERVISOR_SSN
ON EMPLOYEE
FOR EACH ROW
WHEN ( NEW.SALARY > ( SELECT SALARY FROM EMPLOYEE
WHERE SSN = NEW.SUPERVISOR_SSN ) )
INFORM_SUPERVISOR(NEW.Supervisor_ssn,NEW.Ssn );
A typical trigger which is regarded as an ECA (Event, Condition, Action) rule has three
components
1. The event(s): These are usually database update operations that are explicitly applied to the
database. In this example the events are: inserting a new employee record, changing an
employee’s salary, or changing an employee’s supervisor. The person who writes the trigger
must make sure that all possible events are accounted for.
These events are specified after the keyword BEFORE in our example, which means that the
trigger should be executed before the triggering operation is executed. An alternative is to use
the keyword AFTER, which specifies that the trigger should be executed after the operation
specified in the event is completed.
2. The condition that determines whether the rule action should be executed: Once the
triggering event has occurred, an optional condition may be evaluated. If no condition is
specified, the action will be executed once the event occurs. If a condition is specified, it is first
evaluated, and only if it evaluates to true will the rule action be executed. The condition is
specified in the WHEN clause of the trigger.
3. The action to be taken: The action is usually a sequence of SQL statements, but it could also
be a database transaction or an external program that will be automatically executed. In this
example, the action is to execute the stored procedure INFORM_SUPERVISOR.
Triggers can be used in various applications, such as maintaining database consistency,
monitoring database updates, and updating derived data automatically.
Questions:
1.
Page 13
Database Management System (BCS403)
2. Consider the following schema and write the relational algebra expressions for the queries
given below:
SAILORS(Sid, Sname, rating, age)
BOATS(bid, bname, color)
RESERVES(sid, bid,day)
(i) Find names of sailors who reserved green boat
(ii) Find the colors of boats reserved by “Ramesh”
(iii) Find names of sailors who have reserved a red or a green boat.
(iv) Find the “sids” of sailors with age over 20 who have not registered a red boat. 22.
3. Consider the following relations for a database that keeps track of business trips of sales
persons in a sales office:
Salesperson (Salespersonid, Name, Start-year, Dept-no)
Trip (Salespersonid, from, to, Departure-date, Return-date, trip-id)
Expense (trp-id, AccountNo, Amount)
Specify the foreign keys for the above schema. Then specify the following queries in relational
algebra.
i. Give the details (all attributes of trip relation) for trip that exceeded 10,000/- in
expenses.
ii. Print the ‘Salespersonid’ and ‘Name’ of the salespersons who took trips to ‘delhi’.
iii. Print the total trip expenses incurred by the salesman with Salespersonid = ‘504’.
4. Consider the following relational database schema
Student (Student-id,Sname,major,GPA)
Faculty (Faculty-id,fname,dept,designation,salary)
Course (Course-id,Cname,Faculty-id) Enrol (Course-id,Student-id,grade)
Write the following queries in SQL:
a.List the names of all students enrolled for the course “IS6T1” and have received “A” grade.
b.List all the departments having an average salary of above Rs. 10,000.
c.Give a 20% raise to salary of all faculty.
d.List the names of all faculty members beginning with “P” and ending with letter “A”.
e. Add ‘John’ as an employee with id = 99, and Major =’Maths’ GPA=’B’.
f. Retrieve the faculty who gets the second highest salary.
S3:Views (Virtual Tables) in SQL:
Concept of a View in SQL:
A view in SQL terminology is a single table that is derived from other tables. These other tables
Page 14
Database Management System (BCS403)
can be base tables or previously defined views. A view does not necessarily exist in physical
form; it is considered to be a virtual table, in contrast to base tables, whose tuples are always
physically stored in the database. This limits the possible update operations that can be applied
to views, but it does not provide any limitations on querying a view.
Specification of Views in SQL:
In SQL, the command to specify a view is CREATE VIEW. The view is given a (virtual) table
name (or view name), a list of attribute names, and a query to specify the contents of the view.
The views in V1 and V2 create virtual tables whose schemas are illustrated in below Figure.
V1: CREATE VIEW WORKS_ON1
AS SELECT Fname, Lname, Pname, Hours
FROM EMPLOYEE, PROJECT, WORKS_ON
WHERE Ssn = Essn AND Pno = Pnumber;
V2: CREATE VIEW DEPT_INFO(Dept_name, No_of_emps, Total_sal)
AS SELECT Dname, COUNT (*), SUM (Salary)
FROM DEPARTMENT, EMPLOYEE
WHERE Dnumber = Dno
GROUP BY Dname;
A view is supposed to be always up-to-date; if we modify the tuples in the base tables on
which the view is defined, the view must automatically reflect these changes.
If we do not need a view anymore, we can use the DROP VIEW command to dispose of it.
For example, to get rid of the view V1, we can use the SQL statement in V1A:
V1A: DROP VIEW WORKS_ON1;
View materialization, involves physically creating a temporary or permanent view table when
the view is first queried or created and keeping that table on the assumption that other queries
on the view will follow.
Different strategies as to when a materialized view is updated are possible. The immediate
update strategy updates a view as soon as the base tables are changed; the lazy update strategy
Page 15
Database Management System (BCS403)
updates the view when needed by a view query; and the periodic update strategy updates the
view periodically.
➢ A view with a single defining table is updatable if the view attributes contain the
primary key of the base relation, as well as all attributes with the NOT NULL constraint
that do not have default values specified
➢ Views defined on multiple tables using joins are generally not updatable.
➢ Views defined using grouping and aggregate functions are not updatable
A transaction is an atomic unit of work that should either be completed in its entirety or not
done at all. For recovery purposes, the system needs to keep track of when each transaction
starts, terminates, and commits, or aborts. Therefore, the recovery manager of the DBMS needs
to keep track of the following operations:
Page 16
Database Management System (BCS403)
Fig: State transition diagram illustrating the states for transaction execution
The above figure shows a state transition diagram that illustrates how a transaction moves
through its execution states.
A transaction goes into an active state immediately after it starts execution, where it can
execute its
READ and WRITE operations.
➢ When the transaction ends, it moves to the partially committed state. At this point, some
types of concurrency control protocols may do additional checks to see if the transaction
can be committed or not.
➢ Also, some recovery protocols need to ensure that a system failure will not result in an
inability to record the changes of the transaction permanently.
➢ If these checks are successful, the transaction is said to have reached its commit point
and enters the committed state.
➢ When a transaction is committed, it has concluded its execution successfully and all its
changes must be recorded permanently in the database, even if a system failure occurs.
However, a transaction can go to the failed state if one of the checks fails or if the transaction
is aborted during its active state. The transaction may then have to be rolled back to undo the
effect of its WRITE operations on the database. The terminated state corresponds to the
transaction leaving the system. The transaction information that is maintained in system tables
while the transaction has been running is removed when the transaction terminates. Failed or
aborted transactions may be restarted later—either automatically or after being resubmitted
by the user—as brand new transactions.
The System Log
➢ To be able to recover from failures that affect transactions, the system maintains a log
to keep track of all transaction operations that affect the values of database items, as
well as other transaction information that may be needed to permit recovery from
Page 17
Database Management System (BCS403)
failures.
➢ The log is a sequential, append-only file that is kept on disk, so it is not affected by
any type of failure except for disk or catastrophic failure.
➢ Typically, one or more main memory buffers, called the log buffers, hold the last part
of the log file, so that log entries are first added to the log main memory buffer.
➢ When the log buffer is filled, or when certain other conditions occur, the log buffer is
appended to the end of the log file on disk. In addition, the log file from disk is
periodically backed up to archival storage to guard against catastrophic failures.
➢ The following are the types of entries—called log records—that are written to the log
file and the corresponding action for each log record. In these entries, T refers to a
unique transaction-id that is generated automatically by the system for each
transaction and that is used to identify each transaction:
[start_transaction, T]: Indicates that transaction T has started execution.
A transaction T reaches its commit point when all its operations that access the database have
been executed successfully and the effect of all the transaction operations on the database have
been recorded in the log.
The transaction then writes a commit record [commit, T] into the log. If a system
failure occurs, we can search back in the log for all transactions T that have written a
[start_transaction, T] record into the log but have not written their [commit, T] record yet;
these transactions may have to be rolled back to undo their effect on the database during the
recovery process.
Transactions that have written their commit record in the log must also have recorded
all their WRITE operations in the log, so their effect on the database can be redone from the log
records.
The log file must be kept on disk. Updating a disk file involves copying the appropriate block
of the file from disk to a buffer in main memory, updating the buffer in main memory, and
Page 18
Database Management System (BCS403)
copying the buffer to disk.
At the time of a system crash, only the log entries that have been written back to disk are
considered in the recovery process if the contents of main memory are lost. Hence, before a
transaction reaches its commit point, any portion of the log that has not been written to the disk
yet must now be written to the disk. This process is called force-writing the log buffer to
disk before committing a transaction.
Q1. Concurrency control is needed in transaction? Justify. Illustrate with example.
Q2. Describe the various states of transaction execution with the suitable diagram.
Q3. With a neat state transition diagram, discuss the different states of transaction.
Transactions should possess several properties, often called the ACID properties, they
should be enforced by the concurrency control and recovery methods of the DBMS. The
following are the ACID properties:
Page 19
Database Management System (BCS403)
of a transaction should not be interfered with by any other transactions executing concurrently.
➢ It is enforced by the concurrency control subsystem of the DBMS.
➢ If every transaction does not make its updates (write operations) visible to other
transactions until it is committed, one form of isolation is enforced that solves the
temporary update problem and eliminates cascading rollbacks but does not eliminate all
other problems.
Durability or permanency: The changes applied to the database by a committed transaction
must persist in the database. These changes must not be lost because of any failure.
It is the responsibility of the recovery subsystem of the DBMS.
Levels of Isolation: There have been attempts to define the level of isolation of a transaction.
➢ A transaction is said to have level 0 (zero) isolation if it does not overwrite the dirty
reads of higher-level transactions.
➢ Level 1 isolation has no lost updates.
➢ Level 2 isolation has no lost updates and no dirty reads.
When transactions are executing concurrently in an interleaved fashion, then the order of
execution of operations from all the various transactions is known as a schedule (or history).
example, the schedule in which we shall call Sa, can be written as follows in this notation:
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);
Page 20
Database Management System (BCS403)
Similarly, the schedule for fig. which we call Sb, can be written as follows, if we assume that
transaction T1 aborted after its read_item(Y) operation:
Sb: r1(X); w1(X); r2(X); w2(X); r1(Y); a1;
Conflicting Operations in a Schedule:
Two operations in a schedule are said to conflict if they satisfy all three of the following
conditions:
(1) they belong to different transactions;
(2) they access the same item X; and
(3) at least one of the operations is a write_item(X).
Intuitively, two operations are conflicting if changing their order can result in a different
outcome. For example, if we change the order of the two operations r1(X); w2(X) to w2(X);
r1(X), then the value of X that is read by transaction T1 changes, because in the second ordering
the value of X is read by r1(X) after it is changed by w2(X), whereas in the first ordering the
value is read before it is write conflict.
S7: Characterizing Schedules Based on Recoverability
It is easy to recover from transaction and system failures. In some cases, it is even not
possible to recover correctly after a failure. it is important to characterize the types of schedules
for which recovery is possible, as well as those for which recovery is relatively simple.
once a transaction T is committed, it should never be necessary to roll back T. The schedules
that theoretically meet this criterion are called recoverable schedules. A schedule where a
committed transaction may have to be rolled back during recovery is called nonrecoverable
and hence should not be permitted by the DBMS.
A recovery algorithm can be devised for any recoverable schedule. The (partial) schedules Sa
and Sb from the preceding section are both recoverable. Consider the schedule Sa′ given below,
which is the same as schedule Sa except that two commit operations have been added to Sa:
Sa′: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1;
Sa′ is recoverable, even though it suffers from the lost update problem; this problem is
handled by serializability theory.
Page 21
Database Management System (BCS403)
Sc is not recoverable because T2 reads item X from T1, but T2 commits before T1 commits.
The problem occurs if T1 aborts after the c2 operation in Sc; then the value of X that T2 read
is no longer valid and T2 must be aborted after it is committed, leading to a schedule that is not
recoverable. For the schedule to be recoverable, the c2 operation in Sc must be postponed until
after T1 commits, as shown in Sd. If T1 aborts instead of committing, then T2 should also abort
as shown in Se, because the value of X it read is no longer valid. In Se, aborting T2 is acceptable
since it has not committed yet, which is not the case for the non-recoverable schedule Sc.
In Cascadless Schedules If every Xn in the scheduled reads only items that written by
committed Xn.
strict schedule, in which transactions can neither read nor write an item X until the last
transaction that wrote X has committed (or aborted). For example, consider schedule Sf:
Notes:-Any strict schedule is also cascadeless, and any cascadeless schedule is also
recoverable.
Characterizing Schedules Based on Serializability
The Schedules that are always considered to be correct when concurrent transactions are
executing. Such Schedules are known as serializable schedules.
If no interleaving of operations is permitted there are only two possible arrangements for
executing transactions T1 and T2:
Execute (in sequence) all the operations of transaction T1, followed by all the operations of
transaction T2.
Execute (in sequence) all the operations of transaction T2, followed by all the operations of
transaction T1.
If interleaving of operations is allowed there will be many possible schedules.
The concept of serializability of schedules is used to identify which schedules are correct.
Page 22
Database Management System (BCS403)
Fig: Example of serial and non serial schedules involving transactions T1 and T2.
A schedule S is serial, if for every transaction T participating in the schedule, all the
operations of T are executed consecutively in the schedule; otherwise the schedule is called
non-serial.
In a serial schedule, only one transaction at a time is active—the commit (or abort) of the
active transaction initiates execution of the next transaction. No interleaving occurs in a serial
schedule.
The drawback of serial schedules is that they limit concurrency of interleaving of operations.
Page 23
Database Management System (BCS403)
Two schedules are called result equivalent if they produce the same final state of the
database.
Fig: Two schedules that are result equivalent for the initial value X=100, but are not result
equivalent in general.
The algorithm looks at only the read_item and write_item operations in a schedule to construct
Page 24
Database Management System (BCS403)
a precedence graph (or serialization graph), which is a directed graph G = (N, E)that
consists of a set of nodes N = {T1, T2, ..., Tn } and a set of directed edges E ={e1,e2, ..., em }.
There is one node in the graph for each transaction Ti in the schedule. Each edge ei in the graph
is of the form (Tj → Tk ), 1 ≤ j ≤ n, 1 ≤ k f n, where Tj is the starting node of ei and Tk is the
ending node of ei. Such an edge from node Tj to node Tk is created by the algorithm if one of
the operations in Tj appears in the schedule before some conflicting operation in Tk.
w1(A), r2(A), w1(B), w3(C), r2(C), r4(B), w2(D), w4(E), r5(D), w5(E)
Step1. We start with an empty graph with five vertices labeled T1, T2, T3, T4, T5. Step 2 and
3.
w1(A): A is subsequently read by T2, so add edge T1 → T2
r2(A): no subsequent writes to A, so no new edges
w1(B): B is subsequently read by T4, so add edge T1 → T4
Page 25
Database Management System (BCS403)
Step 5.This graph has no cycles, so the original schedule must be serializable.
Moreover, since one way to topologically sort the graph is T3–T1–T4–T2–T5, one serial
schedule that is conflict-equivalent is
w3(C), w1(A), w1(B), r4(B), w4(E), r2(A), r2(C), w2(D), r5(D), w5(E)
How Serializibilty is used for Concurrency Control
The approach taken in most commercial DBMS is to design protocols that if followed by every
individual transaction or if enforced by a DBMS concurrency control subsystem will ensure
serializibility of all schedules in which the transaction participate.
When transaction is submitted continuously the system finds difficulty to determine when to
start or end transaction. Serializibility theory can be adapted to deal with this problem by
considering only the committed projection of schedules committed projection C(s) of a
schedule S includes only the operations in S that belong to committed transactions.
Concurrency protocols:
3. Multi-version protocol
4. Optimistic protocols
Two phase locking: Locking the data items to prevent concurrent terms from interfering with
one another and enforcing serializibility.
Time Stamp ordering: Where transaction is assigned a unique timestamp and ensures that
any conflicting operations are executed in order of the transaction time stamp.
Optimistic protocol: Check possible serializibility violations after the transactions terminate
but before they are permitted to commit.
View Equivalence and view Serializibility
Two schedules S and S’ are said to be view equivalent of the following three conditions holds.
Page 26
Database Management System (BCS403)
1. The same set of transactions participate in S and S’, Sand S’ include the same
operations of those transactions.
2. For any operations r1(x) of Ti in S , if the value of X read by the operation has taken
written by an operation wj(x) of Tj the same condition must hold for the value of X
read by operation ri(x) of Ti in S’.
3. If the operation wk(y) of Tk is the last operation to write item Y in S , then wk(y) of
Tk must also be the last operation to write item Y in S’.
The definitions of conflict serializibility and view serializibilty are similar if a condition known
as the constrained write assumption holds on all transactions in the schedule.
The definition of view serializibilty is less restrictive than that of conflict serializibilty under
the unconstrained write assumption where the value written by an operation wi(x) in Ti can be
independent of its old value from database.
Example: Sg of three transactions T1: r1(X); w1(X); T2: w2(X); and T3:w3(X): Sg: r1(X);
w2(X); w1(X); w3(X); c1; c2; c3;
In Sg the operations w2(X) and w3(X) are blind writes, since T2 and T3 do not read the value
of X. The schedule Sg is view serializable, since it is view equivalent to the serial schedule
T1, T2, T3. However, Sg is not conflict serializable, since it is not conflict equivalent to any
serial schedule.
Other Types of Equivalence of Schedules
Some applications can produce schedules that are correct by satisfying conditions less stringent
than either conflict serializibilty or view serializibilty.
An example is the type of transactions known as debit-credit transactions, for example those
that apply deposits and withdrawals to a data item whose value is the current balance of a bank
account. They update the values of X by adding or subtracting X. Consider the following
transactions each of which may be used to transfer an amount of money between two bank
accounts:
Page 27
Database Management System (BCS403)
are commutative, we know that the order of executing the sequences consisting of (read, update,
write) is not important as long as each (read, update, write) sequence by a particular transaction
Ti on a particular item I is not interrupted by conflicting operations. Hence, the schedule Sh is
considered to be correct even though it is not serializable.
Q1: Check whether each schedule is serializable? If a schedule is serializable then, model
its equivalent serial schedule.
T1:R1(X); R1(Z); W1(X);
T2:R2(X); R2(Y); W2(Z); W2(Y);
T3:R3(X); R3(Y); W3(Y);
S1: R1(X); R2(Z); R1(Z); R3(X); R3(Y); W1(X); W3(Y); R2(Y); W2(Z); W2(Y);
S2: R1(X); R2(Z); R3(X); R1(Z); R2(Y); R3(Y); W1(X); W2(Z); W3(Y); W2(Y);
Q2: How Serializibilty is used for Concurrency Control
Q3: What is a schedule? Explain with an example conflict serializable schedule.
Page 28
Database Management System (BCS403)
SERIALIZABLE.SERIALIZABLE here is based on not allowing violations that cause dirty
read, unrepeatable read, and phantoms, and it is thus not identical to the way serializability. If
a transaction executes at a lower isolation level than SERIALIZABLE, then one or more of the
following three violations may occur:
1. Dirty read. A transaction T1 may read the update of a transaction T2, which has not yet
committed. If T2 fails and is aborted, then T1 would have read a value that does not exist and
is incorrect.
2. Nonrepeatable read. A transaction T1 may read a given value from a table. If another
transaction T2 later updates that value and T1 reads that value again, T1 will see a different
value.
3. Phantoms. A transaction T1 may read a set of rows from a table, perhaps based on some
condition specified in the SQL WHERE-clause. Now suppose that a transaction T2 inserts a
new row r that also satisfies the WHERE-clause condition used in T1, into the table used by
T1. The record r is called a phantom record because it was not there when T1 starts but is there
when T1 ends. T1 may or may not see the phantom, a row that previously did not exist. If the
equivalent serial order is T1 followed by T2, then the record r should not be seen; but if it is T2
followed by T1, then the phantom record should be in the result given to T1. If the system
cannot ensure the correct behavior, then it does not deal with the phantom record problem.
An entry of Yes indicates that a violation is possible and an entry of No indicates that it
is not possible. READ UNCOMMITTED is the most forgiving, and SERIALIZABLE is the
most restrictive in that it avoids all three of the problems mentioned above.
A sample SQL transaction might look like the following:
EXEC SQL WHENEVER SQLERROR GOTO UNDO;
EXEC SQL SET TRANSACTION
READ WRITE
DIAGNOSTIC SIZE 5
ISOLATION LEVEL SERIALIZABLE;
Page 29
Database Management System (BCS403)
EXEC SQL INSERT INTO EMPLOYEE (Fname, Lname, Ssn, Dno, Salary)
VALUES ('Robert', 'Smith', '991004321', 2, 35000);
EXEC SQL UPDATE EMPLOYEE
SET Salary = Salary * 1.1 WHERE Dno = 2;
EXEC SQL COMMIT;
GOTO THE_END;
UNDO: EXEC SQL ROLLBACK;
THE_END: ... ;
The above transaction consists of first inserting a new row in the EMPLOYEE table and then
updating the salary of all employees who work in department 2. If an error occurs on any of the
SQL statements, the entire transaction is rolled back.
Snapshot Isolation. Another isolation level, known as snapshot isolation, is used in some
commercial DBMSs, and some concurrency control protocols exist that are based on this
concept. The basic definition of snapshot isolation is that a transaction sees the data items that
it reads based on the committed values of the items in the database snapshot (or database state)
when the transaction starts
Q1: Explain transaction support in SQL.
Page 30