KEMBAR78
All Unit Notes | PDF
0% found this document useful (0 votes)
133 views109 pages

All Unit Notes

syllbus

Uploaded by

Pawan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
133 views109 pages

All Unit Notes

syllbus

Uploaded by

Pawan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 109
DIGITAL NOTES ON DATABASE MANAGEMENT SYSTEMS 8(S2) The selection operator ¢ specifies the tuples to retain through a selection condition. In general, the selection condition is a boolean combination (i.., an expression using the logical connectives ‘A and V) of ferms that have the form attribute op constant or attribute! op attribute2, where op is one of the comparison operators <, <=, >=, or> The projection operator 7 allows us to extract columns from a relation; for example, we can find out all sailor names and ratings by using x. The expression nsname,rating(S2) Suppose that we wanted to find out only the ages of sailors. The expression age(S2) a single tuple with age=35.0 appears in the result of the projection. This follows from the definition of a relation as a set of tuples. However, our discussion of relational algebra and calculus assumes that duplicate elimination is always done so that relations are always sets of tuples Set Operations The following standard operations on sets are also available in relational algebra: union (U), intersection (1), set-difference (~), and cross-product (*). Union: & 1 S returns a relation instance containing all tuples that occur in either relation instance R or relation instance S (or both). R and S must be union-compatible, and the schema of the result is defined to be identical to the schema of R. Intersection: R 1 $ returns a relation instance containing all tuples that occur in both R and S. The relations R and S must be union-compatible, and the schema of the result is defined to be identical to the schema of B. Set-difference: R ~ S returns a relation instance containing all tuples that occur in R but not in S. The relations R and S must be union-compatible, and the schema of the result is defined to be identical to the schema of R. Cross-product: R * S returns a relation instance whose schema contains all the fields of R (in the same order as they appear in R) followed by all the fields of S (in the same order as they appear in S). The result of R x S contains one tuple (r, s (the concatenation of tuples r and s) for each pair of tuples r © R, s © S. The cross-product opertion is sometimes called Cartesian product. sid] sname | rating] 37 | Lubbe SE] Rusty [TO [ 35.0) Figure 4.9 SI S2 same [rating [age Dustin 7 45.0 Figure 4.10 SI - S2 The result of the cross-product $1 x R1 is shown in Figure 4.11 The fields in SI % RI have the same domains as the corresponding fields in RI and SI. In Figure 4.11 sid is listed in parentheses to emphasize that it is not an inherited field name; only the corresponding domain is inherited Dustin OTL TOTO Dustin | 7 45.0) T03| T112/96 31 Lubber| & 535.5] TOI T0/T0/96' 31 Tubber| 55.5) 703 z/ 3: Rusty [10 3.0 TOT] TOTO? 3: Rusty [10 335.0) T03| T1727 Figure4.11._ SIxRI [eid Tename T rating age bid [day 22 Renaming We introduce a renaming operator p for this purpose. The expression p(R(F ), £) takes an arbitrary relational algebra expression £ and returns an instance of a (new) relation called R. R contains the same tuples as the result of £, and has the same schema as E, but some fields are renamed. The field names in relation R are the same as in E, except for fields renamed in the renaming list F. For example, the expression o(C(1 — sid, 5 — sid2), $1 x R1) returns a relation that contains the tuples shown in Figure 4.11 and has the followi ng schema: C(sid!: integer, sname: string, rating: integer, age: teal, sid2: integer, bid: integerday: dates) Joins The join operation is one of the most useful operations in relational algebra and is the most commonly used way to combine information from two or more relations. Although a join can be defined as a cross-product followed by selections and projections, joins arise much more frequently in practice than plain cross-products.joins have received a lot of attention, and there are several variants of the join operation. Condition Joins The most general version of the join operation accepts a join condition ¢ and a pair of relation instances as arguments, and retums a relation instance. The join condition is identical to a selection condition in form. The operation is defined as follows: RADcS= aclR * S) Thus is defined to be a cross-product followed by a selection. Note that the condition ¢ can (and typically does) refer to attributes of both R and S, wid) [sname [rating] age [isd] bid day Dustin a TG Cpanel 355] 1 Figure 4.12 $1 4DS1.sid § is an equijoin in which equalities are specified on all fields having the same name in R and S. In this case, we can simply omit the join condition; the default is that the join condition is a collection of equalities on all common, fields. Division ‘The division operator is useful for expressing certain kinds of queries, for example: “Find the names of sailors who have reserved all boats.” Understanding how to use the basic operators of the algebra to define division is a useful exercise. (QI) Find the names of sailors who have reserved boat 103. This query can be written as follows: sname((obid=103Reserves) Sailors) We first compute the set of tuples in Reserves with bid = 103 and then take the natural join of this set with Sailors. This expression can be evaluated on instances of Reserves and Sailors. Evaluated on the instances R2 and $3, it yields a relation (Q2) Find the names of sailors who have reserved a red boat. sname((acolor=‘red Boats) I> Reserves a> Sailors This query involves a series of two joins. First we choose (tuples describing) red boats. (Q3) Find the colors of boats reserved by Lubber. color((asname='Lubber Sailors) Reserves Boats) This query is very similar to the query we used to compute sailors who reserved red boats. On instances B1, R2, and $3, the query will return the colors green and red. (Q4) Find the names of sailors who have reserved at least one boat. msname(Sailors Reserves) (Q5) Find the names of sailors who have reserved a red or a green boat. p(T empboats, (acolor'red Boats) U (ecolor~green Boats) asname(Tempboats ID Reserves

Reserves) PT empgreen, xsid((acolor='green Boats) I> Reserves) asname((Tempred 1 Tempgreen) Sailors) (Q7) Find the names of sailors who have reserved at least two boats, p(Reservations, nsid,sname,bid(Sailors Reserves)) plReservationpairs(1 — sid 1, 2—> snamel, 3 bid|, 4 — sid2, 5 sname2,6 — bid2),Reservations x Reservations) msnamelo(sid\=sid2) 1 (bid\=bid2)Reservationpairs (Q8) Find the sids of sailors with age over 20 who have not reserved a red boat. nsid(cage>20Sailors) ~nsid\ (acolor'red-Boats) I> Reserves I> Sailors) This query illustrates the use of the set-difference operator. Again, we use the fact that sid is the key for Sailors. (Q9) Find the names of sailors who have reserved all boats. The use of the word all (or every) is a good indication that the division operation might be applicable: p(T empsids, (nsid, bidReservesnbidBoats)) sname(Tempsids Sailors) (Q10) Find the names of sailors who have reserved all boats called Interlake. p(T empsids, (nsid, bidReserves\inbid(obname='Interlake' Boats))) sname(Tempsids Sailors) RELATIONAL CALCULUS Relational calculus is an alternative to relational algebra. In contrast to the algebra, which is procedural, the calculus is nonprocedural, or declarative, in that it allows us to describe the set of answers without being explicit about how they should be computed, ‘Tuple Relational Calculus A tuple variable is a variable that takes on tuples of a particular relation schema as values. That every value assigned to a given tuple variable has the same number and type of fields, (QI1) Find all sailors with a rating above 7. {SISE Sailors * S. rating > 7} with respect to the given database instance, F evaluates to (or simply ‘is’) true if one of the following holds: " Fisanatomic formula R — Rel, and Ris assigned a tuple in the instance of relation Rel. F is a comparison R.a op S.b, R.a op constant, or constant op R.a, and the tuples assigned to R and S have field values R.a and S.b that make the comparisontrue. Fis of the form 7p, and p is not true; or of the form p “ q, and both p and q are true; or of the form p V q, and one of them is true, or of the formp — qand q is true wheneverd p is true. Fis of the form R(p(R)), and there is some assignment of tuples to the free variables in p(R), including the variable R,6 that makes the formula p(R) true. Fis of the form R(p(R)), and there is some assignment of tuples to the free variables in p(R) that makes the formula p(R) true no matter what tuple is assigned to R. (Q12) Find the names and ages of sailors with a rating above 7 {P| S Sailors(S.rating>7 P.name=S.sname Page = S.age)} This query illustrates a useful convention: P is considered to be a tuple variable with exactly two fields, which are called name and age, because these are the only fields of P that are mentioned and P does not range over any of the relations in the query; that is, there is no subformula of the form P_ Reiname. (Q13) Find the sailor name, boat id, and reservation date for each reservation {P| R Reserves 3S Sailors (Rsid=S.sid P.bid=R.bid P.day=Rday _P.sname = S.sname)} (Ql) Find the names of sailors who have reserved boat 103. {P| S Sailors R_ Reserves(R.sid=S.sid _R.bid=103 sP.sname=S.sname)} This query can be read as follows: “Retrieve all sailor tuples for which there exists a tuple in Reserves, having the same value in the sid field, and with bid = 103.” (Q2) Find the names of sailors who have reserved a red boat. {P| S Sailors3R Reserves(R.sid=S.sid _ P.sname = S.sname B Boats(B.bid=R.bid B.color ~red’))} DATABASE MANGEMENT SYSTEMS This query can be read as follows: “Retrieve all sailor tuples S for which there exist tuples R in Reserves and B in Boats such that S.sid = R.sid, R.bid = B.bid, and B.color =red:” (Q7) Find the names of sailors who have reserved at least two boats. {P| S Sailors Rl Reserves R2 Reserves(S.sid=R\.sid ARL.sid = R2.sid R1.bid# R2.bid — P.sname = S.sname)} (Q9) Find the names of sailors who have reserved all boats. {P| S_ Sailors B Boats ( R_ Reserves(S.sid=R.sid R.bid=B.bid P.sname = S.sname))} (Q14) Find sailors who have reserved all red boats. {S|S Sailors B__ Boats (B.color=red' ( R © Reserves(S.sid=R.sid —R.bid = B.bid)))} Domain Relational Calculus A domain variable is a variable that ranges over the values in the domain of some attribute (e.g., the variable can be assigned an integer if it appears in an attribute whose domain is the set of integers). A DRC query has the form { (x1, 2... , Xo | PC (X1,X2,.. 4 Xn) )}, where each xjis either a domain variable ora constant and p(_(X1,X2,.- -» Xo) ) denotesa DRC formula whose only free variables are thevari-ables among the X,, 1 $iSn. The result of thisqueryisthesetofalltuples (x1, x2,.. ..X») forwhichtheformulaevaluatesto true. (Ql) Find the names of sailors who have reserved boat 103. { (ND | 1, TAC C,N,T,Ad — Sailors Ir, Br, D( (ir,Br,D) Reserves Ir-l Br= 103))} (Q2) Find the names of sailors who have reserved a red boat. {(N) | 1 T,AC CN, T,A) — Sailors |, Br,D) Reserves (Br, BN,red') — Boats)} (Q7) Find the names of sailors who have reserved at least two boats. { (N)| TAC CNT, A) Sailors Br, B2, D1, D2¢_¢I, Bri, D1) Reserves (1, Br2,D2) Reserves Brl = Br2) (Q9) Find the names of sailors who have reserved all boats. { (N)| 1, T,ACCN,T, A) — Sailors B, BN, C(>( (B, BN, C) Boats) ( (ir, Br,D) — Reserves(i= Ir Br=B))))} THE FORM OF A BASIC SQL QUERY This section presents the syntax of a simple SQL query and explains its meaning through a conceptual evaluation strategy. A conceptual evaluation strategy is a way to evaluate the query that is intended to be easy to understand, rather than efficient. A DBMS would typically execute a query ina different and more efficient way Sid_| sname_| rating | age sid | bid | day 22 [Dustin 45.0 101 | 10/10/98 29_[ Brutus 33.0 TO2 [101098 31__|Lubber 35.5 103 | 1078798 32_ [Andy 25.5 T04 [1077798 38__|[ Rust 35.0 31_[102 [1110/98 64_| Horatio 35.0) 103 [117698 71 [Zorba 16.0 3110s T1727 74_| Horatio 35.0 TOT | 9/5798 85_| Art 25.5 TO2 | ORE 95 [Bob__[3 63.5 103 [978/98 Figure 5.1An Instance S 3 of SailorsFigure 5.2 An Instance R2 of Rese ‘bid | bname | color] J01| Interlake | blue 102| Interlake | red 103] Clipper _| greenl 704| Marine | rei (Q15) Find the names and ages of all sailors. SELECT DISTINCT S.sname, S.age FROMSailors § The answer to this query with and without the keyword DISTINCT on instance $3 of Sailors is shown in Figures 5.4 and 5.5. The only difference is that the tuple for Horatio appears twice if DISTINCT is omitted; this is because there are two sailors called Horatio and age 35. (QI1) Find all sailors with a rating above 7. SELECT S.sid, S.sname, S.rating, S.age FROM Sailors AS S WHERE S.rating > 7 (Q16) Find the sids of sailors who have reserved a red boat. SELECTR sid FROMBoats B, Reserves R WHERE B.bid = R-bid AND B.color = ‘red’ (Q2) Find the names of sailors who have reserved a red boat. SELECT S.sname FROM Sailors 8, Reserves R, Boats B WHERE S.sid = R.sid AND Rbid = B.bid AND B.color = “red” (Q3) Find the colors of boats reserved by Lubber. SELECT B.color FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid AND S.sname = ‘Lubber’ (Q4) Find the names of sailors who have reserved at least one boat. SELECT S.sname FROMSailors S, Reserves R WHERES sid = R.sid Expressions and Strings in the SELECT Command SQL supports a more general version of the select-list than just a list of columns. Each item in a select-list can be of the form expression AS column name, where expression is any arithmetic or string expression over column names (possibly prefixed by range variables) and constants. (QS) Compute increments for the ratings of persons who have sailed two different boats on the same day. SELECT S.sname, S.rating+1 AS rating FROM Sailors S, Reserves R1, Reserves R2 WHERE S.sid = R1.sid AND S.sid = R2.sid AND RI.day = R2.day AND RL.bid <> R2.bid Also, each item in a qualification can be as general as expression! = expression? SELECT SI.sname AS namel, $2.sname AS name2 FROM Sailors S1, Sailors S82 WHERE 2*S1 rating = S2.rating-1. (Q6) Find the ages of sailors whose name begins and ends with B and has at least three characters. SELECT S.age FROMSailors S WHERES.sname LIKE ‘B %B” The only such sailor is Bob, and his age is 63.5. UNION, INTERSECT, AND EXCEPT SQL provides three set-manipulation constructs that extend the basic query form pre-sented carlier. Since the answer to a query is a multiset of rows, it is natural to consider the use of operations such as union, intersection, and difference. SQL supports these operations under the names UNION, INTERSECT, and EXCEPT. 4 SQL also provides other set operations: IN (to check if an element is in a given set),op ANY,op ALL (tocom-pare a value with the elements in a given set, using comparison operator op), and EXISTS (to check if a set is empty). IN and EXISTS can be prefixed by NOT, with the obvious modification to their meaning. We cover UNION, INTERSECT, and EXCEPT in this section, Consider the following query: (Ql) Find the names of sailors who have reserved both a red and a green boat. SELECT S.sname FROM Sailors S, Reserves R1, Boats B1, Reserves R2, Boats B2 WHERE S.sid = RI.sid AND RI.bid = B1.bid AND S.sid = R2.sid AND R2.bid = B2.bid AND B1.color='ted’ AND B2.color = ‘green’ (Q2) Find the sids of all sailors who have reserved red boats but not green boats. SELECT S.sid FROMSailors §, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = ‘red’ EXCEPT SELECT S2.sid FROM Sailors $2, Reserves R2, Boats B2Z WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = ‘green’ NESTED QUERIES A nested query is a querythat has another query embedded within it; the embedded query is called a subquery. (Ql) Find the names of sailors who have reserved boat 103. SELECT S.sname FROMSailors S WHERES.sid IN (SELECT Rosid FROMReserves R. WHERER bid = 103 ) (Q2) Find the names of sailors who have reserved a red boat. SELECT S.sname FROM _ Sailors S WHERE S.sid IN (SELECT R.sid FROMReserves R WHERER.bid IN ( SELECT B.bid FROMBoats B WHEREB.color = ‘red’ ) (Q3) Find the names of sailors who have not reserved a red boat. SELECTS.sname FROMSailors $ WHERE S.sid NOT IN ( SELECT R.sid FROMReserves R WHERER bid IN ( SELECT B.bid FROMBoats B WHEREB.color = ‘red’ ) Correlated Nested Queries In the nested queries that we have seen thus far, the inner subquery has been completely independent of the outer query: (QD) Find the names of sailors who have reserved boat number 103. SELECT S.sname FROMSailors S WHEREEXISTS ( SELECT * FROMReserves R WHERER bid = 103 AND Rssid = S.sid ) Set-Comparison Operators (QI) Find sailors whose rating is better than some sailor called Horatio. SELECT S.sid FROMSailors $ WHERES rating > ANY ( SELECT S2.zating FROMSailors $2 WHERES2.sname = “Horatio” ) (Q2) Find the sailors with the highest rating SELECT S.sid FROM Sailors S WHERE —S.rating >= ALL ( SELECT S2.rating FROM Sailors S2) More Examples of Nested Queries (QI) Find the names of sailors who have reserved both a red and a green boat. SELECT S.sname FROMSGailors S, Reserves R, Boats B WHERES sid = R.sid AND R.bid = B.bid AND B.color = ‘red” AND S.sid IN (SELECT S2.sid FROMSailors $2, Boats B2, Reserves R2 WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = ‘green’ ) ( Q9) Find the names of sailors who have reserved all boats. SELECT S.sname FROMSailors S WHERENOT EXISTS (( SELECT B.bid FROMBoats B ) EXCEPT (SELECT Rbid FROMReserves R WHERER sid = S.sid )) AGGREGATE OPERATORS We now consider a powerful class of constructs for computing aggregate values such as MIN and SUM. 1. COUNT ([DISTINCT] A): The number of (unique) values in the A column 2.SUM ([DISTINCT] A): The sum of all (unique) values in the A column, 3.AVG ([DISTINCT] A): The average of all (unique) values in the A column, 4,MAX (A): The maximum value in the A column. 5.MIN (A): The minimum value in the A column. (Ql) Find the average age of all sailors. SELECT AVG (S.age) FROMSailors $ (Q2) Find the average age of sailors with a rating of 10. SELECT AVG (S.age) FROMSailors $ WHERES rating = 10 SELECTS. sname,MAX(S.age) FROMSailors S 03) Count the number of sailors. SELECT COUNT (*) FROMSailors $ The GROUP BY and HAVING Clauses ‘we want to apply aggregate operations to each of a number of groups of rows in a relation, where the number of groups depends on the relation instance (ie, is not known in advance), (Q31) Find the age of the youngest sailor for each rating level. SELECT MIN (S.age) FROMSailors S WHERES . rating = i 032) Find the age of the youngest sailor who is eligible to vote (i.e, is at least 18 years old) for each rating level with at least two such sailors. SELECTS.rating, MIN (S.age) AS minageGROUP BY S.rating HAVING COUNT (*)>1 More Examples of Aggregate Queries Q3) For each red boat, find the number of reservations for this boat. SELECT B.bid, COUNT (*) AS sailorcount FROM Boats B, Reserves R WHERE R.bid = B.bid AND B.color = ‘red’ GROUP BY B.bid SELECT B.bid, COUNT (*) AS sailorcount FROM Boats B, Reserves R WHERE R.bid = B.bid GROUP BY B.bid HAVING B.color = ‘red’ (Q4) Find the average age of sailors for each rating level that has at least two sailors. SELECTS.rating, AVG (S.age) AS avgage FROMSailors $ GROUP BY S.rating HAVING COUNT (*)>1 (Q5) Find the average age of sailors who are of voting age (i-e., at least 18 years old) for each rating level that has at least two sailors. SELECTS.rating, AVG (S.age ) AS avgage FROMSailors S WHERE S. age >= 18 GROUP BY S.rating HAVING 1 <( SELECT COUNT (*) FROMSailors $2 WHERE S.rating = S2.rating (Q6) Find the average age of sailors who are of voting age (ie., at least 18 years old) for each rating level that has at least two such sailors, SELECTS.rating, AVG (S.age ) AS avgage FROMSailors S WHERE S. age > 18 GROUP BY S.rating HAVING 1 <( SELECT COUNT (*) FROMSailors $2 WHERE S.rating ~ S2.rating AND S2.age >= 18 ) The above formulation of the query reflects the fact that it is a variant of Q3S. The answer to Q36 on instance $3 is shown in Figure 5.16, It differs from the answer to Q3S in that there is no tuple for rating 10, since there is only one tuple with rating 10 and age 2 18 SELECTS rating, AVG ( S.age ) AS avgage FROMSailors S WHERE S. age > 18 GROUP BY S.rating HAVING COUNT (*)>1 This formulation of Q36 takes advantage of the fact that the WHERE clause is applied before grouping is done; thus, only sailors with age > 18 are left when grouping is done. It is instructive to consider yet another way of writing this query: SELECT Temp.rating, Temp.avgage FROM (SELECT S.rating, AVG ( S.age ) AS avgage, COUNT (*) AS FROM Sailors S WHERE S. age > 18 GROUP BY S.rating ) AS Temp WHERE Temp.ratingcount > | NULL VALUES we have assumed that column values in a row are always known, In practice column values can be unknown. For example, when a sailor, say Dan, joins a yacht club, he may not yet have a rating assigned, Since the definition for the Sailors table has a rafing column, what row should we insert for Dan? What is needed here is a special value that denotes unknown. SQL provides a special column value called mul! to use in such situations. We use null when the column value is either unknown or inapplicable. Using our Sailor table definition, we might entertherow (98, Dan, null,39) torepresentDan. Thepresence ofnul/valuescomplicates many issues, and we consider the impact of null values on SQL in this section. Comparisons Using Null Values Consider a comparison such as rating = 8. If this is applied to the row for Dan, is this condition true or false? Since Dan’s rating is unknown, it is reasonable to say that this comparison should evaluate to the value unknown, SQL also provides a special comparison operator IS NULL to test whether a column value is null; for example, we can say rating |S NULL, which would evaluate to true on the row representing Dan. We can also say rating IS NOT NULL, which would evaluate to false on the row for Dan. Logical Connectives AND, OR, and NOT Now, what about boolean expressions such as rating = 8 OR age < 40 and rating = 8 AND age < 40? Considering the row for Dan again, because age < 40, the first expression evaluates to true regardless of the value of rating, but what about the second? We can only say unknown INTRODUCTION TO VIEWS A view is a table whose rows are not explicitly stored in the database but are computed as needed from a view de nition. Consider the Students and Enrolled relations, CREATE VIEW B-Students (name, sid, cour AS SELECT S.sname, S.sid, E.cid FROMStudents S, Enrolled E WHERES.sid = Essid AND E.grade = °B' This view can be used just like a base table, or explicitly stored table, in de ning new queries or views, DESTROYING/ALTERING TABLES AND VIEWS If we decide that we no longer need a base table and want to destroy it (ic., delete all the rows and remove the table de nition information), we can use the DROP TABLE command, For example, DROP TABLE Students RESTRICT destroys the Students table unless some view or integrity constraint refers to Students; if so, the command fails. Ifthe keyword RESTRICT is replaced by CASCADE, Students is dropped and any ref-erencing views or integrity constraints are (recursively) dropped as well; one of these two keywords must always be speci ed. A view can be dropped using the DROP VIEW command, which is just like DROP TABLE ALTER TABLE modies the structure of an existing table, To add a column called maiden-name to Students, for example, we would use the following command: ALTER TABLE Students ADD COLUMN maiden-name CHAR(10) TRIGGERS, A trigger is a procedure that is automatically invoked by the DBMS in response to specified changes to the database, and is typically specified by the DBA. A database that has a set of associated triggers is called an active database. AA trigger description contains three parts: Event: A change to the database that activates the trigger. Condition: A query or test that is run when the trigger is activated. Action: A procedure that is executed when the trigger is activated and its con-dition is tue, A trigger action can examine the answers to the query in the condition part of the trigger, refer to old and new values of tuples modified by the statement activating the trigger, execute new queries, and make changes to the database Examples of Triggers in SQL. The examples shown in Figure 5.19, written using Oracle 7 Server syntax for defining triggers, illustrate the basic concepts behind triggers, (The SQL:1999 syntax for these triggers is similar; we will see an example using SQL:1999 syntax shortly.) The trigger called init count initializes a counter variable before every execution of an INSERT statement that adds tuples to the Students relation. The trigger called iner count increments the counter for each inserted tuple that satisfies the condition age < 18. CREATE TRIGGER init count BEFORE INSERT ON Students /* Event */ DECLARE - count INTEGER; BEGIN 7* Action */ ~ count := 0; END CREATE TRIGGER incr count AFTER INSERT ON Students /* Event */ WHEN (new.age < 18) _/* Condition; ‘new’ is just-inserted tuple */ FOR EACH ROW BEGIN /* Action; a procedure in Oracle’s PL/SQL syntax */ count := count * 1; END (identifying the modified table, Students, and the kind of modifying statement, an INSERT), and the third field is the number of inserted Students tuples with age < 18. (The trigger in Figure 5.19 only computes the count; an additional trigger is required to insert the appropriate tuple into the statistics table.) CREATE TRIGGER set count AFTER INSERT ON Students /* Event REFERENCING NEW TABLE AS InsertedTuples FOR EACH STATEMENT INSERT/* Action * INTOStatisticsTable(ModifiedTable, ModificationType, Count)SELECT ‘Students’, ‘Insert’, COUNT * FROM InsertedTuples I WHERE Lage < 18 UNIT-I1 NORMALIZATION SCHEMA REFINEMENT We now present an overview of the problems that schema refinement is intended to address and a refinement approach based on decompositions, Redundant storage of information is the root cause of these problems. Although decomposition can eliminate redundancy, it can lead to problems of its own and should be used with caution. Problems Caused by Redundancy Storing the same information redundantly, that is, in more than one place within a database, can lead to several problems: Redundant storage: Some information is stored repeatedly. Update anomalies: If one copy of such repeated data is updated, an inconsistency is created unless all copies are similarly updated. Insertion anomalies: It may not be possible to store some information unless some other information is stored as well Deletion anomalies: It may not be possible to delete some information without losing some other information as well, Consider a relation obtained by translating a variant of the Hourly Emps entity set from Chapter 2 ~ Hourly Emps(ssn, name, lot, rating, hourly wages, hours worked) If we delete all tuples with a given rating value (e.g., we delete the tuples for Smethurst and Guldu) we lose the association between that rating value and its hourly wage value (a deletion anomaly). Ideally, we want schemas that do not permit redundancy, but at the very least we want to be able to identify schemas that do allow redundancy. Even if we choose to accept a schema with some of these drawbacks, perhaps owing to performance considerations, we want to make an informed decision. Use of Decompositions Intuitively, redundaney arises when a relational schema forces an association between attributes that is not natural. Functional dependencies (and, for that matter, other ICs) can be used to identify such situations and to suggest refinements to the schema We can deal with the redundancy in Hourly Emps by decomposing it into two relations: Hourly Emps2(ssn, name, lot, rating, hours worked) Wages(rating, hourly wages) - 5 Unless we are careful, decomposing a relation schema can create ‘more problems than it solves. Two important questions must be asked repeatedly: 1. Do we need to decompose a relation? 2. What problems (if any) does a given decomposition cause? Ifa relation schema is in one of these normal forms, we know that certain kinds of problems cannot arise. Considering the normal form of a given relation schema can help us to decide whether or not to decompose it further. If we decide that a relation schema must be decomposed further, we must choose a particular decomposition (ie,, a particular collection of smaller relations to replace the given relation) [DATABASE MANAGEMENT = SYSTEMS 43 FUNCTIONAL DEPENDENCIES A functional dependeney (FD) is a kind of IC that generalizes the concept of a key. Let R be a relation schema and let X and ¥ be nonempty sets of attributes in R. We say that an instance r of R satisfies the FD X'! ¥ 1 if the following holds for every pair of tuples ¢1 and 22 in r: Wl:X= 2X, then 1:¥=2-¥ A primary key constraint isa ' ‘special case of an FD. The attributes in the key play the role of X, and the set of all attributes in the relation plays the role of Y. Note, however, that the definition of an FD does not require that the set Y be minimal; the additional minimality condition must be met for X to be a key. If X ! ¥ holds, where Y is the set of all attributes, and there is some subset V of X such that V1 Yholds, then X is a super key; if V is a strict subset of X, then Xis not a key - In the rest of this chapter, we will see several examples of FDs that are not key constraints. REASONING ABOUT FUNCTIONAL DEPENDENCIES The discussion up to this point has highlighted the need for techniques that allow us to carefully examine and further re ne relations obtained through ER design (or, for that matter, through other approaches to conceptual design. Given a set of FDs over a relation schema R, there are typically several additional FDs that hold over R whenever all of the given FDs hold. As an example, consider: Workers(ssn, name, lot, did, since) Closure of a Set of FDs The set of all FDs implied by a given set F of FDs is called the closure of F and is denoted as F +. An important question is how we can infer, or compute, the closure of a given set F of FDs. The answer is simple and elegant. The following three rules, called Armstrong's Axioms, can be applied repeatedly to infer all FDs implied by a set F of FDs. We use X,Y, and Z to denote sets of attributes over a relation schema R Reflexivity: If. X Y, then.X/ ¥. Augmentation: If X ! ¥, then XZ ! YZ for any Z. ‘Transitivity: If.X ! Y and Y ! Z, then X !Z. Ammstrong's Axioms are sound in that they generate only FDs in F = when applied to a set F of FDs. They are complete in that repeated application of these rules will generate all FDs in the closure F . (We will not prove these claims.) Tris convenient to use some additional rules while reasoning about F Union: If X/ Yand.X!Z, then. X! YZ. Decomposition: If X / ¥Z, then X ! ¥and X!Z. ¢ additional rules are not essential; their soundness can be proved using Arm-strong’s Axioms. use a more elaborate version of the Contracts relation: Contracts (contractid, supplierid, projectid, deptid, partid, qty, value) The following ICs are known to hold: 1, The contract id Cis a key: C / CSJDPQV. DATABASE MANAGEMENT. SSeS 2. A project purchases a given part using a single contract: JP ! C. 3. A department purchases at most one part from a supplier: SD / P. NORMAL FORMS Given a relation schema, we need to decide whether it is a good design or whether we need to decompose it into smaller relations, Such a decision must be guided by an understanding of what problems, if any, arise from the current schema. To provide such guidance, several normal forms have been proposed. If a relation schema is in one of these normal forms, we know that certain kinds of problems cannot arise. ‘The normal forms based on FDs are first normal form (INE), second normal form (2NE), third normal form (3NF), and Boyce-Codd normal form (BCNF). These forms have increasingly restrictive requirements: Every relation in BCNF is also in 3NF, every relation in 3NF is also in 2NF, and every relation in 2NF is in INF. A. relation is in first normal form if every field contains only atomic values, that is, not lists or sets. This requirement is implicit in our de nition of the relational model, Although some of the newer database systems are relaxing this requirement, in this chapter we will assume that it always holds. 2NF is mainly of historical interest. 3NF and BCNF are important from a database design standpoint. 15.5.1Boyce-Codd Normal Form Let R be a relation schema, X be a subset of the attributes of R, and let A be an attribute of R. R is in Boyce-Codd normal form if for every FD.X ! A that holds over R, one of the following statements is A 2X; that is, itis a trivial FD, or Xis.a super key Note that if we are given a set F of FDs, according to this de nition, ‘we must consider each dependency X'/ A in the closure Fs to determine whether R is in BCNF. However, we can prove that it is sufficient to check whether the left side of each dependency in Fis a super key (by computing the attribute closure and seeing if it includes all attributes of 2). KEYNonkey attrl—~ Nonkey-attr2~Nonkey attrk FDs ina BCNF Relation BCNF ensures that no redundancy can be detected using FD information alone. It is thus the most desirable normal form (from the point of view of redundancy) ‘Thus, if a relation is in BCNF, every field of every tuple records a piece of information that cannot be inferred (using only FDs) from the values in all other elds in (all tuples of) the relation instance. ‘Third Normal Form Let R be a relation schema, X be a subset of the attributes of R, and A be an attribute of R. & is in third normal form if for every FD.X ! 4 that holds over R, one of the following statements is true: A2X; that is, itis a trivial FD, or Xs a super key, or Ais part of some key for R The definition of 3NF is similar to that of BCNF, with the only difference being the third condition, Every BCNF relation is also in 3NF, Partial dependencies are illustrated in Figure 15.9, and transitive dependencies are illustrated in Figure. Note that in Figure 15.10, the set X of attributes may or may not have ‘some attributes in common with KEY; the diagram should be interpreted as indicating only that X is not a subset of KEY. ( KEvattigutes x not in KEY avasaticex amie atuae t: Ard) a aoa KEYAtibiie A ——~<““Atributes x¢ae 2: Ais nkey AT ABASE MANAGEMENT <> SYSTEMS 48 ‘Transitive Dependencies The motivation for 3NF is rather technical. By making an exception for certain de-pendencies involving key attributes, we can ensure that every relation schema can be decomposed into a collection of 3NF relations using only decompositions that have certain desirable properties DECOMPOSITIONS As we have seen, a relation in BCNF is free of redundancy (to be precise, redundancy that can be detected using FD information), and a relation schema in 3NF comes close. If @ relation schema is not in one of these normal forms, the FDs that cause a violation can give us insight into the potential problems. A decomposition of a relation schema R consists of replacing the relation schema by two (or more) relation schemas that each contain a subset of the attributes of R and together include all attributes in . Lossless-Join Decomposition Let R be a relation schema and let F Be a set of FDs over R. A decomposition of R into two schemas with attribute sets X and Y is said to be a lossless-join decomposition with respect to F if for every instance r of R that satis es the dependencies in F, «(r)./() = All decompositions used to eliminate redundancy must be lossless. The following simple test is very useful Let R be a relation and F be a set of FDs that hold over R. ‘The decomposition of R into relations with attribute sets R and R; is lossless if and only if F - contains either the FD R, \ Ry {Ry or the FD Ry | RR, Dependency-Preserving Decomposition Consider the Contracts relation with attributes CS/DPQV from Section 15.4.1. The given FDs are C ! CS/DPQV, JP ! C, and SD ! P. Because SD is not a key the dependency SD ! P causes a violation of BCNF. Let R be a relation schema that is decomposed into two schemas with Xand Y, and let F be a set of FDs over R. The projection of F on X is, the set of EDs in the closure F(not just F !) that involve only attributes in X. ‘We will denote the projection of F on attributes X’as Fy . Note that a dependency U Vin Fis in . F,only if all the attributes in U and V are in X. ‘The decomposition of relation schema R with FDs F into schemas with attribute sets Xand Y is dependency-preserving if (F x [F,) «=F +. That is, ifwe take the dependencies in Fy and F; and compute the closure of their union, we get back all dependencies in the closure of F. Therefore, we need to enforce only the dependencies in Fy and F ; all EDs in F sare then sure to be satisfied. To enforce Fr, we need to examine only relation X (on inserts to that relation), To’enforce F; , we need to examine only relation ¥. NORMALIZATION Having covered the concepts needed to understand the role of normal forms and de-compositions in database design, we now consider algorithms for converting relations to BCNF or 3NF. If a relation schema is not in BCNF, it is possible to obtain a lossless-join decomposition into a collection of BCNF relation schemas. Unfortunately, there may not be any dependency-preserving decomposition into a collection of BCNF relation schemas Decomposition into BCNF ‘We now present an algorithm for decomposing a relation schema R into a collection of BCNF relation schemas: 1. Suppose that R is not in BCNF. Let XR, A be a single attribute in Rand X!A be an FD that causes a violation of BCNF. Decompose R into R-> A and X A 2. Ifeither R -> A or X A is not in BCNE, decompose them further by arecursive application of this algorithm, R-> A denotes the set of attributes other than A in R, and XA denotes the union of attributes in X and A. Since X/ A violates BCNF, it is not a trivial dependency; further, 4 is a single attribute. Therefore, A is not in X; that is, X \4'is empty. Thus,each decomposition carried out in Step is lossless-join. 1e set of dependencies associated with R -> A and XA is the projection of F onto their attributes. If one of the new relations is not in BCNE, we decompose it further in Step. Since a decomposition results in relations with strictly fewer attributes, this process will terminate, leaving us with a collection of relation schemas that are all in BCNF. Consider the Contracts relation with attributes CS/DPQV and key C. We are given FDs JP / C and SD ! P. By using the dependency SD ! P to guide the decomposition, we get the two schemas SDP and CSIDQV. SDP is in BCNF. Suppose that we also have the constraint that cach project deals with a single supplier: JS. This means that the schema CSJDOY is not in BCNF. So we decompose it further into JS and CIDQV. C ! JDQV holds over CJDQV; the only other FDs that hold are those obtained from this FD by augmentation, and therefore all FDs contain a key in the left side. Thus, each of the schemas SDP, JS, and CJDQV is in BCNE, and this collection of schemas also represents a lossless-join decomposition of CSJDQY. The steps in this decomposition process can be visualized as a tree, as shown in Figure. The root is the original relation CS/DPQY, and the leaves are the BCNF relations that are the result of the decomposition algorithm, namely, SDP, JS, and CSDOY. Intuitively, each internal node is replaced by its children through a single decomposition step that is guided by the FD shown just below the node. csiorav SYSTEMS 53 Redundancy in BCNF Revisited The decomposition of CSJDQV into SDP, JS, and CJDQV is not dependency-preserving. Intuitively, dependency JP ! C cannot be enforced without a join. One way to deal with this situation is to add a relation with attributes CJP. This is a subtle point: Each of the schemas CJP, SDP, JS, and CIDQV is in BCNF, yet there is some redundancy that can be predicted by FD information. In particular, if we join the relation instances for SDP and C/DQY and project onto the attributes CJP, ‘we must get exactly the instance stored in the relation with schema CIP. Minimal Cover for a Set of FDs A minimal cover for a set F of FDs is a set G of FDs such that: 1, Every dependency in Gis of the form X'! A, where 4 is asingle attribute. 2. The closure F + is equal to the closure G. 3. If we obtain a set H of dependencies from G by deleting one or more dependencies, or by deleting attributes from a dependency in G, then F 6-H Intuitively, a minimal cover for a set F of FDs is an equivalent set of dependencies that is minimal in two respects: (1) Every dependency is as small as possible; that is, each attribute on the left side is necessary and the right side is a single attribute. (2)Every dependency in it is required in order for the closure to be equal to F’=, As an example, let F be the set of dependencies: A!B, ABCD ! E, EF ! G, EF | H,and ACDF ! EG. First, let us rewrite ACDF ! EG so that every right side is a single attribute: ACDF ! E and ACDF !G. Next consider ACDF ! G. This dependency is implied by the following FDs: 4!B, ABCD ! E, and EF ! G. Therefore, we can delete it. Similarly, we can delete ACDF ! E. Next consider ABCD LE. Since A ! B holds, we can replace it with ACD ! E, (At this point, the reader should verify that each remaining FD is minimal and required.) Thus, a minimal cover for F is the set: A!B, ACD ! E, EF ! G,and EF ! H. The preceding example suggests a general algorithm for obtaining a minimal cover of a set F of FDs: 1, Put the FDs in a standard form: Obtain a collection G of equivalent FDs with a single attribute on the right side (using the decomposition axiom). Minimize the left side of each FD: For each FD in G, check each attribute in the left side to see if it can be deleted while preserving equivalence to Fs. Delete redundant FDs: Check each remaining FD in G to see if it can be deleted while preserving equivalence to F's. Dependency-Preserving Decomposition into 3NF Returning to the problem of obtaining a lossless-join, dependency- preserving decom-position into 3NF relations, let R be a relation with a set F of FDs that is a minimal cover, and let Ry; Rs : join decomposition of R. For 1 i'n, supy let F; denote the projection of F onto the attributes of R ,. Do the following: Identify the set N of dependencies in F that are not preserved, that is, not included in the closure of the union of Fs. For each FD.X/ A in N, create a relation schema XA and add it to the decom-position of R. Obviously, every dependency in F is preserved if we replace R by the Rs plus the schemas of the form XA added in this step. The Rs are given to be in 3NF. We can show that each of the schemas XA is in 3NF as follows: Since X ! 4 is in the minimal cover F, ¥ ! A does not hold for any Y that is a strict subset of X. Therefore, is a key for XA, ‘As an optimization, if the set NV contains several FDs with the same left side, say, X! AX! Ay: 2:2; X! Aq we can replace them with a single equivalent FD x14 Ay, Therefore, we produce one relation schema XA, : >: A,, instead of several Schgmas XA,; : +: ; XA,, which is generally preferable. Comparing this decomposition with the one that we obtained earlier in this section, we find that they are quite close, with the only difference being that one of them has CDJPQV instead of CJP and CIDQV. In general, however, there could be significant differences. Database designers typically use a conceptual design methodology (e.g., ER design) to artive at an initial database design. Given this, the approach of repeated decompositions to rectify instances of redundancy is likely to be the most natural use of FDs and normalization techniques. However, a designer can also consider the alternative designs suggested by the synthesis approach. Multivalued Dependencies ‘Suppose that we have a relation with attributes course, teacher, and bapk, which we denote as CTB. The meaning of a tuple is that teacher 7 can teach course C, and book B is a recommended text for the course. There are no FDs; the key is CTB. However, the recommended texts for a course are independent of the instructor. The instance shown in Figure 15.13 illustrates this situation, course | teacher] book Physics? Mechanic ot Green |s Physics? ot Green | Optics Physics? Mechanic 01 Brown |s Physics? o1 Brown | Optics Mechanic Math301 | Green |s Math301 | Green | Vectors Math301 [Green [Geometty] BONF Relation with Redundancy That Is Revealed by MVOs There are three points to note here: The relation schema C7B is in BCNF; thus we would not consider decomposing it further if we looked only at the FDs that hold over C7B. There is redundancy. The fact that Green can teach Physics101 is recorded once per recommended text for the course. Similarly, the fact that Optics is a text for Physics101 is recorded once per potential teacher [DINENBINSE-NINNINGENAES SYSTEMS 57 The redundancy can be eliminated by decomposing C7B into CT = and CB. This table suggests another way to think about MVDs: If X!! ¥ holds over R, then Y Z ( X=x(R)) = ¥ ( X=x(R)) Z ( X=x(R)) in every legal instance of R, for any value x that appears in the X column of R. In other words, consider groups of tuples in R with the same X-value for each X-value. In each such group consider the projection onto the attributes YZ. This projection must be equal to the cross-product of the projections onto Y and Z. That is, for a given X-value, the Y-values and Z-values are independent. (From this de nition it is easy to see that X !! Y must hold whenever X/ Y holds. If the FD XY holds, there is exactly one Y-value for a given X-value, and the conditions in the MVD de nition hold trivially, The converse does not hold, as Figure 15.14 illustrates.) Returning to our CTB example, the constraint that course texts are independent of instructors can be expressed as C ! T. In terms of the de nition of MVDs, this constraint can be read as follows: AW (there is @ tuple showing that) C is taught by teacher T, and (there is a tuple showing that) C has book B as text, then (there is a tuple showing that) C is taught by T and has text B Given a set of FDs and MVDs, in general we can infer that several additional FDs and MVDs hold. A sound and complete set of inference rules consists of the three Armstrong Axioms plus ve additional rules. Three of the additional rules involve only MVDs: ‘DATABASE: MANIAGENIES SYSTEMS 58 MVD Complementation: If X !! Y, then XR - XY MVD Augmentation: If X !/ Y and WZ, then WX! YZ. MVD Transitivity: If X !/ ¥ and Y !!Z, then Xf (Z ~ Y). As an example of the use of these rules, since we have C !! T over CTB, MVD complementation allows us to infer that C !/ CTB - CTas well, that is, C !/ B, The remaining two rules relate FDs and MVDs: Replication: If X!/ Y, then X!/ Y. Coalescence: If X!! ¥ and there is a Wsuch that W1 Yis empty, W/Z, and Y Z, then X!Z. Observe that replication states that every FD is also an MVD. Fourth Normal Form Fourth normal form is a direct generalization of BCNF. Let R be a relation schema, X and Y be nonempty subsets of the attributes of R, and F be a set of dependencies that includes both FDs and MVDs. Ris said to be in fourth normal form (4NF) if for every MVD X !! Y that holds over R, one of the following statements is true: » Y XorXY¥=R,or Xis a Superkey In reading this definition, it is important to understand that the de nition of a key has not changed the key must uniquely determine all attributes through FDs alone. X !! Yis a trivial MVD if YX R or XY = R; such MVDs always hold The relation CTB is not in 4NF because C !! Tis a nontrivial MVD and Cis not a key. We can eliminate the resulting redundancy by decomposing CTB into CT and CB; each of these relations is then in 4NF. To use MVD information fully, we must understand the theory of MVDs. However, the following result due to Date and Fagin identifies conditions detected using only FD information!|under which we can safely ignore MVD information. That is, using MVD information in addition to the FD information will not reveal any redundancy. Therefore, if these conditions hold, we do not even need to identify all MVDs. Ifa relation schema is in BCNF, and at least one of its keys consists of a single attribute, it is also in NF. An important assumption is implicit in any application of the preceding result: The set of FDs identified thus far is indeed the set of all FDs that hold over the relation. This assumption is important because the result relies on the relation being in BCNF, which in tum depends on the set of FDs that hold over the relation. Figure shows three tuples from an instance of ABCD that satisfies the given MVD B NIC. From the definition of an MVD, given tuples 1 and £2, it follows CA [D et [at [ot |jwoiet c [ae [ee | rupted et [ez [a2 [iwoied ‘Three Tuplas from a Legal Instance of ABCD that tuple £3 must also be included in the instance. [DATARS MANAGEMEN $$ SYSTEMS 60 Consider tuples t2 and 3. From the given FD A! BCD and the fact that these tuples have the same A-value, we can deduce that c 1 = ¢ 2. Thus, we see that the FD B/C must hold over ABCD whenever the FD A! BCD and the MVD B !! C hold. If B ! C holds, the relation ABCD is not in BCNF (unless additional FDs hold that make B a key)! Join Dependencies A join dependency is a further generalization of MVDs. A join dependency (JD) J1R1;:: 7; Rngis said to hold over a relation Rif R1; ::2;Rn is a lossless-join decomposition of R. ‘An MVD XI! ¥ over a relation R can be expressed as the join dependency ./ XY, X(R-Y)g. As an example, in the CTB relation, the MVD C !! T can be expressed as the join dependency /fCT, CBg. Unlike FDs and MVDs, there is no set of sound and complete inference rules for JDs. Fifth Normal Form Arelation schema R is said to be in fth normal form (SNF) if for every JD /R4; 2; Rg that holds over , one of the following statements is true: Ri= R for some i, or The JD is implied by the set of those FDs over R in "which the left side is a key for R. ‘DATABASE MANAGEMENT SYSTEMS 61 The second condition deserves some explanation, since we have not presented inference rules for FDs and JDs taken together. Intuitively, we must be able to show that the decomposition of R into fR 1; : :: ; Rng is lossless-join whenever the key dependencies (FDs in which the left side is a key for R) hold. /1R 1 Rngis a trivial JD i R= R for some i; such a JD always holds The following result, also due to Date and Fagin, identifies conditions again, detected using only FD information under which we can safely ignore JD information Ifa relation schema is in 3NF and each of its keys consists of a single attribute, it is also in SNF. The conditions identified in this result are sufficient for a relation to be in SNF, but not necessary. The result can be very useful in practice because it allows us to conclude that a elation is in 5NF without ever identifying the MVDs and JDs that may hold over the relation, [DATABASE MANAGEMEN FS —————_>___ SYSTEMS 62 TRANSACTION MANAGEMENT What is a Transaction? A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the database. If you have any concept of Operating Systems, then wwe can say that a transaction is analogous to processes. Although a transaction can both read and write on the database, there are some fundamental differences between these two classes of operations. A read operation does not change the image of the database in any way. But a write operation, whether performed with the intention of inserting, updating or deleting data from the database, changes the image of the database. That is, we may say that these transactions bring the database from an image which existed before the transaction occurred (called the Before Image or BFIM) to an image which exists after the transaction occurred (called the After Image or AFIM). The Four Properties of Transactions Every transaction, for whatever purpose it is being used, has the following four properties. Taking the initial letters of these four properties we collectively call them the ACID Properties. Here we try to describe them and explain them. Atomicity: This means that either all of the instructions within the tran: e reflected in the database, or none of them will bé reflected. Say for example, we have two accounts A and B, each containing Rs 1000/-. We now start a transaction to deposit Rs 100/- from account A to Account B. Read A; A=A-100; Write A; Read B; B=B+100; Write B; Fine, is not it? The transaction has 6 instructions to extract the amount from A and submit it to B. The AFIM will show Rs 900/- in A and Rs 1100/- in B. Now, suppose there is a power failure just after instruction 3 (Write A) has been complete, What happens now? After the system recovers the AFIM will show Rs 900/- in A, but the same Rs 1000/- in B. It would be said that Rs 100/- evaporated in thin air for the power failure, Clearly such a situation is not acceptable. The solution is to keep every value calculated by the instruction of the transaction not in any stable storage (hard disc) but in a volatile storage (RAM), until the transaction completes its last instruction, When we see that there has not been any error we do something known as a COMMIT operation. Its job is to write every temporarily calculated value from the volatile storage on to the stable storage. In this way, even if power fails at instruction 3, the post recovery image of the database will show accounts A and B both containing Rs 1000/-, as if the failed transaction had never occurred. Consistency: If we execute a particular transaction in isolation or together with other transaction, (ie. presumably in a multi-programming environment), the transaction will yield the same expected result. To give better performance, every database management system supports the execution of multiple transactions at the same time, using CPU Time Sharing. Concurrently executing transactions may have to deal with the problem of sharable resources, i.e, resources that multiple transactions are trying to read/write at the same time. For example, we may have a table or a record on which two transaction are trying to read or write at the same time, Careful mechanisms are created in order to prevent mismanagement of these sharable resources, so that there should not be any change in the way a transaction performs. A transaction which deposits Rs 100/- to account A must deposit the same amount whether it is acting alone or in conjunction with another transaction that may be trying to deposit or withdraw some amount at the same time, Isolation: In case multiple transactions are executing concurrently and trying to access a sharable resource at the same time, the system should create an ordering in their execution so that they should not create any anomaly in the value stored at the sharable resource. There are several ways to achieve this and the most popular one is using some kind of locking mechanism. Again, if you have the concept of Operating Systems, then you should remember the semaphores, how it is used by a process to make a resource busy before starting to use it, and how it is used to release the resource after the usage is over. Other processes intending to access that same resource must wait during this time. Locking is almost similar. It states that a transaction must first lock the data item that it wishes to access, and release the lock when the accessing is no longer required. Once a transaction locks the data item, other transactions wishing to access the same data item must wait until the lock is released. Durability: It states that once a transaction has been complete the changes it has made should be permanent. ‘As we have seen in the explanation of the Atomicity property, the transaction, if completes successfully, is committed. Once the COMMIT is done, the changes which the transaction has made to the database are immediately written into permanent storage. So, after the transaction has been committed successfully, there is no question of any loss of information even if the power fails. Committing a transaction guarantees that the AFIM has been reached. ‘There are several ways Atomicity and Durability can be implemented. One of them is called Shadow Copy. In this scheme a database pointer is used to point to the BFIM of the database. During the transaction, all the temporary changes are recorded into a Shadow Copy, which is an exact copy of the original database plus the changes made by the transaction, which is the AFIM. Now, if the transaction is required to COMMIT, then the database pointer is updated to point to the AFIM copy, and the BFIM copy is discarded. On the other hand, if the transaction is not committed, then the database pointer is not updated. It keeps pointing to the BFIM, and the AFIM is discarded. This is a simple scheme, but takes a lot of memory space and time to implement. If you study carefully, you can understand that Atomicity and Durability is essentially the same thing, just as Consistency and Isolation is essentially the same thing. Transaction States There are the following six states in which a transaction may exist: Active: The initial state when the transaction has just started execution, Partially Committed: At any given point of time if the transaction is executing properly, then it is going towards it COMMIT POINT. The values generated during the execution are all stored in volatile storage. Failed: If the transaction fails for some reason. The temporary values are no longer required, and the transaction is set to ROLLBACK. It means that any change made to the database by this transaction up to the point of the failure must be undone. If the failed transaction has withdrawn Rs. 100/- from account A, then the ROLLBACK operation should add Rs 100/- to account A. Aborted: When the ROLLBACK operation is over, the database reaches the BFIM. The transaction is now said to have been aborted. Committed: If no failure occurs then the transaction reaches the COMMIT POINT. All the temporary values are written to the stable storage and the transaction is said to have been committed. |: Either committed or aborted, the transaction finally reaches this The whole process can be described using the following diagram: PARTIALLY Entry Point COMMITTED ACTIVE TERMINATE D FAILED Concurrent Execution A schedule is a collection of many transactions which is implemented as a unit, Depending upon how these transactions are arranged in within a schedule, a schedule can be of two types: *Serial: The transactions are executed one after another, in a non-preemptive manner Concurrent: The transactions are executed in a preemptive, time shared method, In Serial schedule, there is no question of sharing a single data item among many transactions, because not more than a single transaction is executing at any point of time. However, a serial schedule is inefficient in the sense that the transactions suffer for having a longer waiting time and response time, as well as low amount of resource utilization. In concurrent schedule, CPU time is shared among two or more transactions in order to run them concurrently. However, this creates the possibility that more than one transaction may need to access a single data item for read/write purpose and the database could contain inconsistent value if such accesses are not handled properly. Let us explain with the help of an example Let us consider there are two transactions T1 and T2, whose instruction sets are given as following, T1 is the same as we have seen earlier, while T2 is a new transaction. T1 Read A; A=A-100; Write A; Read B: B=B+ 100; Write B; T Read A; Temp =A * 0.1; Read C; C=C + Temp; Write C; 2 is a new transaction which deposits to account C 10% of the amount in account A. If we prepare a serial schedule, then either T1 will completely finish before T2 can begin, or T2 will completely finish before T1 can begin, However, if we want to create a concurrent schedule, then some Context Switching need to be made, so that some portion of T1 will be executed, then some portion of T2 will be executed and so on. For example say we have prepared the following concurrent schedule, Read A: Temp =A * 0.1; Read C; C=C+Temp; Write C; Read B; B=B+ 100; Write B; No problem here. We have made some Context Switching in this Schedule, the first one after executing the third instruction of T1, and after executing the last statement of T2. T1 first deducts Rs 100/- from A and writes the new value of Rs 900/- into A. T2 reads the value of A, calculates the value of Temp to be Rs 90/- and adds the value to C. The remaining part of TI is executed and Rs 100/- is added to B. It is clear that a proper Context Switching is very important in order to maintain the Consistency and Isolation properties of the transactions. But let us take another example where a wrong Context Switching can bring about disaster. Consider the following example involving the same TI and T2

You might also like