Unit 3 Notes
Unit 3 Notes
Unit-3 notes
Unit-3
Relational Model
In the relational data model, unlike the hierarchical and network models, there are no physical links. All
the data is maintained in the form of tables (generally, known as relations) consisting of rows and
columns.
Oracle, Sybase, DB2, Informix, MS-SQL Server are few of the popular relational DBMSs.
The relational model represents both data and the relationships among those data using relation.
A relation is used to represent information about any entity and its relationship with other entities in the
form of attributes (or columns) and tuples (or rows).
Relation Schema:- A relation schema consists of a relation name R and a set of attributes (or fields) A 1,
A2, …., An.
Relation Instance: - A relation instance (or relation) r of the relation schema R(A 1,A2,…..,An) is a set of n-
tuples t.
It is denoted by r(R).
A relation instance is an ordered set of attribute values v that belongs to the domain D and it can be
denoted as r = {t1,t2,…,tm}
where t = {v1,v2,…,vn}
Table Relation
Column Attribute
Row Tuple
Characteristics Of Relations
Ordering of tuples in a relation r(R) : The tuples are not considered to be ordered, even though
they appear to be in the tabular form.
Ordering of attributes in a relation schema R (and of values within each tuple): We will consider
the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to be ordered .
Values in a tuple: All values are considered atomic (indivisible). A special null value is used to
represent values that are unknown or inapplicable to certain tuples.
• Simple: It is a simpler model as it frees the designers from the actual physical data storage
details. This allows them to concentrate on the logical view of the database.
• No anomalies: Unlike hierarchical and network models, relational model does not suffer from
Insert, Delete and Update anomalies.
• Better Query Capability: The relational database model provides very powerful, flexible and
easy-to-use query facilities. It uses fourth-generation language like SQL.
• Easier design, implementation, maintenance and usage: Both data as well as structural
independence are provided by the relational model which makes database design, maintenance,
administration and usage much easier than the other models.
• Hardware Overheads: The relational data models need more powerful computing hardware and
storage devices to perform RDBMS assigned tasks. RDBMS hides the implementation complexities
and the physical data storage details from the users.
• Phenomenon of Information Island : Since relational database systems are easy to implement
and use. This will create a situation where too many people or departments will create their own
databases and applications. These are known as information islands which will prevent the
information integration that is essential for the smooth and efficient functioning of the
organization. These individual databases will also create problems like data inconsistency, data
duplication, data redundancy and so on.
• SQL does not provide an efficient way to browse alphabetically through index. Thus, some systems
cannot provide a simple A-Z browse.
Relational Constraints
The various restrictions on data specified on a relational database in the form of constraints are:-
Domain Constraint
Domain Constraint:- Domain constraint is a constraint which limits the values on an attribute to some
prescribed set of values.
The data type associated with domains are integer, character, string, date etc.
Every attribute is associated with a domain and can have the value from the corresponding domain.
STUDENT
104 Mohan 20
The domain constraints are tested every time a new data is added to the database.
Entity Integrity Constraint:- No attribute of a primary key can have a null value. This is because primary
key values are used to identify the individual tuples. This constraint is known as Entity Integrity
Constraint.
Not Null Constraint:- Other attributes of the relation are also not allowed null values even though they
are not members of the primary key.
Referential Integrity Constraint:- This is used to maintain consistency among tuples of the two relations.
It states that “if a foreign key in relation R1 refers to the primary key of relation R2, then every value of
the foreign key in relation R1 must be available in relation R2.”
ACCOUNT
BRANCH
Distt of relation ACCOUNT is foreign key. Thus Distt Sonipat is not allowed in ACCOUNT relation because
it is not available in BRANCH relation.
Codd Rules
Dr E.F Codd developed the relational data model in 1970.In 1985, Dr Codd published a list of 12 rules
that defines an ideal relational databases.
Rule 0
The system must qualify as relational, as a database, and as a management system.
For a system to qualify as a relational database management system (RDBMS), that system must use its
relational facilities (exclusively) to manage the database.
There should be no way to modify the database structure other than through the multiple row database
language like SQL.
Relational Algebra
Relational Algebra is a procedural query language that consists of a set of operations that take one or
more relations as input and result into a new relation as an output.
These operations are divided into two groups:
Special Relational Operations
Traditional Set Operations
Unary Operations:- The operations operating on a single relation are known as unary operations.
In relational algebra select, project and rename are unary operators.
Select Operation: The select operation retrieves all those tuples from a relation that satisfy a specific
condition.
It can also be considered as a filter that displays only those tuples that satisfy a given condition.
The select operation can be viewed as the horizontal subset of a relation.
The Greek letter sigma (σ) is used as a select operator.
In general, the select operation is specified as:
σ <selection_condition> (R)
where,
selection_condition = the condition on the basis of which subset of tuples is selected.
R = the name of the relation.
e.g. To retrieve those tuples from BOOK relation whose category is Novel, the select operation can be
specified as:-
σ Category=“Novel” (BOOK)
The comparison operators (=,≠,<,≤,>,≥) can be used to specify the conditions for selecting required
tuples from a relation.
Moreover, more than one condition can be concatenated to one another to give a more specific
condition by using any of the logical operators AND (Λ), OR (V), and NOT (¬).
e.g. To retrieve those tuples from BOOK relation, whose Category is Language Book and Page_count is
greater than 400, the select operation can be specified as:-
σ Category=“Language Book” Λ Page_count > 400 (BOOK)
Project Operation: A relation might consist of a number of attributes, which are not always required.
The project operation is used to select some required attributes from a relation while discarding the
other attributes.
It can be viewed as the vertical subset of a relation.
The Greek letter pi (π) is used as project operator.
In general, the project operation is specified as:-
π <attribute_list> (R)
where
attribute_list = list of required attributes separated by comma
R = the name of relation.
e.g. To retrieve only ISBN, Book_Title and Price attributes from BOOK relation, the project operation can
be specifies as:-
π ISBN, Book_Title, Price (BOOK)
Select and Project operations can also be combined together to create more complex queries.
e.g. To retrieve P_ID, Address, and Phone attributes from PUBLISHER relation where State is Georgia, the
combined operation can be specified as:-
π P_ID, Address, Phone (σ State=“Georgia” (PUBLISHER))
Rename Operation: When relational algebra operations are performed, the resultant relations are
unnamed, hence cannot be used for later reference. In addition, it might be required to apply more
operations on the relation obtained from other operations. In such situations, rename operator proves
to be useful.
Rename operation is used to provide name to the relation obtained after applying any relational algebra
operation.
The Greek letter rho (ρ) is used as a rename operator.
Relation obtained from any operation can be renamed by using rename operator as shown here:
ρ (R, E)
where
ρ = rename operator
E = expression representing relational algebra operations
R = name given to relation obtained by applying relational algebra operations specified in expression E.
e.g. ρ (R1, σ Category=“Novel” (BOOK))
Binary Operations:- The operations operating on two relations are known as binary operations.
The set operations, join operations and division operations are all binary operations.
Set Operations: The standard mathematical operations on sets, namely, union, intersection, difference,
and Cartesian product are also available in relational algebra. Of these, the three operations, namely,
union, intersection, and difference operations require two relations to be union compatible with each
other.
The two relations are said to be union compatible, if they satisfy these conditions:
a) The degree of both the operand relations must be same.
b) The domain of nth attribute of relation R1, must be same as that of the nth attribute of relation R2.
However, the cartesian product can be defined on any two relations, that is, they need not be union
compatible.
Union Operation: The union operation, denoted by (U), returns a third relation that contains tuples from
both or either of the operand relations.
Consider two relations R1 and R2, their union is denoted as R1 U R2, which returns a relation containing
tuples from both or either of the given relations. The duplicate tuples are automatically removed from
the resultant relation.
The union operation is, respectively commutative and associative in nature, that is,
R1 U R2 = R2 U R1
and R1 U (R2 U R3) = (R1 U R2) U R3
PUBLISHER_1
PUBLISHER_2
PUBLISHER_1 U PUBLISHER_2
Intersection Operation: The intersection operation, denoted by (∩), returns a third relation that contains
tuples common to both the operand relations.
The intersection of the relations R1 and R2, is denoted by R1 ∩ R2, which returns a relation containing
tuples common to both the given relations.
Like, union operation, intersection operation is, respectively, commutative and associative in nature, that
is,
R1 ∩ R2 = R2 ∩ R1
and R1 ∩ (R2 ∩ R3) = (R1 ∩ R2) ∩ R3
PUBLISHER_1 ∩ PUBLISHER_2
Difference Operation: The difference operation, denoted by –(minus), returns a third relation that
contains all tuples present in one relation, which are not present in the second relation.
The difference of the relations R1 and R2 is denoted by R1 - R2.
The expression R1-R2 results in a relation containing all the tuples from relation R1, which are not
present in relation R2.
Similarly, the expression R2-R1 results in a relation containing all tuples from relation R2, which are not
present in relation R1.
The difference operation is not commutative in nature.
PUBLISHER_1 - PUBLISHER_2
PUBLISHER_2 – PUBLISHER_1
Cartesian Product Operation: The cartesian product, also known as cross product or cross join, returns a
third relation that contains all possible combinations of the tuples from the two operand relations.
The cartesian product is denoted by the symbol x. The cartesian product of the relations R1 and R2 is
denoted by R1xR2
The cartesian product creates tuples with the combined attributes of two relations.
Therefore, the degree of a new relation is the sum of the degree of both the operand relations.
AUTHOR_1
BOOK_1
AUTHOR_1 x BOOK_1
Join Operation: The join operation is one of the most useful and commonly used operations to extract
information from two or more relations.
A join can be defined as a cartesian product followed by select and project operations.
The result of a cartesian product is usually much larger than the result of a join.
The join is denoted by the symbol
The join operation is specified as:
R1 condR2
• Result:
Outer Join
• The outer-join operation is an extension of the join operation to deal with missing information.
• Outer-join operations avoid loss of information. Outer Joins are classified into three types
namely:
Left Outer Join
Right Outer Join
Full Outer Join
• Result:
• Example:
• Result:
Example:
Result:
Division Operation
The division operator divides a dividend relation A of degree (means number of columns in a relation)
m+n by a divisor relation B of degree n, and produces a resultant relation of degree m. It is denoted by ÷
Relation A
Relation B
• Case 1
• Case2
• Case3
A÷B
• Case 1
• Case 2
• Case 3
RELATIONAL CALCULUS
Relational Calculus is a formal query language where we can write one declarative expression to specify
a retrieval request and hence there is no description of how to retrieve it.
A calculus expression specifies what is to be retrieved rather than how to retrieve it.
{t | P (t ) }
It is the set of all tuples t such that predicate P is true for t
A tuple-relational-calculus formula is built up out of atoms. An atom has one of the following forms:
2. s[x] u[y], where s and u are tuple variables, x is an attribute on which s is defined, y is an attribute on
which u is defined, and is a comparison operator
3. s[x] c, where s is a tuple variable, x is an attribute on which s is defined, is a comparison operator, and
c is a constant in the domain of attribute x
An atom is a formula.
If P1 is a formula, then so are ¬P1 and (P1).
If P1 and P2 are formulae, then so are P1 ∨ P2, P1 ∧ P2, and P1 ⇒ P2.
Banking Example
1. branch (branch_name, branch_city, assets )
2. customer (customer_name, customer_street, customer_city )
3. account (account_number, branch_name, balance )
4. loan (loan_number, branch_name, amount )
5. depositor (customer_name, account_number )
6. borrower (customer_name, loan_number )
Example Queries
1. Find the loan_number, branch_name, and amount for loans of over $1200
{t | t loan t [amount ] 1200}
2. Find the loan number for each loan of an amount greater than $1200
{t | s loan (t [loan_number ] = s [loan_number ] s [amount ] 1200)}
Notice that a relation on schema [loan_number] is implicitly defined by the query
3. Find the names of all customers having a loan, an account, or both at the bank
{t | s borrower ( t [customer_name ] = s [customer_name ]) u depositor ( t [customer_name ] = u
[customer_name ])
4. Find the names of all customers who have a loan and an account at the bank
{t | s borrower ( t [customer_name ] = s [customer_name ]) u depositor ( t [customer_name ] = u
[customer_name] )
5. Find the names of all customers having a loan at the Perryridge branch
{t | s borrower (t [customer_name ] = s [customer_name ] u loan (u [branch_name ] = ―Perryridge ‖ u
[loan_number ] = s [loan_number ]))}
Safety of Expressions
For example, { t | t loan} results in infinitely many tuples that are not in loan relation.
To guard against the problem, a domain is defined for all tuple relational calculus formula P.
It is denoted by dom(P). it denotes that P can take value only in that domain.
Domain relational calculus is also a nonprocedural query language equivalent in power to the
tuple relational calculus.
It servers as the theoretical basis of widely used Query By Example (QBE) language.
1. < x1, x2, . . . , xn > r, where r is a relation on n attributes and x1, x2, . . . , xn are domain variables or
domain constants.
2. x y, where x and y are domain variables and is a comparison operator
3. x c, where x is a domain variable, is a comparison operator, and c is a constant.
An atom is a formula.
If P1 is a formula, then so are ¬P1 and (P1).
If P1 and P2 are formulae, then so are P1 ∨ P2, P1 ∧ P2, and P1 ⇒ P2.
Example Queries
1. Find the loan_number, branch_name, and amount for loans of over $1200
{ l, b, a | l, b, a loan a > 1200}
2. Find the names of all customers who have a loan of over $1200
{ c | l, b, a ( c, l borrower l, b, a loan a > 1200)}
3. Find the names of all customers who have a loan from the Perryridge branch and the loan amount:
Safety of Expressions
The expression: { x1, x2, …, xn | P (x1, x2, …, xn )} is safe if all of the following hold:
1. All values that appear in tuples of the expression are values from dom (P) (that is, the values appear
either in P or in a tuple of a relation mentioned in P).
2. For every ―there exists‖ subformula of the form x (P1(x)), the subformula is true if and only if there is
a value of x in dom (P1) such that P1(x) is true.
3. For every ―for all‖ subformula of the form x (P1 (x)), the subformula is true if and only if P1(x) is true
for all values x from dom (P1).
operators.
Relational algebra operators are used as a yardstick A language is said to be complete if it is at least as
for measuring the expressive power of any given powerful as the calculus that is, if any relation
language. definable by some expression of the calculus is
also definable by some expression of the language
in question.
The queries are domain independent The queries are domain dependent.
Functional Dependency
Here each of the attributes Item_Name, Price, Quantity is functionally dependent on the attributes
Item_Code.
Thus,
Item_Codeà Item_Name
Item_Codeà Price
Item_Codeà Quantity
The fully functional dependency diagram of relation RESULT can be shown as.
A functional dependency in which one or more non key attribute are functionally dependent on a part of
the primary key is called partial functional dependency.
e.g. Consider a relation R
ABàCDE
where Primary Key is AB
then
AàC
AàD
AàE
BàC
all are partial dependencies
Eno àEname
Pno àPname
So, Partial F.D.
{X-(A)} à Y
When one non key attribute depends upon another non key attribute, then it is called Transitive
Functional Dependency.
e.g. AàB & BàC
then AàC
where B & C are non key attributes
Bookno à AuthorName
AuthorName àAuthorNationality
Bookno à AuthorNationality
Multi valued dependency is the dependency where one attribute value is potentially a multivalued fact
about another.
CourseààStudentName
CourseààTextBook
And StudentName and TextBook are independent of each other
Given a set of functional dependencies F, there are certain other functional dependencies that are
logically implied by F.
ABCDàE
ACDFàEF, EFàG
Ques: Given R={A,B,C,G,H,I} and Set of FD F={Aàb, AàC, CGàH, CGàI, BàH}
Compute AG+
Find AC+
Compute AB+
Ques: For the following FDs find Super Key and Candidate Key
Solution:
Candidate Key:
{A, B, C, D, J, G }
{A, B, C, D, J}
Ques: For the following FDs find Super Key and Candidate Key
Solution:
Candidate Key:
{A, B, C, E, F}
{A, C, E, F}
Ques: What is a Closure of a set of Functional dependencies? Determine the closure of the following
given FD’s:
A CD
B C
F DE
F A
Solution
Rule of Decomposition
AàC
AàD
FàD
FàE
Rule of Union
FàADE
Rule of Reflexivity
AàA
BàB
CàC
DàD
EàE
FàF
Rule of Union
AàACD
BàBC
FàFADE
Closure:-
F+= F U {AàA, BàB, CàC, DàD, EàE, FàF, AàC, AàD, FàD, FàE, FàADE, AàACD, BàBC,
FàFADE}
Normalization
Normalization is the process that removes the following undesirable properties in a relation:-
1. Repetition of information i.e, Redundancy
2. Inability to store some information i.e, insertion anomaly.
3. Loss of information in deletion i.e, deletion anomaly.
4. Problem in updations i.e, updation anomaly
Need/Purpose of Normalization
• Normalization reduces a complex user view to a set of small and stable subgroups
of fields/relations.
• This process helps to design a logical data model known as conceptual data model.
A relation is said to be in First Normal Form(1 NF) if and only if every entry of the relation has atmost a
single value.
In other words, “A relation is in First Normal Form if and if and only if all underlying domains contain
atomic values or single value only.”
The first approach known as flattening the table removes repeating groups by filling in the missing
entries of each incomplete row of the table with copies of their corresponding non-repeating attributes.
STUDENT(Normalized Table)
In normalized STUDENT table the attribute Course_code no longer identifies uniquely any row. Thus, a
suitable primary key for this table is the composite key(Course_code, Roll_no)
The second approach for normalizing a table requires that the table be decomposed into two new tables
that will replace the original table.
Rule of Decomposition
One of the two tables contains the table identifier of the original table and all the non-repeating
attributes.
The other table contains a copy of the table identifier and all the repeating attributes.
COURSE Table
COURSE_STUDENT Table
Anomalies in 1NF
Insertion Anomaly:
We cannot insert the information about the student until he/she join any course
e.g. we cannot store the information about Roll_no 110 until he join any course, similarly we are unable
to store the information about the course until there is a student who enroll in that course.
e.g. we cannot store C1 course is of visual basic until atleast one student join the course.
Deletion Anomaly:
If we delete the last tuple of the relation then we are not only deleting the course information but also
lose information about the system on which this student works.
Update Anomaly:
This relation is also susceptible to update anomalies because the course in which a student studies may
appear many times in the table. It is this redundancy of information that causes the anomaly because if a
teacher moves to another course, we are now faced with two problems; we either search the entire
table looking for that teacher and update his or her Course_code value or we miss one or more tuples of
that STUDENT and end up with an inconsistent database.
A Relation R is in Second Normal Form (2NF) if and only if it is in 1 NF and every non key attribute is fully
functionally dependent on the primary key.
A resultant database of first normal form COURSE_STUDENT does not satisfy the above rule, because
non-key attributes Name, System_Used and Hourly_Rate are not fully dependent on the primary
key(Course_code, Roll_No) because Name, System_Used and Hourly_Rate are functionally dependent on
Roll_No and Roll_No is a subset of the primary key.
HOURS_ASSIGNED Table
STUDENT_SYSTEM_CHARGE Table
Anomalies in 2NF
Insertion Anomaly: Insertion anomalies occur in the STUDENT_SYSTEM_CHARGE relation.
e.g. Consider a situation where we would like to set in advance the rate to be charged from the students
for a particular system. We cannot insert this information until there is a student assigned to that type
of system.
Suppose we want to store the hourly_rate of laptop we cannot insert it until some student use that type
of system because roll_no is primary key and we cannot insert null into it.
Deletion Anomaly:
The STUDENT_SYSTEM_CHARGE relation is also susceptible to deletion anomalies. This type of anomaly
occurs whenever we delete the tuple of a student who happens to be the only student left which is
working on a particular system. In this case we will also lose the information about the rate that we
charge for that particular system.
e.g. Rollno 102
Updation Anomaly: Update anomalies will occur in the STUDENT_SYSTEM_CHARGE relation because
there may be several type of the system. If the hourly_rate for that particular system changes, we need
to make sure that the corresponding rate is changed for all students that work on that type of system.
Otherwise the database may end up in an inconsistent state.
A relation R is in Third Normal Form (3NF) if and only if the following conditions are satisfied
simultaneously :
• R is already in 2NF
• No non prime(non key) attribute is transitively dependent on the key.
or
No non prime attribute functionally determine any other non prime attribute.
To transform this relation into 3 NF we will decompose it into the following two tables:-
STUDENT_SYSTEM Table
CHARGES Table
Data Anomalies in 3 NF
The 3 NF helped us to get rid of the data anomalies caused either by transitive dependencies on the
primary key or by dependencies of a non prime attribute. However relations in 3NF are still susceptible
to data anomalies particularly when the relations have two overlapping candidate keys or when a non
prime attribute functionally determine a prime attribute.
(Id_No, Item_No)àQuantity
(Name, Item_No)àQuantity
Id_NoàName
Nameà Id_No
The relation has two overlapping candidate keys, because there are two composite candidate
keys(Id_No, Item_No) and (Name, Item_No) out of which Item_No is common attribute in both the
candidate keys, so this is a case of overlapping of candidate keys.
In the above relation, there is only one non key attribute i.e, Quantity and it is FFD and non transitively
dependent on the primary key.
But manufacturer relation is not in BCNF because this relation has four determinants.
Id_No, Item_No [Quantity depends on this combination]
Name, Item_No [Quantity depends on this combination]
Id_No [Name depends on Id_No]
Name [Id_No depends on Name]
Out of these four determinants(Id_No, Item_no) and (Name, Item_no) are unique but Id_No and Name
determinants are not candidate keys.
In order to make this relation in BCNF we non-loss decompose this relation in two projections.
ID_NAME Relation
D_QTY Relation
The above database is in First, Second and Third normal form because for each row-column interaction
we have at-most single entry and primary key is the combination of three columns(Course,
Student_name, Text_book). So, it does not have any non-key attribute. It satisfies second and third
normal form because it only refers to non key attributes. The relation is also in BCNF, since all three
attributes concatenated together constitute its key, yet it clearly contained anomalies and requires
decomposition with the help of fourth normal form.
A Relation R having A,B, and C as attributes can be non loss decomposed into two projections R1(A,B)
and R2(A,C) if and only if the MVD AààB
and
AààC hold in R
CourseààStudent_Name
CourseààText_book
COURSE_STUDENT Table
COURSE_BOOK Table
Now, We can easily check that all the above anomalies of STUDENT_COURSE_BOOK database are
removed.
e.g. if a new student joins a course then we have to make only one insertion in COURSE_STUDENT table
and if a new book introduced for a course then again we have to make a single entry in COURSE_BOOK
table, so modified database eliminate the problem of redundancy which also solves the update problem.
• A Relation R is in fifth normal form if and only if the following conditions are satisfied
simultaneously:-
• R is already in 4 NF
AGENT_COMPANY_PRODUCT
• However, contain an element of redundancy in that it records the fact that Suneet is an agent for
ABC twice.
• Suppose that the table is decomposed into its two projections P1 and P2.
P1
P2
The redundancy has been eliminated but the information about which companies make which products.
The natural join of these projections over Agent column is:
• The table resulting from this join is spurious since the asterisked row of the table contains
incorrect information.
• Now the original table is decomposed into three tables the two projections P1 & P2 which have
already shown and the final possible projection P3
• If a join is taken of all three projections, first P1 & P2 and then this result with P3 over ‘Company’
and ‘Product_Name’ the following table is obtained.
• This still contains a spurious row. It is not simply possible to decompose the
Agent_Company_Product table
P1
P2
P3
All redundancy has been removed if the natural join P1 & P2 is taken, the result is:-
• Now if this result is joined with P3 over the column ‘Company’ and ‘Product_Name’ the
following table is obtained:
• This is a correct recomposition of the original table and a non loss decomposition into the three
projections was achieved.