KEMBAR78
Unit 3 Notes | PDF | Relational Database | Relational Model
0% found this document useful (0 votes)
716 views39 pages

Unit 3 Notes

The document discusses the relational data model and its key concepts. The relational model represents data using tables (relations) made up of rows (tuples) and columns (attributes). A relation has a relation schema defining its structure and a relation instance containing the data. The relational model provides advantages like simplicity, avoidance of data anomalies, and powerful query capabilities. It also discusses relational constraints, Codd's rules for relational databases, and disadvantages of the relational model like hardware overheads and information islands.

Uploaded by

Jyoti Kumari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
716 views39 pages

Unit 3 Notes

The document discusses the relational data model and its key concepts. The relational model represents data using tables (relations) made up of rows (tuples) and columns (attributes). A relation has a relation schema defining its structure and a relation instance containing the data. The relational model provides advantages like simplicity, avoidance of data anomalies, and powerful query capabilities. It also discusses relational constraints, Codd's rules for relational databases, and disadvantages of the relational model like hardware overheads and information islands.

Uploaded by

Jyoti Kumari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

lOMoARcPSD|22372339

Unit-3 notes

Database Management System (Guru Gobind Singh Indraprastha University)

Studocu is not sponsored or endorsed by any college or university


Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)
lOMoARcPSD|22372339

Unit-3

Relational Model

The relational data model was developed by Dr. E. F. Codd in 1970.

In the relational data model, unlike the hierarchical and network models, there are no physical links. All
the data is maintained in the form of tables (generally, known as relations) consisting of rows and
columns.

Oracle, Sybase, DB2, Informix, MS-SQL Server are few of the popular relational DBMSs.

The relational model represents both data and the relationships among those data using relation.

A relation is used to represent information about any entity and its relationship with other entities in the
form of attributes (or columns) and tuples (or rows).

A relation comprises of a relation schema and a relation instance.

Relation Schema:- A relation schema consists of a relation name R and a set of attributes (or fields) A 1,
A2, …., An.

It is represented by R (A1,A2,…..,An) and is used to describe a relation R.

R is also called intension of a relation.

Relation Instance: - A relation instance (or relation) r of the relation schema R(A 1,A2,…..,An) is a set of n-
tuples t.

It is denoted by r(R).

A relation instance is an ordered set of attribute values v that belongs to the domain D and it can be
denoted as r = {t1,t2,…,tm}

where t = {v1,v2,…,vn}

r is also called the extension of a relation.

Informal Terms Formal Terms

Table Relation

Column Attribute

Row Tuple

Values in a Column Domain

Table Definition Schema of a Relation

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

Populated Table Extension

Characteristics Of Relations

 Ordering of tuples in a relation r(R) : The tuples are not considered to be ordered, even though
they appear to be in the tabular form.

 Ordering of attributes in a relation schema R (and of values within each tuple): We will consider
the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to be ordered .

 Values in a tuple: All values are considered atomic (indivisible). A special null value is used to
represent values that are unknown or inapplicable to certain tuples.

Advantages of Relational Model

• Simple: It is a simpler model as it frees the designers from the actual physical data storage
details. This allows them to concentrate on the logical view of the database.

• No anomalies: Unlike hierarchical and network models, relational model does not suffer from
Insert, Delete and Update anomalies.

• Better Query Capability: The relational database model provides very powerful, flexible and
easy-to-use query facilities. It uses fourth-generation language like SQL.

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

• Easier design, implementation, maintenance and usage: Both data as well as structural
independence are provided by the relational model which makes database design, maintenance,
administration and usage much easier than the other models.

Disadvantages of Relational Model

• Hardware Overheads: The relational data models need more powerful computing hardware and
storage devices to perform RDBMS assigned tasks. RDBMS hides the implementation complexities
and the physical data storage details from the users.

• Phenomenon of Information Island : Since relational database systems are easy to implement
and use. This will create a situation where too many people or departments will create their own
databases and applications. These are known as information islands which will prevent the
information integration that is essential for the smooth and efficient functioning of the
organization. These individual databases will also create problems like data inconsistency, data
duplication, data redundancy and so on.

• SQL does not provide an efficient way to browse alphabetically through index. Thus, some systems
cannot provide a simple A-Z browse.

Relational Constraints

The various restrictions on data specified on a relational database in the form of constraints are:-

 Domain Constraint

 Entity Integrity Constraint

 Not Null Constraint

 Referential Integrity Constraint

Domain Constraint:- Domain constraint is a constraint which limits the values on an attribute to some
prescribed set of values.

The data type associated with domains are integer, character, string, date etc.

Every attribute is associated with a domain and can have the value from the corresponding domain.

STUDENT

Rollno Name Class

101 Ajay BCA I

102 Ravi BCA II

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

103 Hari BCA I

104 Mohan 20

 20 is not allowed as class is string attribute.

 The domain constraints are tested every time a new data is added to the database.

Entity Integrity Constraint:- No attribute of a primary key can have a null value. This is because primary
key values are used to identify the individual tuples. This constraint is known as Entity Integrity
Constraint.

Not Null Constraint:- Other attributes of the relation are also not allowed null values even though they
are not members of the primary key.

Referential Integrity Constraint:- This is used to maintain consistency among tuples of the two relations.

It states that “if a foreign key in relation R1 refers to the primary key of relation R2, then every value of
the foreign key in relation R1 must be available in relation R2.”

ACCOUNT

BRANCH

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

Distt of relation ACCOUNT is foreign key. Thus Distt Sonipat is not allowed in ACCOUNT relation because
it is not available in BRANCH relation.

Codd Rules

Dr E.F Codd developed the relational data model in 1970.In 1985, Dr Codd published a list of 12 rules
that defines an ideal relational databases.

Rule 0
The system must qualify as relational, as a database, and as a management system.
For a system to qualify as a relational database management system (RDBMS), that system must use its
relational facilities (exclusively) to manage the database.

Rule 1: Information Rule


All data should be presented in table form. The table form gives ease and flexibility to access data.

Rule 2: Guaranteed Access Rule


All data should be accessible without ambiguity. Each unique piece of data (atomic value) should be
accessible by : Table Name + Primary Key (Row) +Attribute (Column)

Rule 3: Comprehensive Data Sub Language Rule


The database must support at least one clearly defined language that includes functionality for data
definition, data manipulation, data integrity and database transaction control like SQL.

Rule 4: View Updating Rule


Data can be presented in different logical combination called views. Each view should support the same
full range of data manipulation that has access to a table available.

Rule 5: High level Insert, Update, Delete


Data can be retrieved from a relational database in sets constructed of data from various row and/or
multiple tables.
This rule states that insert, update, and delete operations should be supported for any retrievable set
rather than just for a single row in a single table.

Rule 6: Physical data independence


Changes to the physical level (how the data is stored, whether in arrays or linked lists etc.) must not
require a change to an application based on the structure.

Rule 7: Logical data independence


Changes to the logical level (tables, columns, rows, and so on) must not require a change to the external
view. Logical data independence is more difficult to achieve than physical data independence.

Rule 8: Integrity independence


Integrity constraints must be specified separately from application programs and stored in the catalog. It
must be possible to change such constraints as and when appropriate without unnecessarily affecting
existing applications.

Rule 9: Non Subversion Rule

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

There should be no way to modify the database structure other than through the multiple row database
language like SQL.

Rule 10: Systematic Treatment of Null Values


A field should be allowed to remain empty. This involves the support of a null value, which is different
from zero value.

Rule 11: Data Description Rule


A data dictionary should be present within RDBMS that is constructed from tables which can be
examined using SQL.

Rule 12: Distribution Independence


The distribution of portions of the database to various locations should be invisible to users of the
database.

Relational Algebra
Relational Algebra is a procedural query language that consists of a set of operations that take one or
more relations as input and result into a new relation as an output.
These operations are divided into two groups:
 Special Relational Operations
 Traditional Set Operations

Special Relational Operations


 Select
 Project
 Rename
 Join
 Division

Traditional Set Operations


 Union
 Intersection
 Difference
 Cartesian Product

Unary Operations:- The operations operating on a single relation are known as unary operations.
In relational algebra select, project and rename are unary operators.

Select Operation: The select operation retrieves all those tuples from a relation that satisfy a specific
condition.
It can also be considered as a filter that displays only those tuples that satisfy a given condition.
The select operation can be viewed as the horizontal subset of a relation.
The Greek letter sigma (σ) is used as a select operator.
In general, the select operation is specified as:
σ <selection_condition> (R)
where,
selection_condition = the condition on the basis of which subset of tuples is selected.
R = the name of the relation.

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

e.g. To retrieve those tuples from BOOK relation whose category is Novel, the select operation can be
specified as:-
σ Category=“Novel” (BOOK)
The comparison operators (=,≠,<,≤,>,≥) can be used to specify the conditions for selecting required
tuples from a relation.
Moreover, more than one condition can be concatenated to one another to give a more specific
condition by using any of the logical operators AND (Λ), OR (V), and NOT (¬).
e.g. To retrieve those tuples from BOOK relation, whose Category is Language Book and Page_count is
greater than 400, the select operation can be specified as:-
σ Category=“Language Book” Λ Page_count > 400 (BOOK)

Project Operation: A relation might consist of a number of attributes, which are not always required.
The project operation is used to select some required attributes from a relation while discarding the
other attributes.
It can be viewed as the vertical subset of a relation.
The Greek letter pi (π) is used as project operator.
In general, the project operation is specified as:-
π <attribute_list> (R)
where
attribute_list = list of required attributes separated by comma
R = the name of relation.
e.g. To retrieve only ISBN, Book_Title and Price attributes from BOOK relation, the project operation can
be specifies as:-
π ISBN, Book_Title, Price (BOOK)
Select and Project operations can also be combined together to create more complex queries.
e.g. To retrieve P_ID, Address, and Phone attributes from PUBLISHER relation where State is Georgia, the
combined operation can be specified as:-
π P_ID, Address, Phone (σ State=“Georgia” (PUBLISHER))

Rename Operation: When relational algebra operations are performed, the resultant relations are
unnamed, hence cannot be used for later reference. In addition, it might be required to apply more
operations on the relation obtained from other operations. In such situations, rename operator proves
to be useful.
Rename operation is used to provide name to the relation obtained after applying any relational algebra
operation.
The Greek letter rho (ρ) is used as a rename operator.

Relation obtained from any operation can be renamed by using rename operator as shown here:
ρ (R, E)
where
ρ = rename operator
E = expression representing relational algebra operations
R = name given to relation obtained by applying relational algebra operations specified in expression E.
e.g. ρ (R1, σ Category=“Novel” (BOOK))

Binary Operations:- The operations operating on two relations are known as binary operations.
The set operations, join operations and division operations are all binary operations.

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

Set Operations: The standard mathematical operations on sets, namely, union, intersection, difference,
and Cartesian product are also available in relational algebra. Of these, the three operations, namely,
union, intersection, and difference operations require two relations to be union compatible with each
other.

The two relations are said to be union compatible, if they satisfy these conditions:
a) The degree of both the operand relations must be same.
b) The domain of nth attribute of relation R1, must be same as that of the nth attribute of relation R2.
However, the cartesian product can be defined on any two relations, that is, they need not be union
compatible.

Union Operation: The union operation, denoted by (U), returns a third relation that contains tuples from
both or either of the operand relations.

Consider two relations R1 and R2, their union is denoted as R1 U R2, which returns a relation containing
tuples from both or either of the given relations. The duplicate tuples are automatically removed from
the resultant relation.
The union operation is, respectively commutative and associative in nature, that is,
R1 U R2 = R2 U R1
and R1 U (R2 U R3) = (R1 U R2) U R3

PUBLISHER_1

PUBLISHER_2

PUBLISHER_1 U PUBLISHER_2

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

Intersection Operation: The intersection operation, denoted by (∩), returns a third relation that contains
tuples common to both the operand relations.
The intersection of the relations R1 and R2, is denoted by R1 ∩ R2, which returns a relation containing
tuples common to both the given relations.
Like, union operation, intersection operation is, respectively, commutative and associative in nature, that
is,
R1 ∩ R2 = R2 ∩ R1
and R1 ∩ (R2 ∩ R3) = (R1 ∩ R2) ∩ R3

PUBLISHER_1 ∩ PUBLISHER_2

Difference Operation: The difference operation, denoted by –(minus), returns a third relation that
contains all tuples present in one relation, which are not present in the second relation.
The difference of the relations R1 and R2 is denoted by R1 - R2.
The expression R1-R2 results in a relation containing all the tuples from relation R1, which are not
present in relation R2.
Similarly, the expression R2-R1 results in a relation containing all tuples from relation R2, which are not
present in relation R1.
The difference operation is not commutative in nature.

PUBLISHER_1 - PUBLISHER_2

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

PUBLISHER_2 – PUBLISHER_1

Cartesian Product Operation: The cartesian product, also known as cross product or cross join, returns a
third relation that contains all possible combinations of the tuples from the two operand relations.
The cartesian product is denoted by the symbol x. The cartesian product of the relations R1 and R2 is
denoted by R1xR2
The cartesian product creates tuples with the combined attributes of two relations.
Therefore, the degree of a new relation is the sum of the degree of both the operand relations.

AUTHOR_1

BOOK_1

AUTHOR_1 x BOOK_1

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

Join Operation: The join operation is one of the most useful and commonly used operations to extract
information from two or more relations.
A join can be defined as a cartesian product followed by select and project operations.
The result of a cartesian product is usually much larger than the result of a join.
The join is denoted by the symbol
The join operation is specified as:
R1 condR2

Equi Join: A common case of the join operation


R1 R2 is one in which the join condition consists only of equality condition.
This type of join where the only comparision operator used is, = is known as equi join or inner join.
The resultant relation of an equijoin always has one or more pairs of attributes that have identical values
in every tuple.

Example: Consider the two relations


employee employee.employee-name=ft-works.employee ft-works

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

• Result:

Outer Join
• The outer-join operation is an extension of the join operation to deal with missing information.
• Outer-join operations avoid loss of information. Outer Joins are classified into three types
namely:
 Left Outer Join
 Right Outer Join
 Full Outer Join

Left Outer Join


The left outer join( ) takes all tuples in the left relation that did not match with any tuple in the right
relation, pads the tuples with null values for all other attributes from the right relation, and adds them to
the result of the natural join.

• Result:

Right Outer Join


• The right outer join is symmetric with the left outer join. It pads tuples from the right relation
that did not match any from the left relation with nulls and adds them to the result of the
natural join.

• Example:

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

• Result:

Full Outer Join


The full outer join( ) does both of those operations, padding tuples from the left relation that did not
match any from the right relation, as well as tuples from the right relation that did not match any
from the left relation, and adding them to the result of the join.

Example:
Result:

Division Operation
The division operator divides a dividend relation A of degree (means number of columns in a relation)
m+n by a divisor relation B of degree n, and produces a resultant relation of degree m. It is denoted by ÷

Relation A

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

Relation B

• Case 1

• Case2

• Case3

A÷B

• Case 1

• Case 2

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

• Case 3

RELATIONAL CALCULUS

Relational Calculus is a formal query language where we can write one declarative expression to specify
a retrieval request and hence there is no description of how to retrieve it.

A calculus expression specifies what is to be retrieved rather than how to retrieve it.

Relational Calculus is considered to be non procedural language.

Relational Calculus can be divided into

1. Tuple Relational Calculus

2. Domain Relational Calculus

TUPLE RELATIONAL CALCULUS


Tuple Relational Calculus is a nonprocedural query language.

A query in the Tuple Relational Calculus is expressed as follows

{t | P (t ) }
It is the set of all tuples t such that predicate P is true for t

t r denotes that tuple t is in relation r

P is a formula similar to that of the predicate calculus

A tuple variable is said to be a free variable unless it is quanti.ed by a or .


t ∈ loan ∧ ∃ s ∈ customer(t[branch-name] = s[branch-name])

t is a free variable. Tuple variable s is said to be a bound variable.

A tuple-relational-calculus formula is built up out of atoms. An atom has one of the following forms:

1. s r, where s is a tuple variable and r is a relation

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

2. s[x] u[y], where s and u are tuple variables, x is an attribute on which s is defined, y is an attribute on
which u is defined, and is a comparison operator

3. s[x] c, where s is a tuple variable, x is an attribute on which s is defined, is a comparison operator, and
c is a constant in the domain of attribute x

Rules to built formulas from atoms

 An atom is a formula.
 If P1 is a formula, then so are ¬P1 and (P1).
 If P1 and P2 are formulae, then so are P1 ∨ P2, P1 ∧ P2, and P1 ⇒ P2.

 If P1(s) is a formula containing a free tuple variable s, and r is a relation, then


 ∃ s ∈ r (P1(s)) and ∀ s ∈ r (P1(s)) are also formulae.
Equivalence relation in Tuple relational calculus
 P1 ∧ P2 is equivalent to ¬ (¬ (P1) ∨ ¬ (P2)).

 ∀ t ∈ r (P1(t)) is equivalent to ¬ ∃ t ∈ r (¬P1(t)).

 P1 ⇒ P2 is equivalent to ¬ (P1) ∨ P2.

Banking Example
1. branch (branch_name, branch_city, assets )
2. customer (customer_name, customer_street, customer_city )
3. account (account_number, branch_name, balance )
4. loan (loan_number, branch_name, amount )
5. depositor (customer_name, account_number )
6. borrower (customer_name, loan_number )

Example Queries
1. Find the loan_number, branch_name, and amount for loans of over $1200
{t | t loan t [amount ] 1200}

2. Find the loan number for each loan of an amount greater than $1200
{t | s loan (t [loan_number ] = s [loan_number ] s [amount ] 1200)}
Notice that a relation on schema [loan_number] is implicitly defined by the query

3. Find the names of all customers having a loan, an account, or both at the bank
{t | s borrower ( t [customer_name ] = s [customer_name ]) u depositor ( t [customer_name ] = u
[customer_name ])

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

4. Find the names of all customers who have a loan and an account at the bank
{t | s borrower ( t [customer_name ] = s [customer_name ]) u depositor ( t [customer_name ] = u
[customer_name] )

5. Find the names of all customers having a loan at the Perryridge branch
{t | s borrower (t [customer_name ] = s [customer_name ] u loan (u [branch_name ] = ―Perryridge ‖ u
[loan_number ] = s [loan_number ]))}

Safety of Expressions

 A tuple relational calculus may generate an infinite relation.

 For example, { t | t loan} results in infinitely many tuples that are not in loan relation.

 To guard against the problem, a domain is defined for all tuple relational calculus formula P.

 It is denoted by dom(P). it denotes that P can take value only in that domain.

 An expression {t | P (t )} in the tuple relational calculus is safe if every component of t appears in


one of the relations, tuples, or constants that appear in P

DOMAIN RELATIONAL CALCULUS

 Domain relational calculus is also a nonprocedural query language equivalent in power to the
tuple relational calculus.

 It servers as the theoretical basis of widely used Query By Example (QBE) language.

 Domain relational calculus expression is of the form:


{ x1, x2, …, xn | P (x1, x2, …, xn)}
x1, x2, …, xn represent domain variables.
P represents a formula composed of atoms.
 An atom in Domain relational calculus has one of the following form

1. < x1, x2, . . . , xn > r, where r is a relation on n attributes and x1, x2, . . . , xn are domain variables or
domain constants.
2. x y, where x and y are domain variables and is a comparison operator
3. x c, where x is a domain variable, is a comparison operator, and c is a constant.

Rules to built formulas from atoms

 An atom is a formula.
 If P1 is a formula, then so are ¬P1 and (P1).
 If P1 and P2 are formulae, then so are P1 ∨ P2, P1 ∧ P2, and P1 ⇒ P2.

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

 If P1(x) is a formula in x, where x is a free domain variable, then


∃ x (P1(x)) and ∀ x (P1(x)) are also formulae.

Example Queries

1. Find the loan_number, branch_name, and amount for loans of over $1200
{ l, b, a | l, b, a loan a > 1200}

2. Find the names of all customers who have a loan of over $1200
{ c | l, b, a ( c, l borrower l, b, a loan a > 1200)}
3. Find the names of all customers who have a loan from the Perryridge branch and the loan amount:

{ c, a | l ( c, l borrower b ( l, b, a loan b = ―Perryridge ‖))}


{ c, a | l ( c, l borrower l, “ Perryridge”, a loan)}

Safety of Expressions
The expression: { x1, x2, …, xn | P (x1, x2, …, xn )} is safe if all of the following hold:
1. All values that appear in tuples of the expression are values from dom (P) (that is, the values appear
either in P or in a tuple of a relation mentioned in P).

2. For every ―there exists‖ subformula of the form x (P1(x)), the subformula is true if and only if there is
a value of x in dom (P1) such that P1(x) is true.

3. For every ―for all‖ subformula of the form x (P1 (x)), the subformula is true if and only if P1(x) is true
for all values x from dom (P1).

Relational Algebra Relational Calculus


It is a procedural method of solving the queries It is a non-procedural method of solving the
queries
We specify the sequence of operations to perform We specify the only what is required without
a particular request bothering about the sequence of operations to
perform that request.
It is prescriptive or rigid in nature i.e, it describes It is descriptive or straightforward in nature i.e,
steps to perform a given task. describe desired result.
It specifies operations performed on existing Operations are directly performed on the relations
relations to obtain new relations. in form of formulas.
It is more closely associated with a programming It is more closely associated with a natural
language. language.
The solution to the database access problem using The solution to the database access problem using
a relational algebra is obtained by stating what is a relational calculus is obtained simply by stating
required and what are the steps to obtain that what is required and letting the system find the
information. answer.
It is used as a vehicle for implementation of Relational Calculus queries are converted into
Relational Calculus. equivalent relational algebra format by using
Codd’s Reduction Algorithm and then it is
implemented with the help of relational algebra

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

operators.
Relational algebra operators are used as a yardstick A language is said to be complete if it is at least as
for measuring the expressive power of any given powerful as the calculus that is, if any relation
language. definable by some expression of the calculus is
also definable by some expression of the language
in question.
The queries are domain independent The queries are domain dependent.

Functional Dependency

An attribute A2 of relation R is said to be functionally dependent on another attribute A1 of the same


relation if and only if there is only one value of A2 corresponding to A1.
A1àA2
Saying that A2 is functionally dependent on A1 is the same to say that A1 determines A2.

Consider a relation ITEM

Here each of the attributes Item_Name, Price, Quantity is functionally dependent on the attributes
Item_Code.

Thus,
Item_Codeà Item_Name
Item_Codeà Price
Item_Codeà Quantity

Types of Functional Delendencies

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

Full Functional Dependency

• In a relation R, an attribute or a collection of attributes Y is said to be fully functionally


dependent on another collection of attributes X if it is functionally dependent on whole of X but
not functionally dependent on any subset of X.
• {X-(A)} à Y

Consider the following Relation RESULT

Here, it is assumed that:-


A course can be offered to different students in different terms.
A student can take a course only once.
Thus, the attributes Term & Grade are individually fully functional dependent upon the primary
key(Rollno, Coursecode)
Rollno, CoursecodeàTerm
Rollno, CoursecodeàGrade

The fully functional dependency diagram of relation RESULT can be shown as.

Partial Functional Dependency

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

A functional dependency in which one or more non key attribute are functionally dependent on a part of
the primary key is called partial functional dependency.
e.g. Consider a relation R
ABàCDE
where Primary Key is AB
then
AàC
AàD
AàE
BàC
all are partial dependencies

Consider a relation EMP_PRO

Eno,Pno is the Primary Key

Eno àEname
Pno àPname
So, Partial F.D.
{X-(A)} à Y

Transitive Functional Dependency

When one non key attribute depends upon another non key attribute, then it is called Transitive
Functional Dependency.
e.g. AàB & BàC
then AàC
where B & C are non key attributes

Consider a relation BOOK

Bookno à AuthorName
AuthorName àAuthorNationality
Bookno à AuthorNationality

Multivalued Functional Dependency

Multi valued dependency is the dependency where one attribute value is potentially a multivalued fact
about another.

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

MVD can be defined as:-


MVD occurs when two or more independent multivalued facts about the same attribute occur within the
same table. It means that if in a relation R having A, B and C as attributes B & C are multivalue facts
about A which is represented as
Aà àB and
AààC
then multivalue dependency exist only if B & C are independent of each other.
In order for a table to contain MVD it must have three or more attributes.
Consider the following COURSE_STUDENT_BOOK Table

The above table contains a multivalued dependency as shown below:-

CourseààStudentName
CourseààTextBook
And StudentName and TextBook are independent of each other

Given a set of functional dependencies F, there are certain other functional dependencies that are
logically implied by F.

The set of all functional dependencies logically implied by F is the closure of F.

We denote the closure of F by F+.

We can find all F+ by applying Armstrong’s Axioms:

Armstrong Axioms/Inference Rules for Functional Dependencies:

R1: Reflexive Rule


If A is a set of attributes such that B C A then AàB holds true

R2: Augumentation Rule


If AàB

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

Then ACàBC holds true

R3: Transitive Rule


If AàB & BàC
Then AàC holds true

R4: Union/Additive Rule


If AàB & AàC
Then AàBC holds true

R5: Decomposition Rule


If AàBC then
AàB and AàC holds true

R6: Pseudotransitive Rule


If AàB, YBàC then YAàC holds true
Ques: Given a set of FD’s as F

AàB, ABCDàE, EFàG

Find whether ACDFàG

Solution: Given AàB

ABCDàE

Then ACDàE {By Pseudotransitivity Rule}

So, ACDFàEF {By Augmentation Rule}

ACDFàEF, EFàG

So ACDFàG {By Transitivity Rule}

Ques: Given R={A,B,C,G,H,I} and Set of FD F={Aàb, AàC, CGàH, CGàI, BàH}

Compute AG+

Solution: For AàB AG+= {A, B, G}

For AàC AG+= {A, B, C, G}

For CGàH AG+= {A, B, C, G, H}

For CGàI AG+= {A, B, C, G, H, I}

For BàH AG+= {A, B, C, G, H, I}

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

So, AG+= {A, B, C, G, H, I}

Ques: Given R={A, B, C, D, E, E, G}

FD’s F={AàB, BCàDE, AEFàG}

Find AC+

Solution: For AàB AC+= {A, B, C}

For BCàDE AC+= {A, B, C, D, E}

For AEFàG AC+= {A, B, C, D, E}

So, AC+= {A, B, C, D, E}

Ques: Given AàBC, EàCF, BàE, CàEF

Compute AB+

Solution: For AàBC AB+= {A, B, C}

For EàCF AB+= {A, B, C}

For CàEF AB+= {A, B, C, E, F}

So, AB+= {A, B, C, E, F}

Ques: For the following FDs find Super Key and Candidate Key

F={ ABCDàE, ABàG, BàF, CàJ, CJàI, GàH}

Solution:

Super Key: {A, B, C, D, J, G}

Candidate Key:

{A, B, C, D, J, G }

{A, B, C, D, J}

{A, B, C, D} is irreducible and Candidate Key

Ques: For the following FDs find Super Key and Candidate Key

F= {AàB, BCàD, BCàE, AEFàG, BàG}

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

Solution:

Super Key: {A, B, C, E, F}

Candidate Key:

{A, B, C, E, F}

{A, C, E, F}

{A, C, F} is irreducible and Candidate Key

Ques: What is a Closure of a set of Functional dependencies? Determine the closure of the following
given FD’s:

A CD
B C
F DE
F A

Solution
Rule of Decomposition
AàC
AàD
FàD
FàE

Rule of Union
FàADE

Rule of Reflexivity
AàA
BàB
CàC
DàD
EàE
FàF

Rule of Union
AàACD
BàBC
FàFADE

Closure:-
F+= F U {AàA, BàB, CàC, DàD, EàE, FàF, AàC, AàD, FàD, FàE, FàADE, AàACD, BàBC,
FàFADE}

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

Normalization

Normalization is the process that removes the following undesirable properties in a relation:-
1. Repetition of information i.e, Redundancy
2. Inability to store some information i.e, insertion anomaly.
3. Loss of information in deletion i.e, deletion anomaly.
4. Problem in updations i.e, updation anomaly

Need/Purpose of Normalization

• Minimize redundancy in data.

• Remove insert, delete and update anomaly during database activities.

• Reduce the need to reorganize data when it is modified or enhanced.

• Normalization reduces a complex user view to a set of small and stable subgroups
of fields/relations.

• This process helps to design a logical data model known as conceptual data model.

Types of Normal Forms

• First Normal Form (1 NF)

• Second Normal Form (2 NF)

• Third Normal Form (3 NF)

• Boyce-Codd Normal Form (BCNF)

• Fourth Normal Form (4 NF)

• Fifth Normal Form (5 NF)

First Normal Form (1 NF)

A relation is said to be in First Normal Form(1 NF) if and only if every entry of the relation has atmost a
single value.

In other words, “A relation is in First Normal Form if and if and only if all underlying domains contain
atomic values or single value only.”

Consider the following STUDENT (UnnormalizedTable)

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

Tables with multivalve entries are called unnormalized tables.

In general there are two basic approaches to normalize tables:-


 Flattening the table
 Decomposition of the table

Flattening the table

The first approach known as flattening the table removes repeating groups by filling in the missing
entries of each incomplete row of the table with copies of their corresponding non-repeating attributes.

STUDENT(Normalized Table)

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

In normalized STUDENT table the attribute Course_code no longer identifies uniquely any row. Thus, a
suitable primary key for this table is the composite key(Course_code, Roll_no)

Decomposition of the table

The second approach for normalizing a table requires that the table be decomposed into two new tables
that will replace the original table.

Rule of Decomposition
One of the two tables contains the table identifier of the original table and all the non-repeating
attributes.
The other table contains a copy of the table identifier and all the repeating attributes.

COURSE Table

COURSE_STUDENT Table

Anomalies in 1NF
Insertion Anomaly:
We cannot insert the information about the student until he/she join any course
e.g. we cannot store the information about Roll_no 110 until he join any course, similarly we are unable
to store the information about the course until there is a student who enroll in that course.
e.g. we cannot store C1 course is of visual basic until atleast one student join the course.

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

Deletion Anomaly:
If we delete the last tuple of the relation then we are not only deleting the course information but also
lose information about the system on which this student works.

Update Anomaly:
This relation is also susceptible to update anomalies because the course in which a student studies may
appear many times in the table. It is this redundancy of information that causes the anomaly because if a
teacher moves to another course, we are now faced with two problems; we either search the entire
table looking for that teacher and update his or her Course_code value or we miss one or more tuples of
that STUDENT and end up with an inconsistent database.

Second Normal Form (2 NF)

A Relation R is in Second Normal Form (2NF) if and only if it is in 1 NF and every non key attribute is fully
functionally dependent on the primary key.

A resultant database of first normal form COURSE_STUDENT does not satisfy the above rule, because
non-key attributes Name, System_Used and Hourly_Rate are not fully dependent on the primary
key(Course_code, Roll_No) because Name, System_Used and Hourly_Rate are functionally dependent on
Roll_No and Roll_No is a subset of the primary key.

Now break the COURSE_STUDENT table to two tables

HOURS_ASSIGNED Table

STUDENT_SYSTEM_CHARGE Table

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

Anomalies in 2NF
Insertion Anomaly: Insertion anomalies occur in the STUDENT_SYSTEM_CHARGE relation.
e.g. Consider a situation where we would like to set in advance the rate to be charged from the students
for a particular system. We cannot insert this information until there is a student assigned to that type
of system.
Suppose we want to store the hourly_rate of laptop we cannot insert it until some student use that type
of system because roll_no is primary key and we cannot insert null into it.

Deletion Anomaly:
The STUDENT_SYSTEM_CHARGE relation is also susceptible to deletion anomalies. This type of anomaly
occurs whenever we delete the tuple of a student who happens to be the only student left which is
working on a particular system. In this case we will also lose the information about the rate that we
charge for that particular system.
e.g. Rollno 102

Updation Anomaly: Update anomalies will occur in the STUDENT_SYSTEM_CHARGE relation because
there may be several type of the system. If the hourly_rate for that particular system changes, we need
to make sure that the corresponding rate is changed for all students that work on that type of system.
Otherwise the database may end up in an inconsistent state.

3 Normal Form (3NF)

A relation R is in Third Normal Form (3NF) if and only if the following conditions are satisfied
simultaneously :
• R is already in 2NF
• No non prime(non key) attribute is transitively dependent on the key.
or
No non prime attribute functionally determine any other non prime attribute.

The objective of transforming relations into 3 NF is to remove all transitive dependencies.

We can determine that STUDENT_SYSTEM_CHARGE relation is not in 3 NF by noticing that


System_UsedàHourly_Rate, here both attributes are non-prime.

To transform this relation into 3 NF we will decompose it into the following two tables:-

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

STUDENT_SYSTEM Table

CHARGES Table

Data Anomalies in 3 NF

The 3 NF helped us to get rid of the data anomalies caused either by transitive dependencies on the
primary key or by dependencies of a non prime attribute. However relations in 3NF are still susceptible
to data anomalies particularly when the relations have two overlapping candidate keys or when a non
prime attribute functionally determine a prime attribute.

Boyce-Codd Normal Form (BCNF)


BCNF states that:-
A relation R is in Boyce-Codd normal form if and only if every determinant is a candidate key.
Here, determinant is a simple attribute or composite attribute on which some other attribute is fully
functionally dependent.

Consider the Manufacturer relation:-

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

FD of the above relation is:-

(Id_No, Item_No)àQuantity
(Name, Item_No)àQuantity
Id_NoàName
Nameà Id_No

The relation has two overlapping candidate keys, because there are two composite candidate
keys(Id_No, Item_No) and (Name, Item_No) out of which Item_No is common attribute in both the
candidate keys, so this is a case of overlapping of candidate keys.

In the above relation, there is only one non key attribute i.e, Quantity and it is FFD and non transitively
dependent on the primary key.
But manufacturer relation is not in BCNF because this relation has four determinants.
 Id_No, Item_No [Quantity depends on this combination]
 Name, Item_No [Quantity depends on this combination]
 Id_No [Name depends on Id_No]
 Name [Id_No depends on Name]

Out of these four determinants(Id_No, Item_no) and (Name, Item_no) are unique but Id_No and Name
determinants are not candidate keys.
In order to make this relation in BCNF we non-loss decompose this relation in two projections.

ID_NAME Relation

D_QTY Relation

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

FOURTH Normal Form(4NF)

A Relation R is in 4 NF if and only if the following conditions are satisfied simultaneously:

• R is already in 3NF or BCNF

• No multivalued dependency should be there.

Consider the following COURSE_STUDENT_BOOK Table:

The above database is in First, Second and Third normal form because for each row-column interaction
we have at-most single entry and primary key is the combination of three columns(Course,
Student_name, Text_book). So, it does not have any non-key attribute. It satisfies second and third
normal form because it only refers to non key attributes. The relation is also in BCNF, since all three
attributes concatenated together constitute its key, yet it clearly contained anomalies and requires
decomposition with the help of fourth normal form.

Rule to transform into Fourth normal form

A Relation R having A,B, and C as attributes can be non loss decomposed into two projections R1(A,B)
and R2(A,C) if and only if the MVD AààB

and

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

AààC hold in R

Looking again at the COURSE_STUDENT_BOOK table, it contains a multi-valued dependency as shown


below:

CourseààStudent_Name

CourseààText_book

To put it in 4NF, two separate tables are formed as shown below:

COURSE_STUDENT Table

COURSE_BOOK Table

Now, We can easily check that all the above anomalies of STUDENT_COURSE_BOOK database are
removed.

e.g. if a new student joins a course then we have to make only one insertion in COURSE_STUDENT table
and if a new book introduced for a course then again we have to make a single entry in COURSE_BOOK
table, so modified database eliminate the problem of redundancy which also solves the update problem.

Fifth Normal Form (5NF)

• A Relation R is in fifth normal form if and only if the following conditions are satisfied
simultaneously:-

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

• R is already in 4 NF

• It cannot be further non-loss decomposed.

• Consider the table

AGENT_COMPANY_PRODUCT

• The table is in 4 NF because it contains no multivalued dependency.

• However, contain an element of redundancy in that it records the fact that Suneet is an agent for
ABC twice.

• Suppose that the table is decomposed into its two projections P1 and P2.

P1

P2

The redundancy has been eliminated but the information about which companies make which products.
The natural join of these projections over Agent column is:

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

• The table resulting from this join is spurious since the asterisked row of the table contains
incorrect information.

• Now the original table is decomposed into three tables the two projections P1 & P2 which have
already shown and the final possible projection P3

• If a join is taken of all three projections, first P1 & P2 and then this result with P3 over ‘Company’
and ‘Product_Name’ the following table is obtained.

• This still contains a spurious row. It is not simply possible to decompose the
Agent_Company_Product table

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

Now consider a different case

The table can be decomposed into its three projections:-

P1

P2

P3

All redundancy has been removed if the natural join P1 & P2 is taken, the result is:-

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)


lOMoARcPSD|22372339

• The spurious row as asterisked.

• Now if this result is joined with P3 over the column ‘Company’ and ‘Product_Name’ the
following table is obtained:

• This is a correct recomposition of the original table and a non loss decomposition into the three
projections was achieved.

Downloaded by Jyoti Kumari (jyotikumari19110@gmail.com)

You might also like