UNIT 7: RELATIONAL DATABASE DESIGN
INFORMAL DESIGN GUIDELINES FOR RELATIONAL SCHEMAS
Informal guidelines that may be used as measures to determine the quality of relation schema
design are listed below:
• Making sure that the semantics of the attributes is clear in the schema
• Reducing the redundant information in tuples
• Reducing the NULL values in tuples
• Disallowing the possibility of generating spurious tuples
Making sure that the semantics of the attributes is clear in the schema
Whenever we are going to form relational schema there should be some meaning among the
attributes. This meaning is called semantics. This semantics relates one attribute to another with
some relation.
Guideline:
• Informally, each tuple in a relation should represent one entity or relationship instance.
• Attributes of different entities (relations) should not be mixed in the same relation
• Only foreign keys should be used to refer to other entities
• Entity and relationship attributes should be kept apart as much as possible
Reducing the redundant information in tuples
If a table containing attributes of multiple entities may cause redundant information problems.
Information is stored redundantly wasting storage. Problems with update anomalies
• Insertion anomalies
• Deletion anomalies
• Modification anomalies
We can reduce redundant information problem by using normalization. Normalization is the
process of reducing a single table into a multiple simple table.
Example: let’s take two tables,
Student Department
Sid Sname Semester Dept_no Dept_name Address
If we integrate these two relations into a single table i.e. Student Info.
Sid Sname Semester Dept_no Dept_name Address
• Here whenever if we insert the tuples there may be N students in one department, so
Dept_no, Dept_name, Address values are repeated N times which leads to data
redundancy.
• Another problem is update anomalies i.e. if we insert new department that has no
students.
• If we delete the last student of a department, then whole information about the
department will be deleted
1
• If we change the value of one of the attributes of a particular table then we must update
the tuples of all the students belonging to that department else database will become
inconsistent.
So, design the base relation schemas so that no insertion, deletion, or modification anomalies are
present in the relations. If any anomalies are present, note them clearly and make sure that the
programs that update the database will operate correctly.
Reducing Null values in Tuples
In some schema designs we may group many attributes together into a “fat” relation. If many of the
attributes do not apply to all tuples in the relation, we end up with many NULLs in those tuples.
This can waste space at the storage level and may also lead to problems with understanding the
meaning of the attributes and with specifying JOIN operations at the logical level.
Another problem with NULLs is how to account for them when aggregate operations such as
COUNT or SUM are applied. SELECT and JOIN operations involve comparisons; if NULL values
are present, the results may become unpredictable. Reasons for nulls:
• attribute not applicable or invalid
• attribute value unknown (may exist)
• value known to exist, but unavailable
As far as possible, avoid placing attributes in a base relation whose values may frequently be
NULL. If NULLs are unavoidable, make sure that they apply in exceptional cases only and do not
apply to a majority of tuples in the relation. Using space efficiently and avoiding joins with NULL
values are the two overriding criteria that determine whether to include the columns that may have
NULLs in a relation or to have a separate relation for those columns (with the appropriate key
columns). For example, if only 15 percent of employees have individual offices, there is little
justification for including an attribute Office_number in the EMPLOYEE relation; rather, a relation
EMP_OFFICES (Essn, Office_number) can be created to include tuples for only the employees with
individual offices.
Relations should be designed such that their tuples will have as few NULL values as possible.
Attributes that are NULL frequently could be placed in separate relations (with the primary key)
Disallowing the possibility of generating spurious tuples
Bad designs for a relational database may result in erroneous results for certain JOIN Operations
the "lossless join" property is used to guarantee meaningful results for join operations. The relations
should be designed to satisfy the lossless join condition. No spurious tuples should be generated by
doing a natural-join of any relations.
Design relation schemas so that they can be joined with equality conditions on attributes that are
appropriately related (primary key, foreign key) pairs in a way that guarantees that no spurious
tuples are generated. Avoid relations that contain matching attributes that are not (foreign key,
primary key) combinations because joining on such attributes may produce spurious tuples.
PROBLEMS WITHOUT NORMALIZATION
If a table is not properly normalized and has data redundancy then it will not only eat up extra
memory space but will also make it difficult to handle and update the database, without facing data
loss. Insertion, updating and Deletion Anomalies are very frequent if database is not normalized. To
understand these anomalies let us take an example of a student table.
2
Student
Sid Sname Branch HOD Office Phone
100 Abin Pokhara branch Mr. Nabin 4434564
101 Aarav Pokhara branch Mr. Nabin 4434564
102 Ashana Pokhara branch Mr. Nabin 4434564
103 Geeta Pokhara branch Mr. Nabin 4434564
104 Umesh Banepa branch Mr. Prabin 5212343
105 Ramesh Banepa branch Mr. Prabin 5212343
In the table above, we have data of 6 Computer Sci. students. As we can see, data for the fields
branch, HOD and Office_Phone is repeated for the students who are in the same branch in the
college, this is Data Redundancy.
Insertion Anomaly
Suppose for a new admission, until and unless a student opts for a branch, data of the student
cannot be inserted, or else we will have to set the branch information as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch information will be
repeated for all those 100 students. These scenarios are nothing but Insertion anomalies.
Update Anomaly
What if Mr. Prabin leaves the college? Or is no longer the HOD of computer science department? In
that case all the student records will have to be updated, and if by mistake we miss any record, it
will lead to data inconsistency. This is Updating anomaly.
Deletion Anomaly
In our Student table, two different information are kept together, Student information and Branch
information. Hence, at the end of the academic year, if student records are deleted, we will also lose
the branch information. This is Deletion anomaly.
FUNCTIONAL DEPENDENCIES
Functional dependency plays a key role in differentiating good database design from bad database
design. Functional dependency is a relationship that exists when one attribute uniquely determines
another attribute. If R is a relation with attributes X and Y, a functional dependency between the
attributes is represented as X→Y, which specifies Y is functionally dependent on X. Here X is a
determinant set and Y is a dependent attribute. Each value of X is associated with precisely one Y
value.
Functional dependency in a database serves as a constraint between two sets of attributes. Defining
functional dependency is an important part of relational database design and contributes to aspect
normalization.
For example: Suppose we have a student table with attributes: Stu_Id, Stu_Name, Stu_Age. Here
Stu_Id attribute uniquely identifies the Stu_Name attribute of student table because if we know the
student id we can tell the student name associated with it. This is known as functional dependency
and can be written as Stu_Id→Stu_Name or in words we can say Stu_Name is functionally
dependent on Stu_Id.
3
TYPES OF FUNCTIONAL DEPENDENCY
There are many types of functional dependencies, depending on several criteria
• Trivial and Non-trivial dependencies
• Fully functional dependency
• Partial functional dependency
• Transitive dependency
• Multivalued dependency
Trivial functional dependency
The dependency of an attribute on a set of attributes is known as trivial functional dependency if
the set of attributes includes that attribute. Symbolically: A ->B is trivial functional dependency if B
is a subset of A. The following dependencies are also trivial:
• A→A
• B→B
For example: Consider a table with two columns Student_id and Student_Name. {Student_Id,
Student_Name} →S tudent_Id is a trivial functional dependency as Student_Id is a subset of
{Student_Id, Student_Name}. That makes sense because if we know the values of Student_Id and
Student_Name then the value of Student_Id can be uniquely determined. Also,
Student_Id→Student_Id & Student_Name→Student_Name are trivial dependencies too.
Non trivial functional dependency
If a functional dependency X→Y holds true where Y is not a subset of X then this dependency is
called non trivial Functional dependency
For example
An employee table with three attributes: {emp_id, emp_name, emp_address}
The following functional dependencies are non-trivial:
emp_id → emp_name (emp_name is not a subset of emp_id)
emp_id → emp_address (emp_address is not a subset of emp_id)
Fully Functional Dependency
An attribute is fully functionally dependent on a second attribute if and only if it is functionally
dependent on the second attribute but not on any subset of the second attribute. Mathematically, for
a relation schema R, in a functional dependency X→Y, Y is said to be fully functional dependent on
X if Z→Y is false for all Z X.
Partial Functional Dependency
An attribute is partial functionally dependent on a second attribute if and only if it is functionally
dependent on the second attribute and also dependency occur on any subset of the second attribute.
This is the situation that exists if it is necessary to only use a subset of the attributes of the composite
determinant to identify its object. Mathematically, for a relation schema R, in a functional
dependency X→Y, Y is said to be partial functional dependent on X if by removal of some attributes
from X and the dependency still holds.
Example: The dependency {Emp-Id, Projecta-No} → {Emp-Name} is partial because Emp-Id→Emp-
Name also holds.
4
Transitive dependency
A functional dependency is said to be transitive if it is indirectly formed by two functional
dependencies. For example, X →Z is a transitive dependency if the following three functional
dependencies hold true:
• X→Y
• Y→X
• Y→Z
Example: Let’s take an example to understand it better
Movie_ID Listing_ID Listing_Type DVD_Price ($)
M08 L09 Crime 180
M03 L05 Drama 250
M05 L09 Crime 180
Movie_ID →Listing_ID
Listing_ID →Listing_Type
Also, Movie_ID →Listing_Type
Multivalued dependency
Multivalued dependency occurs when there are more than one independent multivalued attributes
in a table. For example: Consider a bike manufacture company, which produces two colors (Black
and white) in each model every year.
Bike_model Manuf_year Color
M1001 2007 Black
M1001 2007 Red
M2012 2008 Black
M2012 2008 Red
M2222 2009 Black
M2222 2009 Red
Here columns manuf_year and color are independent of each other and dependent on bike_model.
In this case these two columns are said to be multivalued dependent on bike_model. These
dependencies can be represented like this:
bike_model ->> manuf_year
INTRODUCTION TO NORMALIZATION
When developing the schema of a relational database, one of the most important aspects to be taken
into account is to ensure that the duplication is minimized. This is done for two purposes:
• Reducing the amount of storage needed to store the data.
• Avoiding unnecessary data conflicts that may creep in because of multiple copies of the
same data getting stored.
Normalization is the process of minimizing redundancy from a relation or set of relations. Redundancy
in relation may cause insertion, deletion and update anomalies. So, it helps to minimize the
redundancy in relations. Normal forms are used to eliminate or reduce redundancy in database tables.
5
Normalization is a database design technique which organizes tables in a manner that reduces
redundancy and dependency of data. It divides larger tables to smaller tables and links them using
relationships.
TYPES OF NORMALIZATION
The degree of normalization is defined by normal forms. The normal forms in an increasing level of
normalization are first normal form (1NF), second normal form (2NF), 3NF, Boyce-Codd normal
form, 4NF and 5NF. Each normal form is a set of conditions on a schema that guarantees certain
properties relating to redundancy and update anomalies. In general 3NF is considered good
enough. In certain instances, a lower level of normalization, that is the instance where queries take
enormous time to execute.
There are various database “Normal” forms. Each normal form has an importance which helps in
optimizing the database to save storage and to reduce redundancies.
Un-normalized
(UDF)
Remove repeating
Groups
First Normal form
(1NF)
Remove partial
Dependencies
Second Normal form
(2NF)
Remove transitive
Dependencies
Third Normal form
(3NF)
Remove remaining
Functional dependency
Boyce Code Normal form anomalies
BCNF
Remove multi-valued
Fourth Normal form
Dependency
(4NF)
Remove remaining
Fifth Normal form
Anomalies
(5NF)
NORMAL FORMS BASED ON PRIMARY KEYS
The normal forms first, second and third are called primary key based normal forms. Here each of
the non-prime attributes of the relation depends upon the prime attribute of the relation.
First Normal Form
If a relation contain composite or multi-valued attribute, it violates first normal form or a relation is
in first normal form if it does not contain any composite or multi-valued attribute.
Simply, A relation R is in 1 NF if it holds following two properties;
• If R has no multivalued attributes
• If R has no composite attributes i.e. it must have simple attributes
6
Example: Relation Student in table below is not in 1NF because of multi-valued attribute Phone. Its
decomposition into 1NF has been shown in table 2.
Sid Sname Phone State Age
1 Ram 9848372632, State 1 23
8905454342
2 Hari 9851093482 State 2 33
3 Ramesh 9848372098 State 1 22
9851123032
4 Anand 9804094302 State 3 29
Now convert given relation into 1NF as below,
Student
Sid Sname State Age
1 Ram State 1 23
2 Hari State 2 33
3 Ramesh State 1 22
4 Anand State 3 29
Phone_detail
Sid Phone
1 9848372632
1 8905454342
2 9851093482
3 9848372098
3 9851123032
4 9804094302
Second normal form (2NF)
A relation R (table) is said to be in 2NF if both the following conditions hold:
• Table is in 1NF (First normal form)
• No non-prime attributes of R fully dependents upon the proper subset of any candidate
key of table.
An attribute that is not part of any candidate key is known as non-prime attribute. Simply a relation
is in 2 NF if there are no partial dependencies between the primary key and non-prime keys of
given relation R.
Example: Suppose a school wants to store the data of teachers and the subjects they teach. They
create a table that looks like this: Since a teacher can teach more than one subjects, the table can
have multiple rows for a same teacher.
7
Teacher
Tid Tsubject Tage
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
Candidate Keys: {Tid, Tsubject}
Non prime attribute: Tage
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non
prime attribute Tage is dependent on Tid alone which is a proper subset of candidate key. This
violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset
of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table
Tid Tage
111 38
222 38
333 40
teacher_subject table
Tid Tsubject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Now the tables comply with Second normal form (2NF).
Third Normal form (3NF)
A table design is said to be in 3NF if both the following conditions hold:
• Table must be in 2NF
• No transitive functional dependency between the attributes of given table.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each
functional dependency X→Y at least one of the following conditions hold:
• X is a super key of table
• Y is a prime attribute of table
Example: Suppose a company wants to store the complete address of each employee, they create a
table named employee_details that looks like this:
8
emp_id emp_name emp_zip emp_state emp_city emp_district
1001 John 282005 State 1 Dhankuta Jhapa
1002 Ajeet 222008 State 2 Janakpur Sarlahi
1006 Lora 282007 State 2 Janakpur Sarlahi
1101 Lilly 292008 State 3 Kathmandu Ktm
1201 Steve 222999 State 7 Dhangadhi Kailali
Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any
candidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on
emp_id that makes non-prime attributes (emp_state, emp_city & emp_district) transitively
dependent on super key (emp_id). This violates the rule of 3NF. To make this table complies with
3NF we have to break the table into two tables to remove the transitive dependency:
Employee table
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999
Employee_zip table
emp_zip emp_state emp_city emp_district
282005 State 1 Dhankuta Jhapa
222008 State 2 Janakpur Sarlahi
282007 State 2 Janakpur Sarlahi
292008 State 3 Kathmandu Ktm
222999 State 7 Dhangadhi Kailali
Boyce Codd normal form (BCNF)
It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A
table complies with BCNF if it is in 3NF and for every functional dependency X→Y, X should be the
super key of the table.
Mathematically,
A relation R is said to be in BCNF with respect to all the functional dependencies of the form α→β,
where α⊆ R and β ⊆ R, if at least one of the following holds:
• α → β is a trivial functional dependency (that is, β ⊆ α).
• α is a candidate key for schema R.
Example: Suppose there is a company wherein employees work in more than one department.
They store the data like this:
9
emp_id emp_nationality emp_dept dept_type dept_no_of_emp
1001 Nepalese Production and planning D001 200
1001 Nepalese Store department D001 250
1002 Indian Technical support D134 100
1002 Indian Purchasing department D134 600
Functional dependencies in the above table are:
emp_id→emp_nationality
emp_dept→{dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table
emp_id emp_nationality
1001 Nepalese
1002 Indian
emp_dept_mapping table
emp_id emp_dept
1001 Production and planning
1001 Store department
1002 Technical support
1002 Purchasing department
emp_dept table
emp_dept dept_type dept_no_of_emp
Production and planning D001 200
Store department D001 250
Technical support D134 100
Purchasing department D134 600
Functional dependencies:
emp_id → emp_nationality
emp_dept → {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.
10
Differences between 3NF and BCNF
3NF BCNF
A table or a relation is considered to be in BCNF is considered to be the stronger
third normal form only if the table is than 3NF. The relation R to be in BCNF
already in 2NF and there is no non-prime must be in 3NF. And wherever a non-
attribute transitively dependent on the trivial functional dependency A→B
candidate key of a relation. holds in relation R, then A must be a
super key of relation R.
No non-prime attribute must be transitively For any trivial dependency in a relation
dependent on the Candidate key. R say X→Y, X should be a super key of
relation R.
3NF can be obtained without sacrificing all Dependencies may not be preserved in
dependencies. BCNF.
Lossless decomposition can be achieved in Lossless decomposition is hard to
3NF. achieve in BCNF.
Fourth Normal Form
A table R is in fourth normal form (4NF) if and only if it satisfied following two conditions
simultaneously:
• R is already in BCNF or in 3NF
• If it contains no multi-valued attributes
Equivalently, we can that a relation schema R is in 4NF if, for every nontrivial multi-valued
dependency X —>> Y, X is a super key for R. The goal of fourth normal form is to eliminate
nontrivial multi-valued dependencies from a table by projecting them onto separate smaller tables,
thus eliminating the update anomalies associated with the multi-valued dependencies.
Example 1: Consider the following relation subject which has the attributes course, instructor, and
textbook.
Subject (course, instructor, Text-book)
Course Instructor Text-Book
CS1 Indra DBMS
CS1 Bhupi DBMS
CS2 Arjun NM
CS2 Bhupendra NM
CS3 Indra CG
CS3 Bhupi CG
CS3 Arjun CG
In this relation for each course there are many instructors and also for each course there are many
text-books. But instructor and text-book are independent of each other. Thus, the given relation
Subject is not in fourth normal form because of multi-valued dependency between attributes.
Course —>> instructor
Course —>> textbooks
11
The relation subject can be converted to fourth normal form by splitting the relation Subject into
two relations Teacher and Text as below
Teacher (Course, Instructor)
Text (Course, Textbook)
Example 2: Consider the relation Employee with the attributes employee number, project name,
and department name as shown below:
Employee (Eno, Pname, Dname)
Where, Eno stands for Employee number, Pname for project name, and Dname for department
name. The relation Employee has the following multi-valued dependencies:
Eno —>> Pname
Eno —>> Dname
Since, Eno is not the super key of the relation Employee so it is not in 4NF. To convert the relation to
4 NF, we can decompose Employee relation into two relations Emp-Proj and Emp- Dept as shown
below.
Emp-proj (Eno, Pname)
Emp-dept (Eno, Dname)
What is decomposition?
Decomposition is the process of breaking down in parts or elements. It replaces a relation with a
collection of smaller relations. It breaks the table into multiple tables in a database. It should always
be lossless, because it confirms that the information in the original relation can be accurately
reconstructed based on the decomposed relations. If there is no proper decomposition of the
relation, then it may lead to problems like loss of information.
Properties of Relational Decomposition
Following are the properties of Decomposition,
1. Lossless Decomposition
2. Dependency Preservation
3. Lack of Data Redundancy
Lossless Decomposition
Decomposition must be lossless. It means that the information should not get lost from the relation
that is decomposed. It gives a guarantee that the join will result in the same relation as it was
decomposed.
Example: Let's take 'E' is the Relational Schema, with instance 'e'; is decomposed into: E1, E2, E3 . . .
En; with instance: e1, e2, e3 . . . en, If e1 ⋈ e2 ⋈ e3 . . . . ⋈ en, then it is called as 'Lossless Join
Decomposition'.
In the above example, it means that, if natural joins of all the decomposition give the original
relation, then it is said to be lossless join decomposition.
12
Example: Employee_Department
Eid Ename Age City Salary Deptid DeptName
E001 ABC 29 Ktm 20000 D001 Finance
E002 PQR 30 Ktm 30000 D002 Production
E003 LMN 25 Banepa 5000 D003 Sales
E004 XYZ 24 Banepa 4000 D004 Marketing
E005 STU 32 Pokhara 25000 D005 Human Resource
Decompose the above relation into two relations to check whether decomposition is lossless or
lossy.
Now, we have decomposed the relation that is Employee and Department.
Relation 1: Employee
Eid Ename Age City Salary
E001 ABC 29 Ktm 20000
E002 PQR 30 Ktm 30000
E003 LMN 25 Banepa 5000
E004 XYZ 24 Banepa 4000
E005 STU 32 Pokhara 25000
Employee Schema contains (Eid, Ename, Age, City, Salary)
Relation 2: Department
Deptid Eid DeptName
D001 E001 Finance
D002 E002 Production
D003 E003 Sales
D004 E004 Marketing
D005 E005 Human Resource
• Department Schema contains (Deptid, Eid, DeptName).
• So, the above decomposition is a Lossless Join Decomposition, because the two relations
contain one common field that is 'Eid' and therefore join is possible.
• Now apply natural join on the decomposed relations.
Employee ⋈ Department
Eid Ename Age City Salary Deptid DeptName
E001 ABC 29 Ktm 20000 D001 Finance
E002 PQR 30 Ktm 30000 D002 Production
E003 LMN 25 Banepa 5000 D003 Sales
E004 XYZ 24 Banepa 4000 D004 Marketing
E005 STU 32 Pokhara 25000 D005 Human Resource
Hence, the decomposition is Lossless Join Decomposition.
13
If the Employee table contains (Eid, Ename, Age, City, Salary) and Department table contains
(Deptid and DeptName), then it is not possible to join the two tables or relations, because there is no
common column between them. And it becomes Lossy Join Decomposition.
Dependency Preservation
• Dependency is an important constraint on the database.
• Every dependency must be satisfied by at least one decomposed table.
• If {A → B} holds, then two sets are functional dependent. And, it becomes more useful
for checking the dependency easily if both sets in a same relation.
• This decomposition property can only be done by maintaining the functional
dependency.
• In this property, it allows to check the updates without computing the natural join of the
database structure.
Lack of Data Redundancy
• Lack of Data Redundancy is also known as a Repetition of Information.
• The proper decomposition should not suffer from any data redundancy.
• The careless decomposition may cause a problem with the data.
• The lack of data redundancy property may be achieved by Normalization process.
FUNCTIONAL DEPENDENCY INFERENCE RULES (ARMSTRONG’S AXIOMS)
Reflexivity
If Y ⊆ X then X→Y. The axiom of reflexivity indicates that given a set of attributes the set itself
functionally determines any of its own subsets.
Augmentation
If X→Y and Z is a subset of table R (i.e., Z is any set of attributes in R), then XZ→YZ. The axiom of
augmentation indicates that we can augment the left side of the functional dependency or both
sides conveniently with one or more attributes. The axiom does not allow augmenting the right-
hand side alone. The augmentation rule can be represented as If X→Y then XZ→Y
Transitivity
If X→Y and Y→Z then X→Z. The axiom of transitivity indicates that if one attribute uniquely
determines a second attribute and this, in turn, uniquely determines a third one, then the first
attributes determines the third one. Consider three parallel lines X, Y, and Z. The line X is parallel to
line Y. The line Y is parallel to line Z then it implies that line X is parallel to line Z. This property is
called transitivity.
Pseudo transitivity
If X→Y and YW→Z then XW→Z. Transitivity is a special case of pseudo transitivity when W is
null. The axiom of pseudo transitivity is a generalization of the transitivity axiom.
Union
If X→Y and X→Z then X→YZ. The axiom of union indicates that if there are two functional
dependencies with the same determinant it is possible to form a new functional dependency that
preserves the determinant and has its right-hand side the union of the right-hand sides of the two
functional dependencies. The union rule can be illustrated with the example of PINCODE. The
PINCODE is used to identify city as well as PINCODE is used to identify state. This implies that
PINCODE determines both city and state.
14
Decomposition
If X→YZ then X→Y and X→Z. The axiom of decomposition indicates that the determinant of any
functional dependency can uniquely determine any individual attribute or any combination of
attributes of the right-hand side of the functional dependency. The decomposition can be illustrated
with an example of Book ID. The Book ID determines the title and the author similar to (X→YZ)
which implies that Book ID determines title (X→Y) and Book ID determines Author (X→Z)
Closure of Set of Functional Dependencies
The Closure of Functional Dependency means the complete set of all possible attributes that can be
functionally derived from given functional dependency using the inference rules known as
Armstrong’s Rules. If “F” is a functional dependency then closure of functional dependency can be
denoted using “{F}+”. There are three steps to calculate closure of functional dependency. These are:
• Step-1: Add the attributes which are present on Left Hand Side in the original
functional dependency.
• Step-2: Now, add the attributes present on the Right Hand Side of the functional
dependency.
• Step-3: With the help of attributes present on Right Hand Side, check the other
attributes that can be derived from the other given functional dependencies. Repeat this
process until all the possible attributes which can be derived are added in the closure.
Algorithm: Determining X+, the closure of X under F.
Input: Let F be a set of FDs for relation R.
Steps:
1. X+ = X //initialize X+ to X
2. for each FD: Y → Z in F Do
If Y ⊆ X+ Then //If Y is contained in X+
X+ = X+ ∪ Z //add Z to X+
End If
End For
3. Return X+ //Return closure of X
Output: Closure X+ of X under F
Example 1: Let R = (A, B, C, D)
F= {A→B, A→C, BC→D}. List several member of F+
Solution:
Since A→B, A→C then A→BC [by union rule]
Since A→BC, BC→D then A→D [by transitive rule]
Since A→C, BC→D then AB→D [by pseudo transitivity rule]
Hence F+ = {A→B, A→C, BC→D, A→BC, A→D, AB→D}
15
Example 2: Let R = (A, B, C, G, H, I)
Given F= {A→B, A→C, CG→H, CG→I, B→H}, Compute closure of F (F+).
Solution:
Since A→B, B→H then A→H [by transitivity rule]
Since A→B, A→C then A→BC [by union rule]
Since A→C then AG→CG [by augmentation rule]
Since AG→CG, CG→I then AG→I [by transitivity rule]
Since CG →H, CG→I then CG→HI [by union rule]
Hence, F+ = {A→A, B→B, C→C, H→H, G→G, I→I, A→B, A→C, CG→H, G→I, CG→HI, B→H,
A→H, AG→I, CG→HI}
Here, first six FDs obtain by reflexive axiom.
Example 3: Let R = (A, B, C, D) and F= {A→B, A→C, BC→D} then compute F+.
Solution:
Since A→B and A→C then by union rule A→BC.
Since BC →D, then by decomposition B→D, C→D. Again by transitivity A→B &B →D ⇒ A→D and
A→C and C→D ⇒ A→D.
Hence, F+ = {A→A, B→B, C→C, D→D, A→B, A→C, BC→D, B→D, C→D, A→D}
CLOSURE OF SETS ATTRIBUTES WITH FUNCTIONAL DEPENDENCY
The set of all those attributes which can be functionally determined from an attribute set is called as
a closure of that attribute set. Closure of attribute set {X} is denoted as {X}+.
Steps to Find Closure of an Attribute Set
Following steps are followed to find the closure of an attribute set-
Step 1: Add the attributes contained in the attribute set for which closure is being calculated to the
result set.
Step 2: Recursively add the attributes to the result set which can be functionally determined from
the attributes already contained in the result set.
Example 1: Consider a relation R (A, B, C, D, E, F, G) with the functional dependencies
A → BC
BC → DE
D→F
CF → G
Now, let us find the closure of some attributes and attribute sets-
Closure of attribute A:
A+ = {A }
={A,B,C} (Using A → BC )
={A,B,C,D,E} (Using BC → DE )
={A,B,C,D,E,F} (Using D → F )
={A,B,C,D,E,F,G} (Using CF → G )
Thus, A+ = { A , B , C , D , E , F , G }
16
Closure of attribute D:
D+ = { D }
= { D , F } ( Using D → F )
We cannot determine any other attribute using attributes D and F contained in the result set.
Thus,
D+ = { D , F }
Closure of attribute set {B, C}:
{ B , C }+= { B , C }
={B,C,D,E} ( Using BC → DE )
={B,C,D,E,F} ( Using D → F )
= { B , C , D , E , F , G } ( Using CF → G )
Thus, { B , C }+ = { B , C , D , E , F , G }
Example 2: Let R = {A, B, C, D, E} and F= {AB→C, A→D, D→E, AC→B} then compute (AB) +
Solution:
result = {AB}
AB→C hence result = {ABC}
A→D hence result = {ABCD}
D→E hence result = {ABCDE}
AC→B [B is already in the result]
So, result= {ABCDE}
Compute result for {A} + for above example:
Result = {A}
AB→C [no change] result = {A}
A→D hence result = {AD}
D→E hence result = {ADE}
AC→B [no change] result = {ADE}
So, result= {ADE}
17