Normalization:
Functional Dependency:
In the Relational Model keys are important because they are used to ensure that each row in a
table is uniquely identifiable. They are also used to establish relationships among tables and to
ensure the integrity of the data. A key consists of one or more attributes that determine other
attributes. For example, an invoice number identifies all of the invoice attributes such as invoice
date and the customer name, Invoice_amount. product IDs etc.
The key’s role is based on a concept known as determination. In the context of database table
the statement “A determines B” indicates that if we know the value of attribute A we can look up
the value of attribute B .For example knowing the STUNO in the STUDENT table we look up the
students last name, grade point ,average phone number and so on.
The principle of determination is very important because it is used in the definition of a central
relational database concept known as functional dependence. The term functional dependence
can be defined most easily this way the attribute B is functionally dependent on A, if A
determines B, or the attribute B is functionally dependent on the attribute A if each value in
column A determines one and only one value in column B.
NORMALIZATION PROCESS:
The objective of normalization is to ensure that each table conforms to the concept of well
formed relations that is tables that have the following characteristics.
Each table represents a single subject for example course table will contain only data
that directly pertains to courses. Similarly a student table will contain only student data.
No data item will be unnecessarily stored in more than one table. The reason for this
requirement is to ensure that the data are updated in only one place.
All non prime attributes in a table are dependent on the primary key , the reason for this
requirement is to ensure that the data are uniquely identifiable by a primary key value.
Each table is void of insertion, update, deletion anomalies. This is to ensure the integrity
and consistency of the data.
Definition of Normalization:
Normalization is the process of organizing data in a database. This includes creating tables
and establishing relationships between those tables according to rules designed both to protect
the data and to make the database more flexible by eliminating redundancy and inconsistent
dependency.
Normalization works through a series of stages called normal forms. The first three stages are
described as first normal form second normal form ,and 3rd normal form. For Business purpose
databases it is enough to attain three normal forms.
First Normal Form:
If a relation contain composite or multi-valued attribute, it violates first normal form or a
relation is in first normal form if it does not contain any composite or multi-valued
attribute. A relation is in first normal form if every attribute in that relation is single
valued attribute.
Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued
attribute STUD_PHONE. Its decomposition into 1NF has been shown in table 2.
Second Normal Form
Before we learn about the second normal form, we need to understand the following −
Prime attribute − An attribute, which is a part of the candidate-key, is known as a
prime attribute.
Non-prime attribute − An attribute, which is not a part of the prime-key, is said to
be a non-prime attribute.
If we follow second normal form, then every non-prime attribute should be fully
functionally dependent on prime key attribute. That is, if X → A holds, then there should
not be any proper subset Y of X, for which Y → A also holds true.
We see here in Student_Project relation that the prime key attributes are Stu_ID and
Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must
be dependent upon both and not on any of the prime key attribute individually. But we
find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by
Proj_ID independently. This is called partial dependency, which is not allowed in
Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
NOTE: 2NF tries to reduce the redundant data getting stored in memory. Here Proj_Name
need not write for every student who is working on the same project.
Trivial Functional Dependency
Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it
is called a trivial FD. Trivial FDs always hold.
Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-
trivial FD.
Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be
a completely non-trivial FD.
Third Normal Form
For a relation to be in Third Normal Form, it must be in Second Normal form and the
following must satisfy −
No non-prime attribute is transitively dependent on prime key attribute. Transitive
dependence means a dependency of one nonprime attribute on another nonprime
attribute.
For any non-trivial functional dependency, X → A, then either −
o X is a superkey or,
A is prime attribute.
We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is
a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, i.e to know
the city we need not know stu-id, if we know Zip we can get city,this is known as
transitive dependency..
To bring this relation into third normal form, we break the relation into two relations as
follows −
Boyce-Codd Normal Form:
The table is in Boyce-Codd normal form when every determinant in the table is a candidate key.
When a table contains only one candidate key the 3NFand the BCNF are equivalent. Putting
the proposition another way BCNF can be violated only when the table contains more than one
candidate key. Most designers consider the BCNF to be a special case of the 3NF. A table can
be in 3NF when it is in 2NF and there are no transitive dependencies. But in a case a nonkey
attribute is the determinant of a key attribute, this condition does not violate 3NF, but it fails to
meet the BCNF requirements. BCNF requires that every determinant in the table be a candidate
key.
Example
Consider a relation R with attributes (student, subject, teacher).
Student Teacher Subject
Jhansi P.Naresh Database
jhansi K.Das C
subbu P.Naresh Database
subbu R.Prasad C
F: { (student, Teacher) -> subject
(student, subject) -> Teacher
Teacher -> subject}
Candidate keys are (student, teacher) and (student, subject).
The above relation is in 3NF [since there is no transitive dependency]. A relation R is in BCNF
if for every non-trivial FD X->Y, X must be a key.
The above relation is not in BCNF, because in the FD (teacher->subject), teacher is not a key.
This relation suffers with anomalies −
For example, if we try to delete the student Subbu, we will lose the information that R. Prasad
teaches C. These difficulties are caused by the fact the teacher is determinant but not a candidate
key.
Decomposition for BCNF
Teacher-> subject violates BCNF [since teacher is not a candidate key].
If X->Y violates BCNF then divide R into R1(X, Y) and R2(R-Y).
So R is divided into two relations R1(Teacher, subject) and R2(student, Teacher).
R1
Teacher Subject
P.Naresh database
K.DAS C
R.Prasad C
R2
Student Teacher
Jhansi P.Naresh
Jhansi K.Das
Subbu P.Naresh
Subbu R.Prasad
All the anomalies which were present in R, now removed in the above two relations.