NORMALIZATION
SQL DATABASES
CONTENTS:
1. What is Normalization ↗
2. Anomaly and their types ↗
3. Types of Normalization ↗
4. 1NF ↗
5. 2NF ↗
6. 3NF ↗
7. BCNF ↗
8. 4NF
9. 5NF
Normalization
• Database normalization is a database schema design technique, by
which an existing Schema is modified to minimize redundancy and
dependency of data.
• Normalization split a large table into smaller tables and define
relationships between them to increase the clarity in organizing data.
Why is normalization needed?
• To avoid redundancy.
• To avoid/minimize anomalies and other issues.
Anomaly
• Problems that occur in poorly planned, unnormalized databases where all
the unrelated data is stored in one table.
Types of Anomalies
• insert
• delete
• update
Insert anomaly
• Insert anomaly occurs when data of certain attributes cannot be inserted into the
database without the presence of other attributes.
Delete anomaly
• Delete anomaly exists when data of certain attributes are lost because of the
deletion of other attributes.
Update anomaly
• An update anomaly exists when one or more instances of duplicated data is
updated, but not all.
The following table is used as an example.
Student Table:
stu_id stu_name location course_name course_duration
s01 Anitha Madurai C++ 6 months
s02 Soundharya Madurai C++ 6 months
s03 Maheswari Chennai JAVA 5 months
s04 Sahana Dindigul JAVA 5 months
s05 Malaviga Coimbatore JAVA 5 month
s06 Keerthana Dindigul Python 3 months
s07 Dharmendhini Chennai Python 3 months
s08 Shibana Salem Web designing 4 months
Redundancy:
• In the table, the attributes location, course_name and
course_duration contain repeated values. This leads to storage space wastage
and also creates anomalies.
Insert anomaly:
• If a new course is introduced in the institution, unless a student is admitted to it, it can’t
be included. The same problem exists for the inclusion of new student too. A newly
admitted student details can’t be recorded until he/she joins a course.
Delete anomaly:
• If a course is to be dropped by the institution, it cannot be removed from the table. If we
do so, the details of the students studied that course will also be deleted.
Update anomaly
• If there is a need to change the course_duration of course JAVA from 5 months to 6
months, this will not be updated in all the locations. If you observe the table keenly, you
can find in the 5th record contains month whereas 3rd and 4th record contains months.
The above-discussed redundancies and anomalies can be removed by splitting
the table into two tables which follow.
Course Table:
Course Name Course Duration
C++ 6 months
Java 5 months
Python 3 months
Web Designing 4 months
Types of Normalization
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce Codd Normal Form (BCNF)
• Fourth Normal Form (4NF)
• Fifth Normal Form (5NF)
FIRST NORMAL FORM
[ 1NF ]
A table is said to be in first normal form, if the following conditions exist:
• The table should contain only atomic-valued attributes.
• Values stored in the column should be of the same domain.
• Every column name should be unique.
• No duplicate rows should exist.
• Primary/Composite key should be defined.
• Ordering of rows and columns doesn't matter.
The following table is taken for example discussion:
Consider p1 and p2 as valid contact numbers.
• In the attributes address and contact_no, multivalued attributes are present. It
should be split into two attributes to convert the table into 1NF.
• The values in the age attribute are not in the same domain. It should be either
numeric or character.
After doing the changes, the table becomes,
stud_id stud_name age City State contact_no
s001 Geetha 18 Chennai Tamilnadu P1
s001 Geetha 18 Chennai Tamilnadu P2
s002 Keerthi 18 Madurai Tamilnadu P1
s003 Sharmina 17 Coimbatore Tamilnadu P1
s003 Sharmina 17 Coimbatore Tamilnadu P2
s004 Gowri 17 Tirunelveli Tamilnadu P1
s005 Sangeetha 18 Theni Tamilnadu P1
If most of the students have more than one contact no, then the table consumes
more storage space. So to minimize the space as well as for better performance,
we split the table into 2 tables having stud_id as the primary key in both the tables
which follows.
CONTACT
stud_id contact_no
s001 P1
s001 P2
s002 P1
s003 P1
s003 P2
s004 P1
s005 P1
SECOND NORMAL FORM
[ 2NF ]
A table is said to be in 2NF, if it satisfies the following two conditions.
1. The table should be in 1NF.
2. The table should not contain any partial dependencies.
(or)
All non-prime attributes should depend on the entire primary key, not on the
part of any primary key. (i.e. candidate key)
(or)
No non-prime attribute is dependent on any proper subset of any candidate
key of the table.
Partial Dependency
If the values of the non-prime attributes of a table is uniquely identifiable by
using the subset of candidate keys, then it is called as Partial Dependency.
Conversion of 1NF to 2NF
If a partial dependency exists, remove the partially dependent attribute(s)
from the relation by placing them in a new relation along with a copy of their
determinant.
Example 1:
EMPLOYEE
eid ename dob salary
• The attribute eid (employee id) is the primary key for the EMPLOYEE table.
• All other non-prime attributes ename (employee name), dob (date of birth) and salary
are completely dependent on the primary key attribute eid which means only by
using eid value, values of all other non-prime attributes in the table are uniquely
identifiable.
Example 2:
stud_no course_no course_fee · In the STUDENT table, the attributes stud_no and course_no
are the primary key attributes because any student can
101 C1 1500
learn any number of courses.
101 C3 1500
· Here the non-prime attribute course_fee is partially dependent
because its value can be identified by using course_fee alone.
102 C1 1500
· Hence the table is not in 2NF.
· To convert this table into 2NF, we split it into two tables which
103 C1 1500 are in 2NF.
· stud_no and course_no are the primary key for the tables
104 C2 2000 STUDENT and COURSE respectively.
104 C3 1500
105 C4 3000
STUDENT
stud_no course_no COURSE
101 C1 course_no course_fee
101 C3
C1 1500
102 C1
C2 2000
103 C1
C3 1500
104 C2
104 C3 C4 3000
105 C4
THIRD NORMAL FORM
[ 3NF ]
A table is said to be in 3NF or third normal form, if the following requirements are
satisfied.
• All 2NF requirements are fulfilled.
• There is no transitive dependency. (i.e.) A non-key column should not depend on
another non-key column.
(OR)
A relation is in 3NF if at least one of the following conditions holds in every non-trivial
functional dependency X -> Y
• X is a super key.
• Y is a Prime attribute.
Conversion to 3NF
Remove the transitively dependent attribute from the relation and place it in a
new relation along with the determinant.
Example: BOOK_DETAILS
Book_ID Category_ID Category_type Price
101 1 Competitive 600
102 2 Sports 350
103 2 Sports 275
104 1 Competitive 750
105 3 Novel 500
106 4 Poetry 350
• In the relation BOOK_DETAILS, the Book_ID attribute is the Primary Key attribute.
Because using BOOK_ID the values of all attributes such as Category_ID,
Category_type and Price of every record can be uniquely identified.
• The Non-prime attributes Price, Category_ID and Category_type are dependent on the
Primary Key.
• But the Category_type is also dependent on the non-prime attribute Category_ID.
• So it is transitively dependent on the Primary Key. i.e. The Primary Key attribute
BOOK_ID can find the Category_type through Category_ID too.
So to convert the relation to 3NF it is decomposed into 2 relations such as given below
BOOK_DETAILS CATEGORY_DETAILS
Book_ID Category_ID Price Category_ID Category_type
101 1 600
1 Competitive
102 2 350
2 Sports
103 2 275
3 Novel
104 1 750
4 Poetry
105 3 500
5 Puzzle
106 4 350
BOYCE CODD NORMAL
FORM
[ BCNF ]
• It is also called as 3.5NF
A table is said to be BCNF, if it satisfies the following conditions.
1. It should satisfy 3nf
2. All determinants must be candidate key.