Normalization in DBMS
Normalization is a systematic process of organizing data to minimize redundancy and dependency.
It breaks down a table into smaller tables and links them using relationships. The primary goal is
to eliminate data anomalies, such as insertion, update, and deletion anomalies, and to improve
database performance.
There are several types of normal forms (NF), with the most common being the 1st through 5th
Normal Forms. Detailed overview of each normal form with examples is given below:
1. First Normal Form (1NF)
Definition: A table is in 1NF if:
1. It contains only atomic (indivisible) values.
2. Each column contains values of a single type.
3. Each column has a unique name.
4. The order in which data is stored does not matter.
Example:
StudentID Name Phone Number Courses
1 Arbaz Shah 1234567 Math, Science
2 Beenish Khan 1122358 History
Problem: Courses contain multiple values, violating the atomicity rule.
Solution (1NF): Split the multiple values into separate rows.
StudentID Name Phone Number Course
1 Arbaz Shah 1234567 Math
1 Arbaz Shah 1234567 Science
2 Beenish Khan 1122358 History
2. Second Normal Form (2NF)
Definition: A table is in 2NF if:
1. It is in 1NF.
2. All non-key attributes are fully functionally dependent on the primary key (i.e., no partial
dependency).
Example: Consider a CourseEnrollments table with a composite primary key (StudentID,
CourseID):
StudentID CourseID StudentName CourseName Instructor
1 101 Arbaz Shah Math Dr. Babar
1 102 Arbaz Shah Science Dr. Gohar
2 101 Beenish Khan Math Dr. Babar
Problem: StudentName and CourseName depend only on StudentID and CourseID,
respectively, causing partial dependency.
Solution (2NF): Split the table into two to remove partial dependency:
StudentCourse Table:
StudentID CourseID Instructor
1 101 Dr. Babar
1 102 Dr. Gohar
2 101 Dr. Babar
Students Table:
StudentID StudentName
1 Arbaz Shah
2 Beenish Khan
Course Table:
CourseID CourseName
101 Math
102 Science
3. Third Normal Form (3NF)
Definition: A table is in 3NF if:
1. It is in 2NF.
2. There is no transitive dependency (non-key attributes should not depend on other non-key
attributes).
Example: Consider a StudentDetails table:
StudentID StudentName DeptID DepartmentName
1 Arbaz Shah D01 Science
2 Beenish Khan D02 Arts
Problem: DepartmentName depends on DeptID, which is not part of the primary key. This is a
transitive dependency.
Solution (3NF): Remove transitive dependencies by splitting the table.
StudentDetails Table:
StudentID StudentName DeptID
1 Arbaz Shah D01
2 Beenish Khan D02
Departments Table:
DeptID DepartmentName
D01 Science
D02 Arts
4. Boyce-Codd Normal Form (BCNF)
Definition: A table is in BCNF if:
1. It is in 3NF.
2. For every functional dependency X→Y, X must be a superkey (a key or combination of attributes
that uniquely identify rows).
Example: Consider a TeacherCourses table:
TeacherID CourseID DeptID
T01 C101 D01
T02 C102 D01
T03 C101 D02
Problem: Here, CourseID and DeptID together are a candidate key, but neither attribute alone
can uniquely identify a row. Since TeacherID depends on a part of the candidate key, it violates
BCNF.
Solution (BCNF): Decompose the table.
TeacherDept Table:
TeacherID DeptID
T01 D01
T02 D01
T03 D02
CourseDept Table:
CourseID DeptID
C101 D01
C102 D01
C101 D02
5. Fourth Normal Form (4NF)
Definition: A table is in 4NF if:
1. It is in BCNF.
2. It has no multi-valued dependencies, which means that if two attributes are independent of
each other but depend on a third attribute, they should be split.
Example: Consider a StudentLanguages table where students can have multiple skills and
languages:
StudentID Skill Language
1 Coding English
1 Coding Spanish
1 Math English
1 Math Spanish
2 Coding French
Problem: Skill and Language are independent attributes, creating a multi-valued dependency
on StudentID.
Solution (4NF): Split into two tables:
StudentSkills Table:
StudentID Skill
1 Coding
1 Math
2 Coding
StudentLanguages Table:
StudentID Language
1 English
1 Spanish
2 French
6. Fifth Normal Form (5NF)
Definition: A table is in 5NF if:
1. It is in 4NF.
2. It has no join dependency, which means that a table should not contain any spurious tuples
after being decomposed.
Example: Suppose we have a SupplierPartsProjects table with suppliers, parts, and projects.
SupplierID PartID ProjectID
S1 P1 PJ1
S1 P2 PJ1
S2 P1 PJ1
S2 P2 PJ2
Problem: Here, SupplierID, PartID, and ProjectID create a join dependency because each
supplier can supply any part for any project, causing redundancy.
Solution (5NF): Split the table into smaller tables to remove join dependency.
SupplierParts Table:
SupplierID PartID
S1 P1
S1 P2
S2 P1
S2 P2
SupplierProjects Table:
SupplierID ProjectID
S1 PJ1
S2 PJ1
S2 PJ2
PartsProjects Table:
PartID ProjectID
P1 PJ1
P2 PJ1
P2 PJ2
Summary of Normal Forms
1. 1NF: Eliminate repeating groups; make sure each column contains atomic values.
2. 2NF: Eliminate partial dependencies on the primary key.
3. 3NF: Eliminate transitive dependencies.
4. BCNF: Every determinant should be a candidate key.
5. 4NF: Eliminate multi-valued dependencies.
6. 5NF: Eliminate join dependencies to avoid redundancy.
Normalization reduces redundancy and dependency, enhancing database performance and
making it easier to maintain consistent and accurate data.