DATA NORMALIZATION
Content Standard The learners demonstrate an understanding of key concepts,
underlying principles and core competencies in database.
Performance Standard The learners shall be able to independently create/provide quality
product database using Microsoft access.
21st Century Learning Critical Thinking-and-Doing, ICT Literacy, Career and Self-
Skills reliance.
Most Essential Learning Familiarize themselves with the database data normalization in
Competencies order to avoid data anomalies and create efficient database
systems.
Core Values Orderliness, Decision, Accuracy, Enthusiasm
REFERENCES: (Please be guided with the given references to help you perform the given
activities. Click the given links and hyperlinks to access the suggested learning resources.)
A. Printed:
Bombase, Lilibeth S., et. al. My Computer Microsoft Access XP, Quezon City,
Philippines: ABIVA Publishing House Incorporated
B. Online:
Peterson, R. (2021, August 27). What is Normalization in DBMS (SQL)? 1NF, 2NF,
3NF, BCNF Database with Example. Guru99. https://www.guru99.com/database-
normalization.html
LESSON
1 TOPIC: DATA NORMALIZATION
Introduction
Database organization and stability are achieved through normalization. Normalization is a
process wherein rules and conditions are established. When a requirement or condition is not
met, the table violating the requirement must be separated into other tables to meet the process of
normalization. The purpose of this is to develop a database that is less susceptible or vulnerable
to update anomalies.
Below are the Learning Targets/ Specific Objectives:
Differentiate dependent from independent fields
Distinguish and discuss the Data Normalization
Relate the lesson to real-life situation
Exploration of Prior Knowledge
Directions: Fill in the K-W-H-L Chart below to assess your prior knowledge and
understanding of the topic, Normalization.
What I Want to How I Can Learn
What I Know What I Have Learned
Find Out More
Skills I expect to use:
PROCESS QUESTIONS/ FOCUS QUESTIONS:
Below are the key guide questions that you should remember as you perform all the activities in
this lesson. You should be able to answer them at the end of the week.
1. What is normalization?
2. How does data normalization help us to create an efficient and effective
database system?
SHORT EXERCISES/DRILLS
Direction: Using the Radial Diagram below, write your ideas once you hear the word
Normalization.
Normaliz
ation
CONTENT DISCUSSION:
Normalization is a progressive process that goes through several phases and usually starts with
abnormal relation. There are actually several forms of normalization, but the first three forms
are sufficient to reach speak efficiency in data organization. Thus, for the purpose of this
courseware, only these three will be discussed.
Normalization is a database design technique that reduces data redundancy and eliminates
undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization rules
divides larger tables into smaller tables and links them using relationships. The purpose of
Normalization in SQL is to eliminate redundant (repetitive) data and ensure data is stored
logically.
The inventor of the relational model Edgar Codd proposed the theory of normalization of data
with the introduction of the First Normal Form, and he continued to extend theory with Second
and Third Normal Form. Later he joined Raymond F. Boyce to develop the theory of Boyce-
Codd Normal Form.
Consider the following form:
T h i s i s a n e x a m
enrollment. We will now create a normalized database based on the raw data that will be
generated from the form. Let’s start by creating a flat file first. A flat file contains all the
information worth keeping in a database.
This is how a flat file would look like based on the previous form:
Student Student Contact College Course Subject Subject Units
ID Name Number Code Description
0145-56 Leo 09775182334 Science Botany B-058 Ecology 2 3
Bernas 09179660002
0145-56 Leo 09997782334 Science Botany B-091 Chemistry 3
Bernas 09179660002 1
0143-59 Ariadne 09996558929 Science Physics L-023 Physics 2 2
Bajar 09189177003 lab
0143-59 Ariadne 09186558290 Science Physics B-024 Physics 2 3
Bajar 09189177002 lec
0143-67 Maricar 09174332209 Arts & Economics E-014 English 1 3
Co Letters
You will notice that some information in the form was not included, like the Registrar’s
Signature and Total Units. This is because the registrar’s signature is not a significant or relevant
attribute in a database about student information. Total units, on the other hand, is computed by
getting the sum of all the units and this could be queried using SQL statements which you will
learn later on.
The previous table is an example of an unnormalized table, which does not satisfy any
conditions related to the three forms of normalization. It also suffers from several data
anomalies that defeat the intentions of the table’s design. Data anomalies are
inconsistencies in the data stored in a database as a result of operation such as update, insertion,
and deletion. It is happes when there is too much redundancy in the database’s information.
Examples of the three different data anomalies are:
1. Insertion Anomalies – happen when inserting vital data into the database is not possible
due to the absence of other data. For example, a new student, Jose, enrolls for the first
time and his information is added in the student information table. The encoder misspells
the course of Jose, entering “Botanee” instead of “Botany.” Because there are other
students taking up Botany in the school, the values for the course title of Jose would be
inconsistent with the other rows of students who are taking up Botany. The same thing
goes if the encoder entered 1 instead of 3 in the Units column. During the end of the
semester, the total units of Jose would be two units less than that of his classmates who
enrolled in the same subjects. This scenario will be very complicated if not corrected
before graduation, since Jose, according to the school database, would then be two units
short of completing his requirements for graduation.
2. Deletion Anomalies – happen when the deletion of unwanted information causes desired
information to be deleted as well. For example, newly introduced subject, Holistic
Medicine 101, has only two enrollees for the semester. One student dropped the subject
and the other student dropped out of school. This means that the information about the
subject is lost in the database. During the next semester, if a student inquires about the
number of credits he/she will get for enrolling in Holistic Medicine 101, there will be no
information in the database.
3. Update Anomalies – are data inconsistency that results from data redundancy and a
partial update. For example, the school board decided to change the subject code of
Ecology 2 from B-058 to B-099. This would prove to be a time-consuming job for the
encoder, who must then manually change all rows containing the subject code B-058. If
this is not carried out on all rows in the table, a case of one subject having two different
subject codes would materialize in the database.
It is apparent that the given table is not a good example of a student information database.
Let’s start the first process of normalization and check if the anomalies we encountered would be
resolved. We are going to begin by converting the table to the firs normal form.
FIRST NORMAL FORM
There are two rules for a table to be in the First Normal Form or 1NF:
The intersection of each row and column must contain one and only one value (atomic
values).
Repeating fields and non-repeating fields are separated into different tables.
Note the difference in highlighting between the field names of the repeating fields and the
non-repeating fields and their subsequent separation into different tables: Student table and
Subject Table.
Recall that, in the original form, the repeating fields (e.g., subject code, subject
description, unit) appear in a table indicating several entries in a form while the non-repeating
fields (e.g., student name, course, etc.) appear only once in the form. This means that each
student may have several subjects, indicating a one-to-many side are repeating fields.
Also, the Contact Table is created for the multivalued field contact number because each
cell must contain only atomic values. Take note that the key Student ID is copied in each table to
establish the relationship among the tables. Given a student ID, you can access the corresponding
data in the three tables.
Although the flat file has been normalized to 1NF, there are still update anomalies. These
are:
1. Insertion Anomaly – a new subject cannot be entered in the Subject Table if there are not
students (Student ID) enrolled for the particular subject.
2. Deletion anomaly – deleting a student’s information may remove all values for course in
the database. This would be the case for a newly introduced course with only one student
enrolled in it.
Update– if the subject code changes for a particular subject, then all values that match the old
subject code must be changed as well.