ICT200
INTRODUCTION TO DATABASE
MANAGEMENT SYSTEM
By :
Miss Noorfadzilah Arifin
Chapter 5 Normalization of Database Tables
Database Tables and Normalization
Normalization
Process for evaluating and correcting table structures to
minimize data redundancies
◼ Reduces data anomalies
Works through a series of stages called normal forms:
◼ First normal form (1NF)
◼ Second normal form (2NF)
◼ Third normal form (3NF)
Database Tables and Normalization
(continued)
2NF is better than 1NF; 3NF is better than 2NF
For most business database design purposes, 3NF is
as high as we need to go in normalization process
Highest level of normalization is not always most
desirable
The Need for Normalization
Example: Construction company that manages
several building projects
Each employee has an employee number, name and job
classification (ex: engineer, computer technician)
The company charges its clients by billing hours spent on
each contract
Hourly billing rate is dependent on employee’s position
Periodically, report is generated that contains information
displayed in Table 5.1
The Need for Normalization
(continued)
The Need for Normalization
(continued)
The Need for Normalization
(continued)
The structure of data in figure 5.1 does not
conform to the requirement.
The PROJ_NUM is apparently intended to be a
primary key or least a part of FK but it contains null
Each table entries invite data inconsistencies. Ex:
JOB_CLASS value “Elec. Engineer” might be entered
as “Elec. Eng”, “EL. Eng.” or “EE”.
The Need for Normalization
(continued)
The table displays data redundancies which yield the
following anomalies:
◼ Update anomalies
◼ Insert anomalies
◼ Deletion anomalies
The Need for Normalization
(continued)
Structure of data set in Figure 5.1 does not handle
data very well
The table structure appears to work; report
generated with ease
Unfortunately, report may yield different results
depending on what data anomaly has occurred
The Normalization Process
Each table represents a single subject (Ex: a course table
will contain only data that directly pertains to courses)
No data item will be unnecessarily stored in more than
one table-to ensure that the data are updated in only 1
place.
All attributes in a table are dependent on the PK-to
ensure that the data are uniquely identifiable by a PK
value
The Normalization Process
(continued)
▪ To accomplish the normalization process, the steps need go
through that lead to successively higher normal forms
▪ The most common normal forms and their basic characteristic
are listed below:
Conversion to First Normal Form
Figure 5.1 contains what is known as Repeating group
Derives its name from the fact that a group of multiple
entries of same type can exist for any single key attribute
occurrence
Relational table must not contain repeating groups
Normalizing table structure will reduce data
redundancies
If repeating groups do exist, they must be eliminated
by making sure that each row defines a single entity
Normalization is three-step procedure
Conversion to First Normal Form
(continued)
Step 1: Eliminate the Repeating Groups
Present data in tabular format, where each cell has
single value and there are no repeating groups.
Eliminate repeating groups, eliminate nulls by making
sure that each repeating group attribute contains an
appropriate data value
Conversion to First Normal Form
(continued)
Conversion to First Normal Form
(continued)
Step 2: Identify the Primary Key
Primary key must uniquely identify attribute value
New key must be composed
Conversion to First Normal Form
(continued)
Step 3: Identify All Dependencies
Dependencies can be depicted with help of a
diagram
Dependency diagram:
◼ Depicts all dependencies found within given table
structure
◼ Helpful in getting bird’s-eye view of all relationships
among table’s attributes
◼ Makes it less likely that will overlook an important
dependency
Conversion to First Normal Form
(continued)
Conversion to First Normal Form
(continued)
First normal form describes tabular format in
which:
All key attributes are defined
There are no repeating groups in the table
All attributes are dependent on primary key
All relational tables satisfy 1NF requirements
Conversion to First Normal Form
(continued)
Some tables contain partial dependencies
Dependencies based on only part of the primary key
Sometimes used for performance reasons, but should
be used with caution
Still subject to data redundancies
Conversion to Second Normal Form
Relational database design can be improved by
converting the database into second normal form
(2NF)
Two steps
Converting to 2NF is done only when the 1NF has
a composite PK
If 1NF has a single attribute PK, the table
automatically in 2NF
Conversion to Second Normal Form
(continued)
Step 1: Write Each Key Component
on a Separate Line
Write each key component on separate line, then
write original (composite) key on last line
Each component will become key in new table
Ex: PROJ_NUM
EMP_NUM
PROJ_NUM EMP_NUM
Conversion to Second Normal Form
(continued)
Step 2: Assign Corresponding Dependent
Attributes
Determine those attributes that are dependent on
other attributes
At this point, most anomalies have been eliminated
The dependencies for the original key components
are found by examining the arrows below the
dependency diagram in Figure 5.3
The 3 new tables (PROJECT, EMPLOYEE and
ASSIGNMENT)
Conversion to Second Normal Form
(continued)
PROJECT (PROJ_NUM , PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_NAME,
JOB_CLASS,CHG_HOUR)
ASSIGNMENT(PROJ_NUM, EMP_NUM, HOURS)
Conversion to Second Normal Form
(continued)
At this point, most anomalies discussed before have been eliminated.
Ex: If u want to add, change or delete a PROJECT record, only need
go to the PROJECT table and make the change to only one row
Conversion to Second Normal Form
(continued)
Figure 5.4 still shows a transitive dependency,
which can generate anomalies
Ex: if the charge per hour changes for a job
classification held by many employees, that change
must be made for each of those employees
Conversion to Second Normal Form
(continued)
Table is in second normal form (2NF) when:
It is in 1NF and
It includes no partial dependencies:
◼ No attribute is dependent on only portion of primary
key
Conversion to Third Normal Form
Data anomalies created are easily eliminated by
completing three steps
Step 1: Identify Each New Determinant
For every transitive dependency, write its determinant
as PK for new table
◼ Determinant
◼ Any attribute whose value determines other values within a
row
◼ EX: JOB_CLASS
Conversion to Third Normal Form
(continued)
Step 2: Identify the Dependent Attributes
Identify attributes dependent on each determinant
identified in Step 1 and identify dependency
Name table to reflect its contents and function
EX: JOB_CLASS CHG_HOUR
Conversion to Third Normal Form
(continued)
Step 3: Remove the Dependent Attributes from
Transitive Dependencies
Eliminate all dependent attributes in transitive
relationship(s) from each of the tables that have such
a transitive relationship
Draw new dependency diagram to show all tables
defined in Steps 1–3
Check new tables as well as tables modified in Step
3 to make sure that each table has determinant and
that no table contains inappropriate dependencies
Conversion to Third Normal Form
(continued)
Conversion to Third Normal Form
(continued)
After the 3NF conversion has been completed,
database contains 4 tables:
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)
ASSIGNMENT ( PROJ_NUM, EMP_NUM, HOURS)
Conversion to Third Normal Form
(continued)
A table is in third normal form (3NF) when both of
the following are true:
It is in 2NF
It contains no transitive dependencies
Improving the Design
Table structures are cleaned up to eliminate
troublesome initial partial and transitive
dependencies
Normalization cannot, by itself, be relied on to
make good designs
It is valuable because its use helps eliminate data
redundancies
Improving the Design (continued)
Issues to address in order to produce a good
normalized set of tables:
Evaluate PK Assignments
Evaluate Naming Conventions
Refine Attribute Atomicity
Identify New Attributes
Identify New Relationships
Refine Primary Keys as Required for Data Granularity
Maintain Historical Accuracy
Evaluate Using Derived Attributes
Improving the Design (continued)
Improving the Design (continued)
Surrogate Key Considerations
When primary key is considered to be unsuitable,
designers use surrogate keys
Data entries in Table 5.3 are inappropriate
because they duplicate existing records
Yet there has been no violation of either entity
integrity or referential integrity
Surrogate Key Considerations
(continued)
Normalization and Database Design
Normalization should be part of design process
Make sure that proposed entities meet required normal
form before table structures are created
Many real-world databases have been improperly
designed or burdened with anomalies if improperly
modified during course of time
You may be asked to redesign and modify existing
databases
Normalization and Database Design
(continued)
ER diagram
Provides big picture, or macro view, of an
organization’s data requirements and operations
Created through an iterative process
◼ Identifying relevant entities, their attributes and their
relationship
◼ Use results to identify additional entities and attributes
Normalization and Database Design
(continued)
Normalization procedures
Focus on characteristics of specific entities
Represents micro view of entities within ER diagram
Difficult to separate normalization process from ER
modeling process
Two techniques should be used concurrently
Normalization and Database Design
(continued)
Those operations can be summarized by using the
following rules:
The company manages many projects
Each project requires the services of many employees
An employee may be assigned to several different projects
Some employee are not assigned to a project and perform
duties not specifically related to a project. Some of the
employees are part of a labor pool, to be shared by all project
teams. For example, the company’s executive secretary would
not be assigned to any one particular project
Normalization and Database Design
(continued)
Each employee has a single primary job classification. That job
classification determines the hourly billing rate
Many employees can have the same job classification. For
example, the company employs more than one electrical
engineer
Normalization and Database
Design (continued)
PROJECT is in 3NF and needs no
modification
EMPLOYEE requires additional
scrutiny. JOB_DESCRIPTION
attribute defines job
classifications such as System
Analyst, Database Designer and
Programmer. In turn, those
classifications determine the
billing rate, JOB_CHG_HOUR.
Employee contains transitive
dependency
Normalization and Database
Design (continued)
Normalization and Database
Design (continued)
Normalization and Database
Design (continued)
Normalization and Database
Design (continued)
Denormalization
Creation of normalized relations is important
database design goal
Processing requirements should also be a goal
If tables decomposed to conform to normalization
requirements:
Number of database tables expands
Denormalization (continued)
Joining the larger number of tables takes
additional input/output (I/O) operations and
processing logic, thereby reducing system speed
Conflicts between design efficiency, information
requirements, and processing speed are often
resolved through compromises that may include
denormalization
Denormalization (continued)
Unnormalized tables in production database tend
to suffer from these defects:
Data updates are less efficient because programs
that read and update tables must deal with larger
tables
Indexing is more cumbersome
Unnormalized tables yield no simple strategies for
creating virtual tables known as views
Denormalization (continued)
Use denormalization cautiously
Understand why—under some circumstances—
unnormalized tables are better choice
Summary
Normalization is technique used to design tables in
which data redundancies are minimized
First three normal forms (1NF, 2NF, and 3NF) are
most commonly encountered
Table is in 1NF when all key attributes are defined
and when all remaining attributes are dependent
on primary key
Summary (continued)
Table is in 2NF when it is in 1NF and contains no partial
dependencies
Table is in 3NF when it is in 2NF and contains no
transitive dependencies
Table that is not in 3NF may be split into new tables until
all of the tables meet 3NF requirements
Normalization is important part—but only part—of
design process
Summary (continued)
Summary (continued)
Summary (continued)
Summary (continued)
Table in 3NF may contain multivalued
dependencies that produce either numerous null
values or redundant data
It may be necessary to convert 3NF table to fourth
normal form (4NF) by
Splitting table to remove multivalued dependencies
Tables are sometimes denormalized to yield less
I/O which increases processing speed