NORMALIZATION MUHAMMAD HAMIZ MOHD RADZI
CONTENT
Database Tables and Normalization
The Need for Normalization
Normalization Process
Higher Level Normal Form (BCNF)
Normalization and Database Design
Denormalization
MUHAMMAD HAMIZ MOHD RADZI
OBJECTIVE
At the end of this lesson, you should be able to:
Describe database design and normalization
Explain the need for normalization
Do step by step normalization process
Do the Boyce-Codd Normal Form
Explain the denormalization process
MUHAMMAD HAMIZ MOHD RADZI
NORMALIZATION
When we design a database for an enterprise, the main objective is to
create an accurate representation of the data, relationships between the
data, and constraints on the data that is pertinent to the enterprise.
To help achieve this objective, we can use one or more database design
techniques and one of it is the normalization.
Normalization is the technique for producing a set of relations with
desirable properties, given the data requirements of an enterprise.
The purpose of normalization is to identify a suitable set of relations that
support the data requirements of an enterprise.
MUHAMMAD HAMIZ MOHD RADZI
The characteristics of a suitable set of relations include the following:
the minimal number of attributes necessary to support the data requirements of
the enterprise
attributes with a close logical relationship (described as functional dependency)
are found in the same relation;
minimal redundancy with each attribute represented only once with the important
exception of attributes that form all or part of foreign keys, which are essential for
the joining of related relations.
The benefits of using a database that has a suitable set of relations is that
the database will be easier for the user to access and maintain the data,
and take up minimal storage
MUHAMMAD HAMIZ MOHD RADZI
ANOMALIES
Insert anomaly: occurs when extra data beyond the desired data must be added to
the database. Eg: Salesperson cannot be inserted unless he/she is assigned to a
customer.
Update anomaly: exists when one or more instances of duplicated data is updated,
but not all. Eg: there are 3 duplicated records in database, but only one is updated
with the current data.
Delete anomaly: exists when certain attributes are lost because of the deletion of
other attributes. Eg: if the data is the only information left in the database, once it is
deleted, the information will be forever gone.
MUHAMMAD HAMIZ MOHD RADZI
FUNCTIONAL DEPENDENCIES
An important concept associated with normalization is functional
dependency, which describes the relationship between attributes.
Functional dependency is a property of the meaning or semantics of the
attributes in a relation.
The semantics indicate how attributes relate to one another, and specify the
functional dependencies between attributes.
Determinant refers to the attribute, or group of attributes, on the left-hand
side of the arrow of a functional dependency.
MUHAMMAD HAMIZ MOHD RADZI
MUHAMMAD HAMIZ MOHD RADZI
Tbl_Staff
Tbl_Branch
Full functional dependency - Indicates that if A and B are attributes of a
relation, B is fully functionally dependent on A if B is functionally dependent
on A, but not on any proper subset of A.
staffID →branchID
Here means, branchID is full functional dependent on staffID.
branchID is functionally dependent on staffID but it is not a subset of
STAFF as it have its own BRANCH table.
MUHAMMAD HAMIZ MOHD RADZI
A functional dependency A→B is a partially dependency if there is some
attribute that can be removed from A and yet the dependency still holds.
staffID, sName →branchID
Previously we have:
staffID →branchID
Hence, we know that:
branchID is functionally dependent on a subset of A (StaffID, Name),
namely StaffID.
MUHAMMAD HAMIZ MOHD RADZI
Transitive dependency - A condition where A, B, and C are attributes of a
relation such that if A → B and B → C, then C is transitively dependent on
A via B (provided that A is not functionally dependent on B or C).
staffID →sName, position, salary, branchID, bAddress
branchID →bAddress
So, staffID attribute functionally determines bAddress via branchID.
MUHAMMAD HAMIZ MOHD RADZI
• Table format, • 1 NF and no
no repeating partial
group and PK dependencies
identified
1NF 2 NF
BCNF 3 NF
• Every • 2 NF and no
determinant is a transitive
candidate key dependencies
(special case of
3 NF)
THE NEED FOR NORMALIZATION
Example: company that manages building projects
Charges its clients by billing hours spent on each contract
Hourly billing rate is dependent on employee’s position
Periodically, a report is generated that contains information displayed in Table
5.1
MUHAMMAD HAMIZ MOHD RADZI
A SAMPLE REPORT LAYOUT
MUHAMMAD HAMIZ MOHD RADZI
A TABLE IN THE REPORT FORMAT
MUHAMMAD HAMIZ MOHD RADZI
THE NEED FOR NORMALIZATION ( CONTINUED)
Structure of data set in Figure 5.1 does not handle data very well
The table structure appears to work; report is generated with ease
Unfortunately, the report may yield different results, depending on what data
anomaly has occurred
MUHAMMAD HAMIZ MOHD RADZI
CONVERSION TO FIRST NORMAL FORM
Repeating group
Derives its name from the fact that a group of multiple (related)
entries can exist for any single key attribute occurrence
Relational table must not contain repeating groups
Normalizing the table structure will reduce these data redundancies
Normalization is three-step procedure
MUHAMMAD HAMIZ MOHD RADZI
Step 1:
Eliminate the
Repeating
Groups
1 NF
Step 2:
Step 3:
Identify the
Primary Key Identify All
Dependencies
STEP 1: ELIMINATE THE REPEATING GROUPS
Present data in a tabular format, where each cell has a single value and there
are no repeating groups
Eliminate repeating groups by eliminating nulls, making sure that each
repeating group attribute contains an appropriate data value
MUHAMMAD HAMIZ MOHD RADZI
DATA ORGANIZATION: FIRST NORMAL FORM
MUHAMMAD HAMIZ MOHD RADZI
STEP 2: IDENTIFY THE PRIMARY KEY
Primary key must uniquely identify attribute value
New key must be composed
MUHAMMAD HAMIZ MOHD RADZI
STEP 3: IDENTIFY ALL DEPENDENCIES
Dependencies can be depicted with the help of a diagram
Dependency diagram:
Depicts all dependencies found within a given table structure
Helpful in getting bird’s-eye view of all relationships among a table’s
attributes
Use makes it much less likely that an important dependency will be
overlooked
MUHAMMAD HAMIZ MOHD RADZI
A DEPENDENCY DIAGRAM: FIRST NORMAL FORM (1NF)
MUHAMMAD HAMIZ MOHD RADZI
FIRST NORMAL FORM
Tabular format in which:
All key attributes are defined
There are no repeating groups in the table
All attributes are dependent on primary key
All relational tables satisfy 1NF requirements
Some tables contain partial dependencies
Dependencies based on only part of the primary key
Sometimes used for performance reasons, but should be used with caution
Still subject to data redundancies
MUHAMMAD HAMIZ MOHD RADZI
CONVERSION TO SECOND NORMAL FORM
Relational database design can be improved by converting the database into
second normal form (2NF)
Two steps
Step 2:
Step 1: Write Each
Assign
Corresponding
2 NF Key Component
on a Separate Line
Dependent
Attributes
MUHAMMAD HAMIZ MOHD RADZI
STEP 1: IDENTIFY ALL KEY COMPONENTS
Write each key component on separate line, and then write the original
(composite) key on the last line
Each component will become the key in a new table
MUHAMMAD HAMIZ MOHD RADZI
STEP 2: IDENTIFY THE DEPENDENT ATTRIBUTES
Determine which attributes are dependent on which other attributes
At this point, most anomalies have been eliminated
MUHAMMAD HAMIZ MOHD RADZI
SECOND NORMAL FORM (2NF) CONVERSION RESULTS
MUHAMMAD HAMIZ MOHD RADZI
SECOND NORMAL FORM
Table is in second normal form (2NF) if:
It is in 1NF and
It includes no partial dependencies:
No attribute is dependent on only a portion of the primary key
MUHAMMAD HAMIZ MOHD RADZI
CONVERSION TO THIRD NORMAL FORM
Step 1: Identify Each
New Determinant
3 NF
Step 2: Step 3:
Identify the Dependent Remove the Dependent
Attributes Attributes from
Transitive Dependencies
MUHAMMAD HAMIZ MOHD RADZI
STEP 1: IDENTIFY EACH NEW DETERMINANT
For every transitive dependency, write its determinant as a PK for a new table
Determinant
Any attribute whose value determines other values within a row
MUHAMMAD HAMIZ MOHD RADZI
STEP 2: IDENTIFY THE DEPENDENT ATTRIBUTES
Identify the attributes dependent on each determinant identified in Step 1
and identify the dependency
Name the table to reflect its contents and function
MUHAMMAD HAMIZ MOHD RADZI
STEP 3: REMOVE THE DEPENDENT ATTRIBUTES
FROM TRANSITIVE DEPENDENCIES
Eliminate all dependent attributes in transitive relationship(s) from each table
that has such a transitive relationship
Draw a new dependency diagram to show all tables defined in Steps 1–3
Check new tables and modified tables from Step 3 to make sure that each has
a determinant and does not contain inappropriate dependencies
MUHAMMAD HAMIZ MOHD RADZI
THIRD NORMAL FORM (3NF) CONVERSION RESULTS
MUHAMMAD HAMIZ MOHD RADZI
THIRD NORMAL FORM
A table is in third normal form (3NF) if:
It is in 2NF and
It contains no transitive dependencies
MUHAMMAD HAMIZ MOHD RADZI
IMPROVING THE DESIGN
Table structures are cleaned up to eliminate the troublesome initial partial and
transitive dependencies
Normalization cannot, by itself, be relied on to make good designs
It is valuable because its use helps eliminate data redundancies
MUHAMMAD HAMIZ MOHD RADZI
IMPROVING THE DESIGN ( CONTINUED)
The following changes were made:
PK assignment
Naming conventions
Attribute atomicity
Adding attributes
Adding relationships
Refining PKs
Maintaining historical accuracy
Using derived attributes
MUHAMMAD HAMIZ MOHD RADZI
THE COMPLETED DATABASE
MUHAMMAD HAMIZ MOHD RADZI
THE COMPLETED DATABASE ( CONTINUED)
MUHAMMAD HAMIZ MOHD RADZI
LIMITATIONS ON SYSTEM-ASSIGNED KEYS
System-assigned primary key may not prevent confusing entries
Data entries in Table 5.2 are inappropriate because they duplicate existing
records
Yet there has been no violation of either entity integrity or referential
integrity
MUHAMMAD HAMIZ MOHD RADZI
DUPLICATE ENTRIES IN THE JOB TABLE
MUHAMMAD HAMIZ MOHD RADZI
NORMALIZATION AND DATABASE DESIGN
Normalization should be part of design process
Make sure that proposed entities meet required normal form before table
structures are created
Many real-world databases have been improperly designed or burdened
with anomalies if improperly modified during course of time
You may be asked to redesign and modify existing databases
MUHAMMAD HAMIZ MOHD RADZI
NORMALIZATION AND DATABASE DESIGN ( CONTINUED)
ER diagram
Provides the big picture, or macro view, of an organization’s data
requirements and operations
Created through an iterative process
Identifying relevant entities, their attributes and their relationship
Use results to identify additional entities and attributes
MUHAMMAD HAMIZ MOHD RADZI
NORMALIZATION AND DATABASE DESIGN (CONTINUED)
Normalization procedures
Focus on the characteristics of specific entities
A micro view of the entities within the ER diagram
Difficult to separate normalization process from ER modeling process
Two techniques should be used concurrently
MUHAMMAD HAMIZ MOHD RADZI
THE INITIAL ERD FOR A CONTRACTING COMPANY
MUHAMMAD HAMIZ MOHD RADZI
THE MODIFIED ERD FOR A CONTRACTING COMPANY
MUHAMMAD HAMIZ MOHD RADZI
THE INCORRECT REPRESENTATION OF A M:N
RELATIONSHIP
MUHAMMAD HAMIZ MOHD RADZI
THE FINAL (IMPLEMENTABLE) ERD FOR A CONTRACTING
COMPANY
MUHAMMAD HAMIZ MOHD RADZI
THE IMPLEMENTED DATABASE FOR THE CONTRACTING
COMPANY
MUHAMMAD HAMIZ MOHD RADZI
BOYCE-CODD NORMAL FORM (BCNF)
Every determinant in the table is a candidate key
Determinant is an attribute whose value determines other values within a row
3NF table with one candidate key is already in BCNF
Table is not BCNF if the determinant is not a candidate key.
MUHAMMAD HAMIZ MOHD RADZI
Consider the following relation and determinants.
R(a,b,c,d)
a,c -> b,d
a,d -> b
Here, the first determinant suggests that the primary key of R could be changed from
a,b to a,c. If this change was done all of the non-key attributes present in R could still
be determined, and therefore this change is legal.
However, the second determinant indicates that a,d determines b, but a,d could not
be the key of R as a,d does not determine all of the non key attributes of R (it does
not determine c).
We would say that the first determinate is a candidate key, but the second
determinant is not a candidate key, and thus this relation is not in BCNF (but is in 3rd
normal form).
MUHAMMAD HAMIZ MOHD RADZI
3NF TABLE NOT IN BCNF
MUHAMMAD HAMIZ MOHD RADZI
DECOMPOSITION OF TABLE STRUCTURE TO MEET
BCNF
MUHAMMAD HAMIZ MOHD RADZI
DECOMPOSITION INTO BCNF
MUHAMMAD HAMIZ MOHD RADZI
EXAMPLE OF BCNF
An example table from the University Database might be as follows:
This table is in 3NF already. Transform this into BCNF.
MUHAMMAD HAMIZ MOHD RADZI
If we know the Student Number and Teacher Code we know the Offering (class) the
student is in.
We also know the review date for that student and teacher (Student progress is
reviewed for that class by the teacher and student).
Hence, the dependencies:
S_Num, T_Code Offering#, Review Date
The table is not in BCNF as if we know the offering number we know who the teacher
is.
Each offering can only have one teacher!
Offering# T_Code
A non key attribute is a determinant.
MUHAMMAD HAMIZ MOHD RADZI
TRANSFORMING INTO BCNF
Step 1: The determinant, Offering#, becomes part of the key and the dependant
attribute T_Code, becomes a non key attribute.
So the Dependency diagram is now:
S_Num, Offering# T_Code, Review Date
MUHAMMAD HAMIZ MOHD RADZI
Step 2: There are problems with this structure as T_Code is now dependant on only
part of the key.
This violates the rules for 2NF, so the table needs to be divided with the partial
dependency becoming a new table.
The dependencies would then be:
S_Num, Offering# T_Code, Review Date
Offering# T_Code
MUHAMMAD HAMIZ MOHD RADZI
Step 3: The original table is divided into two new tables. Each is in 3NF and in BCNF.
Dependencies: S_Num, Offering# Review Date
Offering# T_Code
MUHAMMAD HAMIZ MOHD RADZI
DENORMALIZATION
Creation of normalized relations is important database design goal
Processing requirements should also be a goal
If tables decomposed to conform to normalization requirements
Number of database tables expands
Joining larger number of tables takes additional disk input/output (I/O) operations
and processing logic
Reduces system speed
Conflicts among design efficiency, information requirements, and processing speed are
often resolved through compromises that may include denormalization
MUHAMMAD HAMIZ MOHD RADZI
DENORMALIZATION (CONTINUED)
Unnormalized tables in a production database tend to have these defects:
Data updates are less efficient because programs that read and update tables
must deal with larger tables
Indexing is much more cumbersome
Unnormalized tables yield no simple strategies for creating virtual tables known
as views
Use denormalization cautiously
Understand why—under some circumstances—unnormalized tables are a better
choice
MUHAMMAD HAMIZ MOHD RADZI
CASE STUDY 1
Consider this following situation:
MODULE DEPARTMENT LECTURER TEXTS
M1 D1 L1 T1, T2
M2 D1 L1 T1, T3
M3 D1 L2 T4
M4 D2 L3 T1, T5
M5 D2 L4 T6
Normalize this table.
MUHAMMAD HAMIZ MOHD RADZI
CASE STUDY 2
Consider the following FD and
do the normalization.
MUHAMMAD HAMIZ MOHD RADZI
REFERENCES
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
Database Systems: A Practical Approach to Design, Implementation, and Management,
Thomas Connolly and Carolyn Begg, 5th Edition, 2010, Pearson.
Fundamental of Database Management Systems, Mark L. G., 2nd Edition, 2012, John
Wiley.
MUHAMMAD HAMIZ MOHD RADZI