Chapter Five
Database Normalization
Database normalization is a series of steps followed to
obtain a database design that allows:
For consistent storage and
Efficient access of data in a relational database.
These steps reduce:
Data redundancy and
The risk of data becoming inconsistent.
Normalization is the process of identifying:
the logical associations between data items
designing a database that will represent such associations
but without suffering the update anomalies.
Normalization of Database
Database Normalisation is a technique of organizing
the data in the database.
Normalization is a systematic approach of
decomposing tables to eliminate data redundancy and
undesirable characteristics like:
Insertion,
Update
and Deletion Anamolies.
Normalization is used for mainly two purpose,
Eliminating reduntant(useless) data.
Ensuring data dependencies make sense i.e data is
logically stored.
The type of problems that could occur in insufficiently
normalized table is called update anomalies which
includes:
Insertion anomalies
Deletion anomalies
Modification anomalies
Insertion anomalies
Failure to place information about a new database entry
into all the places in the database where information about
that new entry needs to be stored
Deletion anomalies
Failure to remove information about an existing database
entry when it is time to remove that entry.
Modification anomalies
In a properly normalized database table the change will be
effected and used accordingly.
Functional Dependency (FD)
functional dependency is a relationship among attributes.
Data Dependency
The logical associations between data items that point the
database designer in the direction of a good database
design are referred to as determinant or dependent
relationships.
Two data items A and B are said to be in a determinant or
dependent relationship if certain values of data item B
always appears with certain values of data item A.
If the data item A is the determinant data item and B the
dependent data item then the direction of the association is
from A to B and not vice versa.
The essence of this idea is that if the existence of
something, call it A, implies that B must exist and have a
certain value, and then we say that "B is functionally
dependent on A.
" We also often express this idea by saying that "A
determines B," or that "B is a function of A, “or that "A
functionally governs B."
Often, the notions of functionality and functional
dependency are expressed briefly by the statement, "If A,
then B.“
It is important to note that the value B must be unique for a
given value of A, i.e., any given value of A must imply just
one and only one value of B, in order for the relationship
to qualify for the name "function.
" (However, this does not necessarily prevent different
values of A from implying the same value of B.)
Example
Dinner Type of Wine
Meat Red
Fish White
Since the type of Wine served depends on the type of Dinner, we say
Wine is functionally dependent on Dinner.
Dinner Wine
Dinner Type of Wine
Types of Fork
Meat Red
Meat fork
Fish White
Fish fork
Since both Wine type and Fork type are determined by the Dinner
type, we say Wine is functionally dependent on Dinner and Fork is
functionally dependent on Dinner.
Dinner Wine
Dinner Fork
Partial Dependency
If an attribute which is not a member of the primary key is
dependent on some part of the primary key (if we have
composite primary key) then that attribute is partially
functionally dependent on the primary key.
Let {A, B} is the Primary Key and C is no key attribute.
Then if {A,B}C and BC or AC Then C is partially
functionally dependent on {A,B}
Full Dependency
If an attribute which is not a member of the primary key is
not dependent on some part of the primary key but the
whole key (if we have composite primary key) then that
attribute is fully functionally dependent on the primary
key.
Let {A,B} is the Primary Key and C is no key attribute
Then if {A,B}C and BC and AC does not hold ( if B
cannot determine C and A cannot determine C).
Then C Fully functionally dependent on {A,B}
Transitive Dependency
A transitive relationship is a relationship of the following
form:
"If A implies B, and if also B implies C, then A implies C.“
Example: If Mr X is a Human, and if every Human is an
Animal, then Mr X must be an Animal.
Generalized way of describing transitive dependency is
that:
If A functionally governs B, AND
If B functionally governs C
THEN A functionally governs C Provided that neither C
nor B determines A i.e. (B / A and C / A)
In the normal notation:
{(AB) AND (BC)} ==> AC provided that B /A and
C /A
levels or steps in normalization called Normal Forms.
The level of complexity, strength of the rule and
decomposition increases as we move from one lower level
Normal Form to the higher.
Normalization towards a logical design consists of the
following steps:
Un-Normalized Form: Identify all data elements
First Normal Form: Find the key with which you can
find all data
Second Normal Form: Remove part-key dependencies.
Make all data dependent on the whole key.
Third Normal Form: Remove non-key dependencies.
Make all data dependent on nothing but the key.
UNNORMALIZED
A table that contains one or more repeating groups.
A repeating group is a field or group of fields that hold
multiple values for a single occurrence of a field.
FIRST NORMAL FORM (1NF)
A table (relation) is in 1NF
If
There are no duplicated rows in the table.
Each cell is single-valued (i.e., there are no repeating
groups).
Entries in a column (attribute, field) are of the same kind.
Example for First Normal form (1NF)
Remove all repeating groups.
Distribute the multi-valued attributes into different rows
and identify a unique identifier for the relation so that is
can be said is a relation in relational database.
SECOND NORMAL FORM (2NF)
No partial dependency of a non-key attribute on part of
the primary key.
This will result in a set of relations with a level of Second
Normal Form.
Any table that is in 1NF and has a single-attribute (i.e., a
non-composite) primary key is automatically in 2NF.
Definition: a table (relation) is in 2NF
If
It is in 1NF and
If all non-key attributes are dependent on the entire
primary key i.e. no partial dependency.
Example for First Normal form (2NF)
Business rule: Whenever an employee participates in a
project, he/she will be entitled for an incentive.
This schema is in its 1NF since we don’t have any repeating
groups or attributes with multi-valued property.
To convert it to a 2NF we need to remove all partial
dependencies of non-key attributes on part of the primary
key.
{EmpID, ProjNo}EmpName, ProjName, ProjLoc,
ProjFund, ProjMangID, Incentive
But in addition to this we have the following dependencies:
FD1: {EmpID}EmpName
FD2: {ProjNo}ProjName, ProjLoc, ProjFund,
ProjMangID
FD3: {EmpID, ProjNo}Incentive
THIRD NORMAL FORM (3NF)
Eliminate Columns Dependent on another non-Primary
Key - If attributes do not contribute to a description of the
key, remove them to a separate table.
These levels avoid update and delete anomalies.
Definition: a Table (Relation) is in 3NF
If
It is in 2NF and
There are no transitive dependencies between a primary
key and non-primary key attributes.
Example for First Normal form (3NF)
Assumption: Students of same batch (same year) live in
one building or dormitory
This schema is in its 2NF since the primary key is a single
attribute.
Let’s take StudID, Year and Dormitaryand see the
dependencies.
StudIDYear AND YearDormitary
And Year cannot determine StudIDand Dormitary cannot
determine StudID
Then transitively StudIDDormitary
Generally, even though there are other four additional
levels of Normalization, a table is said to be normalized if
it reaches 3NF.
A database with all tables in the 3NF is said to be
Normalized Database.
Mnemonic for remembering the rationale for
normalization up to 3NF could be the following:
No Repeating or Redundancy: no repeating fields in the
table.
The Fields Depend Upon the Key: the table should solely
depend on the key.
The Whole Key: no partial key dependency.
And Nothing But the Key: no inter data dependency.
So Help Me Codd: since Codd came up with these rules.
Other Levels of Normalization
Boyce-Codd Normal Form (BCNF)
Forth Normal form (4NF)