Database I
Methodology
Normalization
1
Normalization ? (1/2)
Normalization is the technique for analyzing
relations based on their primary key (or candidate
keys in the case of BCNF) and functional
dependencies
Normalization is executed as a series of steps
Each step corresponds to a specific normal form
Different normal forms or levels are called as first,
second and so on
Each normalized form has certain requirements or
conditions, which must be fulfilled
2
Normalization ? (2/2)
If a relation fulfills any particular form then it
is said to be in that normal form
The minimum form in which all the tables are
in is called the normal form of entire database
It is performed after the logical database
design
3
Why Normalization ? (1/3)
Main aim of relational database design is to group
attributes into relations so as to minimize data
redundancy and thereby reduce the file storage
space
Consider StaffBranch relation below
It is clear that there is redundant information in
StaffBranch relation
SNo SName Position Salary BNo BAddress PhNo
S123 Asad Manager 23000 B5 Peshawar 20456789
S125 Jamal Programmer 40000 B5 Peshawar 20456789
S130 Ghamgeen Peon 5000 B6 Islamabad 34564567
4
S133 Khamar Sweaper 3000 B7 Mardan 23567890
Why Normalization ? (2/3)
The serious problem with relations having
redundant information is update anomalies
Update anomalies can be classified as:
Insertion Anomalies
There are two insertion anomalies in StaffBranch
relation
To insert the details of new member, we have to also
enter the branch details for new entry, thus we have to
take great care that branch details is consistent
To insert details of a new branch that currently has no
members of staff, the solution could be to insert nulls
for staff, but wait SNo is primary key ?
5
Why Normalization ? (3/3)
Deletion Anomalies
If we delete the row for staff number S133, the
information relating to branch number B7 is also lost
from the database
Modification Anomalies
If we want to change the phone number for branch
number B5, we must update the rows of all staff
located at that branch to avoid data inconsistency
6
Functional Dependencies (1/4)
Functional dependency describes the
relationship between attributes
In a database, we often have the case where
one field defines the other
For example, we can say that Employee ID
(EmpID) defines a name
What does this mean?
It means that if I have a database with
EmpIDs and names, and if I know someone's
EmpID, then I can find their name
7
Functional Dependencies (2/4)
From the word "defines," we means that for
every EmpID we will have one and only one
name
So we will say that we have defined name as
being functionally dependent on EmpID
A formal definition can be:
If A and B are attributes or sets of attributes of
relation R, we say that B is functionally dependent
on A if each value of A in R has associated with it
exactly one value of B in R 8
Functional Dependencies (3/4)
We write a functional dependency (FD) as:
EmpID → Name
Can be read as “EmpID defines Name" or “EmpID
implies Name“ or Name is functionally dependent on
EmpID”
Let us look at a an interesting example:
EmpID Name Job Salary
101 Darya Khan Programmer 40000
104 Shahrukh Designer 50000
103 Gul Jana System Engineer 35000
103 Gul Jana Project Manager 55000 9
Functional Dependencies (4/4)
We have the FD that EmpID → Name
This means that every time we find 103, we
find the name, Gul Jana
Thus something is on the left-hand side of a
FD, it does not imply that you have a key or
that it will be unique in the database
The FD X → Y only means that for every
occurrence of X you will get the same value
of Y
10
FD Inference Axioms or Armstrong
Axioms ( Transitivity Rule )
RNo Name School Location
Let us now consider
101 Rasheed PPS Peshawar
another example 102 Ani FCA Mardan
Here, we will define: 103 Naseer ECS Swabi
RNo → Name 104 Ghafoor FCA Mardan
RNo → School 105 Bilal PPS Peshawar
106 Sohail PPS Peshawar
Name → School
Have we violated any FDs with our data?
Because all RNos are unique, there cannot be a FD
violation of RNo → Name
The same comment is true for RNo → School and
Name → School 11
FD Inference Axioms or Armstrong
Axioms ( Transitivity Rule )
If we define a FD X → Y and we define a FD Y → Z,
then we know by inference that X → Z
So we can infer that RNo → Location
The inference illustrated is called the transitivity rule
of FD inference
To see that the FD RNo→ Location is true in our
data, you can note that given any value of RNo, you
always find a unique location for that person
This prove that transitivity rule exists
12
FD Inference Axioms or Armstrong
Axioms (Pseudo Transitivity Rule )
If A → B and CB → D, then AC → D
If RNo → Name and
Name,School →Location
Then we can write it as
RNo,Name → Location
13
FD Inference Axioms or Armstrong
Axioms ( Reflexive Rule )
If X is a composite, composed of A Name City
and B, then X→ A and X→ B Haleem Peshawar
Example: X= Name, City. Then we Rasheed Mardan
Kaleem Charsadda
are saying that X → Name and
X → City
The rule, says if I give you the combination
<Rasheed, Mardan>, what is this person's Name?
What is this person's City?
14
FD Inference Axioms or Armstrong
Axioms (Augmentation Rule)
If X→ Y, then XZ→ Y
Name City ShoeSize
Now, I claim that because Haleem Peshawar 10
Name→ City, that Rasheed Mardan 6
Name+ShoeSize → City Kaleem Charsadda 7
(i.e., we augmented Name with ShoeSize)
Will there be a contradiction here, ever?
No, because we defined Name → City, Name
plus more information will always identify the
unique City for that individual
We can always add information to the LHS of
an FD and still have the FD be true
15
FD Inference Axioms or Armstrong
Axioms (Decomposition Rule)
The decomposition/projectivity rule says that
if it is given that X → YZ (that is, X defines
both Y and Z), then X → Y and X → Z
Suppose I define Name → City, ShoeSize
This means for every occurrence of Name, I
have a unique value of City and a unique
value of ShoeSize
16
FD Inference Axioms or Armstrong
Axioms (Union Rule)
The union/additivity rule is the reverse of the
decomposition rule in that if X → Y and
X → Z, then X → YZ
If we were given that Name → City and given
that Name → ShowSize
We can immediately write Name→ City, Shoe
Size
17
Keys and Functional
Dependencies (1/4)
The main reason we identify the FDs and
inference rules is to be able to find keys and
develop normal forms for relational
databases
In any table, we want to find out which, if any
attribute(s), will identify the rest of the
attributes
An attribute that will identify all the other
attributes in row is called a "candidate key“
Consider the example on next slide 18
Keys and Functional
Dependencies (2/4)
Now suppose we defineRNo Name School Location
101 Rasheed PPS Peshawar
the following FDs: 102 Ani FCA Mardan
RNo → Name 103 Naseer ECS Swabi
104 Ghafoor FCA Mardan
RNo → School 105 Bilal PPS Peshawar
School → Location 106 Sohail PPS Peshawar
What we want is to find the least number of
attributes that can identify all the rest
attributes (hopefully only one attribute)
Suppose we take RNo as candidate key 19
Keys and Functional
Dependencies (3/4)
Can we show that RNo "defines" all attributes in the
relation?
We can use the transitive, reflexive and union rule
RNo → Name (given)
RNo → School (given)
School → Location (given)
RNo → Location (derived by the transitive rule)
RNo → RNo (reflexive rule)
RNo → RNo, Name, School, Location (union rule)
Therefore RNo is a candidate key for this relation
Finding a candidate key is the finding of a “closure of an
attribute or a set of attributes” that defines all the other
attributes 20
Keys and Functional
Dependencies (4/4)
Are there any other candidate keys?
Of course! By augmentation rule we can augment
RNo and name to form new candidate keys: RNo,
Name
Is School a candidate key? No
Once we have found a set of candidate keys (or
perhaps only one as in this case), we designate one
of the candidate keys as the primary key and move
on to normal forms
FD rules are useful in developing Normal forms
21
Normal Forms
(1N Form)
A relation is in first normal form if and only if
every attribute is single valued for each tuple or
There be no repeating fields or groups in the row
This means that each attribute in each row , or
each cell of the table, contains only one value
First normal form say that the domains of
attributes of a relation are atomic
The concept of 1NF is demonstrated on next
slide
22
Normal Forms
(1N Form)
Employee
Non-1NF: Name Address Dependant
Sohail Dar I-8 Islamabad Jamal,Kamal,Karim
Shoiab Ali Wah Cantt. Haider, Ihsan
Tahira Ijaz Peshawar Gul Mast
There are more than one values for
1NF Tables: Dependant, thus it is not in 1N form
Dependant
DependantName EmployeeName Employee
Jamal Sohail Dar Name Address
Kamal Sohail Dar Sohail Dar I-8 Islamabad
Karim Sohail Dar Shoiab Ali Wah Cantt.
Haider Shoiab Ali Tahira Ijaz Peshawar
Ihsan Shoiab Ali 23
Gul Mast Tahira Ijaz
Normal Forms
(1N Form)
Note that the original table could be
reconstructed by combining these two tables
By recording all the rows in the EMPLOYEE
table and combining them with the
corresponding rows in the EMPLOYEE table
where the names were equal (an equi-join
operation)
24
Normal Forms
(2N Form)
A relation is in second normal form (2NF) if and only
if it is in first normal form and all the non-key
attributes are fully functional dependent on the key
Partial dependencies are not allowed
The only time we have to be concerned about 2NF
is when the key is composite
Second normal form (2NF) addresses the concept
of removing duplicative data
A relation that is not in 2NF exhibits the update,
insertion and deletion anomalies
Consider the example on next slide 25
Normal Forms
(2N Form)
name job salary address
Employee (name, job,
Asad Welder 2300 Peshawar
salary, address) Asad Programmer 30000 Peshawar
FDs Karim Programmer 45000 Lahore
name+job → salary Javad Designer 40000 Lahore
name → address
The problems developing here are redundancy and
anomalies
Here address depends only on the name, not the
job; this is an example of a partial dependency
26
Normal Forms
(2N Form)
As the table is in 1NF
The process for transforming a 1NF table to 2NF is:
1. Identify any partial determinants (attribute on LHS) other
than the composite key, and the columns they determine
2. Create and name a new table for each determinant and
the unique columns it determines
3. Move the determined columns from the original table to
the new table. The determinate becomes the primary key
of the new table
4. Delete the columns you just moved from the original table
except for the determinant which will serve as a foreign
key
5. The original table may be renamed to maintain semantic
meaning 27
Normal Forms
(2N Form)
Consider the non-2NF name job salary address
2 relation Asad Welder 2300 Peshawar
3 Asad Programmer 30000 Peshawar
NameAdd
Karim Programmer 45000 Lahore
name address Javad Designer 40000 Lahore
Asad Peshawar
Karim Lahore 1
NameJob
Javad Lahore
name job salary 5
Asad Welder 2300
4
Asad Programmer 30000
Karim Programmer 45000
28
Javad Designer 40000
Normal Forms
(3N Form)
A relational table is in 3NF if it is already in
2NF and every non-key column is non-
transitively dependent upon its primary key
OR All non-key attributes are functionally
dependent only upon the primary key
Transitive Dependency
Transitive dependency occurs when one non-key
attribute determines another non-key attribute
E.g.
STD(stId, stName, stAdr, prName, prCrdts)
29
Normal Forms
(3N Form)
stId → stName, stAdr, prName, prCrdts
prName → prCrdts
Here stId is the key
prCrdts can be determined by prName
(Transitive Dependency)
Thus transitive dependency exists which
means STUDENT relation is not in 3NF
Transitive dependencies cause insertion,
deletion, and update anomalies 30
Normal Forms
(3N Form)
So for 3NF we will concentrate on relations with
one candidate key
The process for transforming a 2NF table to 3NF is:
1. Identify any determinants, other the primary key, and the
columns they determine
2. Create and name a new table for each determinant and
the unique columns it determines
3. Move the determined columns from the original table to
the new table. The determinate becomes the primary key
of the new table
4. Delete the columns you just moved from the original table
except for the determinate which will serve as a foreign
key
5. The original table may be renamed to maintain semantic
meaning 31
Normal Forms
(3N Form)
Non-3NF stdId stName stAdr prName prCrdts
S1020 Sohail Dar I-8 Islamabad MCS 64
S1038 Shoaib Ali G-6 Islamabad BCS 132
S1015 Tahira Ejaz L Rukh Wah MCS 64
2
1
3 5 4
Program Student
stdId stName stAdr prName
prName prCrdts
MCS 64 S1020 Sohail Dar I-8 Islamabad MCS
BCS 132 S1038 Shoaib Ali G-6 Islamabad BCS
S1015 Tahira Ejaz L Rukh Wah MCS
32
Normal Forms
(Boyce-Codd Normal Form (BCNF))
1NF and 2NF identifies partial and transitive
dependencies on the basis of primary keys
But what if such dependencies remain on other
candidate keys, if any exist
BCNF identifies the functional dependencies on all
candidate keys
It means that if relation has one candidate key and it
is in 3NF, it is also in BCNF
3NF relation is not in BCNF if:
The candidate keys in the relation are composite keys
There is more than one candidate key in the relation 33
Normal Forms
(Boyce-Codd Normal Form (BCNF))
The keys are not disjoint, that is, some attributes in the
keys are common
For example consider the Enroll relation
Enroll (sno, sname, cno, cname, dateofenroll)
Let us assume that the relation has the following
candidate keys:
sno,cno
sno,cname
sname,cno
sname, cname
The relation is in 3NF but not in BCNF because
there are dependencies
34
Normal Forms
(Boyce-Codd Normal Form (BCNF))
sno → sname
cno → cname
Where attributes are part of a candidate key are
dependent on part of another candidate key
By decomposing the Enroll relation in the following
three relations result in BCNF
Std(sno, sname)
Crs(cno, cname)
Std_Crs(sno, cno, dateofenroll)
35
Normal Forms
(Boyce-Codd Normal Form (BCNF))
Non-BCNF sno sname cno cname dateofenroll
S1020 Sohail Dar C104 E-Com 12/02/2007
S1038 Shoaib Ali C104 E-Com 10/01/2008
S1015 Tahira Ejaz H234 English 01/02/2007
Crs
cno cname
Std
C104 E-Com
sno sname H234 English
S1020 Sohail Dar Std_Crs
S1038 Shoaib Ali sno cname datofenroll
S1015 Tahira Ejaz S1020 C104 10/01/2008
S1038 C104 01/02/2007
36
S1015 H234 12/02/2007
Other Normal Forms
5NF deals with multi-valued dependency and
possible loss less decompositions
Domain Key Normal Form (DKNF) reduces
further chances of any possible inconsistency
37