NORMALIZATION
Dr. Nilanjana Dutta Roy
Why normalization??
• Bad database..
A bad database may lead to a problem of redundant & spurious data &
information.
• Solution??
it breaks down the tables into smaller parts just to avoid the different
anomalies.
Definition:
Normalization is process by which we can decompose any relation into more
than one relation to remove anomalies in database. It is a step by step
process & each step is known as Normal Form
Properties of Normalization
Removes Anomalies
Removes use of null values
Removes redundant values
Preserves necessary dependency
Decomposition must be lossless
Types of Anomalies
• Insertion Anomaly:
• Update Anomaly:
• Deletion Anomaly:
Insertion anomaly
Insertion anomaly
Insertion anomaly when new department is
introduced (say, 6 .. HR)
Update anomaly
Deletion anomaly
Informal Design Guidelines:
Informal Design Guidelines:
• Guideline1: Design a relational schema in such a way, so that it is easy
to explain its meaning.
Informal Design Guidelines:
• Guideline1: Design a relational schema in such a way, so that it is easy
to explain its meaning.
• Guideline2: Design the base table so that no Insertion, update or
deletion anomalies are present in the relations.
Informal Design Guidelines:
• Guideline1: Design a relational schema in such a way, so that it is easy
to explain its meaning.
• Guideline2: Design the base table so that no Insertion, update or
deletion anomalies are present in the relations.
• Guideline3: As far as possible, avoid populating tuples that may
probably have null values.
Informal Design Guidelines:
• Guideline1: Design a relational schema in such a way, so that it is easy
to explain its meaning.
• Guideline2: Design the base table so that no Insertion, update or
deletion anomalies are present in the relations.
• Guideline3: As far as possible, avoid populating tuples that may
probably have null values.
• Guideline4: Design a schema in such a way that avoids the repetition
of same information (redundant values).
FUNCTIONAL DEPENDENCIES:
• inter-relationship between attributes or in between tuples in any
relation R
• is denoted as X->Y
• This means that attribute Y is functionally dependant on X.
• The left hand side of the FD is sometimes called as the determinant &
the right hand side is called dependent.
Roll number Name
determinant dependent
To hold functional dependency, two
conditions need to be satisfied
For X-> Y
X Y
t1
t2
for any pair of tuples t1 & t2 in r( R) ,
if t1[X]=t2[X] , then
t1[Y]=t2[Y].
A-> B ??
A-> C ??
Functional Dependency Chart:
It is a graphical representation of functional dependencies among
attributes in any relation. To draw FD chart..
• Find out the primary key attributes
• Make a rectangle & write all primary attributes inside it.
• Write all non-prime key attributes outside the rectangle
• Use arrows to show functional dependency among attributes
Types of Functional Dependencies:
There are four major types of FD’s.
• Partial & Full Functional Dependencies
• Transitive & Non-transitive Dependencies
• Single valued & Multi-valued Dependency
• Trivial & Non trivial Dependency
Partial & Full Functional Dependencies
FD2
FD3
FD1
Transitive & Non-transitive Dependencies:
Transitive & Non-transitive Dependencies:
FD2
FD1
Single valued/ Multivalued Dependency
Name Project Dependents
No relation between project and dependents
X Amit
Smith Smith Nilesh
Y Asha
Trivial/ Non-trivial Dependency
• Closure- Armstrong’s Axioms
• Attribute closure
• Finding Candidate key from relation
• Lossy /Lossless decomposition
• Dependency preservation
• Cover
• Minimal cover
• Decomposition into Normal forms
Closure of a set of FDs
• Given some FDs, new FDs can be inferred.
• The set of all FDs that are implied by a given set of FDs is called
CLOSURE.
• Denoted by F+
• Example: A-> B and B->C imply a new relation between A and C. If B is
dependent on A and C is dependent on B, then C is dependent on A
through B. Hence, A->C is true.
Why do we need to find Closure?
Rules to find FD Closure – Armstrong’s Axioms
• Inference rules used to infer new FDs from a given set of FDs.
• Used to find the Closure of FDs and find the Candidate key from a
relation.
Rules to find FD Closure – Armstrong’s Axioms
• IR-1: Reflexivity: if X Y, then X-> Y
• IR-2: Augmentation: if X->Y, then XZ->YZ
• IR-3: Transitivity: if X->Y and Y->Z, then X->Z
• IR-4: Decomposition: if X->YZ, then X->Y and X->Z
• IR-5: Union: if X->Y and X->Z, then X->YZ
• Pseudotransitivity: if X->Y and AY->Z holds, then XA->Z
Example
• R(A B C D E F) with
A-> BC
B-> E
CD-> EF
Is it possible to hold AD->F as a member of the closure?
Goal is to find AD->F
• 1. A->BC (given)
• 2. A->C (by decomposition of 1)
• 3. AD->CD (augmentation of 2 by adding D)
• 4. CD->EF (given)
• 5. AD->EF (transitivity of 3 and 4)
• 6. AD->F (decomposition of 5)
Exercise: (from Korth)
R(A B C G H I)
A->B
A->C
CG->H
CG->I
B->H
Is the functional dependency A->H logically implied?
Finding Candidate Key from relation
A-> B
A B
A-> B, BC-> D
A B
D
C
A-> B, BC-> D, BC-> E,
A B
D
C
E
A-> B, BC-> D, BC-> E, AEF-> G
A B
D
C
E
F
G
A-> B, BC-> D, BC-> E, AEF-> G, B->G
A B
D
C
E
F
G
Step 2:
Step 3:
Minimal Cover
Minimal Cover
• R (A,B,C) and F={A->BC, B->C, A->B, AB->C}
• Let’s find the minimal cover of F
• Break the RHS
• Find extraneous attribute from LHS if possible
• Reduce redundant values
Step1:
• Break all the right hand sides
A->B,
A->C,
B->C,
A->B,
AB->C
Step2:
• Find Extraneous attributes from left hand side
A->B,
A->C,
B->C,
A->B,
AB->C
Step2:
• Find Extraneous attributes from left hand side
A->B,
A->C,
B->C,
A->B,
AB->C
AB->C
Let’s consider A as extraneous first.
So, B+ should return A.
B+=B
B->BC. But B+ is unable to return A from the given FD. So, A is not
extraneous.
Let’s consider B as extraneous then.
A+=A
A->ABC, here, A+ returns B on the RHS. So, B is extraneous.
Step 3:
A->B,
A->C,
B->C,
A->B,
A->C
Reduce the redundant values
A->B,
A->C,
B->C,
A->B,
A->C
A->B,
B->C,
A->C
A->B,
B->C,
A->C
So the final Minimal cover/canonical cover is
A->B,
B->C
Equivalent sets
F={A->B, A->C}
F+= {A->B, A->C, A->BC}
G={A->B, B->C}
G+={A->B, B->C, A->C, A->BC}
F={A->B, A->C}
F+= {A->B, A->C}
G={A->B, B->C}
G+={A->B, B->C, A->C}
F={A->B, A->C}
F+= {A->B, A->C}
G={A->B, B->C}
G+={A->B, B->C, A->C}
B->C??
Equivalent sets
• F+=G+ and vice versa
1 NF
Roll no Name Phone no.
1 Asha {1234, 2345, 3456}
2 Aman {9876,8765}
3 Akash 5678
..
..
1 NF
Roll no Name Phone no.
1 Asha 1234
1 Asha 2345
1 Asha 3456
2 Aman 9876
2 Aman 8765
Roll no Name Phone 1 Phone 2 Phone 3 Phone 4
1 Asha 1234 2345 3456 NULL
2 Aman 9876 8765 NULL NULL
3 Akash 5678 NULL NULL NULL
2NF
FD2
FD3
FD1
Decomposition into 2NF
Table 1
Roll Game Grade
Table 2
Table 3
Roll Name Game Fee
Contn..
3NF
To be in 3NF
• Should be in 2NF
• No transitive dependency should exist
3NF
FD2
FD1
Table 1
Roll Name Semester
Table 2
Semester Hostel
3NF
Reference: Korth Book, 5th edition, page no. 275
A relation schema R is in 3NF with respect to a set F of functional
dependencies if, for all functional dependencies in F+ of the form α->β,
where α⊆R and β ⊆R, at least one of the following holds:
• α->β is a trivial functional dependency
• α is a superkey of R
• Each attribute A in β - α is contained in a candidate key for R
Lossless decomposition
R1 R2
• R1 R2 = R
• πR1(R) πR2(R)
Extra information added
R1 and R2 form a lossless decomposition of R if at least one of the following
functional dependencies is in F+:
Either R1Ⴖ R2 -> R1 or R1Ⴖ R2 -> R2
Korth Book, 5th Ed, Page no. 285
Either R1Ⴖ R2 -> R1 or R1Ⴖ R2 -> R2
R1Ⴖ R2 = A
R1Ⴖ R2 -> (R1-R2) = A->B or R1Ⴖ R2 -> (R2-R1) = A->C
Either A->B should be present in F+ or A->C should be present in F+
F={A->B, A->C}
F+={A->B, A->C, A->BC}
F+={A->B, A->C, A->BC}
Dependency preservation
(F1 U F2)+ = F+ R1={A,B} with F1={A->B} and R2={A,C} with F2={A->C}
F1 = {A->B}
F2 = {A->C}
(F1 U F2) ={A->B, A->C}
(F1 U F2)+ ={A->B, A->C, A->BC}
F+ ={ A->BC}
F+ ={ A->BC, A->B, A->C}
(F1 U F2)+ = F+ Hence, dependency preserved
Examples to solve
Given R(A,B,C,D) with F={A->B, A->C, C->D}
Decomposed into
• R1(A,B,C) with F1={A->B, A->C} and
• R2(C,D) with F2={C->D}
• Is it preserving dependency?
• Is it lossless?
R=(A,B,C,D,E) with F={A->BC, CD->E, B->D, E->A}
(A,B,C)
(A,D,E)
• Is it lossless?
Korth 5th Ed, page 306, Exercise 7.1
Exercise (Assignment)
1. Consider a relation R(A,B,C,D,E) with F={A->B, BC->E, ED->A}
• List all keys for R
• Is R in 3NF?
• Is R in BCNF?
2. Consider R(A,B,C,D) with F={C->D, C->A, B->C}
• Find the candidate key
• Find the best normal form and decompose, if necessary.
Korth 5th Ed, page 307-310
• Exercise 7.6
• Exercise 7.7
• Exercise 7.23
• Exercise 7.29
BCNF (Boyce-Codd Normal Form), 3.5 NF
To be in BCNF, a table should satisfy two conditions
• Should be in 3NF
• For any dependency A->B, A should be a superkey
In simple words,
for A->B
A can not be a non-prime attribute when B is a prime attribute
Part of primary key/CK non-prime attribute
partial dependency
non-prime attribute non-prime attribute
transitive dependency
non-prime attribute prime attribute??
BCNF doesn’t allow this
Roll Subject Professor
10 DBMS NDR
10 NETWORK WONG
12 DBMS DB
13 OS DRS
14 DBMS NDR
Roll Subject Professor
10 DBMS NDR
10 NETWORK WONG
12 DBMS DB
13 OS DRS
14 DBMS NDR
Roll Subject Professor
10 DBMS NDR
10 NETWORK WONG
12 DBMS DB
13 OS DRS
14 DBMS NDR
Roll Subject Professor
10 DBMS NDR
10 NETWORK WONG
12 DBMS DB
13 OS DRS
14 DBMS NDR
Roll Subject Professor
10 DBMS NDR
10 NETWORK WONG
12 DBMS DB
13 OS DRS
14 DBMS NDR
Roll Subject Professor
10 DBMS NDR
10 NETWORK WONG
12 DBMS DB
13 OS DRS
14 DBMS NDR
Roll Subject Professor
10 DBMS NDR
10 NETWORK WONG
12 DBMS DB
13 OS DRS
14 DBMS NDR
1NF, as all the attributes are atomic
2NF, as there is no partial dependency, F = {Roll, subject -> professor, professor->subject}
3NF, as no transitive dependency
(Roll, subject) professor
professor subject
Prime attribute
Non-prime attribute
Roll P_id P_id Professor Subject
Student Table
Professor Table
BCNF Checking
• (Roll, subject) --> professor
(Roll, subject)+ = {Roll, subject, professor}
• professor --> subject
(professor)+ = {professor, subject}
Since there is no roll present on the right hand side, the table is not in
BCNF
4NF
Name->-> Project and Name->-> Dependent
Name Project
SMITH X
SMITH Y
Name Dependent
SMITH Amit
SMITH Nilesh
SMITH Asha
5NF (Project Join Normal Form)
ACP
AC CP AP
AC CP AP= ACP
ACP
Agent Company Product
Smith Ford Car
Smith Ford Truck
Smith GM Car
Smith GM Truck
John Ford Car
AC CP AP
Agent Company Company Product Agent Product
Smith Ford Ford Car Smith Car
Smith GM Ford Truck Smith Truck
John Ford GM Car John Car
GM Truck