0% found this document useful (0 votes)

11 views64 pages

DBMS Lecture09

Normalization is a technique used in database design to reduce redundancy and update anomalies by decomposing relations into smaller, logically related relations. It aims to create well-designed relational schemas that maintain desirable properties such as lossless-join and dependency preservation. The document discusses the motivation for normalization, types of anomalies, and the role of functional dependencies in ensuring the integrity of the database design.

Uploaded by

umeume3636

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views64 pages

DBMS Lecture09

Uploaded by

umeume3636

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Introduction to Database Systems

Normalization

Dr. Shams Qazi

shams.qazi@seecs.edu.pk
Lecture 09
Database Systems – CS220
Normalization
Normalization is a technique for producing relations with
desirable properties.

Normalization decomposes relations into smaller relations that

contain less redundancy. This decomposition requires that no
information is lost and reconstruction of the original relations
from the smaller relations must be possible.

Normalization is a bottom-up design technique for producing

relations. It pre-dates ER modeling and was developed by
Codd in 1972 and extended by others over the years.
 Normalization can be used after ER modeling or independently.
 Normalization may be especially useful for databases that have
already been designed without using formal techniques.
Normalization Motivation
The purpose of normalization is to develop good relational
schemas that minimize redundancies and update anomalies.

Redundancy occurs when the same data value is stored more

than once in a relation.
 Redundancy wastes space and reduces performance.

Update anomalies are problems that arise when trying to insert,

delete, or update tuples and are often caused by redundancy.

The goal of normalization is to produce a set of relational

schemas R1, R2, …, Rm from a set of attributes A1, A2, … ,An.
 Imagine that the attributes are originally all in one big relation R=
{A1, A2, .., An} which we will call the Universal Relation.
 Normalization divides this relation into R1, R2, …, Rm.
The Power of Normalization
Example Relations
Emp Relation WorksOn Relation
eno ename bdate title salary supereno dno eno pno resp hours
E1 J. Doe 01-05-75 EE 30000 E2 null E1 P1 Manager 12
E2 M. Smith 06-04-66 SA 50000 E5 D3 E2 P1 Analyst 24
E3 A. Lee 07-05-66 ME 40000 E7 D2 E2 P2 Analyst 6
E4 J. Miller 09-01-50 PR 20000 E6 D3 E3 P3 Consultant 10
E5 B. Casey 12-25-71 SA 50000 E8 D3 E3 P4 Engineer 48
E6 L. Chu 11-30-65 EE 30000 E7 D2 E4 P2 Programmer 18
E7 R. Davis 09-08-77 ME 40000 E8 D1 E5 P2 Manager 24
E8 J. Jones 10-11-72 SA 50000 null D1 E6 P4 Manager 48
E7 P3 Engineer 36

Proj Relation Dept Relation

pno pname budget dno dname mgreno
P1 Instruments 150000 D1 Management E8
P2 DB Develop 135000 D2 Consulting E7
P3 Budget 250000 D3 Accounting E5
P4 Maintenance 310000 D4 Development null
P5 CAD/CAM 500000
The Power of Normalization
Universal Relation
A Universal Relation with all attributes:
eno pno resp hours ename bdate title salary supereno dno dname mgreno pname budget
E1 P1 Manager 12 J. Doe 01-05-75 EE 30000 E2 Instruments 150000
E2 P1 Analyst 24 M. Smith 06-04-66 SA 50000 E5 D3 Accounting E5 Instruments 150000
E2 P2 Analyst 6 M. Smith 06-04-66 SA 50000 E5 D3 Accounting E5 DB Develop 135000
E3 P3 Consultant 10 A. Lee 07-05-66 ME 40000 E6 D2 Consulting E7 Budget 250000
E3 P4 Engineer 48 A. Lee 07-05-66 ME 40000 E6 D2 Consulting E7 Maintenance 310000
E4 P2 Programmer 18 J. Miller 09-01-50 PR 20000 E6 D3 Accounting E5 DB Develop 135000
E5 P2 Manager 24 B. Casey 12-25-71 SA 50000 E8 D3 Accounting E5 DB Develop 135000
E6 P4 Manager 48 L. Chu 11-30-65 EE 30000 E7 D2 Consulting E7 Maintenance 310000
E7 P3 Engineer 36 J. Jones 10-11-72 SA 50000 D1 Management E8 Budget 250000

Universal(eno, pno, resp, hours, ename, bdate, title, salary, supereno,

dno, dname, mgreno, pname, budget)

What are some of the problems with the Universal Relation?

Normalization will allow us to get back to the starting relations.

Update Anomalies
There are three major types of update anomalies:
 Insertion Anomalies - Insertion of a tuple into the relation
either requires insertion of redundant information or cannot be
performed without setting key values to NULL.

 Deletion Anomalies - Deletion of a tuple may lose information

that is still required to be stored.

 Modification Anomalies - Changing an attribute of a tuple may

require changing multiple attribute values in other tuples.
Example Relation Instances
Emp Relation Dept Relation
eno ename bdate title salary supereno dno dno dname mgreno
E1 J. Doe 01-05-75 EE 30000 E2 null D1 Management E8
E2 M. Smith 06-04-66 SA 50000 E5 D3 D2 Consulting E7
E3 A. Lee 07-05-66 ME 40000 E7 D2 D3 Accounting E5
E4 J. Miller 09-01-50 PR 20000 E6 D3 D4 Development null
E5 B. Casey 12-25-71 SA 50000 E8 D3
E6 L. Chu 11-30-65 EE 30000 E7 D2
E7 R. Davis 09-08-77 ME 40000 E8 D1
E8 J. Jones 10-11-72 SA 50000 null D1

EmpDept Relation
eno ename bdate title salary supereno dno dname mgreno
E1 J. Doe 01-05-75 EE 30000 E2 null null null
E2 M. Smith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A. Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J. Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B. Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L. Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R. Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J. Jones 10-11-72 SA 50000 null D1 Management E8
Insertion Anomaly Example
eno ename bdate title salary supereno dno dname mgreno
E1 J. Doe 01-05-75 EE 30000 E2 null null null
E2 M. Smith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A. Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J. Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B. Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L. Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R. Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J. Jones 10-11-72 SA 50000 null D1 Management E8

Consider these two types of insertion anomalies:

 1) Insert a new employee E9 working in department D2.
You have to redundantly insert the department name and manager when
adding this record.
 2) Insert a department D4 that has no current employees.
This insertion is not possible without creating a dummy employee id and
record because eno is the primary key of the relation.
Deletion Anomaly Example
eno ename bdate title salary supereno dno dname mgreno
E1 J. Doe 01-05-75 EE 30000 E2 null null null
E2 M. Smith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A. Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J. Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B. Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L. Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R. Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J. Jones 10-11-72 SA 50000 null D1 Management E8

Consider this deletion anomaly:

 Delete employees E3 and E6 from the database.
 Deleting those two employees removes them from the database,
and we now have lost information about department D2!
Modification Anomaly Example
eno ename bdate title salary supereno dno dname mgreno
E1 J. Doe 01-05-75 EE 30000 E2 null null null
E2 M. Smith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A. Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J. Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B. Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L. Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R. Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J. Jones 10-11-72 SA 50000 null D1 Management E8

Consider these modification anomalies:

 1) Change the name of department D3 to Research.
You must update the department name in 3 different records.
 2) Change the manager of D1 to E4.
You must update the mgreno field in 2 different records.
Desirable Relational Schema Properties
Relational schemas that are well-designed have several
important properties:
 1) The most basic property is that relations consists of
attributes that are logically related.
The attributes in a relation should belong to only one entity or
relationship.
 2) Lossless-join property ensures that the information
decomposed across many relations can be reconstructed using
natural joins.
 3) Dependency preservation property ensures that
constraints on the original relation can be maintained by
enforcing constraints on the normalized relations.
Normalization Question
Question: How many of the following statements are true?

1)Normalization is a bottom up process.

2) Anomalies with updates often are indicators of a bad design.
3)The lossless-join property means that it is possible to
reconstruct the original relation after normalization using joins.
4)The dependency preservation property means that
constraints should be preserved before and after normalization.

A) 0 B) 1 C) 2 D) 3 E) 4
Functional Dependencies
Functional dependencies represent constraints on the values of
attributes in a relation and are used in normalization.

A functional dependency (abbreviated FD) is a statement

about the relationship between attributes in a relation. We say a
set of attributes X functionally determines an attribute Y if given
the values of X we always know the only possible value of Y.
 Notation: X → Y
 X functionally determines Y
 Y is functionally dependent on X
Example:
 eno → ename
 eno, pno → hours
Notation for Functional Dependencies
A functional dependency has a left-side called the determinant
which is a set of attributes, and one attribute on the right-side.

eno, pno → hours

determinant determined
attribute
Strictly speaking, there is always only one attribute on the RHS,
but we can combine several functional dependencies into one:
eno, pno → hours
eno, pno → resp
eno, pno → hours, resp
Remember that this is really short-hand for two functional
dependencies.
The Semantics of
Functional Dependencies
Functional dependencies are a property of the domain being
modeled NOT of the data instances currently in the database.
 This means that similar to keys you cannot tell if one attribute is
functionally dependent on another by looking at the data.
Example: Emp Relation
eno ename bdate title salary supereno dno
E1 J. Doe 01-05-75 EE 30000 E2 null
E2 M. Smith 06-04-66 SA 50000 E5 D3
E3 A. Lee 07-05-66 ME 40000 E7 D2
E4 J. Miller 09-01-50 PR 20000 E6 D3
E5 B. Casey 12-25-71 SA 50000 E8 D3
E6 L. Chu 11-30-65 EE 30000 E7 D2
E7 R. Davis 09-08-77 ME 40000 E8 D1
E8 J. Jones 10-11-72 SA 50000 null D1

 List the functional dependencies of the attributes in this relation.

The Semantics of Functional
Dependencies (2)
Functional dependencies are directional.
eno → ename does not mean that ename → eno
Example: Emp Relation
eno ename bdate title salary supereno dno
E1 J. Doe 01-05-75 EE 30000 E2 null
E2 M. Smith 06-04-66 SA 50000 E5 D3
E3 A. Lee 07-05-66 ME 40000 E7 D2
E4 J. Miller 09-01-50 PR 20000 E6 D3
E5 B. Casey 12-25-71 SA 50000 E8 D3
E6 L. Chu 11-30-65 EE 30000 E7 D2
E7 R. Davis 09-08-77 ME 40000 E8 D1
E8 J. Jones 10-11-72 SA 50000 null D1

 Given an employee name there may be multiple values for eno if

we have employees in the database with the same name.
 Thus knowing ename, does not uniquely tell us the value of eno.
Trivial Functional Dependencies
A functional dependency is trivial if the attributes on its left-hand
side are a superset of the attributes on its right-hand side.

Examples: eno → eno

eno, ename → eno
eno, pno, hours → eno, hours
Trivial functional dependencies are not interesting because they
do not tell us anything.
 Trivial FDs basically say "If you know the values of these
attributes, then you uniquely know the values of any subset of
those attributes.
We are only interested in nontrivial FDs.
Identifying Functional Dependencies
Identify all non-trivial functional dependencies in the Universal Relation:
eno pno resp hours ename bdate title salary supereno dno dname mgreno pname budget
E1 P1 Manager 12 J. Doe 01-05-75 EE 30000 E2 Instruments 150000
E2 P1 Analyst 24 M. Smith 06-04-66 SA 50000 E5 D3 Accounting E5 Instruments 150000
E2 P2 Analyst 6 M. Smith 06-04-66 SA 50000 E5 D3 Accounting E5 DB Develop 135000
E3 P3 Consultant 10 A. Lee 07-05-66 ME 40000 E6 D2 Consulting E7 Budget 250000
E3 P4 Engineer 48 A. Lee 07-05-66 ME 40000 E6 D2 Consulting E7 Maintenance 310000
E4 P2 Programmer 18 J. Miller 09-01-50 PR 20000 E6 D3 Accounting E5 DB Develop 135000
E5 P2 Manager 24 B. Casey 12-25-71 SA 50000 E8 D3 Accounting E5 DB Develop 135000
E6 P4 Manager 48 L. Chu 11-30-65 EE 30000 E7 D2 Consulting E7 Maintenance 310000
E7 P3 Engineer 36 J. Jones 10-11-72 SA 50000 D1 Management E8 Budget 250000
Why the Name
"Functional" Dependencies?
Functional dependencies get their name because you could
imagine the existence of some function that takes in the
parameters of the left-hand side and computes the value on the
right-hand side of the dependency.

Example: eno, pno → hours

f(eno, pno) → hours
int f (String eno, String pno)
{ // Do some lookup...
return hours;
}
Remember that no such function exists, but it may be useful to
think of FDs this way.
Functional Dependencies and Keys
Functional dependencies can be used to determine the
candidate and primary keys of a relation.
 For example, if an attribute functionally determines all other
attributes in the relation, that attribute can be a key:

eno → eno, ename, bdate, title, supereno, dno

 eno is a candidate key for the Employee relation.

Alternate definition of keys:

 A set of attributes K is a superkey for a relation R if the set of
attributes K functionally determines all attributes in R.
 A set of attributes K is a candidate key for a relation R if K is a
minimal superkey of R.
Functional Dependencies and Keys (2)
EmpDept Relation
eno ename bdate title salary supereno dno dname mgreno
E1 J. Doe 01-05-75 EE 30000 E2 null null null
E2 M. Smith 06-04-66 SA 50000 E5 D3 Accounting E5
E3 A. Lee 07-05-66 ME 40000 E7 D2 Consulting E7
E4 J. Miller 09-01-50 PR 20000 E6 D3 Accounting E5
E5 B. Casey 12-25-71 SA 50000 E8 D3 Accounting E5
E6 L. Chu 11-30-65 EE 30000 E7 D2 Consulting E7
E7 R. Davis 09-08-77 ME 40000 E8 D1 Management E8
E8 J. Jones 10-11-72 SA 50000 null D1 Management E8

eno → eno, ename, bdate, title, salary, supereno, dno

dno → dname, mgreno
Question: List the candidate key(s) of this relation.
Functional Dependency Rules
Functional dependencies follow rules (Armstrong's Axioms
1974) with the most important being transitivity.

Transitive Rule: If A → B and B → C then A → C

Functional Dependency Question
Question: How many of the following statements are true?

1)Functional dependencies are not directional.

2) A trivial functional dependency has its right side as a subset
of (or the same as) its left side.
3)A key is a set of attributes that functionally determines all
attributes in a relation.
4)Functional dependencies can be determined by examining
the data in a relation.

A) 0 B) 1 C) 2 D) 3 E) 4
Computing the Closure of FDs
The transitivity rule of FDs can be used for three purposes:
 1) To determine if a given FD X → Y follows from a set of FDs F.
 2) To determine if a set of attributes X is a superkey of R.
 3) To determine the set of all FDs (called the closure F+) that
can be inferred from a set of initial functional dependencies F.

The basic idea is that given any set of attributes X, we can

compute the set of all attributes X+ that can be functionally
determined using F. This is called the closure of X under F.
 For purpose #1, we know that X → Y holds if Y is in X+.
 For purpose #2, X is a superkey of R if X+ is all attributes of R.
 For purpose #3, we can compute X+ for all possible subsets X of
R to derive all FDs (the closure F+) .
Computing the Attribute Closure
The algorithm is as follows:
 Given a set of attributes X.
 Let X+ = X
 Repeat
Find a FD in F whose left side is a subset of X+.
Add the right side of F to X+.
 Until (X+ does not change)

After the algorithm completes you have a set of attributes X+ that

can be functionally determined from X. This allows you to
produce FDs of the form:
 X → A where A is in X+
Computing Attribute Closure Example
R (A,B,C,D,E,G)
F = {A → B,C , C → D , D → G}

Compute {A}+:
 {A}+ = {A} (initial step)
 {A}+ = {A,B,C} (by using FD A → B,C)
 {A}+ = {A,B,C,D} (by using FD C → D)
 {A}+ = {A,B,C,D,G} (by using FD D → G)

Similarly we can compute {C}+ and {E,G}+:

 {C}+ = {C,D,G}
 {E,G}+ = {E,G}
Using Attribute Closure to Detect a FD
The attribute closure algorithm can be used to determine if a
particular FD X → Y holds based on the set of given FDs F.
 Compute X+ and see if Y is in X+.

Example:
 R = (A,B,C,D,E,G)
 F = {A → B,C , C → D , D → G}
 Does the FD C,D → G hold?
Compute {C,D}+ = {C,D,G}. Yes, the FD C,D → G holds.
 Does the FD B,C → E hold?
Compute {B,C}+ = {B,C,D,G}. No, the FD B,C → E does not hold.
Function Dependency Closure Question
Question: Given the following, does {A,B → D} hold?
R = (A,B,C,D)
F = { A,B → C , C → D , D → A }

A) yes
B) no
FD Closure Question 2
Question: Given the following, does {B,D → A,C} hold?

R = (A,B,C,D)
F = {A,B → C , B,C → D , C,D → A ,
A,D → B}

A) yes
B) no
Function Dependency Key Question
Question: Given the following FDs, how many keys of R?

R = (A,B,C,D)
F = { A,B → C , C → D , D → A }

A) 0
B) 1
C) 2
D) 3
E) 4
Function Dependency Key Question 2
Question: Given the following FDs, how many keys of R?

R = (A,B,C,D)
F = {A,B → C , B,C → D , C,D → A ,
A,D → B
}
A) 0
B) 1
C) 2
D) 3
E) 4
Minimal Set of FDs
A set of FDs F is minimal if it satisfies these conditions:
 Every FD in F has a single attribute on its right-hand side.
 We cannot replace any FD X → A in F with Y → A where Y ⊂ X
and still have a set of dependencies that is equivalent to F.
 We cannot remove any FD and still have a set that is equivalent
to F.

A minimal set of FDs is like a canonical form of FDs.

Note that there may be many minimal covers for a given set of
functional dependencies F.
Full Functional Dependencies
A functional dependency A → B is a full functional
dependency if removal of any attribute from A results in the
dependency not existing any more.
 We say that B is fully functionally dependent on A.
 If remove an attribute from A and the functional dependency still
exists, we say that B is partially dependent on A.

Example:

eno → ename (full FD)

eno, ename → salary, title (partial FD - can remove ename)
eno, pno → hours, resp (full FD)
Full Functional Dependency Question
Question: True or False: You can determine if certain, valid
functional dependencies are full functional dependencies
without having any domain knowledge.

A) true

B) false
Normal Forms
A relation is in a particular normal form if it satisfies certain
normalization properties.

There are several normal forms defined:

 1NF - First Normal Form
 2NF - Second Normal Form
 3NF - Third Normal Form
 BCNF - Boyce-Codd Normal Form
 4NF - Fourth Normal Form
 5NF - Fifth Normal Form
Each of these normal forms are stricter than the next.
For example, 3NF is better than 2NF because it removes more
redundancy/anomalies from the schema than 2NF.
Normal Forms
All relations (non-normalized)
1NF (First Normal Form)
2NF (Second Normal Form)
3NF (Third Normal Form)
BCNF (Boyce-Codd NF)
4NF (Fourth NF)
5NF (Fifth NF)
First Normal Form (1NF)
A relation is in first normal form (1NF) if all its attribute values
are atomic.

That is, a 1NF relation cannot have an attribute value that is:
 a set of values (multi-valued attribute)
 a set of tuples (nested relation)

1NF is a standard assumption in relational DBMSs.

 However, object-oriented DBMSs and nested relational DBMSs
relax this constraint.

A relation that is not in 1NF is an unnormalized relation.

A non-1NF Relation
eno ename pno resp hours eno ename pno resp hours
E1 J. Doe P1 Manager 12 E1 J. Doe P1 Manager 12
E2 M. Smith P1 Analyst 24 E2 M. Smith {P1,P2} {Analyst,Analyst} {24,6}
P2 Analyst 6
E3 A. Lee P3 Consultant 10 E3 A. Lee {P3,P4} {Consultant,Engineer} {10,48}
P4 Engineer 48
E4 J. Miller P2 Programmer 18
E5 B. Casey P2 Manager 24 Two equivalent E4
E5
J. Miller
B. Casey
P2
P2
Programmer
Manager
18
24
E6 L. Chu P4 Manager 48 representations E6 L. Chu P4 Manager 48
E7 J. Jones P3 Engineer 36 E7 J. Jones P3 Engineer 36

Two ways to convert a non-1NF relation to a 1NF relation:

 1) Splitting Method - Divide the existing relation into two
relations: non-repeating attributes and repeating attributes.
Make a relation consisting of the primary key of the original relation
and the repeating attributes. Determine a primary key for this
new relation.
Remove the repeating attributes from the original relation.
 2) Flattening Method - Create new tuples for the repeating
data combined with the data that does not repeat.
Introduces primary
Determine redundancy thatthis
key for willflattened
be later removed
relation. by normalization.
Converting a non-1NF Relation
to 1NF Using Splitting
eno ename pno resp hours
E1 J. Doe P1 Manager 12
E2 M. Smith P1 Analyst 24
P2 Analyst 6
E3 A. Lee P3 Consultant 10
P4 Engineer 48
E4 J. Miller P2 Programmer 18
E5 B. Casey P2 Manager 24
E6 L. Chu P4 Manager 48
E7 J. Jones P3 Engineer 36

Also need original Repeating group:

primary key: eno (pno, resp, hours)

eno ename eno pno resp hours

E1 J. Doe E1 P1 Manager 12
E2 M. Smith E2 P1 Analyst 24
E3 A. Lee E2 P2 Analyst 6
E4 J. Miller E3 P3 Consultant 10
E5 B. Casey E3 P4 Engineer 48
E6 L. Chu E4 P2 Programmer 18
E7 J. Jones
E5 P2 Manager 24
E6 P4 Manager 48
E7 P3 Engineer 36
Converting a non-1NF Relation
to 1NF Using Flattening
eno ename pno resp hours
E1 J. Doe P1 Manager 12
E2 M. Smith P1 Analyst 24
P2 Analyst 6
E3 A. Lee P3 Consultant 10
P4 Engineer 48
E4 J. Miller P2 Programmer 18
E5 B. Casey P2 Manager 24
E6 L. Chu P4 Manager 48
E7 J. Jones P3 Engineer 36

eno ename pno resp hours

E1 J. Doe P1 Manager 12
E2 M. Smith P1 Analyst 24
E2 M. Smith P2 Analyst 6
E3 A. Lee P3 Consultant 10
E3 A. Lee P4 Engineer 48
E4 J. Miller P2 Programmer 18
E5 B. Casey P2 Manager 24
E6 L. Chu P4 Manager 48
E7 J. Jones P3 Engineer 36
Second Normal Form (2NF)
A relation is in second normal form (2NF) if it is in 1NF and
every non-prime attribute is fully functionally dependent on a
candidate key.
 A prime attribute is an attribute in any candidate key.

If there is a FD X → Y that violates 2NF:

 Compute X+.
 Replace R by relations: R1 = X+ and R2 = (R – X+) U X

Note:
 By definition, any relation with a single key attribute is in 2NF.
Second Normal Form (2NF) Example
EmpProj relation:
eno ename title bdate salary supereno dno pno pname budget resp hours

fd1

fd2

fd3

fd4

fd1 and fd4 are partial functional dependencies. Normalize to:

 Emp (eno, ename, title, bdate, salary, supereno, dno)
 WorksOn (eno, pno, resp, hours)
 Proj (pno, pname, budget)
Second Normal Form (2NF) Example (2)
Emp relation:
eno ename title bdate salary supereno dno

fd1

fd2

WorksOn relation: Proj relation:

eno pno resp hours pno pname budget

fd3 fd4
Third Normal Form (3NF)
A relation is in third normal form (3NF) if it is in 2NF and there
is no non-prime attribute that is transitively dependent on the
primary key.

That is, for all functional dependencies X →Y of R, one of the

following holds:
 Y is a prime attribute of R
 X is a superkey of R
Third Normal Form (3NF) Example
Emp relation:
eno ename title bdate salary supereno dno

fd1

fd2

fd2 results in a transitive dependency eno → salary. Remove it.

Emp Pay
eno ename title bdate supereno dno title salary

fd1 fd2
Normalization Question
Consider the universal relation R(A,B,C,D,E,F,G,H,I,J) and the
set of functional dependencies:
 F= { A,B → C ; A → D,E ; B → F ; F → G,H ; D → I,J }

List the keys for R.

Decompose R into 2NF and then 3NF relations.

Boyce-Codd Normal Form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if and only if
every determinant of a non-trivial FD is a superkey.

To test if a relation is in BCNF, we take the determinant of each

non-trivial FD in the relation and determine if it is a superkey.

The difference between 3NF and BCNF is that 3NF allows a

FD X → Y to remain in the relation if X is a superkey or Y is a
prime attribute. BCNF only allows this FD if X is a superkey.
 Thus, BCNF is more restrictive than 3NF. However, in practice
most relations in 3NF are also in BCNF.
Boyce-Codd Normal Form (BCNF)
Example
Consider the WorksOn relation where we have the added
constraint that given the hours worked, we know exactly the
employee who performed the work. (i.e. each employee is FD
from the hours that they work on projects). Then:
WorksOn eno pno resp hours
Relation is in 3NF but
not BCNF.
fd1
fd2

pno hours resp

eno hours

fd2

Note that we lose the FD eno,pno → resp, hours.

BCNF versus 3NF
We can decompose to BCNF but sometimes we do not want to
if we lose a FD.

The decision to use 3NF or BCNF depends on the amount of

redundancy we are willing to accept and the willingness to lose
a functional dependency.

Note that we can always preserve the lossless-join property

(recovery) with a BCNF decomposition, but we do not always
get dependency preservation.

In contrast, we get both recovery and dependency preservation

with a 3NF decomposition.
BCNF versus 3NF Example
An example of not having dependency preservation with BCNF:
 street,city → zipcode and zipcode → city
 Two keys: {street,city} and {street, zipcode}

street city zipcode

street zipcode

fd1

fd2 zipcode
city

zipcode → city is a fd2

BCNF violation.
BCNF versus 3NF Example (2)
Consider an example instance:
street zip city zip
invalid 545 Tech Sq. 02138 Cambridge 02138 Harvard
MIT 545 Tech Sq. 02139 Cambridge 02139 MIT

Join tuples with equal zipcodes:

street city zip
545 Tech Sq. Cambridge 02138 invalid
545 Tech Sq. Cambridge 02139 MIT

Note that the decomposition did not allow us to enforce the

constraint that street,city → zipcode even though no FDs were
violated in the decomposed relations.
Conversion to BCNF
Algorithm for converting to BCNF given relation R with FDs F:
 Look among the given FDs for a BCNF violation X → Y.
 Compute X+.
Note that X+ cannot be the set of all attributes or else X is a superkey.
 Replace R by relations with schemas:
R1 = X+
R2 = (R – X+) U X
 Project the FDs F onto the two new relations by:
Computing the closure of F and then using only the FDs whose attributes
are all in R1 or R2.
Normalization to BCNF Question
Given this schema normalize into BCNF directly.

StudentCourse
sid ssn cnum grade

fd1

fd2
fd3
Normalization Question
Given this database schema normalize into BCNF.
Lots
PID# county lot# area price taxRate

fd1
fd2
fd3
fd4

 Description:
Lots (parcels of land) are identified within each county.
Each lot can be uniquely identified either by the Property ID# or by the
County name and the Lot#.
 property id’s are unique but lot#’s are unique only with the county
The tax rate is constant within each county.
The price is based on the area of the land.
Normalization Question 2
Given this database schema normalize into BCNF.
Lots
PID# county lot# area price taxRate

fd1

fd2
fd3
fd4
fd5

 New FD5 says that the size of the parcel of land determines
what county it is in.
Normalization to BCNF Question
Given this schema normalize into BCNF:
 R (courseNum, secNum, offeringDept, creditHours,
courseLevel, instructorSSN, semester, year, daysHours,
roomNum, numStudents)
 courseNum → offeringDept,creditHours, courseLevel

 courseNum, secNum, semester, year → daysHours, roomNum,

numStudents, instructorSSN
 roomNum, daysHours, semester, year → instructorSSN,
courseNum, secNum
Fourth Normal Form (4NF) and Fifth
NormalForm (5NF)
Fourth normal form (4NF) and fifth normal formal (5NF) are
rarely used in practice.

A relation is in fourth normal form (4NF) if it is in BCNF and

contains no non-trivial multi-valued dependencies.
 A multi-valued dependency (MVD) occurs when two
independent, multi-valued attributes are present in the schema.

A relation is in fifth normal form (5NF) if the relation has no

join dependency.
 A join dependency implies that spurious tuples are generated
when the relations are natural joined.
Normal Forms in Practice
Normal forms are used to prevent anomalies and redundancy.
However, just because successive normal forms are better in
reducing redundancy that does not mean they always have to
be used.

For example, query execution time may increase because of

normalization as more joins become necessary to answer
queries.
Normal Forms in Practice Example
For example, street and city uniquely determine a zipcode.
Emp
eno street city
eno street city zipcode
fd1
fd1
street city zipcode

fd2
fd2

In this case, reducing redundancy is not as important as the fact

that a join is necessary every time the zipcode is needed.
 When a zipcode does change, it is easy to scan the entire Emp
relation and update it accordingly.
Normal Forms and ER Modeling
Normalization and ER modeling are two independent concepts.

You can use ER modeling to produce an initial relational

schema and then use normalization to remove any remaining
redundancies.
 If you are a good ER modeler, it is rare that much normalization
will be required.

In theory, you can use normalization by itself. This would

involve identifying all attributes, giving them unique names,
discovering all FDs and MVDs, then applying the normalization
algorithms.
 Since this is a lot harder than ER modeling, most people do not
do it.
Normal Forms in Practice Summary
In summary, normalization is typically used in two ways:
 To improve on relational schemas after ER design.
 To improve on an existing relational schemas that are poorly
designed and contain redundancy and potential for anomalies.
In practice, most designers make sure their schemas are in at
least 3NF or BCNF because this removes most anomalies and
redundancies. If multi-valued dependencies exist, they should
definitely be removed to go to 4NF.

There is always a tradeoff in normalization: Normalization

reduces redundancy but fragments the relations which makes it
more expensive to query. Some redundancy may help
performance.
 This is the classic time versus space tradeoff!
Conclusion
Normalization is produces relations with desirable properties
and reduces redundancy and update anomalies.

Normal forms indicate when relations satisfy certain properties.

 1NF - All attributes are atomic.
 2NF - All attributes are fully functionally dependent on a key.
 3NF - There are no transitive dependencies in the relation.
 4NF - There are no multi-valued dependencies in the relation.
 5NF - The relation has no join dependency.
In practice, normalization is used to improve schemas produced
after ER design and existing relational schemas.
 Full normalization is not always beneficial as it may increase
query time.
Objectives
 Explain process of normalization and why it is beneficial.
 Describe the concept of the Universal Relation.
 What are the motivations for normalization?
 What are the 3 types of update anomalies? Be able to give
examples.
 Describe the lossless-join property and dependency
preservation property.
 Define and explain the concept of a functional dependency.
 What is a trivial FD?
 Describe the relationship between FDs and keys.
 Be able to compute the closure of a set of attributes.
Objectives (2)
 Define normal forms, be able to list the normal forms, and draw
a diagram showing their relationship.
 Define 1NF. Know how to convert non-1NF relation to 1NF.
 Define 2NF. Convert 1NF relation to 2NF.
 Define 3NF. Convert 2NF relation to 3NF.
 Define BCNF. Convert 3NF relation to BCNF. Compare BCNF
and 3NF. Be able to convert directly to BCNF using algorithm.
 Explain normalization in terms of the time versus space
tradeoff.
 Explain the relationship between normalization and ER
modeling.

Dbmsi Unit 4
No ratings yet
Dbmsi Unit 4
20 pages
Lec 5 Normalization
No ratings yet
Lec 5 Normalization
25 pages
Lecture 8 - Normalisation
No ratings yet
Lecture 8 - Normalisation
7 pages
4-Database Design Theory-Without Inclass Exercises
No ratings yet
4-Database Design Theory-Without Inclass Exercises
121 pages
Normalization 1
No ratings yet
Normalization 1
21 pages
4.4 Normalization
No ratings yet
4.4 Normalization
55 pages
Normalization
No ratings yet
Normalization
5 pages
Normalization: Dale-Marie Wilson, PH.D
No ratings yet
Normalization: Dale-Marie Wilson, PH.D
47 pages
Lecture 11fjylkgey
No ratings yet
Lecture 11fjylkgey
107 pages
BITWeek7 - L10 - ITE2422 V1
No ratings yet
BITWeek7 - L10 - ITE2422 V1
11 pages
DBMS Module 3
No ratings yet
DBMS Module 3
24 pages
Module 4 - Normalization
No ratings yet
Module 4 - Normalization
141 pages
Lecture 7 - 8 - Normalization
No ratings yet
Lecture 7 - 8 - Normalization
30 pages
Database Normalization Guide
No ratings yet
Database Normalization Guide
31 pages
6 Normalization Part 1
No ratings yet
6 Normalization Part 1
68 pages
Fourth Unit
No ratings yet
Fourth Unit
14 pages
Unit 1 Lesson 7 Normalization
No ratings yet
Unit 1 Lesson 7 Normalization
12 pages
Normalization
No ratings yet
Normalization
31 pages
Database Normalization Guide
No ratings yet
Database Normalization Guide
192 pages
DBMS 3 1
No ratings yet
DBMS 3 1
27 pages
Schema Refinement and Normal Forms: UNIT-4
No ratings yet
Schema Refinement and Normal Forms: UNIT-4
10 pages
Dependency
No ratings yet
Dependency
47 pages
Mod3 Chap2 - Normalization
No ratings yet
Mod3 Chap2 - Normalization
44 pages
Database Normalization Guide
No ratings yet
Database Normalization Guide
5 pages
MYSQL DAY - 20 (Normalization)
No ratings yet
MYSQL DAY - 20 (Normalization)
13 pages
Normalization
No ratings yet
Normalization
30 pages
Unit Iv Data Normalization: Semantics of Attributes Should Be Easy To Interpret
No ratings yet
Unit Iv Data Normalization: Semantics of Attributes Should Be Easy To Interpret
14 pages
Database Normalization
No ratings yet
Database Normalization
18 pages
DBMS Normalization
No ratings yet
DBMS Normalization
15 pages
What Is A Relational Database?
No ratings yet
What Is A Relational Database?
13 pages
Module - 4 Notes - 12-11-2021
No ratings yet
Module - 4 Notes - 12-11-2021
21 pages
Relational Design
No ratings yet
Relational Design
50 pages
DBMS Unit 4
No ratings yet
DBMS Unit 4
54 pages
DBMS
No ratings yet
DBMS
10 pages
Lecture 4
No ratings yet
Lecture 4
49 pages
Co1 Session7
No ratings yet
Co1 Session7
69 pages
Final Explanation
No ratings yet
Final Explanation
211 pages
Database Anamolies
No ratings yet
Database Anamolies
39 pages
Data Normalization
No ratings yet
Data Normalization
25 pages
Database Normalization Essentials
No ratings yet
Database Normalization Essentials
18 pages
Cape Notes Unit 2 Module 1 Content 9
No ratings yet
Cape Notes Unit 2 Module 1 Content 9
21 pages
Media To Upload1700644090
No ratings yet
Media To Upload1700644090
43 pages
Lecture Note 4 - Dependencies and Normalization
No ratings yet
Lecture Note 4 - Dependencies and Normalization
35 pages
Topic 4 Normalization
No ratings yet
Topic 4 Normalization
64 pages
Chapter 4 DB
No ratings yet
Chapter 4 DB
30 pages
Database Normalization Guide
No ratings yet
Database Normalization Guide
44 pages
Normalization
No ratings yet
Normalization
30 pages
IS2511 Module 9
No ratings yet
IS2511 Module 9
32 pages
Logical Database Design Guide
No ratings yet
Logical Database Design Guide
53 pages
Normalization - InClass Demo
No ratings yet
Normalization - InClass Demo
16 pages
Normalization
No ratings yet
Normalization
41 pages
Lec 5 Normalization
No ratings yet
Lec 5 Normalization
55 pages
Unit 3 NEP DBMS
No ratings yet
Unit 3 NEP DBMS
27 pages
Functional Dependency & Normalization Guide
No ratings yet
Functional Dependency & Normalization Guide
46 pages
Graphs 1
No ratings yet
Graphs 1
23 pages
DBMS Lecture13 NoSQL
No ratings yet
DBMS Lecture13 NoSQL
31 pages
PHYSICS 11 Dec
No ratings yet
PHYSICS 11 Dec
3 pages
Genius Physics 2 by Al - Beruni
100% (2)
Genius Physics 2 by Al - Beruni
143 pages
Duty Chart (Course Wise) Report
No ratings yet
Duty Chart (Course Wise) Report
53 pages
Hcr2 Multi Scripts Gemz
No ratings yet
Hcr2 Multi Scripts Gemz
6 pages
Descargar Pergaminos de Cristal Claudio Pasten
No ratings yet
Descargar Pergaminos de Cristal Claudio Pasten
3 pages
4NXCI Software Update Log
No ratings yet
4NXCI Software Update Log
3 pages
Account 421
No ratings yet
Account 421
537 pages
Chapter4 OK
No ratings yet
Chapter4 OK
39 pages
Chapter 4.1 Install openLDAP For Windows PDF
No ratings yet
Chapter 4.1 Install openLDAP For Windows PDF
6 pages
Laser Business Plan
No ratings yet
Laser Business Plan
11 pages
Itsm Self Led Project Plan
No ratings yet
Itsm Self Led Project Plan
44 pages
ZXA10 C320 Product Introduction
100% (2)
ZXA10 C320 Product Introduction
9 pages
3 Best Ways On How To Bypass Google Account Samsung
No ratings yet
3 Best Ways On How To Bypass Google Account Samsung
10 pages
692283-Phone Repair StepbyStep Flowchart Diagrams
No ratings yet
692283-Phone Repair StepbyStep Flowchart Diagrams
52 pages
New Ford Figo Features & Specs
No ratings yet
New Ford Figo Features & Specs
22 pages
L I N K e D: Linked List
100% (1)
L I N K e D: Linked List
11 pages
Mark Sheet
No ratings yet
Mark Sheet
1 page
CMOS 4000 Series IC List
No ratings yet
CMOS 4000 Series IC List
6 pages
Elementary Level Test: 1. Open The Brackets Using The Correct Form of The Verb: Present Simple/continuous
No ratings yet
Elementary Level Test: 1. Open The Brackets Using The Correct Form of The Verb: Present Simple/continuous
2 pages
Final Project New
100% (1)
Final Project New
127 pages
Advanced Driver Assistance Systems
No ratings yet
Advanced Driver Assistance Systems
11 pages
Hays 1808039
No ratings yet
Hays 1808039
124 pages
BGP for Network Professionals
No ratings yet
BGP for Network Professionals
12 pages
UTRAN & UE Connection Protocols
No ratings yet
UTRAN & UE Connection Protocols
33 pages
Serial Communication With ET 200S 1SI Module Via PROFIBUS-PROFINET CP
No ratings yet
Serial Communication With ET 200S 1SI Module Via PROFIBUS-PROFINET CP
57 pages
36 PES POs and Mapping CSE
No ratings yet
36 PES POs and Mapping CSE
3 pages
d2 33 90pa - Manual
No ratings yet
d2 33 90pa - Manual
8 pages
( - AMW - 32P - ) - Owner's - Manual - V0.1 - 2024-11-13T142334.190
No ratings yet
( - AMW - 32P - ) - Owner's - Manual - V0.1 - 2024-11-13T142334.190
13 pages
KIRAN's Resume
No ratings yet
KIRAN's Resume
1 page
Data Sandcone 2021
No ratings yet
Data Sandcone 2021
1 page
Aadhaar Details for Mehtab
No ratings yet
Aadhaar Details for Mehtab
1 page
Avaya CMS Deploying On AWS 19.2 March 2021
No ratings yet
Avaya CMS Deploying On AWS 19.2 March 2021
41 pages

DBMS Lecture09

Uploaded by

DBMS Lecture09

Uploaded by

Introduction to Database Systems

Dr. Shams Qazi

Normalization decomposes relations into smaller relations that

Normalization is a bottom-up design technique for producing

Redundancy occurs when the same data value is stored more

Update anomalies are problems that arise when trying to insert,

The goal of normalization is to produce a set of relational

Proj Relation Dept Relation

Universal(eno, pno, resp, hours, ename, bdate, title, salary, supereno,

What are some of the problems with the Universal Relation?

Normalization will allow us to get back to the starting relations.

 Deletion Anomalies - Deletion of a tuple may lose information

 Modification Anomalies - Changing an attribute of a tuple may

Consider these two types of insertion anomalies:

Consider this deletion anomaly:

Consider these modification anomalies:

1)Normalization is a bottom up process.

A functional dependency (abbreviated FD) is a statement

eno, pno → hours

 List the functional dependencies of the attributes in this relation.

 Given an employee name there may be multiple values for eno if

Examples: eno → eno

Example: eno, pno → hours

eno → eno, ename, bdate, title, supereno, dno

Alternate definition of keys:

eno → eno, ename, bdate, title, salary, supereno, dno

Transitive Rule: If A → B and B → C then A → C

1)Functional dependencies are not directional.

The basic idea is that given any set of attributes X, we can

After the algorithm completes you have a set of attributes X+ that

Similarly we can compute {C}+ and {E,G}+:

A minimal set of FDs is like a canonical form of FDs.

eno → ename (full FD)

There are several normal forms defined:

1NF is a standard assumption in relational DBMSs.

A relation that is not in 1NF is an unnormalized relation.

Two ways to convert a non-1NF relation to a 1NF relation:

Also need original Repeating group:

eno ename eno pno resp hours

eno ename pno resp hours

If there is a FD X → Y that violates 2NF:

fd1 and fd4 are partial functional dependencies. Normalize to:

WorksOn relation: Proj relation:

That is, for all functional dependencies X →Y of R, one of the

fd2 results in a transitive dependency eno → salary. Remove it.

List the keys for R.

Decompose R into 2NF and then 3NF relations.

To test if a relation is in BCNF, we take the determinant of each

The difference between 3NF and BCNF is that 3NF allows a

pno hours resp

Note that we lose the FD eno,pno → resp, hours.

The decision to use 3NF or BCNF depends on the amount of

Note that we can always preserve the lossless-join property

In contrast, we get both recovery and dependency preservation

street city zipcode

zipcode → city is a fd2

Join tuples with equal zipcodes:

Note that the decomposition did not allow us to enforce the

 courseNum, secNum, semester, year → daysHours, roomNum,

A relation is in fourth normal form (4NF) if it is in BCNF and

A relation is in fifth normal form (5NF) if the relation has no

For example, query execution time may increase because of

In this case, reducing redundancy is not as important as the fact

You can use ER modeling to produce an initial relational

In theory, you can use normalization by itself. This would

There is always a tradeoff in normalization: Normalization

Normal forms indicate when relations satisfy certain properties.

You might also like