Normalization
3 Normal Forms Based on Primary Keys
■ 3.1 Normalization of Relations
■ 3.2 Practical Use of Normal Forms
■ 3.3 Definitions of Keys and Attributes Participating
in Keys
■ 3.4 First Normal Form
■ 3.5 Second Normal Form
■ 3.6 Third Normal Form
3.1 Normalization of Relations (1)
■ Normalization:
■ The process of decomposing unsatisfactory "bad"
relations by breaking up their attributes into
smaller relations
■ Normal form:
■ Condition using keys and FDs of a relation to
certify whether a relation schema is in a particular
normal form
Normalization of Relations (2)
■ 2NF, 3NF, BCNF
■ based on keys and FDs of a relation schema
■ 4NF
■ based on keys, multi-valued dependencies :
MVDs; 5NF based on keys, join dependencies :
JDs
■ Additional properties may be needed to ensure
a good relational design
3.2 Practical Use of Normal Forms
■ Normalization is carried out in practice so that the
resulting designs are of high quality and meet the
desirable properties
■ The practical utility of these normal forms becomes
questionable when the constraints on which they are
based are hard to understand or to detect
■ The database designers need not normalize to the highest
possible normal form
■ (usually up to 3NF, BCNF or 4NF)
■ Denormalization:
■ The process of storing the join of higher normal form
relations as a base relation—which is in a lower normal form
3.3 Definitions of Keys and Attributes
Participating in Keys (1)
■ A key of a relation schema R = {A1, A2, ...., An}
is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
Eg: {SSN},{SSN,ENAME}, {SSN,ENAME,SEX}
■ A key K is a superkey with the additional
property that removal of any attribute from K will
cause K not to be a superkey any more.
Eg:{SSN,ENAME}, {SSN,ENAME,SEX}
Definitions of Keys and Attributes
Participating in Keys (2)
■ If a relation schema has more than one key, each is
called a candidate key.
■One of the candidate keys is arbitrarily designated to
be the primary key, and the others are called
secondary keys.
Eg: PRIMARY KEY = {SSN}, Secondary Key
={ESSN},
■ A Prime attribute must be a member of some
candidate key
Eg: SSN, PNO of Works-on {SSN,PNO}
■ A Nonprime attribute is not a prime attribute—that
is, it is not a member of any candidate key.
Problems due to redundancy
Insertion anomaly
Deletion Anamoly
Updation Anamoly
While updating the table, if one
Of the row missed to update,
Data inconsistency can occur.
First Normal Form
■ All the attributes must have atomic values
■ Disallows
■ composite attributes
■ multivalued attributes
■ nested relations; attributes whose values for an
individual tuple are non-atomic
Example
• In the student table, subject column is
multivalued attribute.
• Student table is not in 1NF
Normalize into 1NF
Figure 10.8 Normalization into 1NF
Second Normal Form (1)
■ Uses the concepts of FDs, primary key
■ Definitions
■ Prime attribute: An attribute that is member of the
candidate key K
■ Full functional dependency: a FD X -> Y where removal
of any attribute from X means the FD does not hold any
more
■ Examples:
■ {SSN, PNUMBER} -> HOURS is a full FD since neither SSN
-> HOURS nor PNUMBER -> HOURS hold
■ {SSN, PNUMBER} -> ENAME is not a full FD (it is called a
partial dependency ) since SSN -> ENAME also holds
Second Normal Form (2)
■ A relation should be in 1NF.
■ A relation schema R is in second normal form
(2NF) if every non-prime attribute A in R is fully
functionally dependent on the primary key
■ R can be decomposed into 2NF relations via the
process of 2NF normalization .
2NF
Third Normal Form (1)
■ Definition:
■ Transitive functional dependency: a FD X -> Z
that can be derived from two FDs X -> Y and Y ->
Z
■ Examples:
■ SSN -> DMGRSSN is a transitive FD
■ Since SSN -> DNUMBER and DNUMBER ->
DMGRSSN hold
■ SSN -> ENAME is non-transitive
■ Since there is no set of attributes X where SSN -> X
and X -> ENAME
Third Normal Form (2)
■ A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively
dependent on the primary key
■ R can be decomposed into 3NF relations via the process
of 3NF normalization
■ NOTE:
■ In X -> Y and Y -> Z, with X as the primary key, we consider
this a problem only if Y is not a part of a candidate key.
■ When Y is a candidate key, there is no problem with the
transitive dependency .
■ E.g., Consider EMP (SSN, Emp#, Salary ).
■ Here, SSN -> Emp# -> Salary and Emp# is a candidate key.
Example 1
• Here,
• CK/PK :{rollno}
• FD: Rollno State
State City
• state is a non-prime attribute, which is trivially
dependent.
• Given relation is not in 3NF.
• Decompose relation as:
• Stud(Rollno,City), City(city,state)
Example 2
• Given relation is not in 3NF
• Solution:
• Split the relation into two relations, such as-
Normal Forms Defined Informally
■ 1st normal form
■ All attributes depend on the key
■ 2nd normal form
■ All attributes depend on the whole key
■ 3rd normal form
■ All attributes depend on nothing but the key
General Normal Form Definitions (For
Multiple Keys) (1)
■ The above definitions consider the primary key
only
■ The following more general definitions take into
account relations with multiple candidate keys
■ A relation schema R is in second normal form
(2NF) if every non-prime attribute A in R is fully
functionally dependent on every key of R
General Normal Form Definitions
Third normal form (3NF)
■ Definition:
■ Superkey of relation schema R - a set of attributes S of
R that contains a key of R
■ A relation schema R is in third normal form (3NF) if
whenever a FD X -> A holds in R, then either:
■ (a) X is a key of R, or
■ (b) A is a prime attribute of R
BCNF (Boyce-Codd Normal Form)
■ A relation schema R is in Boyce-Codd Normal Form
(BCNF) if whenever an FD X -> A holds in R, then X is a
superkey of R
■ In simple terms, for any case (say, X->Y), X can't
be a non-prime attribute.
■ Each normal form is strictly stronger than the previous
one
■ Every 2NF relation is in 1NF
■ Every 3NF relation is in 2NF
■ Every BCNF relation is in 3NF
■ There exist relations that are in 3NF but not in BCNF
■ The goal is to have each relation in BCNF (or 3NF)
Example
• 1NF
• 3NF
• BCNF
• We can decompose the table
Figure 10.12 Boyce-Codd normal form
Figure 10.13 a relation TEACH that is in
3NF but not in BCNF
Achieving the BCNF by Decomposition (1)
■ Two FDs exist in the relation TEACH:
■ fd1: { student, course} -> instructor
■ fd2: instructor -> course
■ {student, course} is a candidate key for this relation and
that the dependencies shown follow the pattern in Figure
10.12 (b).
■ So this relation is in 3NF but not in BCNF
■ A relation NOT in BCNF should be decomposed so as to
meet this property, while possibly forgoing the
preservation of all functional dependencies in the
decomposed relations.
Achieving the BCNF by Decomposition (2)
■ Three possible decompositions for relation TEACH
■ {student, instructor} and {student, course}
■ {course, instructor } and {course, student}
■ {instructor, course } and {instructor, student}
■ All three decompositions will lose fd1.
■ We have to settle for sacrificing the functional dependency
preservation. But we cannot sacrifice the non-additivity property
after decomposition.
■ Out of the above three, only the 3rd decomposition will not generate
spurious tuples after join.(and hence has the non-additivity property).
■ A test to determine whether a binary decomposition (decomposition
into two relations) is non-additive (lossless) is discussed in section
11.1.4 under Property LJ1. Verify that the third decomposition above
meets the property.
Fourth Normal Form (4NF)
• MULTIVALUED DEPENDANCY
• A B, is multivalued dependency if
3 conditions for Multivalued Dependency
1. A B ,for a single value of A, more than one value of B
exists.
2. Tables should have at least 3 columns.
(if table has only 2 columns, we can use 1NF to resolve it).
3. For this table with A,B,C columns, B and C should be
independent.
If ALL THE 3 CONDITIONS ARE TRUE, THEN WE CAN SAY THAT
THE TABLE MAY HAVE MULTI-VALUED DEPENDENCY.
EXAMPLE 1
• DECOMPOSITION of the table:
Example 2
Decomposition
• The process of decomposition in DBMS helps
us remove
– redundancy,
– Inconsistencies,
– anomalies from a database when we divide the
table into numerous tables.
• In simpler words, the process of
decomposition refers to dividing a relation X
into {X1, X2,……Xn}.
Types of Decomposition
• Lossless Join Decomposition (Non-additive):
• Consider there is a relation R which is decomposed into
sub relations R1 , R2 , …. , Rn.
• This decomposition is called lossless join decomposition
when the join of the sub relations results in the same
relation R that was decomposed.
• For lossless join decomposition, we always have
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R, where ⋈ is a natural join
operator
• Lossy Join Decomposition:
• Consider there is a relation R which is decomposed into sub
relations R1 , R2 , …. , Rn.
• This decomposition is called lossy join decomposition when
the join of the sub relations does not result in the same
relation R that was decomposed.
• The natural join of the sub relations is always found to have
some extraneous tuples.
• For lossy join decomposition, we always have-
• R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R, where ⋈ is a natural join operator
Determining Whether Decomposition Is Lossless Or Lossy
• Consider a relation R is decomposed into two sub
relations R1 and R2. Then,
• If all the following conditions satisfy, then the
decomposition is lossless.
• If any of these conditions fail, then the decomposition
is lossy.
• Condition-01
• Union of both the sub relations must contain all the
attributes that are present in the original relation R.
• Thus, R1 ∪ R2 = R
• Condition-02
• Intersection of both the sub relations must not be
null.
• In other words, there must be some common
attribute which is present in both the sub
relations.
• Thus, R1 ∩ R2 ≠ ∅
• Condition-03
• Intersection of both the sub relations must be a
super key of either R1 or R2 or both.
• Thus, R1 ∩ R2 = Super key of R1 or R2
Properties of Decomposition
• Lossless:
• All the decomposition that we perform in Database management system should be
lossless.
• All the information should not be lost while performing the join on the sub-relation
to get back the original relation. It helps to remove the redundant data from the
database.
• Dependency Preservation:
• Dependency Preservation is an important technique in database management
system.
• It ensures that the functional dependencies between the entities is maintained while
performing decomposition.
• It helps to improve the database efficiency, maintain consistency and integrity.
• Lack of Data Redundancy:
• Data Redundancy is generally termed as duplicate data or repeated data.
• This property states that the decomposition performed should not suffer redundant
data.
• It will help us to get rid of unwanted data and focus only on the useful data or
information.