Database design
Part 2: Normalization
Instructor: Vu Tuyet Trinh
Objective
•Upon completion of this lesson, students will be able to:
1. Know why we need normalization in relational DB
2. Identify normal forms such as 1st NF, 2nd NF, 3rd NF
3. Know how to normalize a relational DB into 3NF
1
Outline
• Introduction
• Normal Forms
• Normalization
1. Introduction
1.1. Motivation
1.2. Full & Partial Dependency
1.3. Transitive Dependency
2
1.1. Motivation
• Designing DB: one of the most difficult tasks
• One simplest design approach is to use a big table and store all
data
• But what’s the problem with this?
• Anomalies
• Redundancies
1.1. Motivation: Insertion Anomalies
• PK: (student_id, subject_id)
• We can not insert a new subject if we do not have a student
assigned to it yet
• We can not insert a null value into PK attributes
student_id full_name dob subject_id name result
1234 David Beckham 12/21/1997 IT3090 Databases A
1238 Theresa May 08/06/1998 IT4843 Data integration B
1234 David Beckham 12/21/1997 IT4868 Web mining C
1497 Tony Blair 03/01/1999 IT3090 Databases A
1238 Theresa May 08/06/1998 IT4868 Web mining B
1542 Margaret Thatcher 05/08/1997 IT2000 Introduction to ICT C
3
1.1. Motivation: Update anomalies
• An instance where the same information must be updated in
several different places
• If you update the name of subject "Databases", you need to
update in two different places (not efficient)
student_id full_name dob subject_id name result
1234 David Beckham 12/21/1997 IT3090 Databases A
1238 Theresa May 08/06/1998 IT4843 Data integration B
1234 David Beckham 12/21/1997 IT4868 Web mining C
1497 Tony Blair 03/01/1999 IT3090 Databases A
1238 Theresa May 08/06/1998 IT4868 Web mining B
1542 Margaret Thatcher 05/08/1997 IT2000 Introduction to ICT C
1.1. Motivation: Deletion Anomalies
• Where deleting one piece of data inadvertently causes other
data to be lost
• If we delete student Margaret Thatcher, then we will lose
information about subject "Introduction to ICT"
student_id full_name dob subject_id name result
1234 David Beckham 12/21/1997 IT3090 Databases A
1238 Theresa May 08/06/1998 IT4843 Data integration B
1234 David Beckham 12/21/1997 IT4868 Web mining C
1497 Tony Blair 03/01/1999 IT3090 Databases A
1238 Theresa May 08/06/1998 IT4868 Web mining B
1542 Margaret Thatcher 05/08/1997 IT2000 Introduction to ICT C
4
1.1. Motivation
• Normalization is the process of removing anomalies and
redundancies from DB
1.2. Full & Partial Dependency
student_id full_name dob subject_id name result
1234 David Beckham 12/21/1997 IT3090 Databases A
1238 Theresa May 08/06/1998 IT4843 Data integration B
1234 David Beckham 12/21/1997 IT4868 Web mining C
1497 Tony Blair 03/01/1999 IT3090 Databases A
1238 Theresa May 08/06/1998 IT4868 Web mining B
1542 Margaret Thatcher 05/08/1997 IT2000 Introduction to ICT C
Key: {student_id, subject_id}
student_id full_name Full Key Dependency:
{student_id, subject_id} → result
subject_id result Partial Key Dependency:
student_id → full_name
10
10
5
1.3. Transitive dependency
• If A → B and B → C
• Attribute A must be the determinant of C.
• Attribute A transitively determines attribute C or
• C is transitively dependent on A
A B C
11
11
2. Normal Forms
2.1. Introduction
2.2. 1st Normal Form
2.3. 2nd Normal Form
2.4. 3rd Normal Form
12
12
6
2.1. Introduction
• Each form was designed to eliminate one or more of the anomalies: First
NF; Second NF; Third NF
• Unnormalized Form (UNF)
• A table that contains one or more repeating groups. I.e., its cell may
contain multiple values
Multi Value
Or repeating groups
student_id full_name dob subject_id name result
1234 David Beckham 12/21/1997 IT3090, IT4868 Databases, Web mining A, C
1238 Theresa May 08/06/1998 IT4843, IT4868 Data integration, Web mining B, B
1497 Tony Blair 03/01/1999 IT3090 Databases A
1542 Margaret Thatcher 05/08/1997 IT2000 Introduction to ICT C
13
13
2.2. First Normal Form (1NF)
• A cell in a relation contains one and only one value.
• Disallows composite attributes, multivalued attributes or nested
relations
student_id full_name dob subject_id name result
1234 David Beckham 12/21/1997 IT3090 Databases A
1238 Theresa May 08/06/1998 IT4843 Data integration B
1234 David Beckham 12/21/1997 IT4868 Web mining C
1497 Tony Blair 03/01/1999 IT3090 Databases A
1238 Theresa May 08/06/1998 IT4868 Web mining B
1542 Margaret Thatcher 05/08/1997 IT2000 Introduction to ICT C
14
14
7
2.3. Second Normal Form (2NF)
• Based on the concept of full functional dependency
• A prime attribute
• It is an attribute that is member of some candidate key
• 2NF relation is
• in 1NF and every non-prime attribute is fully functionally dependent on
the primary key
Partial Dependency
student_id full_name
Full Dependency
subject_id result
15
15
2.4. Third Normal Form (3NF)
• A relation that is in 2NF
• No non-prime attribute is transitively dependent on the primary key.
• I.e, all non-prime attributes are fully & directly dependent on the PK.
16
16
8
3. Normalization
3.1. Properties of relational decompositions
3.2. An algorithm decomposes a universal relation into 3NF
3.3. Some examples
17
17
3.1. Properties of relational decompositions
• A single universal relation schema R = {A1, A2, ..., An} that includes all
the attributes of the DB
• F is a set of FDs holds on R
• Using the FDs, the algorithms decompose the universal relation
schema R into a set of relation schemas D = {R1, R2, ..., Rm}; D is
called a decomposition of R
18
18
9
3.1. Properties of relational decompositions
• Attribute preservation
• Each attribute in R will appear in at least one relation
schema Ri in the decomposition so that no attributes are lost
• Dependency preservation
• Each FD X→Y specified in F either appeared directly in one of the R i in
the decomposition D or could be inferred from the dependencies that
appear in some Ri.
• Lossless join
• r = R1(r) ⋈ R2(r) ⋈ … ⋈ Rm (r)
19
19
3.1. Properties of relational decompositions
• An example
• Suppose we have a relation:
Learn(student_id, full_name, dob, subject_id, name, result)
• We split it into two relations:
Student(student_id, full_name, dob)
Subject(subject_id, name)
• This decomposition does not warrant:
• Attribute preservation: Lost information about "result"
• Dependency preservation condition, for instance, (student_id, subject_id)
→ result is loss.
• Lossless join property, i.e., we can join these two relations
20
20
10
3.2. An algorithm decomposes a universal
relation into 3NF
• Input: A universal relation R and a set of FDs F on the attributes of R.
• Find a minimal cover G for F
• For each left-hand-side X of a FD that appears in G, create a relation schema
in D with attributes {X ∪ {A1} ∪ {A2} ... ∪ {Ak} }, where X → A1, X → A2, ..., X →
Ak are the only dependencies in G with X as the left-hand-side (X is
the key of this relation);
• Place any remaining attributes (that have not been placed in any relation)
in a single relation schema to ensure the attribute preservation property.
21
21
3.2. An algorithm decomposes a universal
relation into 3NF
• If none of the relation schemas in D contains a key of R, then create one
more relation schema in D that contains attributes that form a key of R.
• Eliminate redundant relations from the resulting set of relations in the
relational database schema.
• A relation R is considered redundant if R is a projection of another
relation S in the schema; alternately, R is subsumed by S
22
22
11
3.3. Some examples
• Example 1:
• Given R = {A, B, C, D, E, F, G}, F = {A→B; ABCD→E; EF→G;
ACDF→EG}
• A minimal cover of F is G = {A→B, ACD→E, EF→G}
• Find a minimal key: K = ACDF
• We have R1(AB), R2(ACDE), R3(EFG)
• Since K is not a subset of Ri, we have a new relation R4(ACDF)
• In conclusion, we have a decomposition D = {R1, R2, R3, R4}
23
23
3.3. Some examples
• Example 2:
• Given R(student_id, name, birthday, advisor, department, semester, course, grade)
• F = { student_id → (name, birthday); advisor → department; (student_id, semester,
course) → (grade, advisor, department)}
• We denote like this: student_id (A), name (B), birthday (C), advisor (D), department
(E), semester (F), course (G), grade (H)
• F is rewritten as {A →BC; D →E; AFG →HDE}
• A minimal cover of F is G = {A→B; A →C; D →E; AFG →DH}
• Find a minimal key: K = AFG
• We have R1(ABC), R2(DE), R3(AFGDH)
• Since K is a subset of R3, we have a decomposition D = {R1, R2, R3} or
{R1(student_id, name, birthday), R2 (advisor, department), R3 (student_id, semester,
course, advisor, grade)}
24
24
12
Remark
• Motivation of normalization
• Full & Partial Dependency
• Transitive dependency
• 1NF, 2 NF, 3 NF
• Properties of relational decompositions
• An algorithm decomposes a universal relation into 3NF
25
25
Summary
1. Introduction
• Normalization is the process of removing anomalies and redundancies
from DB
• Full & Partial Dependency
• Transitive dependency
2. Normal Forms
• 1NF, 2NF, 3NF
3. Normalization
• Properties of relational decompositions
• An algorithm decomposes a universal relation into 3NF
• Some examples
26
26
13