Database Normalization
To reduce redundancy and
improve integrity of databases
Reminder
1. Use this form to confirm your presence at a lecture
2. Test’s results
3. Review of the previous lecture (Relational Algebra)
4. Laboratory work #2
2
Database design approaches
Top-down design Bottom-up design
Entity-Relationship modelling: Normalization:
- Identifying entities, attributes and - Structuring relations to reduce
relationships redundancy
- Mapping diagrams to relations - Using formal rules of decomposition
(tables) (splitting relations into two)
In practice, both approaches are applied
3
Normalization definition
The process of structuring and removing redundant
data of relations in order to improve storage efficiency
and data integrity.
Normalization divides larger tables into smaller tables
and links them using relationships.
It’s applied only for one relation.
4
The reason to use normalization
1. Minimizing data redundancy
Book
Name Author name Author email
Book1 Author1 author1@email.com
Book2 Author1 author1@email.com
There are two book which are written by the same Author1. The author’s email
address is in the relation twice. It’s redundant.
5
The reason to use normalization
2. Eliminating modification anomalies:
- Update anomaly
- Delete anomaly
- Insert anomaly
6
Update anomaly
Book
Name Author name Author email
Book1 Author1 author1@email.com
Book2 Author1 author1@email.com
What if Author1’s email is going to be changed?
Every record of the Book relation must be updated. It is extra work to be done.
7
Delete anomaly
Book
Name Author name Author email
Book1 Author1 author1@email.com
Book2 Author1 author1@email.com
What if Book1 and Book2 are going to be deleted?
Having deleted both records authors information will be lost. It’s a harmful
side-effect.
8
Insert anomaly
Book
Name Author name Author email
Book1 Author1 author1@email.com
Book2 Author1 author1@email.com
Book3 Author1 author1@email.com
What if Book3 of Author1 is going to be inserted?
Having done that 3rd record of Author1’s email will be inserted. It’s unnecessary
(redundant) information.
9
Normal forms
Codd introduced the concept of normalization and what is now known as the first normal form (1NF) in 1970.
Codd went on to define the second normal form (2NF) and third normal form (3NF) in 1971.
Codd and Raymond F. Boyce defined the Boyce-Codd normal form (BCNF) in 1974.
A relational database relation is often described as "normalized" if it meets third normal form.
Most 3NF relations are free of insertion, update, and deletion anomalies.
10
Normal forms
Normalization is a database design technique, which is used to design a relational database table up to higher
normal form (NF).
The process is progressive, and a higher level of database normalization cannot be achieved unless the
previous levels have been satisfied.
1 NF 2 NF 3 NF 3 BCNF 4 NF 5 NF 6 NF
Used in practice
11
Normal forms - Functional dependency
If one set of attributes in a relation (A) determines another set of attributes in the relation, then the second set of
attributes (B) is said to be functionally dependent (FD) on the first set of attributes.
We write it: A→B
where A is determinant
“If you know A you can always say value of B”
Examples:
Student ID → Student Name
City→Country
Username→Profile
Profile→Username
Email→First Name, Last Name, Age
12
Normal forms - Functional dependency
The following is a complete system of rules for functional dependencies:
F1 (Reflexivity): If Y ⊆ X, then X→Y (parts depend on the whole)
If Author, Book → Pages then Author, Book → Book
F2 (Augmentation): If Z ⊆ W and X→Y, then XW→YZ
Pages ⊆ (Tables, Pages) and Book Name → Pictures
then (Book Name, Tables, Pages) → (Pictures, Pages)
F3 (Transitivity): If X→Y and Y→Z, then X→Z.
Other useful rules can be derived from these.
F4 (Pseudotransitivity): If X → Y and YW → Z, then XW→Z.
F5 (Decomposition): If X → Y, then X → A for every A in Y (right part is just one attribute)
If Author, Book → Tables, Pages, Pictures then
Author, Book → Tables Author, Book → Pages Author, Book → Pictures 13
Normal forms - Functional dependency
Because of F5, we assume, that all the FDs have only one attribute at their right side.
The FD X→Y is called non-trivial when Y is not an element of X. Note that a trivial FD is obeyed by all
instances.
A field Y is "functionally dependent" on a field (or fields) X if it is invalid to have two records with the same
X-value but different Y-values.
That is, a given X-value must always occur with the same Y-value.
It is like 1:1 but only from left to right (sometimes the opposite is true)
When X is a key, then all fields are by definition functionally dependent on X in a trivial way, since there
can't be two records having the same X value.
14
Normal forms - Functional dependency
Examples:
Let’s look at a relation Book:
Book (ID, Name, ISBN, Author Name, Author Email, Pages),
Keys:
ID is the primary key(PK), ISBN is the candidate key(CK) (equals to ID), ID+Name is a super key (not minimal)
Functional dependencies:
ID→Name, ISBN, Author Name, Author Email, Pages (By definition of FD and PK + CK)
ISBN→Name, ID, Author Name, Author Email, Pages (By definition of FD and PK + CK)
ISBN→Name
Author Name→Author Email
Author Email→Author Name (it is not always true that A→B and B→A !)
ID→Author Name, Author Email and because of F5 the next two:
ID→Author Name
ID→Author Email
... and others.. 15
Normal forms - Functional dependency
Exercise:
Student Department Subject Mark Date
S1 FPM Math 5 04.02.2020
S1 FPM History 4 02.03.2018
S2 FPM Math 4 01.04.2016
What are the functional dependencies?
16
Normal forms - Functional dependency
Exercise:
Student Department Subject Mark Date
S1 FPM Math 5 04.02.2020
S1 FPM History 4 02.03.2018
S2 FPM Math 4 01.04.2016
What are the functional dependencies?
Student, Subject, Date→Department, Mark (by definition of primary key)
Student→Department
Student, Subject, Date→Department (F5 rule)
Student, Subject, Date→Mark (F5 rule)
Department→Student (why?)
17
1 Normal form (1NF)
To satisfy 1NF, the values in each column of a table must be atomic:
● Each relation “cell” should contain a single value.
● Each record needs to be unique (it’s a property of relation).
Person Friend Person Friend
Ann Bill, Alice Ann Bill
Alex John, Mary Ann Alice
Ann Bill, Alice Alex John
Alex Mary
18
2 Normal form (2NF)
A relation is said to be in 2NF if both the following conditions hold:
● Table is in 1NF (First normal form)
● Every non-key attribute must functionally depend on the whole key, not a part of it
This 2NF only is an issue for relations with a composite primary key. Composite primary key is a primary key
consisting of two or more attributes.
If a relation has primary key consisting of just only one attribute then a relation satisfies 2NF automatically.
19
2 Normal form (2NF). Example
Parts in warehouses
Parts Warehouse Quantity Warehouse City
P1 W1 10 Kyiv
P2 W1 7 Kyiv
P1 W2 44 Lviv
Primary key:
(Parts, Warehouse)
Functional dependencies:
Parts, Warehouse→Quantity, Warehouse City (OK by Primary Key definition)
Parts, Warehouse→Quantity (OK)
Parts, Warehouse→Warehouse City (OK)
Warehouse→Warehouse City (Fails 2NF because Warehouse is just a part of the Primary Key)
Warehouse City→Warehouse (OK) 20
2 Normal form (2NF). Example
Parts in warehouses
Parts Warehouse Quantity Warehouse City
P1 W1 10 Kyiv
P2 W1 7 Kyiv
P1 W2 44 Lviv
QUANTITY is a fact about the combination of PART and WAREHOUSE,
but WAREHOUSE-ADDRESS is a fact about the WAREHOUSE alone (part of primary key)
The basic problems with this design are:
● The warehouse City is repeated in every record that refers to a part stored in that warehouse.
● If the City of the warehouse changes, every record referring to a part stored in that warehouse must be updated.
● Because of the redundancy, the data might become inconsistent, with different records showing different addresses for the same
warehouse.
● If at some point in time there are no parts stored in the warehouse, there may be no record in which to keep the warehouse's
address. 21
2 Normal form (2NF). Example
Foreign key (N:1)
Parts in warehouses Warehouse
Parts Warehouse Quantity Warehouse Warehouse City
P1 W1 10 W1 Kyiv
P2 W1 7 W2 Lviv
P1 W2 44
Functional dependencies: Functional dependencies:
Parts, Warehouse→Quantity Warehouse→Warehouse City
(OK by Primary Key definition) (OK by Primary Key definition)
22
2 Normal form (2NF). Exercise
TV programmes
TV Channel TV Channel Genre Program Program Duration
TV1 Sport Football Match 123
TV1 Sport Skiing 75
TV2 Movie Terminator 6 142
The task:
1) What are functional dependencies?
2) Does the relation satisfy 2NF? Why?
23
2 Normal form (2NF). Exercise
TV programmes
TV Channel TV Channel Genre Program Program Duration
TV1 Sport Football Match 123
TV1 Sport Skiing 75
TV2 Movie Terminator 6 142
Functional dependencies:
TV Channel, Program→TV Channel Genre, Program Duration
Program→Program Duration (2NF violation!)
TV Channel→TV Channel Genre (2NF violation!)
24
2 Normal form (2NF). Exercise
TV Channel/Program
Channel
Functional dependencies:
TV Channel TV Channel TV Channel Program
Only trivial (why?)
Genre
TV1 Football Match
TV1 Sport
TV1 Skiing
TV2 Movie
TV2 Terminator 6
Program
Functional dependencies:
TV Channel→TV Channel Genre Program Program Duration
Football Match 123
Functional dependencies: Skiing 75
Program→Program Duration
Terminator 6 142
25
3 Normal form (3NF)
A relation R is said to be in 3NF if both the following conditions hold:
● Table is in 2NF (Second normal form)
● Every non-prime attribute of R is non-transitively dependent on every candidate key of R.
A non-prime attribute of R is an attribute that does not belong to any candidate key of R
F3 (Transitivity): If X→Y and Y→Z, then X→Z.
All non-prime attribute of R must be functionally dependent on a primary/candidate key i.e. there can be
no interdependencies between non-prime attributes.
26
3 Normal form (3NF). Example
Student University University Location
John KPI Ukraine
Ann Oxford England
Paul Stanford USA
Functional dependencies:
Student→University, University Location
Student→University
Student→University Location
University→University Location
Student→University→University Location (transitive via non-key! 3NF violation) 27
3 Normal form (3NF). Example
Student University
Student University University University Location
John KPI KPI Ukraine
Ann Oxford Oxford England
Paul Stanford Stanford USA
Functional dependencies: Functional dependencies:
Student→University University→University Location
No transitive dependencies!
28
3 Normal form (3NF). Exercise
Books
Book Name Author Author Email Book genre
B1 A1 a1@a1.com Criminal
B2 A1 a1@a1.com Romance
B3 A2 a2@a2.com Technology
Assumption:
One book is written by one author
The task:
1) What are functional dependencies?
2) Does the relation satisfy 3NF? Why?
29
3 Normal form (3NF). Exercise
Books
Book Name Author Author Email Book genre
B1 A1 a1@a1.com Criminal
B2 A1 a1@a1.com Romance
B3 A2 a2@a2.com Technology
Functional dependencies: Transitional functional dependencies (TFD):
Book Name→Author, Author Email, Book genre Book Name→Author→Author Email
Book Name→Book genre Book Name→Author Email→Author
Book Name→Author
Book Name→Author Email Both TFDs violate 3NF!
Author→Author Email
Author Email→Author
30
3 Normal form (3NF). Exercise
Books Authors
Book Name Author Email Book genre Author Author Email
B1 a1@a1.com Criminal A1 a1@a1.com
B2 a1@a1.com Romance A2 a2@a2.com
B3 a2@a2.com Technology
Is it possible to use ‘Author’ as the PK?
Functional dependencies: Functional dependencies:
Book Name→Author Email, Book genre Author Email→Author
No transitive dependencies!
31
3 Normal form (3NF). Exercise
User
ID Username Email
1 a1 a1@a1.com
2 a2 a2@a2.com
Does the table satisfies 3NF?
Super, Candidate, Primary Keys:
???
Functional dependencies:
???
32
3 Normal form (3NF). Exercise
User
ID Username Email Username is unique - candidate key!
Email is unique - candidate key!
1 a1 a1@a1.com Keys?
2 a2 a2@a2.com
Functional dependencies:
ID→Username, Email
ID→Username
ID→Email
Username→Email
Email→Username
ID→Username→Email (transitive but not non-key! No problem)
ID→Email→Username (transitive but not non-key! No problem)
33
Homework
Student Faculty Dean Student Dean Age Student
Age Photo
Change the design of the relation performing
normalization:
1. Set the primary key, find the candidate keys
2. Find functional dependencies
3. Ensure third normal form is satisfied
34