Unit IV: Unit IV Database Normalization
Informal Design Guidelines for Relational Schemas
Functional Dependencies
Normal Forms Based on Primary Keys
First, Second and Third Normal Forms
Boyce-Codd Normal Form
Multivalued Dependency and Fourth Normal Form
Properties of Relational Decomposition
Informal guidelines for relation schemas
Semantics of the attributes
Reducing the redundant values in tuples
Reducing null values in tuples
Disallowing the possibility of generating spurious
tuples
The main goal of relational database is to create / find
good collection of relational schemas. Such that
database should allow us to store data / information
without unnecessary redundancy and should also allow
to retrieve information easily. That is the goal of
relational database design should concentrated
to avoid redundant data from database.
to ensure that relationships among attributes are represented.
to facilitate the checking of updates for the violation of
database integrity constraints.
A bad relational database design may lead to:
Repetition of information.
That is, it leads data redundancy in database, so obviously it requires
much space.
Inability to represent certain information.
Example 1: Consider the relational schema
Branch_loan = ( beranch_name, branch_city, assets, customer_name,
loan_no, amount)
branch_name branch city assets customer name loan no amount
kathmandu baneshwor 25000 rohan L - 15 3000
Lalitpur patan 19000 mohan L - 17 5000
baneshwor 25000 raju L - 19 10000
Pokhara Prithibi nagar 17000 manoj L - 10 7000
Lalitpur patan 19000 swikar L - 30 9000
Redundancy:
Data for branch_name, branch_city, assets are repeated for each loan that provides by
bank.
Storing information several times leads waste of storage space / time.
Data redundancy leads problem in Insertion, deletion and update.
Anomalies
Insertion problem
We cannot store information about a branch without loan information, we
can use null values for loan information, but they are difficult to handle.
Deletion problem
In this example, if we delete the information of Manoj (i.e. DELETE FROM
branch_loan WHERE customer_name = ‘Manoj’;), we cannot obtain the
information of pokhara branch (i.e. we don’t know branch city of
pokhara, total asset of pokhara branch etc.
Update problem
Since data are duplicated so multiple copies of same fact need to update
while updating one. It increases the possibility of data inconsistency.
When we made update in one copy there is possibility that only some of
the multiple copies are update but not all, which lead data/database in
inconsistent state.
Normalization
Is the process of minimizing redundancy from a relation or
set of relations
Redundancy in relation may cause insertion, deletion and
update anomalies.
Reducing redundancy is done for mainly two purposes
Reducing the amount of storage needed to store data.
Avoiding unnecessary data conflicts that may creep in because
of multiple copies of this data getting stored.
Normal forms are used to eliminate or reduce redundancy
in a database tables.
It is a database design technique which organizes tables in
manner that reduces redundancy and dependency of data.
It divides larger tables to smaller tables and links them
using relationships.
Advantages of Normalization
More efficient data structure
Avoid redundant fields or columns
More flexible data structure i.e. we should be able to
add new rows and data values easily
Better understanding of data
Ensures that distinct tables exit when necessary
Minimizes data duplication
Close modeling of real world entities, processes and
their relationships
Better performance is ensured as databases become
lesser in size, the passes through the data becomes
faster and shorter thereby improving response time
and speed
Disadvantages of Normalization
More tables to join as by spreading out data into more tables,
the need to joins tables increases and the task become more
tedious.
Tables will contain codes rather than real data as the repeated
data will be stored as lines of codes rather than the true data.
Data model becomes extremely difficult to query against as
the data model is optimized for application
As the normal form type progresses , the performance
become slower
It is very time consuming and difficult process in normalizing
relations to higher degree
Careless decomposition may leads to bad design of data base
which may leads to serious problems
Proper knowledge is required on the various normal forms to
execute the normalization process efficiently.
Functional Dependencies
Functional dependencies are constraints on the set of legal
relations. It defines attributes of relation, how they are related to
each other.
It determines unique value for a certain set of attributes to the
value for another set of attributes That is functional dependency is
a generalization of the notation of key.
Functional dependencies are interrelationship among attributes of a
relation.
Definition:
For a given relation R with attribute X and Y, Y is said to be
functionally dependent on X, if given value for each X uniquely
determines the value of the attribute in Y. X is called determinant of
the functional dependency (FD) and functional dependency denoted
by X→ Y.
Example 1: consider a relation supplier
Supplier(supplier_id#,sname,status,city)
Here, sname, status and city are functionally dependent on
Supplier.supplier_id→supplier.sname
Supplier.supplier_id→supplier.status
Supplier.supplier_id→supplier.city
Or simply,
supplier_id→ sname
supplier_id→ status
supplier_id→city
Example 2: Consider a relation student-info
Student-info(name#,course#,phone_no,major,prof,grade)
That is, {name,course} is composite primary key
This relation has the following functional dependencies
{name→phone_no, name→major, name,course→grage,
course→prof}
Functional dependency X→Y satisfied on the relation R/
hold on R
FD X→Y is satisfied on relation R if the cardinality of
∏Y( x=x(r)) is at most one. That is if, two tuples ti and tj of
R have the same X value then the corresponding value
of Y must identical.
Types of Functional Dependency
Trival and Non Trival dependencies
Fully functional dependency
Partial functional dependency
Transitive dependency
Multi valued dependency
Trival and Non Trival Functional
Dependency
The dependency of an attribute on a set of
attributes is known as trival functional
dependency if the set of attributes includes
that attribute.
AB is trival functional dependency if B is a
subset of A.
AA and BB is also trival
{Student_id , Student_name } student_id
If a functional dependency XY holds true
where Y is not a subset of X then this
dependency is called non trival functional
dependency.
Emp_id emp_name
Fully and Partial
An attribute is fully functional dependent on a
second attribute if and only if it is functionally
dependent on the second attribute but not on
any subset of the second attribute.
an attribute is partial functionally dependent
on a second attribute if and only if it is
functionally dependent on the second
attribute and also dependency occur on any
subset of the second attribute.
{emp_id, project_no} emp_name is partial
FD as emp_id emp_name
Transitive Dependency
A FD is said to be transitive if it is indirectly
formed by two functional dependencies. For
example X Z is transitive dependency if the
following three FDs hold true.
XY
YZ
X Z
Multivalued
Multivalued dependency occurs when there
are more than one independent multivalued
attribues in a table
Example Database: Supplier/Parts
S# SNAME STATUS CITY S# P# QTY
S SP
S1 Smith 20 London S1 P1 300
S2 Jones 10 Paris S1 P2 200
S3 Blake 30 Paris S1 P3 400
S4 Clark 20 London S1 P4 200
S5 Adams 30 Athens S1 P5 100
S1 P6 100
P
P# PNAME COLOUR WEIGHT CITY S2 P1 300
P1 Nut Red 12 London S2 P2 400
P2 Bolt Green 17 Paris S3 P2 200
P3 Screw Blue 17 Rome S4 P2 200
P4 Screw Red 14 London S4 P4 300
P5 Cam Blue 12 Paris S4 P5 400
P6 Cog Red 19 London
18
First Normal Form (1NF)
A relation is in 1 NF if all the underlying
domains contain only atomic values
For any relational system this is always
true of the relations
It is true for SQL, since only one value
allowed in any field
Normalised sometimes means 1NF but
more often a higher form of normalisation
e.g. 3NF or higher
19
Normal Forms
There are lots of Normal Forms and a
higher one implies all lower ones as well
The process is reversible
i.e. going to a higher normal form never loses
information
Main ones in increasing level are:
1NF, 2NF, 3NF, BCNF, 4NF, 5NF
A database should be in 3NF or higher
20
Second Normal Form (2NF)
Suppose in our database there was only one
table called FIRST instead of S and SP
e.g. FIRST (S#, SNAME, STATUS, CITY, P#, QTY)
Primary Key is now (S#,P#) and there are
FD’s other than from a primary key
In addition, add the FD
CITY STATUS
i.e. the status is determined by the location
21
Second Normal Form (cont)
S# STATUS
QTY
P#
CITY
An illustration of the problems this causes is that
we cannot insert a CITY for a Supplier unless they
supply at least one part
There are also problems with deletes and updates
22
Second Normal Form(cont)
Definition of 2NF
A relation is in 2NF iff it is in 1NF and every
nonkey attribute is irreducibly dependent
on the primary key
A nonkey attribute is a key that does not
participate in the primary key
This assumes there is only one candidate
key, which we can then assume is the
primary key
23
Second Normal Form(cont)
A relation in 1NF can always be transformed into a
collection of 2NF relations by means of projections
In general, a relation R{A, B, C,D} with primary
key {A,B} having a FD A D
can always be decomposed into the projections
R1{A,D} with primary key A
and R2{A,B,C} with primary key {A,B} and
foreign key {A} REFERENCES
R1
Thus FIRST(S#.STATUS,CITY,P#,QTY) can be
decomposed into:
SECOND(S#, STATUS, CITY) and SP(S#,P#,QTY)
24
Second Normal Form (cont)
S# S# STATUS
QTY
P#
CITY
This shows the 2NF tables
diagrammatically
The transformation from 1NF to 2NF is an
example of non-loss decomposition
25
Third Normal Form (3NF)
However 2NF still causes problems, in that
although the dependency of STATUS on S# is
functional and irreducible, it is also transitive
S# CITY STATUS
which has the logical consequence that
S# STATUS also holds
In general, if A B C then A C
This causes problems with insert, delete and
update
In fact, we want all the nonkey attributes in a
relation to be mutually independent, so that each
can be updated independently of the rest
26
Third Normal Form(cont)
Definition of 3NF:
A relation is in 3NF iff it is in 2NF and every nonkey
attribute is nontransitively dependent on the
primary key
This implies that there are no mutual
dependencies between the nonkey attributes
A relation that is in 2NF can always be
transformed by means of projections into a
collection of 3NF relations
27
Third Normal Form(cont)
Thus given a relation R(A,B,C) with primary key
{A} and assuming the FD B C, we normalise
taking the projections
R1{B,C} with primary key {B}
and R2{A,B} with primary key {A} and foreign
key {B} references R1
e.g. SECOND{S#, STATUS,CITY} becomes
SC{S#,CITY} and CS{CITY,STATUS}
N.B. The relation SP was already in 3NF
28
Boyce/Codd Normal Form(BCNF)
The definition of 3NF assumed in a
relation that there was only one
candidate key
BCNF is a slightly stronger normal form
that addresses this (named from its
authors)
It deals with the case where a relation
has:
1. Two or more candidate keys such that
2. The candidate keys were composite, and
3. They overlapped (had at least one attribute in
common)
29
Boyce/Codd Normal Form(cont)
Formal Definition: A relation is in BCNF iff every
nontrivial left-irreducible FD has a candidate key
as its determinant (i.e. LHS)
Informal Definition: A relation is in BCNF iff the
only determinants are candidate keys
This means that in a FD diagram, the only
arrows are out of candidate keys
With no reference to 3NF, this is stronger and
implies 3NF
It is conceptually simpler than 3NF
Relations FIRST and SECOND are not in BCNF
Relations SP , SC and CS are
30
Multi-valued Dependency
(MVD)
Consider a relation representing courses, teachers
and texts
There may be several teachers on one course and
several texts, but the set of texts are independent
of the set of teachers
Could represent by a row for every combination of
teacher and text but this contains redundancy
which could lead to update anomalies, i.e.
CTX(COURSE,TEACHER,TEXT)
This is an example of two multi-valued
dependencies (MVDs)
COURSE TEACHER and COURSE TEXT
31
Multi-valued Dependency
(cont)
Definition: Let R be a relation, and let A,B, and C
be subsets of the attributes of R, then
B is multi-dependent on A
AB
iff in every possible legal value of R, the set of B
values matching a given (A value, C value) pair
depends only on the A value (i.e. is independent
of the C value)
Every FD is a MVD (but not the other way round)
MVDs go in pairs: if MVD AB holds then so
does AC often written as A B | C
e.g. COURSETEACHER and COURSETEXT
32
Fourth Normal Form (4NF)
Fagin’s Theorem
Let R{A,B,C} be a relation, where A,B, and C are
sets of attributes. Then R is equal to the join of
its projections on {A,B} and {A,C} iff R satisfies
the MVD A B | C
Definition of 4NF:
A relation R is in 4NF iff whenever there exists
subsets A and B of the attributes of R such that
the nontrivial MVD AB is satisfied, then all
the attributes of R are also functionally
dependent on A
An MVD AB is trivial if either A is a superset of
B or the union of A and B is the entire heading
33 Database [ITC218]:
Fourth Normal Form(cont)
Thus instead of having the relation
CTX(COURSE,TEACHER,TEXT)
which is in BCNF, it is decomposed into two
projections
CT(COURSE,TEACHER)
and CX (COURSE,TEXT)
which are now in 4NF
Notes:
1. 4NF implies BCNF
2. 4NF is always achievable but not always desirable, due
to other constraints
34 Database [ITC218]:
Decomposition
Decompose the relation schema Lending-schema
into:
Branch-schema = (branch-name, branch-city,assets)
Loan-info-schema = (customer-name, loan-number,
branch-name,
amount)
All attributes of an original schema (R) must
appear in the decomposition (R1, R2):
R = R1 R2
Lossless-join decomposition.
For all possible relations r on schema R
r = R1 (r) R2 (r)
Example of Non Lossless-Join Decomposition
Decomposition of R = (A, B)
R1 = (A) R2 = (B)
A B A B
1 1
2 2
1
A(r) B(r)
r
A B
A (r) B (r)
1
2
1
2