A Helpful Hand
DATABASE MANAGEMENT SYSTEMS
UNIT V
SCHEMA REFINEMENT
DEPARTMENT OF COMPUTER SCIENCE
AND ENGINEERING
CSEROCKZ
DATABASE MANAGEMENT SYSTEMS
UNIT-V
INTRODUCTION TO SCHEMA REFINEMENT
Conceptual database design gives us a set of relation schemas and integrity
constraints (ICs) that can be regarded as a good starting point for the final database
design. This initial design must be refined by taking the ICs into account and also
by considering performance criteria and typical workloads.
A major aim of relational database design is to minimie data redundancy. The
problems associated with data redundancy are illustrated as follows!
Problems caused by redudacy
"toring the same information in more than one place within a database is called
redundancy and can lead to several problems!
Reduda! S!ora"e# "ome information is stored repeatedly.
U$da!e Aomal%es# If one copy of such repeated data is updated# an
inconsistency is created unless all copies are similarly updated.
Iser!%o Aomal%es# It may not be possible to store certain information
unless some other# unrelated# information is stored as well.
Dele!%o Aomal%es# It may not be possible to delete certain information
without losing some other# unrelated# information as well.
E&# Consider a relation# $ourly%&mps(ssn, name, lot, rating, hourly_wages,
hours_worked)
The key for $ourly%&mps is ssn. In addition# suppose that the hourly_wages
attribute is determined by the rating attribute. That is# for a given rating value#
there is only one permissible hourly_wages value. This IC is an e'ample of a
functional dependency. It leads to possible redundancy in the relation
$ourly%&mps# as shown below!
ssn name lo
t
ratin
g
hourly%wages hours%worked
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
()* Adithya +, , -. +.
(*) /evesh 00 , -. 1.
(*+ Ayush
"oni
1( ( * 1.
(2* 3ajasekha
r
1( ( * 10
(c- "unil 1( , -. +.
If the same value appears in the rating column of two tuples# the IC tells us that the
same value must appear in the hourly_wages column as well. This redundancy has
the following problems!
Redundant Storage: The rating value , corresponds to the hourly%wage -.#
and this association is repeated three times.
Update Anomalies: The hourly_wage in the first tuple could be updated
without making a similar change in the second tuple.
Insertion Anomalies: 4e cannot insert a tuple for an employee unless we
know the hourly_wage for the employee5s rating value.
Deletion Anomalies: If we delete all tuples with a given rating value (e.g.#
we delete the tuples for Ayush "oni and 3ajasekhar) we lose the association
between that rating value and its hourly_wage value.
Null )alues
6ull values cannot provide a complete solution# but they can provide some help.
Consider the e'ample $ourly%&mps relation. $ere null values cannot help to
eliminate redundant storage# update or deletion anomalies. It appears that they can
address insertion anomalies. 7or instance# we can insert an employee tuple with
null values in the hourly wage field. $owever# null values cannot address all
insertion anomalies. Thus# null values do not provide a general solution to the
problems of redundancy# even though they can help in some cases.
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
Decom$os%!%os
A decom$os%!%o o* rela!%o sc+ema R consists of replacing the relation schema
by two or more relation schemas that each contain a subset of the attributes of R
and together include all attributes in R.
E&# 4e can decompose $ourly%&mps into two relations!
$ourly%&mps0(ssn, name, lot, rating, hours_worked)
4ages(rating, hourly_wages)
Problems Rela!ed !o Decom$os%!%o
Two important 8uestions must be asked during decomposition process!
-. /o we need to decompose a relation9
0. 4hat problems does a given decomposition cause9
To answer a first 8uestion# several normal forms have been proposed for relations.
If a relation schema is in one of these normal forms# we know that certain kinds of
problems cannot arise.
4ith respect second 8uestion# two properties of decomposition are to be
considered!
The lossless!oin property enables us to recover any instance of the relation of the
decomposed relation from corresponding instances of the smaller relations.
The dependencypreser"ation property enables us to enforce any constraint on the
original relation by simply enforcing some constraints on each of the smaller
relations. That is# we need not perform joins of the smaller relations to check
whether a constraint on the original relation is violated.
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
Fuc!%oal De$edec%es
A *uc!%oal de$edecy (7/) is a kind of IC that generalies the concept of a
key.
:et 3 be a relation schema and let ; and < be nonempty sets of attributes in 3. 4e
say that an instance r of 3 satisfies the 7/ ;=<
-
if the following holds for every
pair of tuples t- and t0 in r!
If t-.; > t0.;# then t-.< > t0.<.
An 7/ ;=< says that if two tuples agree on the values in attributes ;# they must
also agree on the values in attributes <.
E&# The 7/ A?=C is satisfied by the following instance!
A ? C /
a- b- c- d-
a- b- c- d0
a- b0 c0 d-
a0 b- c1 d-
$ere# if we add a tuple @a-# b-# c0# d-A to the instance shown in figure# the
resulting instance would violate the 7/.
Reaso%" abou! FDs
1
X Y is read as X functionally determines Y, or si!"# as X
determines Y.
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
Biven a set of 7/s over a relation schema 3# typically several additional 7/s hold
over 3 whenever all of the given 7/s hold.
E&# Consider 4orkers(ssn, name, lot, did, since)
4ith given 7/s ssn # did and did# lot. Then# in any legal instance of 4orkers# if
two tuples have the same ssn value# they must have the same did value# and
because they have the same did value# they must also have the same lot value.
Therefore# the 7/ ssn# lot also holds on 4orkers.
Closure o* a Se! o* FDs
The set of all 7/s implied by a given set $ of 7/s is called the closure o* F,
denoted as $%. The closure $% can be calculated by using the following
Arms!ro"-s A&%oms rules. :et ;# <# and C be the sets of attributes over a relation
schema 3!
Re*le&%.%!y# If ; is a super set of <# then ;=<.
Au"me!a!%o# If ;=<# then ;C=<C for any C.
Tras%!%.%!y# If ;=< and <=C# then ;=C.
U%o# If ;=< and ;=C# then ;=<C.
Decom$os%!%o# If ;=<C# then ;=< and ;=C.
E&/#
Consider a relation schema A?C with 7/s A=? and ?=C.
7rom transitivity# we get A=C.
7rom augmentation# we get AC=?C# A?=AC# A?=C?.
E&0#
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
Contracts(contractid, supplierid, pro!ectid, deptid, partid, &ty, "alue)
4e denote the schema for Contracts as 'S(D)*+.
The following are the given 7/s!
i, '# 'S(D)*+.
ii, ()#'.
iii, SD#).
"everal additional 7/s hold in the closure of the set of given 7/s!
7rom ()#' and '#'S(D)*+, and transitivity# we infer ()#'S(D)*+.
7rom SD#) and augmentation# we infer SD(# ().
7rom SD(#(), ()#'S(D)*+, and transitivity# we infer SD(#'S(D)*+.
Note:
In a !r%.%al FD, the right side contains only attributes that also appear on the
left side. Dsing refle-i"ity, we can generate all trivial dependencies# which
are of the form!
;=<# where < is a subset of ;# ; is a subset of A?C# and < is a subset of
A?C.
7rom augmentation# we get the nontrivial dependencies.
A!!r%bu!e Closure
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
If we want to check whether a given dependency# say# ;=<# is in the closure $%,
we can do so efficiently without computing $%. 4e first compute the a!!r%bu!e
closure 12 with respect to 7# which is the set of attributes A such that ;=A can be
inferred using Armstrong A'ioms. The algorithm for computing the attribute
closure of a set ; of attributes is shown below!
closure . /0
repeat until there is no change! E
if there is an 7/ U#+ in $ such that U 1 closure,
then set closure . closure D F
G
NORMA3IZATION
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
2ormali3ation is a techni8ue for producing a set of relations with desirable
properties# given the data re8uirements of an enterprise.
The process of normaliation was first developed by &.7.Codd.
2ormali3ation is often performed as a series of tests on a relation to
determine whether it satisfies or violates the re8uirements of a given normal
form.
Ad.a!a"es o* Normal%4a!%o#
:ess storage space
3educes data redundancy in a database
It eliminates serious manipulation anomalies.
7le'ible structure
Normal Forms
Biven a relation schema# we need to decide whether it is a good design or we need
to decompose it into smaller relations. To make such a decision# several ormal
*orms have been proposed.
The normal forms based on 7/s are first normal form 452$,, second normal
form462$,, third normal form472$,, and 8oyce'odd normal form 48'2$,.
F%rs! Normal Form 5/NF6#
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
A relation in which the intersection of each row and column contains one and only
one value.
E&am$le#
&mp6um &mpHhone &mp/egrees
--- .+.I
01,+.--0
000 .+.I
012,*)(+
E ?A# ?"c# Hh/
G
111 .+.I
01+()*,2
E ?"c# J"c G
Identification of multi"alued attri9utes:
The above relation in o! in -67 as &mp/egrees is a multiIvalued field!
employee 111 has two degrees! 8Sc and :Sc
employee 000 has three degrees! 8A, 8Sc, )hD
;ransformation into 52$:
In order to make this relation to be in -67# we remove the repeating group
(&mp/egrees details) by placing the repeating data along with a copy of the
original key attribute ( &mp6um) in a separate relation. The format of the resulting
-67 relations are as follows!
&mployee( &mp6um# &mpHhone) &mployee/egree(&mp6um#
&mp/egrees)
&mp6um &mpHhone
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
--- .+.I01,+.--0
000 .+.I012,*)(+
111 .+.I01+()*,2
Secod Normal Form 50NF6#
Par!%al De$edecy# A partial dependency e'ists when an attribute ? is
functionally dependent on an attribute A# and A is a component of a multipart
candidate key.
0NF# A relation is in 067 if it is in -67# and every nonIkey attribute is fully
dependent on each candidate key. (That is# we don5t have any partial functional
dependency.)
A relation in 067 will not have any partial dependencies.
E&am$le#
Consider this I.3%e table (in -67)!
Inv6um :ine6um Hrod6um Kty Inv/ate
&mp6um &mp/egrees
000 ?A
000 ?"c
000 Hh/
111 ?"c
111 J"c
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
The above relation has the following 7/s!
EInv6um# :ine6umG=Hrod6um# Kty
EInv6um# Hrod6umG=:ine6um# Kty
Inv6um=Inv/ate
Identification of partial dependencies:
Inv:ine is o! 0NF since there is a partial dependency of Inv/ate on Inv6um.
;ransformation into 62$:
4e can impro"e the database by decomposing the above relation into relations!
Inv6um :ine6um Hrod6um Kty
Inv6um Inv/ate
T+%rd Normal Form 57NF6#
Tras%!%.e de$edecy# A condition where A# ?# and C are attributes of a relation
such that if A =? and ?=C# then C is transitively dependent on A via ?.
7NF# A relation that is in first and second normal form# and in which no nonIkey
attribute is transitively dependent on the candidate key.
In other words#
A relation in 167 will not have any transitive dependencies of nonIkey attribute on
a candidate key through another nonI key attribute.
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
7ormally it can be given as!
:et 3 be relation schema# 7 be the set of 7/s given to hold over 3# then for every
7/ ;=A in 7# one of the following is true!
-) ; is a candidate key
or
0) A is part of the key.
E&am$le#
Consider an Em$loyee relation!
&mp6um &mp6ame /ept6um /ept6ame
This relation has the following 7/s!
&mp6um=&mp6ame# /ept6um# /ept6ame
/ept6um=/ept6ame
Identifying transiti"e dependencies:
&mp6ame# /ept6um# and /ept6ame are nonIkey attributes.
/ept6um determines /ept6ame# a nonIkey attribute# and /ept6um is not a
candidate key.
;ransformation into 72$:
Is the relation in 1679 L no
Is the relation in 0679 L yes
4e correct the situation by decomposing the original relation into two 167
relations!
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
8oyce9Codd Normal Form 58CNF6#
6amed after its inventors! ?oyce and
Codd
I "tronger than 167
De!erm%a!# 3efers to the attribute or
group of attributes on the left hand side of
the arrow of a functional dependency.
&'! Consider an 7/#
&mp6um=&mp&mail. $ere# &mp6um is
a determinant of &mp&mail.
8CNF# A relation is in ?C67# if and only
if# every determinant is a candidate key.
7ig! &.7.Codd
&mp6um &mp6ame /ept6um
/ept6um /ept6ame
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
7ormally it can be given as!
:et 3 be a relation schema# 7 be the set of 7/s given over 3# then for every 7/
; =A in 7# ; should be a candidate key.
E&am$le#
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
;ransformation into 8'2$:
Note:
any relation that is in ?C67# is in 167
any relation in 167 is in 067
any relation in 067 is in -67
There is a se8uence to normal forms!
-67 is considered the weakest
067 is stronger than -67
167 is stronger than 067# and
?C67 is considered the strongest
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
Mul!%9)alued De$edec%es 5M)D6
The possible e'istence of multiIvalued dependencies in a relation is due to first
normal form (-67)# which disallows an attribute in a tuple from having a set of
values.
7or e'ample# if we have two multiIvalued attributes in a relation# we have to repeat
each value of one of the attributes with every value of the other attribute# to ensure
that tuples of a relation are consistent. This type of constraint is referred to as a
multiIvalued dependency and results in data redundancy.
Consider the <mployee relation which is not in -67 shown in figure!
&mp6um &mpHhone &mp/egrees
---
E .+.I000000#
.+.I111111G E ?A# ?"cG
The result of -67 on above relation is shown below!
&mp6um &mpHhone &mp/egrees
--- .+.I000000 ?A
--- .+.I111111 ?"c
--- .+.I000000 ?"c
--- .+.I111111 ?A
This relation records the <mp)hone and <mpDegrees details of an employee ---.
$owever# the <mpDegrees of an employee are independent of <mp)hone. This
constraint results in data redundancy and is referred to as multi"alued dependency.
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
Mul!%9)alued De$edecy 5M)D6#
3epresents a dependency between attributes (for e'ample# A# ?# and C) in a
relation# such that for each value of A there is a set of values for ? and a set of
values for C. $owever# the set of values for ? and C are independent of each other.
4e represent an JF/ between attributes A# ?# and C in a relation using the
following notation!
A == ?
A == C
For e&am$le, we specify the JF/ in the above <mployee relation as follows!
<mp2um ## <mp)hone
<mp2um ## <mpDegrees
7ormally# an JF/ can be defined as!
:et 3 be a relation schema# and ; and < be disjoint subsets of 3# and C > 3M;<.
If a relation 3 satisfies ; ==<
0
# the following must be true for every legal
instance of r of R!
if for any two tuples t-# t0 and t-(;) > t0(;)# then there e'ist t1 in r such that
t1(;) > t-(;)# t1(<) > t-(<)# t1(C) > t0(C).
?y symmetry# there e'ist t+ in r such that# t+(;) > t-(;)# t+(<) > t0(<)# t+(C) >
t-(C).
2
$%%Y &a' (e read as X multi-determines Y.
; < C
'- y- - t-
'- y0 0 t0
'- y- 0 t1
'- y0 - t+
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
The JF/ ;== < says that the relationship between ; and < is independent of
the relationship between ; and 3M <.
Arms!ro" A&%oms rules rela!e !o M)Ds#
M)D Com$leme!a!%o# If ;==<# then ;==3M;<.
M)D Au"me!a!%o# If ;==< and 4 be super set of C# then
4;==<C.
M)D Tras%!%.%!y# If ;==< and <==C# then ;==(CM<).
Tr%.%al M)D#
An JF/ A==? in relation 3 is defined as being tri"ial if#
a) ? is a subset of A or
b) A D ? > 3.
An JF/ is said to be nontri"ial if neither a) nor b) is satisfied.
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
Four!+ Normal Form 5:NF6#
A relation that is in ?oyceICodd normal form and contains no nonItrivial JF/s.
7ormally# :et 3 be a relation schema# ; and < be nonempty subsets of the
attributes of 3. 3 is said to be in :NF, if# for every JF/ ;==< that holds over
3# one of the following is true!
< is a subset of ; or ;< > 3# or
; is a super key.
E&am$le#
Consider the above <mployee relation.
Identifying nontri"ial :+D:
The <mployee is not in +67 because of the presence of nonItrivial JF/.
;ransformation into =2$:
4e decompose the <mployee relation into <mp5 and <mp6 relations as shown
below!
<mp5 <mp6
&mp6um &mpHhone
--- .+.I
000000
--- .+.I
111111
?oth new relations are in +67 because the <mp5 relation contains the trivial JF/
<mp2um##<mp)hone, and the <mp6 relation contains the trivial JF/
<mp2um##<mpDegrees.
&mp6um &mp/egrees
--- ?A
--- ?"c
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
Pro$er!%es o* Decom$os%!%o
3ossless9;o% Decom$os%!%o#
:et 3 be a relation schema and let 7 be a set of 7/s over 3. A decomposition of 3
into two schemas with attribute sets ; and < is said to be a lossless9<o%
decom$os%!%o =%!+ res$ec! !o F if# for every instance r of 3 that satisfies the
dependencies in 7# N
'
(r) N
y
(r) > r. In other words# we can recover the
original relation from the decomposed relations.
7rom the definition it is easy to see that r is always a subset of natural join of
decomposed relations. If we take projections of a relation and recombine them
using natural join# we typically obtain some tuples that were not in the original
relation.
E&am$le#
?y replacing the instance r shown in figure with the instances N
"H
(r) and N
H/
(r)#
we lose some information.
Instance r >
S)
(r) N
H/
(r)
" H /
s- p- d-
s0 p0 d0
s1 p- d1
" H
s- p-
s0 p0
s1 p-
H /
p- d-
p0 d0
p- d1
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
N
"H
(r) N
H/
(r)
7ig! Instances illustrating :ossy /ecompositions
T+eorem# :et 3 be a relation and 7 be a set of 7/s that hold over 3. The
decomposition of 3 into relations with attribute sets 3- and 30 is lossless if and
only if 7
O
contains either the 7/ 3- P 30 = 3- (or 3-M30) or the 7/ 3- P 30
= 30 (or 30M3-).
Consider the $ourly%&mps relation. It has attributes S2?R@A, and the 7/ 3=4
causes a violation of 167. 4e dealt this violation by decomposing the relation into
S2?RA and R@. "ince 3 is common to both decomposed relations and 3=4
holds# this decomposition is losslessIjoin.
De$edecy9Preser.%" Decom$os%!%o#
Consider the Contracts relation with attributes 'S(D)*+. The given 7/s are
C=C"Q/HKF# QH=C# and "/=H. ?ecause "/ is not a key# the dependency
"/=H causes a violation of ?C67.
4e can decompose Contracts into relations with schemas C"Q/KF and "/H to
address this violation. The decomposition is losslessIjoin. ?ut# there is one
problem. If we want to enforce an integrity constraint QH=C# it re8uires an
e'pensive join of the two relations. 4e say that this decomposition is not
dependencyIpreserving.
:et 3 be a relation schema that is decomposed into two schemas with attributes
sets ; and <# and let 7 be a set of 7/s over 3. The $ro<ec!%o o* F o 1 is the set
" H /
s- p- d-
s0 p0 d0
s1 p- d1
s- p- d1
s1 p- d-
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
of 7/s in the closure 7
O
that involve only attributes in ;. 4e denote the projection
of 7 on attributes ; as 7
;
. 6ote that a dependency D=F in 7
O
is in 7
;
only if all
the attributes in D and F are in ;.
The decomposition of relation schema 3 with 7/s 7 into schemas with attribute
sets ; and < is de$edecy9$reser.%" if (7
;
D 7
<
)
O
> 7
O
.
E&am$le#
Consider the relation 3 with attributes A?C is decomposed into relations with
attributes A? and ?C. The set of 7/s over 3 includes A=?# ?=C# and C=A.
The closure of 7 contains all dependencies in 7 plus A=C# ?=A# and C=?.
Conse8uently 7
A?
contains
A=? and ?=A# and 7
?C
contains ?=C and C=?. Therefore# 7
A?
D 7
?C
contains
A=?# ?=C# ?=A
and C=?. The closure of 7
A?
and 7
?C
now includes C=A (which follows from
C=? and ?=A). Thus
the decomposition preserves the dependency C=A.
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
E1ERCISES
0) i) The following relation is in 167# but not in ?C679 &'plain why
and take an appropriate action9
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
ii) If we reverse the direction between staff%id and class%code# then
determine whether it is violating 067# 167# ?C679 If yes# perform the
necessary action.
1) 6ormalie the following relation!
+) Biven a relation with schema EI/# 6ame# Address# Hostcode#
CardType# Card6umberG# the candidate key EI/G# and the following
functional dependencies!
R EI/G E6ame# Address# Hostcode# CardType# Card6umberG
R EAddressG EHostcodeG
R ECard6umberG ECardTypeG
(i) &'plain why this relation is in second normal form# but not in third
normal form.
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
(ii) "how how this relation can be converted to third normal form. <ou
should show what functional dependencies are being removed# e'plain.
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
A:: T$& ?&"T
EMAI3# cseroc>4?@A"ma%l(com
'''( C S E R O C K Z ( C O M
Page 27
DATABASE MANAGEMENT SYSTEMS
UNIT-V
www.cserockz.com
Kee$ 'a!c+%" *or Re"ular U$da!esB(CC
'''( C S E R O C K Z ( C O M
Page 27