KEMBAR78
DBMS Module5 V1 | PDF | Information Technology Management | Relational Model
0% found this document useful (0 votes)
14 views91 pages

DBMS Module5 V1

Module 5 focuses on relational database design, covering pitfalls, normalization, and functional dependencies. It emphasizes the importance of eliminating redundancy and anomalies through normalization, detailing various normal forms and their significance. The module also discusses decomposition techniques to ensure lossless joins and the preservation of functional dependencies in database schemas.

Uploaded by

yofeg38927
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views91 pages

DBMS Module5 V1

Module 5 focuses on relational database design, covering pitfalls, normalization, and functional dependencies. It emphasizes the importance of eliminating redundancy and anomalies through normalization, detailing various normal forms and their significance. The module also discusses decomposition techniques to ensure lossless joins and the preservation of functional dependencies in database schemas.

Uploaded by

yofeg38927
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

Module 5: Relational-Database

Design
Shilpa Ingoley
Module 5: Relational-Database Design

➢ Pitfalls in Relational-Database designs


➢ Concept of normalization
➢ Function Dependencies
➢ First Normal Form, 2NF, 3NF, BCNF.
Semantics of the Relation Attributes
GUIDELINE : Informally, each tuple in a relation should represent
one entity or relationship instance. (Applies to individual
relations and their attributes).
• Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be
mixed in the same relation
• Only foreign keys should be used to refer to other entities
• Entity and relationship attributes should be kept apart as much as possible.

Bottom Line: Design a schema that can be explained easily


relation by relation. The semantics of attributes should be easy
to interpret.
Guideline to Redundant Information in Tuples and Update
Anomalies
• GUIDELINE : Design a schema that does not suffer from the insertion,
deletion and update anomalies. If there are any present, then note
them so that applications can be made to take them into account
Null Values in Tuples
GUIDELINE : Relations should be designed such that their
tuples will have as few NULL values as possible
• Attributes that are NULL frequently could be placed in
separate relations (with the primary key)
• Reasons for nulls:
• attribute not applicable or invalid
• attribute value unknown (may exist)
• value known to exist, but unavailable
Module 5: Normalization
• Pitfalls in relational-Database designs
• Concept of normalization
• Function Dependencies
• First Normal Form
• 2NF
• 3NF
• BCNF
Pitfalls in relational-Database designs
• Relational database design requires that we find a “good” collection of relation schemas.

• A bad design may have several properties, including:


• Repetition of information.
• Wastage of space due to repetition of information
• Inability to represent certain information.
• Loss of information.
The Need for Normalisation
1) A smaller database can be maintained as normalization eliminates the
duplicate data. Overall size of the database is reduced as a result.

2) Better performance is ensured which can be linked to the above point. As


databases become lesser in size, the passes through the data becomes faster and
shorter thereby improving response time and speed.

3) Narrower tables are possible as normalized tables will be fine-tuned and will
have lesser columns which allows for more data records per page.

4) Fewer indexes per table ensures faster maintenance tasks (index rebuilds).

5) Also realizes the option of joining only the tables that are needed.
Normalization
• Normalization is the process of minimizing redundancy from a
relation or set of relations.
• Redundancy in relation may cause following anomalies:
• insertion
• deletion
• updation
• So, it helps to minimize the redundancy in relations.
• Normal forms are used to eliminate or reduce redundancy in
database tables.
• Database normalization is the process of organizing data into tables in
such a way that the results of using the database are always
unambiguous and as intended.
Various Level of Normalization are:
1.First Normal Form or 1NF
2.Second Normal Form or 2NF
3.Third Normal Form or 3NF
4.Boyce Codd Normal Form or BCNF
5.Fourth normal form or 4NF
6.Fifth normal form or 5NF
7.Domain-key normal form or DKNF
8.Sixth normal form or 6NF
Note: Databases should be in 3NF or BCNF in order to avoid the
database anomalies.
Each higher level is a subset of lower Level
Contd…
• When we use normalization generally –
• Redundancy Decreases
• Number of table increases
• Complexity increases
Types of Anomalies
• Changing the data in some relations can have undesirable consequences
called Modification anomalies.
• Anomalies can be eliminated by changing the structure of the relation.
• Usually relation without modification (deletion and insertion) anomalies
are preferred.
• Consider a relation Activity(SID, Activity, Fee)
Modification Anomalies
SID Activity Fee If fees of swimming changes
100 Hocky 1200
1000 to 900 → changes should be
made at all location
120 swimming 500
155 squash 500
172 swimming 500
Contd…
• Deletion Anomaly
• Students has fixed fees, deleting tuple 100(SID=100), we lose not only
fact that student 100 is hocky player but also the fact that hocky cost
Rs 1200
SID Activity Fee
100 Hocky 1200
120 swimming 500 We loss two facts about
155 squash 500 entities with one deletion
172 swimming 500
Contd…
• Insertion Anomaly
• Scuba diving cost Rs 2000. We can not enter unless student take up
the activity
• This kind of restriction called an insertion anomaly
Contd…
• Eliminating both insertion and deletion anomalies
STUD_ACT(SID, Activity) ACT_COST(Activity, fee)
SID Activity Activity Fee
100 Hocky Hocky 1200
120 swimming swimming 500
155 squash squash 500
172 swimming Scuba diving 2000

• Division of activity into two relations(Tables) namly STUD_ACT and


ACT_COST
Contd…
• Lets say student 200 wants to enroll for Badminton.
• Can we insert this new tuple in STUD_ACT ??
STUD_ACT(SID, Activity) ACT_COST(Activity, fee)

SID Activity Activity Fee


100 Hocky Hocky 1200
120 swimming swimming 500
155 squash squash 500
172 swimming Scuba diving 2000

• Database system must prevent STUD_ACT from being added this


information, if value of activity is not present in ACT_COST table?
• The activities can exist before any student enroll them.(Must be present in
ACT_COST)
• WE can state,
• STUD_ACT(Activity) C (Subset of ) ACT_COST(Activity)
Contd…
• This constraint called the inter-relation constraint(Referential integrity
constraint).

• Identifying and eliminating such modification anomalies is the


essence of the normalization process
Decomposition
Decomposition
• A functional decomposition is the process of breaking down the functions of an organization into progressively
greater (finer and finer) levels of detail.
• In decomposition, one function is described in greater detail by a set of other supporting functions.
• The decomposition of a relation scheme R consists of replacing the relation schema by two or more relation
schemas that each contain a subset of the attributes of R and together include all attributes in R.
• Decomposition helps in eliminating some of the problems of bad design such as redundancy, inconsistencies and
anomalies.
• There are two types of decomposition :
1. Lossy Decomposition
2. Lossless Join Decomposition
Lossy Decomposition
• "The decomposition of relation R into R1 and R2 is lossy when the join of R1 and R2 does not yield the same
relation as in R."
• One of the disadvantages of decomposition into two or more relational schemes (or tables) is that some
information is lost during retrieval of original relation or table.
• Consider that we have table EMPLOYEE with three attribute eno , name and department.
• EMPLOYEE:
Contd…

Decompose the above table into two tables EmpDetails and DeptDetails −

Department can have


many attributes like –
Deptno, location, Head
of dept …etc

• You won’t be able to join the above tables,


since ENO/Department/DNo isn’t part of the DeptDetails relation.
• Therefore, the above relation has lossy decomposition.
Contd…
• This relation is decomposed into two relation :
• EMP1(Eno, name)
• EMP2(name, Department):
Contd…
• In lossy decomposition ,spurious tuples are generated when a natural join is applied to the relations in the
decomposition.
• Natural join of Emp1 and Emp2 :

• The above decomposition is a bad decomposition or Lossy decomposition.


Spurious tuples
are generated
Lossless Join Decomposition :
• "The decomposition of relation R into R1 and R2 is lossless when the join of R1 and R2 yield the same
relation as in R."
• A relational table is decomposed (or factored) into two or more smaller tables, in such a way that the
designer can capture the precise content of the original table by joining the decomposed parts. This is
called lossless-join (or non-additive join) decomposition.
• This is also referred as non-additive decomposition.
• The lossless-join decomposition is always defined with respect to a specific set F of dependencies.
• Consider that we have table EMPLOYEE with three attribute Eno , name and department.
Lossless decomposition
• Relation Employee

• This relation is decomposed into two relation Emp1(Eno, name) and Emp3(Eno, Department) :
Contd…
• Now ,when these two relations are joined on the common attribute Eno, the resultant relation will
look like →.

• In lossless decomposition, no spurious tuples are generated when a natural joined is applied to
the relations in the decomposition.
Spurious Tuples
• Bad designs for a relational database may result in
erroneous results for certain JOIN operations
• The "lossless join" property is used to guarantee
meaningful results for join operations

GUIDELINE : The relations should be designed to satisfy


the lossless join condition. No spurious tuples should
be generated by doing a natural-join of any relations.
Spurious Tuples
There are two important properties of decompositions:
(a) non-additive or losslessness of the corresponding join
(b) preservation of the functional dependencies(FD).

Note that
property (a) is extremely important and cannot be sacrificed.
Property (b) is less stringent and may be sacrificed.
FUNCTIONAL DEPENDANCY:
• Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional
dependency says that if two tuples have same values for attributes A1, A2,..., An, then those two tuples must
have to have same values for attributes B1, B2, ..., Bn.
• Functional dependency is represented by an arrow sign (→) that is,
X→Y, where X functionally determines Y.
Or Y is functionally dependent on X
• The left-hand side attributes determine the values of attributes on the right-hand side.


X is Determinant Y is Dependent

X→Y
Functional Dependencies (Contd…)
• X -> Y holds if whenever two tuples have the same value for X,
they must have the same value for Y
• For any two tuples t1 and t2 in any relation instance r(R): If
t1[X]=t2[X], then t1[Y]=t2[Y]
• X -> Y in R specifies a constraint on all relation instances r(R)
• Written as X -> Y; can be displayed graphically on a relation
schema as in Figures. ( denoted by the arrow: ).
• FDs are derived from the real-world constraints on the
attributes
Functional Dependencies (Contd…)
• Functional dependencies (FDs) are used to specify formal
measures of the "goodness" of relational designs

• FDs and keys are used to define normal forms for relations

• FDs are constraints that are derived from the meaning and
interrelationships of the data attributes

• A set of attributes X functionally determines a set of attributes


Y if the value of X determines a unique value for Y
Types of FUNCTIONAL DEPENDANCY:
• Trivial
• Non-Trivial
• Transitive
• Multivalued
Contd…
• Can ENO→ Name or DOJ ??
• NO (as ENO 115 is repeated )
Eg1: RollNo → Name
Sample Relation (As no repeating value of x
so no need to check value
RollNO Name Marks Dept Elective of y)
1 Ashu 18 CM DBMS
Eg2: name → RollNo ?
2 Raj 16 IT DBMS Eg3: Marks → dept ?
3 Ashu 18 CM Robotics Eg4: Dept → Marks ?
4 Raj 16 IT MIS
5 Mita 20 EXTC MIS
6 Jay 20 BioMed Robotics
A B C D
a1 b1 c1 d1 A→C ?
a1 b2 c1 d2 C→A ?
a2 b2 c2 d2 AB→D

a2 b3 c2 d3
a3 b3 c2 d4
Examples of FD constraints
• social security number determines employee name
SSN -> ENAME
• project number determines project name and location
PNUMBER -> {PNAME, PLOCATION}
• employee ssn and project number determines the hours per
week that the employee works on the project
{SSN, PNUMBER} -> HOURS
Examples of FD constraints
• An FD is a property of the attributes in the schema R
• The constraint must hold on every relation instance r(R)
• If K is a key of R, then K functionally determines all attributes
in R (since we never have two distinct tuples with t1[K]=t2[K])
Inference Rules for FDs
• Given a set of FDs F, we can infer additional FDs that hold
whenever the FDs in F hold
Armstrong's inference rules:
IR1: Rollno → Rollno
IR1. (Reflexive) If Y subset-of X, then X -> Y Rollno,name→name
IR2. (Augmentation) If X -> Y, then XZ -> YZ IR2: Rollno→ name
(Notation: XZ stands for X U Z) Rollno,city→ name,city

IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z IR3: Dno → Dname → HOD
Dno→ HOD

• IR1, IR2, IR3 form a sound and complete set of inference


rules
Inference Rules for FDs Contd…
Some additional inference rules that are useful:
IR4: Rollno→ marks,city
IR4:(Decomposition) If X -> YZ, then X -> Y and X -> Z Rollno → marks
Rollno → city
IR5:(Union) If X -> Y and X -> Z, then X -> YZ
IR6: (Psuedotransitivity) If X -> Y and WY -> Z, then WX -> Z IR5: Rollno → name
Rollno → percentage
Rollno→name,percentage
• The last three inference rules, as well as any other
inference rules, can be deduced from IR1, IR2, and IR3
(completeness property)
IR6: Rollno→ PhoneNo
PhoneNO, name→city
Rollno,name → city
Inference Rules for FDs Contd…

• Closure of a set F of FDs is the set F+ of all FDs that can be


inferred from F

• Closure of a set of attributes X with respect to F is the set X + of


all attributes that are functionally determined by X

• X + can be calculated by repeatedly applying IR1, IR2, IR3 using


the FDs in F
Contd…
Q 1) Find Candidate key, Prime and Non prime attributes
• R(A B C D)
• FD= {A→B, B→C, C→D }
• Step 1: Find Candidate Key(CK)
• Setp 2: Write Prime attributes
• Step 3: Write Non-prime attributes

• A+ = ABCD A →B, B→C,A→C -- Transitive property , A→ A (Reflexive Property)

• B+ = BCD

• C+ = CD

• D+ =D

• CK= A(Candidate key)


• Prime attributes : { A } and Non-Prime attribute = { B,C,D}
Contd…
Q2) Find Candidate key, Prime and Non prime attributes
• R(A B C D)
• FD= {A→B, B→C, C→A }
• ABCD+ = {A B C D}
• = {A B C D}

• = {A C D}

• = {A D}

• AD+ = {A B C D}
• Check proper subset of AD ie A+ and D+ , if no proper subset as SK(Super key) then AD → CK

• Check and find if any other SK is present ???


• Then find Prime attribute and Non- Prime attributes
Minimal Sets of FDs
• A set of FDs is minimal if it satisfies the following
conditions:
(1) Every dependency in F has a single attribute for its RHS.
(2) We cannot remove any dependency from F and have a set of
dependencies that is equivalent to F.
(3) We cannot replace any dependency X -> A in F with a
dependency Y -> A, where Y proper-subset-of X ( Y subset-of X)
and still have a set of dependencies that is equivalent to F.
Normalization of Relations
• Normalization: The process of decomposing unsatisfactory "bad"
relations by breaking up their attributes into smaller relations

• Normal form: Condition using keys and FDs of a relation to certify


whether a relation schema is in a particular normal form
Definitions of Keys and Attributes
Participating in Keys

• A superkey of a relation schema R = {A1, A2, ...., An} is a set of


attributes S subset-of R with the property that no two tuples
t1 and t2 in any legal relation state r of R will have t1[S] = t2[S]

• A key K is a superkey with the additional property that


removal of any attribute from K will cause K not to be a
superkey any more.
Contd…

• If a relation schema has more than one key, each is called a candidate
key. One of the candidate keys is arbitrarily designated to be the
primary key, and the others are called secondary keys.
• A Prime attribute must be a member of some candidate key
• A Nonprime attribute is not a prime attribute—that is, it is not a
member of any candidate key.
Normalization of Relations
• Normalization: The process of decomposing unsatisfactory "bad"
relations by breaking up their attributes into smaller relations

• Normal form: Condition using keys and FDs of a relation to certify


whether a relation schema is in a particular normal form
Definitions of Keys and Attributes
Participating in Keys

• A superkey of a relation schema R = {A1, A2, ...., An} is a set of


attributes S subset-of R with the property that no two tuples
t1 and t2 in any legal relation state r of R will have t1[S] = t2[S]

• A key K is a superkey with the additional property that


removal of any attribute from K will cause K not to be a
superkey any more.
Contd…

• If a relation schema has more than one key, each is called a candidate
key. One of the candidate keys is arbitrarily designated to be the
primary key, and the others are called secondary keys.
• A Prime attribute must be a member of some candidate key
• A Nonprime attribute is not a prime attribute—that is, it is not a
member of any candidate key.
First normal form (1NF)
• First normal form (1NF). This is the "basic" level of database normalization, and it generally
corresponds to the definition of any database, namely:

● It contains two-dimensional tables with rows and columns.


● Each column corresponds to a sub object or an attribute of the object represented by the entire table.
● Each row represents a unique instance of that sub object or attribute and must be different in some
way from any other row (that is, no duplicate rows are possible).

• 1NF has no repeating groups in it


• The domain of an attribute must include only atomic values (single values i.e. Non- decomposable)
• 1NF disallows “Relation within relation” or “Relations as attributes of tuple”
Name Age subject
Sahil 20 DBMS, Math
Dev 21 MP, CN
• Is the relation in 1NF?
• NO
Contd…
• 1NF Disallows
• composite attributes
• multivalued attributes,
• and nested relations
• attributes whose values for an individual tuple are non-atomic
Contd…

• Dlocation contain atomic as well as set of values so Dlocation not


functionally dependent of DNO
• Dlocation contains set of values and hence is non-atomic
• DNO→ Dlocation
• Department is not in 1NF , does not even qualify relation along with
PK DNO and DLocation
Contd…
There ae 3 main technique to achieve 1NF
• 1) Remove the attribute Dlocation that violate 1NF and place it in
separate relation Dept_Location

Dept_Location
Contd…
2) Expand the key so that there will be a separate tuple in original
DEPARTMENT relation for each location of DEPARTMENT

• In this PK is combination of {DNO,Dlocation}


• This has disadvantage of introducing ‘Redundancy in relation’
Contd…
• 3) If maximum number of values known for the attribute, For eg,
three location exist for department , replace Dlocation by
• Dlocation1
• Dlocation2
• Dlocation3 ( and so on…)

• This solution has disadvantage of NULL values if most of the


department have fewer than 3 Locations. Also Limit place on max no
of locations

• Which is Superior among three ??


One more Example
to convert into 1NF
Second Normal Form
• Uses the concepts of FDs, primary key -PK( actually Candidate key(CK) )
Definitions:
• Prime attribute - attribute that is member of the Candidate key -CK
• Full functional dependency - a FD Y -> Z where removal of any
attribute from Y means the FD does not hold any more

Examples: - {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor
PNUMBER -> HOURS hold
- {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN
-> ENAME also holds
Contd…

• A relation schema R is in second normal form (2NF) if every


non-prime attribute A in R is fully functionally dependent
on the primary key/CK

• R can be decomposed into 2NF relations via the process of


2NF normalization
Contd…
• ACTIVITY(SID, Activity, Fee)

SID Activity Fee


100 Hocky 1200
120 swimming 500
155 squash 500 One student can take multiple
172 swimming 500 activities
100 Golf 1700
172 Hocky 1200
200 Golf 1700 This has insertion, deletion and
120 squash 500 modification anomalies
Contd…
• NON-prime attribute is dependent on CK(Entire) and NOT part of CK
STUD_ACT(SID, Activity) ACT_COST(Activity, fee)

SID Activity Activity Fee


100 Hocky Hocky 1200
120 swimming swimming 500
155 squash squash 500
172 Swimming Scuba diving 2000
100 Golf Golf 1700
172 Hocky
200 Golf
120 squash
Example to convert
into 2NF
One more Example to
convert into 2NF

• Primary key {stud_id,proj_id}


• Is the relation in 2NF?
• NO
• Partial dependency
Third normal form (3NF)
• At the second normal form, modifications are still possible because a change to one row in a table may affect
data that refers to this information from another table.
• Database normalization's ability to avoid or reduce data anomalies, data redundancies and data duplications,
while improving data integrity, have made it an important part of the data developer's toolkit for many years.
It has been one of the hallmarks of the relational data model.
Third Normal Form
• A relation is said to be in 3NF if there is no transitive functional
dependency(FD) between non-key attributes.
• When non-key attribute can determined with one or more non-key
attributes then there is transitive functional dependency
Third Normal Form
Definition:
• Transitive functional dependency - a FD X -> Z that
can be derived from two FDs X -> Y and Y -> Z
Examples:
- SSN -> DMGRSSN is a transitive FD since
SSN -> DNUMBER and DNUMBER -> DMGRSSN hold
- SSN -> ENAME is non-transitive since there is no set of
attributes X where SSN -> X and X -> ENAME
Contd…

• A relation schema R is in third normal form (3NF) if it


is in 2NF and no non-prime attribute A in R is
transitively dependent on the primary key
• R can be decomposed into 3NF relations via the
process of 3NF normalization
NOTE:
In X -> Y and Y -> Z, with X as the primary key, we consider this a problem
only if Y is not a candidate key. When Y is a candidate key, there is no
problem with the transitive dependency .
E.g., Consider EMP (SSN, Emp#, Salary ).
Here, SSN -> Emp# -> Salary and Emp# is a candidate key.
Contd…
• Relation in 2NF also have anomalies consider following relation
• Housing(SID, Building, Fee)
STU_Housing(SID, Building)
Building_Fee(Building, Fee)
SID Building Fee SID Building Building Fee
100 Nehru 12000 100 Nehru Nehru 12000
150 Gandhi 11000 150 Gandhi Gandhi 11000
200 Bose 12000 200 Bose Bose 12000
250 Nehru 12000 250 Nehru
300 Nehru 12000 Removing 300 Nehru
Transitive
dependency
and Converting
into 3NF
• IS the relation in 3NF?
• No
• Neither Zip is a superkey nor is City a prime attribute.
Additionally, Stu_ID → Zip → City, so there exists transitive
dependency.
BCNF (Boyce-Codd Normal Form)
• A relation schema R is in Boyce-Codd Normal Form (BCNF) if
whenever an FD X -> A holds in R, then X is a superkey of R

• Each normal form is strictly stronger than the previous one


• Every 2NF relation is in 1NF
• Every 3NF relation is in 2NF
• Every BCNF relation is in 3NF

• There exist relations that are in 3NF but not in BCNF


• The goal is to have each relation in BCNF (or 3NF)
• Functional dependencies
Contd…

• Anomalies can arises from situations other than FD.


• The 4NF, 5NF and Domain key normal form were proposed to
overcome these anomalies.
Achieving the BCNF by Decomposition (1)
• Two FDs exist in the relation LECTURE:
fd1: { sid, subject} → Faculty
fd2: Faculty → Subject
• {sid,subject} is a candidate key for this relation and that the
dependencies shown follow the pattern in previous figure . So this
relation is in 3NF but not in BCNF
• A relation NOT in BCNF should be decomposed so as to meet this
property, while possibly forgoing the preservation of all functional
dependencies in the decomposed relations.
Achieving the BCNF by Decomposition (2)
• Three possible decompositions for relation TEACH
1. {sid, Faculty} and {sid, subject}
2. {subject, faculty } and {subject, sid}
3. {Faculty, subject } and {Faculty, sid}

• All three decompositions will lose fd1. We have to settle for sacrificing the
functional dependency preservation. But we cannot sacrifice the non-additivity
property after decomposition.
• Out of the above three, only the 3rd decomposition will not generate spurious
tuples after join.(and hence has the non-additivity property).
• A test to determine whether a binary decomposition (decomposition into two
relations) is nonadditive (lossless)
Testing Binary Decompositions for Lossless (Non-
additive) Join Property

A decomposition D = {R1, R2} of R has the lossless join


property with respect to a set of functional dependencies
F on R if and only if either
The f.d. ((R1 ∩ R2) → (R1- R2)) is in F+ or
The f.d. ((R1 ∩ R2) → (R2 - R1)) is in F+ .
4. Boyce-Codd Normal Form (BCNF) –
• A relation R is in BCNF if R is in Third Normal Form and for
every FD, LHS is super key. A relation is in BCNF iff in every non-
trivial functional dependency X –> Y, X is a super key.

• Example 1 – Find the highest normal form of a relation R(A,B,C,D,E)


with FD set as {BC->D, AC->BE, B->E}
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can
determine all attribute of relation, So AC will be candidate key. A or C
can’t be derived from any other attribute of the relation, so there will be
only 1 candidate key {AC}.

Step 2. Prime attributes are those attribute which are part of candidate
key {A, C} in this example and others will be non-prime {B, D, E} in this
example.

Step 3. The relation R is in 1st normal form as a relational DBMS does
not allow multi-valued or composite attribute.
• The relation is in 2nd normal form because BC->D is in 2nd normal
form (BC is not a proper subset of candidate key AC) and AC->BE is
in 2nd normal form (AC is candidate key) and B->E is in 2nd normal
form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC
is a super key nor D is a prime attribute) and in B->E (neither B is a
super key nor E is a prime attribute) but to satisfy 3rd normal for,
either LHS of an FD should be super key or RHS should be prime
attribute.
So the highest normal form of relation will be 2nd Normal form.
• Example 2 –For example consider relation R(A, B, C)
A -> BC,
B ->
A and B both are super keys so above relation is in BCNF.
• Key Points –
1. BCNF is free from redundancy.
2. If a relation is in BCNF, then 3NF is also also satisfied.
3. If all attributes of relation are prime attribute, then the relation is always in 3NF.
4. A relation in a Relational Database is always and at least in 1NF form.
5. Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.
6. If a Relation has only singleton candidate keys( i.e. every candidate key consists
of only 1 attribute), then the Relation is always in 2NF( because no Partial
functional dependency possible).
7. Sometimes going for BCNF form may not preserve functional dependency. In
that case go for BCNF only if the lost FD(s) is not required, else normalize till
3NF only.
8. There are many more Normal forms that exist after BCNF, like 4NF and more.
But in real world database systems it’s generally not required to go beyond
BCNF.
• Exercise 1: Find the highest normal form in R (A, B, C, D, E) under following
functional dependencies.

• Important Points for solving above type of question.


1) It is always a good idea to start checking from BCNF, then 3 NF and so
on.
2) If any functional dependency satisfied a normal form then there is no
need to check for lower normal form. For example, ABC –> D is in BCNF
(Note that ABC is a superkey), so no need to check this dependency for
lower normal forms.
• Candidate keys in the given relation are {ABC, BCD}
• BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a super key so
this dependency is not in BCNF. So, R is not in BCNF.
• 3NF: ABC -> D we don’t need to check for this dependency as it
already satisfied BCNF. Let us consider CD -> AE. Since E is not
a prime attribute, so the relation is not in 3NF.
• 2NF: In 2NF, we need to check for partial dependency. CD which
is a proper subset of a candidate key and it determine E, which
is non-prime attribute. So, given relation is also not in 2 NF. So,
the highest normal form is 1 NF.

You might also like