KEMBAR78
Mod 2 | PDF | Data | Databases
0% found this document useful (0 votes)
19 views79 pages

Mod 2

Uploaded by

Haf hafeefa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views79 pages

Mod 2

Uploaded by

Haf hafeefa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

MODULE II

DATABASE DESIGN
● Database normalization is the process of organizing the attributes of the database
to reduce or eliminate data redundancy (having the same data but at different
places).
● Data redundancy unnecessarily increases the size of the database as the same
data is repeated in many places. Inconsistency problems also arise during insert,
delete, and update operations.
● In the relational model, there exist standard methods to quantify how efficient a
databases is. These methods are called normal forms and there are algorithms to
covert a given database into normal forms.
● Normalization generally involves splitting a table into multiple ones which must be
linked each time a query is made requiring data from the split tables.
The primary objective for normalizing the relations is to eliminate the below anomalies.
● Insertion Anomalies: Insertion anomalies occur when it is not possible to insert data
into a database because the required fields are missing or because the data is
incomplete. For example, if a database requires that every record has a primary key,
but no value is provided for a particular record, it cannot be inserted into the database.
● Deletion anomalies: Deletion anomalies occur when deleting a record from a
database and can result in the unintentional loss of data. For example, if a database
contains information about customers and orders, deleting a customer record may also
delete all the orders associated with that customer.
● Updation anomalies: Updation anomalies occur when modifying data in a database
and can result in inconsistencies or errors. For example, if a database contains
information about employees and their salaries, updating an employee’s salary in one
record but not in all related records could lead to incorrect calculations and reporting.
Redundant Data Explanation:

● The branch, hod, and office_tel columns have the same values (CSE, Mr. X, 53337) for all students.
● This repetition indicates data redundancy because the information about the branch, HOD, and office contact
number is stored multiple times.

How to Fix It?

This can be normalized into two tables:

1. Students Table (rollno, name, branch_id)


2. Branch Table (branch_id, branch, hod, office_tel)

This structure will avoid redundancy and improve database efficiency.


Insertion Anomaly: If a tuple is inserted in referencing relation and referencing attribute value is not present in
referenced attribute, it will not allow insertion in referencing relation.
OR
An insertion anomaly occurs when adding a new row to a table leads to inconsistencies.
Example:If we try to insert a record into the STUDENT_COURSE table with STUD_NO = 7, it will not be allowed
because there is no corresponding STUD_NO = 7 in the STUDENT table.
Deletion and Updation Anomaly: If a tuple is deleted or updated from referenced
relation and the referenced attribute value is used by referencing attribute in
referencing relation, it will not allow deleting the tuple from referenced relation.
Example: If we want to update a record from STUDENT_COURSE with STUD_NO
=1, We have to update it in both rows of the table.

If we try to delete a record from the STUDENT table with STUD_NO = 1, it will not be
allowed because there are corresponding records in the STUDENT_COURSE table
referencing STUD_NO = 1.
Deleting the record would violate the foreign key constraint, which ensures data
consistency between the two tables.
x=2 then y=1
x=2 then y=? (x cannot be determined the y value). In this case we
can say there is no functional dependency.That is if X value is same ,Y
value also should same.

How to define functional dependency?

FD X -> Y,then

If t1.X = t2.X (if t1.X != t2.X) ,then no need of checking Y value.

Then t1.Y= t2.Y


Full Functional Dependency and Partial Functional Dependency

Fully Functional Dependency :


If X and Y are an attribute set of a relation, Y is fully functional dependent on X, if Y
is functionally dependent on X but not on any proper subset of X.
Example –
In the relation ABC->D, attribute D is fully functionally dependent on ABC and not
on any proper subset of ABC. That means that subsets of ABC like AB, BC, A, B, etc
cannot determine D.
Ie, AB->D , not valid
BC-> D, not valid
Example

From the table, we can clearly see that neither supplier_id


nor item_id can uniquely determine the price but both
supplier_id and item_id together can do so. So we can say
that price is fully functionally dependent on { supplier_id,
item_id }. This summarizes and gives our fully functional
dependency −
{ supplier_id , item_id } -> price
Partial Functional Dependency :
A functional dependency X->Y is a partial dependency if Y is functionally dependent on X and Y can
be determined by any proper subset of X.

For example, we have a relationship AC->B, A->D, and D->B.

Now if we compute the closure of {A+}=ADB

Here A is alone capable of determining B, which means B is partially dependent on AC.

Let us take another example –

Here, both the attributes name and roll_no alone


are able to uniquely identify a course. Hence the
relationship is partially dependent.
2. Non-trivial functional dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant.
i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency.
Example 1 :

● Id -> Name
● Name -> DOB

Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a
subset of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional
dependency, since age is not a subset of {roll_no, name}
3.Multivalued Functional Dependency

● In Multivalued functional dependency, entities of the dependent set are not dependent on each other. i.e. If a
→ {b, c} and there exists no functional dependency between b and c, then it is called a multivalued
functional dependency.
In this table:
● X: bike_model
● Y: color
● Z: manuf_year

For each bike model (bike_model):


1. There is a group of colors (color) and a group of manufacturing years (manuf_year).
2. The colors do not depend on the manufacturing year, and the manufacturing year does
not depend on the colors. They are independent.
3. The sets of color and manuf_year are linked only to bike_model.

That’s what makes it a multivalued dependency.


In this case these two columns are said to be multivalued dependent on bike_model.
4. Transitive Functional Dependency
● In relational databases, transitive dependency occurs when one attribute indirectly depends
on another through an intermediate attribute. This problem means that the value of one
column can be determined indirectly from the values of two other columns.
Inference Rules(Properties of Functional Dependency)
● Inference rules( Properties of Functional Dependency)in databases are also known as
Armstrong’s Axioms in Functional Dependency. These rules govern the functional
dependencies in a relational database. From inference rules a new functional
dependency can be derived using other FDs. These rules were introduced by William
W. Armstrong.
● Reflexive Rule: According to this rule, if B is a subset of A then A logically
determines B. Formally, B ⊆ A then A → B.
● Example: Let us take an example of the Address (A) of a house, which contains so
many parameters like House no, Street no, City etc. These all are the subsets of A.
Thus, address (A) → House no. (B).
● Augmentation Rule: It is also known as Partial dependency. According to this rule, If
A logically determines B, then adding any extra attribute doesn't change the basic
functional dependency.
● Example: A → B, then adding any extra attribute let say C will give AC → BC and
doesn't make any change.
● Transitive rule: Transitive rule states that if A determines B and B determines C, then
it can be said that A indirectly determines B.
● Example: If A → B and B → C then A → C.
Secondary Rules –
These rules can be derived from the axioms.
● Union Rule: Union rule states that If A determines B and C, then A determines
BC.
○ Example: If A → B and A → C then A → BC.
● Decomposition Rule: It is perfectly reverse of the above Union rule. According to
this rule, If A determined BC then it can be decomposed as A → B and A → C.
○ Example: If A → BC then A → B and A → C.
● Pseudo Transitive Rule: According to this rule, If A determined B and BC
determines D then AC determines D.
○ Example: If A → B and BC → D then AC → D.
Minimal set of functional dependencies
Third Normal Form (3NF)

A relation is in the third normal form, if there is no transitive dependency for non-prime attributes as well
as it is in the second normal form. A relation is in 3NF if at least one of the following conditions holds in
every non-trivial functional dependency X –> Y.
● X is a super key.
● Y is a prime attribute (each element of Y is part of some candidate key).

In other words,
A relation that is in First and Second Normal Form and in which no non-primary-key attribute is
transitively dependent on the primary key, then it is in Third Normal Form (3NF).

If A->B and B->C are two FDs then A->C is called transitive dependency. The normalization of 2NF
relations to 3NF involves the removal of transitive dependencies.

If a transitive dependency exists, remove the transitively dependent attribute(s) from the relation by
placing the attribute(s) in a new relation along with a copy of the determinant.
● First take the right side of every FDs.

ie, BCDEF

● The attribute that do not present in the right side is A.So A will be a key
or part of a key(No other attributes determines attribute A).
● Take A closure ,ie {A,B,C,D,E,F}
● A closure include all the attributes in a relation.That is A determines every
attributes.
● Thus conclude that A is a candidate key.
1.Find the candidate keys of the below relation.
Boyce-Codd Normal Form (BCNF)
● Boyce-Codd Normal Form (BCNF) is a stricter version of Third Normal Form (3NF) that
ensures a more simplified and efficient database design.
● It enforces that every non-trivial functional dependency must have a superkey on its
left-hand side.
● This approach addresses potential issues with candidate keys and ensures the
database is free from redundancy.
● BCNF eliminates redundancy more effectively than 3NF by strictly requiring that all
functional dependencies originate from super-keys.

Rules for BCNF

● Rule 1: The table should be in the 3rd Normal Form.


● Rule 2: X should be a super-key for every functional dependency (FD) X−>Y in a given
relation.
● Note: To test whether a relation is in BCNF, identify all the determinants and make sure that
● To determine the highest normal form of a given relation R with functional dependencies, the first step is to
check whether the BCNF condition holds. If R is found to be in BCNF, it can be safely deduced that the
relation is also in 3NF, 2NF, and 1NF.
Check whether the below table is in BCNF
Fourth Normal Form (4NF)
● The Fourth Normal Form (4NF) is a level of database normalization
where there are no non-trivial multivalued dependencies other than a
candidate key.
● Properties
● A relation R is in 4NF if and only if the following conditions are
satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. The table should not have any Multi-valued
Dependency.
● A table with a multivalued dependency violates the normalization
standard of the Fourth Normal Form (4NF) because it creates
Multivalued Dependency (MVD)

● In Database Management Systems (DBMS), multivalued dependency


(MVD) deals with complex attribute relationships in which an attribute may
have many independent values while yet depending on another attribute or
group of attributes.
● It improves database structure and consistency and is essential for data
integrity and database normalization.
● MVD or multivalued dependency means that for a single value of
attribute ‘a’ multiple values of attribute ‘b’ exist. We write it as,
a --> --> b
● It is read as a is multi-valued dependent on b.
What is Multivalued Dependency?

When one attribute in a database depends on another attribute and has many independent values, it is said to
have multivalued dependency (MVD). It supports maintaining data accuracy and managing intricate data
interactions.
Multi Valued Dependency (MVD)
We can say that multivalued dependency exists if the following conditions are met.

Conditions for MVD

Any attribute say a multiple define another attribute b; if any legal relation r(R), for all pairs of tuples t1 and t2
in r, such that,
t1[a] = t2[a]
Then there exists t3 and t4 in r such that.
t1[a] = t2[a] = t3[a] = t4[a]
t1[b] = t3[b]; t2[b] = t4[b]
t1 = t4; t2 = t3
● Suppose a person named Geeks is working on 2 projects Microsoft and Oracle and has 2
hobbies namely Reading and Music.
● This can be expressed in a tabular format in the following way.
● Then multivalued (MVD) dependency exists. To check the MVD in given table, we apply the
conditions stated above and we check it with the values in the given table.
● The table above, student with s_id 1 has opted for two courses, Science and Maths, and has two
hobbies, Cricket and Hockey.
● There is no relationship between the columns course and hobby. They are independent of each other.
● So there is multi-value dependency, which leads to un-necessary repetition of data and other
anomalies as well.
To make the above relation satisfy the 4th normal form, we can decompose the table into 2 tables.

Now this relation satisfies the fourth normal form.


Join Dependencies in DBMS

● Join Dependency (JD) can be illustrated as when the relation R is equal to the
join of the sub-relations R1, R2,..., and Rn are present in the database.
● Join Dependency arises when the attributes in one relation are dependent on
attributes in another relation, which means certain rows will exist in the table if
there is the same row in another table.
● Multiple tables are joined to create a single table where one of the attributes is
common in the sub-tables.

Join dependency on a database is denoted by:


R1 ⨝ R2 ⨝ R3 ⨝ ..... ⨝ Rn ;

where R1 , R2, ... , Rn are the relations and ⨝ represents the natural join
operator.
Types of Join Dependency
There are two types of Join Dependencies:
● Lossless Join Dependency: It means that whenever the join occurs between the tables, then no
information should be lost, the new table must have all the content in the original table.
● Lossy Join Dependency: In this type of join dependency, data loss may occur at some point in time
which includes the absence of a tuple from the original table or duplicate tuples within the database.

Example of Join Dependency

● Suppose you have a table having stats of the company, this can be decomposed into sub-tables to
check for the join dependency among them.
● Below is the depiction of a table Company_Stats having attributes Company, Product, Agent. Then we
created sub-tables R1 with attributes Company & Product and R2 with attributes Product & Agent.
● When we join them we should get the exact same attributes as the original table.
Here, we got two additional tuples after performing join i.e.
(C1, TV, Mohan) & (C2, TV, Aman) these tuples are known as
Spurious Tuple, which is not the property of Join Dependency.
Therefore, we will create another relation R3 and perform its
natural join with (R1 ⨝ R2). So, here it is:
Now, we got our original relation, that we had earlier decomposed, in this way
you can decompose the original relation and check for the join dependency
among them.
What is PJNF(Project-Join Normal Form)?

The Fifth Normal Form (5NF), or Project-Join Normal Form (PJNF), is the highest
level of database normalization that is designed to solve problems of data
redundancy. A relation is called 5NF when it is in 4NF and does not have a join
dependency.

Conditions for 5NF


A relation is in 5NF if and only if it meets the following conditions:
● They have to be in Fourth Normal Form, the table’s relation must not
contain any multi-valued dependencies.
● It should not have the possibility to be decomposed in smaller relations
with lossless join dependency.
Properties of Fifth Normal Form (5NF)
A relation R is in 5NF if and only if it satisfies the following conditions:
● No Multi-Valued Dependencies: It should be in 4NF that there can be no set
of values in a relation that is dependent on another independent set of
values.
● Lossless Decomposition: It should not be possible to reduce the extent of
the relation into its components in a manner that would cause the loss of
information. If a relation can be made to undergo a join dependency that
makes it into the same relation after recombination, then the relation is not in
5NF.
combination of all three attributes of the relation.
Example of Fifth Normal Form (5NF)
Explanation:

Table 2, Table 3 and Table 4 when joined yield the original table (Table 1). Hence join
dependency exists in Table 1, therefore Table 1 is not in 5NF or PJNF.
However Table 2, Table 3 and Table 4 satisfy 5NF as it has no multivalued dependency
and cannot be decomposed further (join dependency does not exists).
But this might not be true in all cases i.e., when we combine the decomposed tables, the
resultant table may not be equivalent to the original table, in that case the original table is
said to be in 5NF provided it is already in 4NF.
Thank You

You might also like