Anomalies in Relational Model
Anomalies in the relational model refer to inconsistencies or
errors that can arise when working with relational databases,
specifically in the context of data insertion, deletion, and
modification. There are different types of anomalies that can
occur in referencing and referenced relations which can be
discussed as:
These anomalies can be categorized into three types:
Insertion Anomalies
Deletion Anomalies
Update Anomalies.
Database anomalies are the faults in the database caused due
to poor management of storing everything in the flat database.
It can be removed with the process of Normalization, which
generally splits the database which results in reducing the
anomalies in the database.
STUD_STAT STUD-
STUD_NO STUD_NAME STUD_PHONE E COUNTRY STUD_AGE
97162717 Haryan
1 RAM India 20
21 a
98982912
2 RAM Punjab India 19
81
78982919 Rajasth
3 SUJIT India 18
81 an
4 SURESH Punjab India 21
Table 1
STUDENT_COURSE
STUD_NO COURSE_NO COURSE_NAME
1 C1 DBMS
Computer
2 C2
Networks
Computer
1 C2
Networks
Table 2
Insertion Anomaly: If a tuple is inserted in referencing relation
and referencing attribute value is not present in referenced
attribute, it will not allow insertion in referencing relation.
Example: If we try to insert a record in STUDENT_COURSE with
STUD_NO =7, it will not allow it.
Deletion and Updation Anomaly: If a tuple is deleted or
updated from referenced relation and the referenced attribute
value is used by referencing attribute in referencing relation, it
will not allow deleting the tuple from referenced relation.
Example: If we want to update a record from
STUDENT_COURSE with STUD_NO =1, We have to update it in
both rows of the table. If we try to delete a record from
STUDENT with STUD_NO =1, it will not allow it.
Normalization
Normalization is the process of minimizing redundancy from
a relation or set of relations. Redundancy in relation may
cause insertion, deletion, and update anomalies. So, it helps to
minimize the redundancy in relations. Normal forms are used
to eliminate or reduce redundancy in database tables.
Normalization is the process of organizing the data and the
attributes of a database. It is performed to reduce the data
redundancy in a database and to ensure that data is stored
logically. Data redundancy in DBMS means having the same
data but at multiple places. It is necessary to remove data
redundancy because it causes anomalies in a database which
makes it very hard for a database administrator to maintain it.
Advantages
Reduced data redundancy: Normalization helps to eliminate
duplicate data in tables, reducing the amount of storage
space needed and improving database efficiency.
Improved data consistency: Normalization ensures that
data is stored in a consistent and organized manner,
reducing the risk of data inconsistencies and errors.
Simplified database design: Normalization provides
guidelines for organizing tables and data relationships,
making it easier to design and maintain a database.
Improved query performance: Normalized tables are
typically easier to search and retrieve data from, resulting in
faster query performance.
Easier database maintenance: Normalization reduces the
complexity of a database by breaking it down into smaller,
more manageable tables, making it easier to add, modify,
and delete data.
Functional Dependency
Functional Dependency is the relationship between
attributes(characteristics) of a table related to each other. The
functional dependency of A on B is represented by A → B,
where A and B are the attributes of the relation.
Example:
roll_no name dept_namedept_building
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2
From the above table we can conclude some valid functional
dependencies:
roll_no → { name, dept_name, dept_building },→ Here,
roll_no can determine values of fields name, dept_name and
dept_building, hence a valid Functional dependency
roll_no → dept_name , Since, roll_no can determine whole
set of {name, dept_name, dept_building}, it can determine its
subset dept_name also.
dept_name → dept_building , Dept_name can identify the
dept_building accurately, since departments with different
dept_name will also have a different dept_building
More valid functional dependencies: roll_no → name,
{roll_no, name} ⇢ {dept_name, dept_building}, etc.
Types of Functional dependency
1. Trivial functional dependency
o A → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B
Example:
1. Consider a table with two columns Employee_Id and Employee_Name.
2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional
dependency as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee
_Name are trivial dependencies too.
2. Non-trivial functional dependency
o A → B has a non-trivial functional dependency if B is not a subset
of A.
o When A intersection B is NULL, then A → B is called as complete
non-trivial.
Example:
1. ID → Name,
2. Name → DOB
3. Multivalued Functional Dependency in DBMS
In Multivalued functional dependency, attributes in
the dependent set are not dependent on each other.
For example, X → { Y, Z }, if there exists is no
functional dependency between Y and Z, then it is
called as Multivalued functional dependency.
For example, consider the Employee table below.
Employee_Id Name Age
1 Zayn 24
2 Phobe 34
3 Hikki 26
4 David 29
4 Phobe 24
Here, { Employee_Id } → { Name, Age } is a
Multivalued functional dependency, since the
dependent attributes Name, Age are
not functionally dependent(i.e. Name → Age or Age
→ Name doesn’t exist !).
4. Transitive Functional Dependency in DBMS
Consider two functional dependencies A → B and B
→ C then according to the transitivity axiom A →
C must also exist. This is called a transitive
functional dependency.
In other words, dependent is indirectly dependent
on determinant in Transitive functional dependency.
For example, consider the Employee table below.
Employee_Id Name Department Street Number
1 Zayn CD 11
2 Phobe AB 24
3 Hikki CD 11
4 David PQ 71
5 Phobe LM 21
Here, { Employee_Id → Department
} and { Department → Street Number } holds
true. Hence, according to the axiom of
transitivity, { Employee_Id → Street Number } is a
valid functional dependency.
Types of Normal Forms:
Normalization works through a series of stages called Normal
forms. The normal forms apply to individual relations. The
relation is said to be in particular normal form if it satisfies
constraints.
First Normal Form (1NF)
A relation is in 1NF if every attribute is a single-valued attribute
or it does not contain any multi-valued or composite attribute,
i.e., every attribute is an atomic attribute. If there is a composite
or multi-valued attribute, it violates the 1NF. To solve this, we
can create a new row for each of the values of the multi-valued
attribute to convert the table into the 1NF.
Let’s take an example of a relational table <EmployeeDetail>
that contains the details of the employees of the company.
<EmployeeDetail>
Employee Code Employee Name Employee Phone Number
101 John 98765623,998234123
101 John 89023467
102 Ryan 76213908
103 Stephanie 98132452
Here, the Employee Phone Number is a multi-valued attribute.
So, this relation is not in 1NF.
To convert this table into 1NF, we make new rows with each
Employee Phone Number as a new row as shown below:
<EmployeeDetail>
Employee Code Employee Name Employee Phone Number
101 John 998234123
101 John 98765623
101 John 89023467
102 Ryan 76213908
103 Stephanie 98132452
Second Normal Form (2NF)
The normalization of 1NF relations to 2NF involves the
elimination of partial dependencies. A partial dependency in
DBMS exists when any non-prime attributes, i.e., an attribute
not a part of the candidate key, is not fully functionally
dependent on one of the candidate keys.
For a relational table to be in second normal form, it must
satisfy the following rules:
1. The table must be in first normal form.
2. It must not contain any partial dependency, i.e., all non-prime
attributes are fully functionally dependent on the primary key.
If a partial dependency exists, we can divide the table to
remove the partially dependent attributes and move them to
some other table where they fit in well.
Let us take an example of the following
<EmployeeProjectDetail> table to understand what is partial
dependency and how to normalize the table to the second
normal form:
<EmployeeProjectDetail>
Employee Code Project ID Employee Name Project Name
101 P03 John Project103
101 P01 John Project101
102 P04 Ryan Project104
103 P02 Stephanie Project102
In the above table, the prime attributes of the table are
Employee Code and Project ID. We have partial dependencies in
this table because Employee Name can be determined by
Employee Code and Project Name can be determined by
Project ID. Thus, the above relational table violates the rule of
2NF.
The prime attributes in DBMS are those which are part of one or
more candidate keys.
To remove partial dependencies from this table and normalize
it into second normal form, we can decompose the
<EmployeeProjectDetail> table into the following three tables:
<EmployeeDetail>
Employee Code Employee Name
101 John
101 John
102 Ryan
103 Stephanie
<EmployeeProject>
Employee Code Project ID
101 P03
101 P01
102 P04
103 P02
<ProjectDetail>
Project ID Project Name
P03 Project103
P01 Project101
Project ID Project Name
P04 Project104
P02 Project102
Thus, we’ve converted the <EmployeeProjectDetail> table into
2NF by decomposing it into <EmployeeDetail>, <ProjectDetail>
and <EmployeeProject> tables. As you can see, the above
tables satisfy the following two rules of 2NF as they are in 1NF
and every non-prime attribute is fully dependent on the primary
key.
Third Normal Form (3NF)
The normalization of 2NF relations to 3NF involves the
elimination of transitive dependencies in DBMS.
For a relational table to be in third normal form, it must satisfy
the following rules:
1. The table must be in the second normal form.
2. No non-prime attribute is transitively dependent on the
primary key.
3. For each functional dependency X -> Z at least one of the
following conditions hold:
X is a super key of the table.
Z is a prime attribute of the table.
If a transitive dependency exists, we can divide the table to
remove the transitively dependent attributes and place them to
a new table along with a copy of the determinant.
Let us take an example of the following <EmployeeDetail> table
to understand what is transitive dependency and how to
normalize the table to the third normal form:
<EmployeeDetail>
Employee Employee Employee Employee
Code Name Zipcode City
101 John 110033 Model Town
101 John 110044 Badarpur
102 Ryan 110028 Naraina
103 Stephanie 110064 Hari Nagar
The above table is not in 3NF because it has Employee Code ->
Employee City transitive dependency because:
Employee Code -> Employee Zipcode
Employee Zipcode -> Employee City
Also, Employee Zipcode is not a super key and Employee City is
not a prime attribute.
To remove transitive dependency from this table and normalize
it into the third normal form, we can decompose the
<EmployeeDetail> table into the following two tables:
<EmployeeDetail>
Employee Code Employee Name Employee Zipcode
101 John 110033
101 John 110044
102 Ryan 110028
103 Stephanie 110064
<EmployeeLocation>
Employee Zipcode Employee City
110033 Model Town
110044 Badarpur
110028 Naraina
110064 Hari Nagar
Thus, we’ve converted the <EmployeeDetail> table into 3NF by
decomposing it into <EmployeeDetail> and
<EmployeeLocation> tables as they are in 2NF and they don’t
have any transitive dependency.
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form(BCNF) is an advanced version of
3NF as it contains additional constraints compared to 3NF.
For a relational table to be in Boyce-Codd normal form, it must
satisfy the following rules:
1. The table must be in the third normal form.
2. For every non-trivial functional dependency X -> Y, X is the
superkey of the table. That means X cannot be a non-prime
attribute if Y is a prime attribute.
A superkey is a set of one or more attributes that can uniquely
identify a row in a database table.
Let us take an example of the following
<EmployeeProjectLead> table to understand how to normalize
the table to the BCNF:
<EmployeeProjectLead>
Employee Code Project ID Project Leader
101 P03 Grey
101 P01 Christian
102 P04 Hudson
103 P02 Petro
The above table satisfies all the normal forms till 3NF, but it
violates the rules of BCNF because the candidate key of the
above table is {Employee Code, Project ID}. For the non-trivial
functional dependency, Project Leader -> Project ID, Project ID
is a prime attribute but Project Leader is a non-prime attribute.
This is not allowed in BCNF.
To convert the given table into BCNF, we decompose it into
three tables:
<EmployeeProject>
Employee Code Project ID
101 P03
101 P01
102 P04
103 P02
<ProjectLead>
Project Leader Project ID
Grey P03
Christian P01
Hudson P04
Petro P02
Thus, we’ve converted the <EmployeeProjectLead> table into
BCNF by decomposing it into <EmployeeProject> and
<ProjectLead> tables.
Decomposition In DBMS
Decomposition refers to the division of tables into multiple
tables to produce consistency in the data. When we divide a
table into multiple tables or divide a relation into multiple
relations, then this process is termed Decomposition in DBMS.
We perform decomposition in DBMS when we want to process
a particular data set. It is performed in a database
management system when we need to ensure consistency and
remove anomalies and duplicate data present in the database.
When we perform decomposition in DBMS, we must try to
ensure that no information or data is lost.
Types of Decomposition
There are two types of Decomposition:
Lossless Decomposition
Lossy Decomposition
Lossless Decomposition
The process in which where we can regain the original relation
R with the help of joins from the multiple relations formed
after decomposition. This process is termed as lossless
decomposition. It is used to remove the redundant data from
the database while retaining the useful information. The
lossless decomposition tries to ensure following things:
While regaining the original relation, no information should
be lost.
If we perform join operation on the sub-divided relations, we
must get the original relation.
Example:
There is a relation called R(A, B, C)
A B C
5 1 2
5 6 7
A B C
4 5 8
8 2 9
Now we decompose this relation into two sub relations R1 and
R2
R1(A, B)
A B
5 1
5 6
4 5
8 2
R2(B, C)
B C
1 2
6 7
5 8
2 9
After performing the Join operation we get the same original
relation
A B C
5 1 2
5 6 7
4 5 8
8 2 9
Lossy Decomposition
As the name suggests, lossy decomposition means when we
perform join operation on the sub-relations it doesn’t result to
the same relation which was decomposed. After the join
operation, we always found some extraneous tuples. These
extra tuples genrates difficulty for the user to identify the
original tuples.
Example:
We have a relation R(A, B, C)
A B C
1 2 1
2 5 3
3 3 3
Now , we decompose it into sub-relations R1 and R2
R1(A, B)
A B
1 2
2 5
3 3
R2(B, C)
B C
2 1
5 3
B C
3 3
Now After performing join operation
A B C
1 2 1
2 5 3
2 3 3
3 5 3
3 3 3
Properties of Decomposition
Lossless: All the decomposition that we perform in
Database management system should be lossless. All the
information should not be lost while performing the join on
the sub-relation to get back the original relation. It helps to
remove the redundant data from the database.
Dependency Preservation: Dependency Preservation is an
important technique in database management system. It
ensures that the functional dependencies between the
entities is maintained while performing decomposition. It
helps to improve the database efficiency, maintain
consistency and integrity.
Lack of Data Redundancy: Data Redundancy is generally
termed as duplicate data or repeated data. This property
states that the decomposition performed should not suffer
redundant data. It will help us to get rid of unwanted data
and focus only on the useful data or information.
Multivalued Dependency (MVD) in DBMS
Multivalued dependency (MVD) occurs when two attributes (B
and C) in a table independently depend on a third attribute (A).
This means A's value determines multiple values for both B and
C. It's a complex relationship that can impact database
normalization and integrity.
To illustrate: A person (A) can have multiple projects (B) and
hobbies (C). In this case, Project and Hobby are multivalued
attributes. MVD is written as A -->> B, meaning A determines
multiple values of B.
Let's have an example to understand multi-valued dependency:
The below table shows the details of an office department
exchange event having the
columns, EMPLOYEE_ID, DEPARTMENT, and HOBBY.
EMPLOYEE_ID DEPARTMENT HOBBY
E901 HR Badminton
E901 Sales Reading
E902 Marketing Cricket
E903 Finance Football
As you can see in the above table, Employee E901 is interested
in two departments HR and Sales, and, has two hobbies
Badminton and Reading. This will result in multiple records
for E901 as,
EMPLOYEE_ID DEPARTMENT HOBBY
E901 HR Badminton
E901 Sales Reading
EMPLOYEE_ID DEPARTMENT HOBBY
E901 HR Reading
E901 Sales Badminton
In the above table, you can see that for the Employee E901
multiple records exist in the DEPARTMENT and the HOBBY
attribute. Hence the multi-valued dependencies are,
EMPLOYEE_ID −>−> DEPARTMENT and
EMPLOYEE_ID −>−> HOBBY
Also, the DEPARTMENT and HOBBY attributes are independent
of each other thus leading to a multi-valued dependency in the
above table.
Fourth Normal Form (4NF)
Rules for 4NF
A table is said to be in 4NF if the following conditions are met,
The table is in Boyce-Codd Normal Form (BCNF).
The table does not have multi-valued dependency.
Let's see how we can satisfy the conditions of 4NF in DBMS by
looking at the example we used in Multi-Valued Dependency
EMPLOYEE_ID DEPARTMENT HOBBY
E901 HR Badminton
E901 Sales Reading
E902 Marketing Cricket
E903 Finance Football
The multi-valued dependencies in the above table are,
EMPLOYEE_ID −>−> DEPARTMENT and
EMPLOYEE_ID −>−> HOBBY
To satisfy the fourth normal form, we can decompose the
relation into two tables,
EMPLOYEE_ID DEPARTMENT
E901 HR
E901 Sales
E902 Marketing
E903 Finance
and,
EMPLOYEE_ID HOBBY
E901 Badminton
E901 Reading
E902 Cricket
E903 Football
Now the above relation is in 4NF.
Join Dependency
Join Dependency is similar to Multi-Valued Dependency as Join
Dependency is also a constraint.
A Join Dependency on a relation schema R, specifies a
constraint on states, r of R that every legal state r of R should
have a lossless join decomposition into 𝑅 1R1, 𝑅 2R2,..., 𝑅
𝑛 Rn. In a database management system, join dependency is a
generalization of the idea of multivalued dependency.
Let R be a relation schema and the decompositions of R are 𝑅
1,𝑅 2,𝑅 3,...,𝑅 𝑛 R1,R2,R3,...,Rn. R is said to be in Join
Dependency if and only if every instance of R, r is equal to the
join of its projections on 𝑅 1,𝑅 2,𝑅 3,...,𝑅 𝑛 R1,R2,R3,...,Rn.
Example of Join Dependency
Suppose we have the following table R:
E_Name Company Product
Rohan Comp1 Jeans
Harpreet Comp2 Jacket
Anant Comp3 TShirt
We can break, or decompose the above table into three tables, this
would mean that the table is not in 5NF!
The three decomposed tables would be:
1. R1: The table with columns E_Name and Company.
E_Name Company
Rohan Comp1
Harpreet Comp2
Anant Comp3
2. R2: The table with columns E_Name and Product.
E_Name Product
Rohan Jeans
Harpreet Jacket
Anant TShirt
3. R3: The table with columns Company and Product.
Company Product
Comp1 Jeans
Company Product
Comp2 Jacket
Comp3 TShirt
Let's try to figure out whether or not R has a join dependency.
Step 1- First, the natural join of R1 and R2:
E_Name Company Product
Rohan Comp1 Jeans
Harpreet Comp2 Jacket
Anant Comp3 TShirt
Step 2- Next, let's perform the natural join of the above table
with R3:
E_Name Company Product
Rohan Comp1 Jeans
Harpreet Comp2 Jacket
Anant Comp3 TShirt
In the above example, we do get the same table R after
performing the natural joins at both steps, luckily.
Therefore, our join dependency comes out to be: {(E_Name,
Company ), (E_Name, Product), (Company, Product)}
Because the above-mentioned relations are joined dependent,
they are not 5NF. That is, a join relation of the three relations
above is equal to our initial relation table R.
Join Dependencies and Fifth Normal Form (5NF)
If a relation is in 4NF and does not contain any join dependencies,
it is in 5NF.
To avoid redundancy, 5NF is satisfied when all tables are divided
into as many tables as possible.
Conclusion: if a relation has join dependency, it won't be in 5NF.