DBMS Module 3 Referential Integirty
DBMS Module 3 Referential Integirty
Integrity constraints are a set of rules. It is used to maintain the quality of information.
Integrity constraints ensure that the data insertion, updating, and other processes have to
be performed in such a way that data integrity is not affected. Thus, integrity constraint is
used to guard against accidental damage to the database
1. Domain constraints
Domain constraints can be defined as the definition of a valid set of values for an attribute.
The data type of domain includes string, character, integer, time, date, currency, etc. The
value of the attribute must be available in the corresponding domain.
Referential integrity refers to the relationship between tables. Because each table in a database
must have a primary key, this primary key can appear in other tables because of its relationship
to data within those tables. When a primary key from one table appears in another table, it is
called a foreign key. Foreign keys join tables and establish dependencies between tables.
tables can form a hierarchy of dependencies in such a way that if a change or delete a row in
one table, then it destroys the meaning of rows in other tables.
In Short,
A referential integrity constraint is specified between two tables. In the Referential integrity
constraints, if a foreign key in Table 1 refers to the Primary Key of Table 2, then every value
of the Foreign Key in Table 1 must be null or be available in Table 2.
4. Key constraints
Keys are the entity set that is used to identify an entity within its entity set uniquely. An
entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically
exists between the primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if an Emp_Id is known, that employee name associated with it can be found out
Emp_Id → Emp_Name
ie, Emp_Name is functionally dependent on Emp_Id.
42 abc 17
43 pqr 18
44 xyz 18
For example,
42 abc 17
43 pqr 18
44 xyz 18
42 abc 17
43 pqr 18
44 xyz 18
45 abc 19
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
Here, enrol_no → dept and dept → building_no, Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an indirect
functional dependency, hence called Transitive functional dependency.
DATABASE NORMALIZATION
Keys
Keys play an important role in the relational database. It is used to uniquely identify any record
or row of data from the table. It is also used to establish and identify relationships between
tables.
For example: In Student table, ID is used as a key because it is unique for each student. In
PERSON table, passport_number, license_number, SSN are keys since they are unique for
each person.
Types of key:
1. Primary key
o It is the first key which is used to identify one and only one instance of an entity
uniquely. An entity can contain multiple keys in PERSON table. The key which is most
suitable from those lists become a primary key.
o In the EMPLOYEE table, ID can be primary key since it is unique for each employee. In
the EMPLOYEE table, one can select License_Number and Passport_Number as primary
key since they are also unique.
o For each entity, selection of the primary key is based on requirement and developers.
2. Candidate key
A candidate key is an attribute or set of an attribute which can uniquely identify a tuple.
The remaining attributes except for primary key are considered as a candidate key. The
candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. Rest of the attributes
like SSN, Passport_Number, and License_Number, etc. are considered as a candidate key.
3. Super Key
Super key is a set of an attribute which can uniquely identify a tuple. Super key is a superset
of a candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME) the name of
two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this
combination can also be a key.
4. Foreign key
Foreign keys are the column of the table which is used to point to the primary key of another
table. In a company, every employee works in a specific department, and employee and
department are two different entities. So we can't store the information of the department in
the employee table. That's why we link these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id as a new attribute in
the EMPLOYEE table.
o Now in the EMPLOYEE table, Department_Id is the foreign key, and both the tables are
related.
Normalization
Normalization is the process of organizing the data in the database. Normalization is used to
minimize the redundancy from a relation or set of relations. It is also used to eliminate the
undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization
divides the larger table into the smaller table and links them using relationship. The normal
form is used to reduce redundancy from the database table.
Normal Description
Form
A relation will be 1NF if it contains an atomic value. It states that an attribute of a table cannot
hold multiple values. It must hold only single-valued attribute. First normal form disallows the
multi-valued attribute, composite attribute, and their combinations.
EMPLOYEE table:
14 John 7272826385, UP
9064738238
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
In the 2NF, relational must be in 1NF. In the second normal form, all non-key attributes are fully
functional dependent on the primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.
TEACHER Table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a
proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL Table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
Third Normal Form (3NF)
A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency. 3NF is used to reduce the data duplication. It is also used to achieve the
data integrity. If there is no transitive dependency for non-prime attributes, then the
relation must be in third normal form.
A relation is in third normal form if it holds at least one of the following conditions for every
non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL Table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Hence a change is to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP>
table, with EMP_ZIP as a Primary key.
So the decomposition will be as follows
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Boyce Codd Normal Form (BCNF)
BCNF is the advance version of 3NF. It is stricter than 3NF. A table is in BCNF if every functional
dependency X → Y, X is the super key of the table. For BCNF, the table should be in 3NF, and for
every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department.
Employee
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys. To convert the
given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.
Example
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Fifth normal form (5NF) A relation is in 5NF if it is in 4NF and not contains any join dependency
and joining should be lossless. 5NF is satisfied when all the tables are broken into as many
tables as possible in order to avoid redundancy. 5NF is also known as Project-join normal form
(PJ/NF).
So to make the above table into 5NF, it can decompose into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Inference Rule (IR) or Axioms:
The Armstrong's axioms are the basic inference rule. Armstrong's axioms are used to conclude
functional dependencies on a relational database. The inference rule is a type of assertion. It
can apply to a set of FD(functional dependency) to derive other FD. Using the inference rule,
one can derive additional functional dependency from the initial set.
If X ⊇ Y then X → Y
Example:
X = {a, b, c, d, e}
Y = {a, b, c}
If X → Y then XZ → YZ
Example:
In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
If X → Y and Y → Z then X → Z
Union rule says, if X determines Y and X determines Z, then X must also determine Y and
Z.
If X → Y and X → Z then X → YZ
Proof:
1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4
Decomposition rule is also known as project rule. It is the reverse of union rule.
This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
If X → YZ then X → Y and X → Z
Proof:
1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2
If X → Y and YZ → W then XZ → W
Proof:
1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
Decomposition
Decomposition is the process of breaking down in parts or elements. It replaces a relation with a
collection of smaller relations. It breaks the table into multiple tables in a database. It should always be
lossless, because it confirms that the information in the original relation can be accurately reconstructed
based on the decomposed relations. If there is no proper decomposition of the relation, then it may lead
to problems like loss of information.
Properties of Decomposition
1. Lossless Decomposition
Decomposition must be lossless. It means that the information should not get lost from the relation that
is decomposed. It gives a guarantee that the join will result in the same relation as it was decomposed.
Example:
Let's take 'E' is the Relational Schema, With instance 'e'; is decomposed into: E1, E2, E3, . . . . En; With
instance: e1, e2, e3, . . . . en, If e1 ⋈ e2 ⋈ e3 . . . . ⋈ en, then it is called as 'Lossless Join Decomposition'.
In the above example, it means that, if natural joins of all the decomposition give the original relation,
then it is said to be lossless join decomposition.
Employee_Department Table
Decompose the above relation into two relations to check whether decomposition is lossless or lossy.
Now, we have decomposed the relation that is Employee and Department.
Employee
Department
So, the above decomposition is a Lossless Join Decomposition, because the two relations contain one
common field that is 'Eid' and therefore join is possible. Now apply natural join on the decomposed
relations will give the original Employee_Department table. If the <Employee> table contains (Eid,
Ename, Age, City, Salary) and <Department> table contains (Deptid and DeptName), then it is not
possible to join the two tables or relations, because there is no common column between them. And it
becomes Lossy Join Decomposition
2. Attribute Preservation:
Using functional dependencies the algorithms decompose the universal relation schema R in a set of
relation schemas D = { R1, R2, ….. Rn } relational database schema, where ‘D’ is called the Decomposition
of R. The attributes in R will appear in at least one relation schema Ri in the decomposition, i.e., no
attribute is lost. This is called the Attribute Preservation condition of decomposition.
3. Dependency Preservation
Dependency is an important constraint on the database. Every dependency must be satisfied by at least
one decomposed table. If {A → B} holds, then two sets are functional dependent. And, it becomes more
useful for checking the dependency easily if both sets in a same relation. This decomposition property
can only be done by maintaining the functional dependency. In this property, it allows to check the
updates without computing the natural join of the database structure.
Lack of Data Redundancy is also known as a Repetition of Information. The proper decomposition should
not suffer from any data redundancy. The careless decomposition may cause a problem with the data.
The lack of data redundancy property may be achieved by Normalization process.
2. Every table should have: No columns, only depending on some of the primary key. (This only applies,
if the primary key is composite, and there's columns not in the primary key.)
3. Every table should have: No columns not depending on the primary key at all.
A repeating group is a series of fields/attributes that are repeated throughout a database table. It is a
common problem faced by organizations both large and small. The same set of information being
present in different areas can cause data redundancy and data inconsistency. All of this repeating data
can eat up a lot of valuable disk space and take a long of time to search through.
Fixing Repeating Groups
Even though repeating groups are not, a violation of first normal form (1NF), the process of converting
your data from Un-Normalized Form (UNF) to 1NF will eliminate repeating groups. Here are the steps for
doing that:
2. Remove the repeating group fields to a new table, leaving a copy of the primary key with the table
that is left.
3. The original primary key will not now be unique so assign a new primary key to the relation using the
original primary key as part of a composite key.
Redundancy means having multiple copies of same data in the database. This problem arises when a
database is not normalized. Suppose a table of student details attributes are: student Id, student name,
college name, college rank, course opted.
Values of attribute college name, college rank, course is being repeated which can lead to problems.
Problems caused due to redundancy are: Insertion anomaly, Deletion anomaly, and Updation anomaly.
1. Insertion Anomaly – If a student detail has to be inserted whose course is not being decided yet
then insertion will not be possible till the time course is decided for student. This problem
happens when the insertion of a data record is not possible without adding some additional
unrelated data to the record.
2. Deletion Anomaly – If the details of students in this table are deleted then the details of college
will also get deleted which should not occur by common sense.
This anomaly happens when deletion of a data record results in losing some unrelated
information that was stored as part of the record that was deleted from a table.
It is not possible to delete some information without loosing some other information in the
table as well.
3. Updation Anomaly – Suppose if the rank of the college changes then changes will have to be
all over the database which will be time-consuming and computationally costly. If updation do
not occur at all places then database will be in inconsistent state.
The database design rule of removing columns not dependent on the key. Similar to removing
redundant data, this rule will make the database more flexible and maintainable throughout the
lifecycle of the system.
A basic example of columns not dependent on the key can be shown in the following table:
Book ID , Book Title, Book Publish Date, Publisher Name, Publisher Address, Publisher Phone
There are two separate entities stored in the same table – book information, and for each book, that
book’s publisher information. If every book had a different publisher, this would be fine. More than
likely, however, many books will share the same publisher. If the publisher’s phone number changes,
the users will need to go through every single book from that publisher, and make sure they update the
phone number. This leads to a lot of overhead for users. In addition, if one of the users forgets to
change a row, as they often do, the data integrity will be compromised, and some rows will have
incorrect information.
Another weakness of this approach is that there is no capability to add a publisher that is not associated
with a book. Let’s say that a publisher is connected to a book which is removed from the system. Once
we delete that book, we would lose the publisher’s address and phone number as well.
The solution to this dilemma is to add an ID column, and separate the table into two tables, one for
books, and one for publishers:
Book
Book ID
Book Title
Book Publish Date
Publisher ID
Publisher
Publisher ID
Publisher Name
Publisher Address
Publisher Phone
This database schema update will make sure that publisher updates are be automatically propagated to
any related books. It will also remove the storage space wasted by repeating publisher details on each
book record, and result in a much more stable foundation for the software development.