Dbms Notes Unit II
Dbms Notes Unit II
DATABASE DESIGN
Entity:
∙ An entity can be a real-world object that can be easily identifiable.
∙ Example:
◦ In a school database, students, teachers, classes, and courses offered can be
considered as entities.
∙ All entities have some attributes or properties that give them
their identity. ∙ An entity set is a collection of similar types of
entities.
∙ An entity set may contain entities with attribute sharing similar values.
∙ Example:
◦ A Students set may contain all the students of a school; likewise a Teachers set
may contain all the teachers of a school from all faculties.
Attributes:
∙ Entities are represented by means of their properties, called attributes.
∙ All attributes have values.
∙ Example:
◦ a student entity may have name, class, and age as attributes.
∙ There exists a domain or range of values that can be assigned
to attributes. ∙ Example:
◦ a student's name cannot be a numeric value. It has to be alphabetic. A student's
age cannot be negative, etc.
Types of Attributes:
∙ Simple attribute − Simple attributes are atomic values, which cannot be divided
further. Example: a student's phone number is an atomic value of 10 digits.
∙ Composite attribute − Composite attributes are made of more than one simple attribute.
1
Example: a student's complete name may have first_name and last_name.
∙ Derived attribute − Derived attributes are the attributes that do not exist in the
physical database, but their values are derived from other attributes present in the
database. Example: average_salary in a department should not be saved
Directly in the database, instead it can be derived.
Example: age can be derived from data_of_birth.
∙ Single-value attribute − Single-value attributes contain single value.
Example: Social_Security_Number.
∙ Multi-value attribute − Multi-value attributes may contain more than
one values. Example: a person can have more than one phone
number,email_address, etc.
These attribute types can come together in a way like −
∙ simple single-valued attributes
∙ simple multi-valued attributes
∙ composite single-valued attributes
∙ composite multi-valued attributes
Keys:
Key is an attribute or collection of attributes that uniquely identifies an entity
among entity set.
Example: the roll_number of a student makes him/her identifiable among students.
∙ Super Key − A set of attributes (one or more) that collectively identifies an entity
in an entity set.
Example: consider the student relation, student (rollno,Name ,Age) is a super
key.
∙ Candidate Key − A minimal super key is called a candidate key. An entity set may
have more than one candidate key.
Example: consider the car relation car
(license_no,engine_serial_no,make,model,year)
The candidate keys are license_no and engine_serial_no
∙ Primary Key − An attribute which is unique and not null, can identify an instance
of the entity set is termed as primary key. A primary key is one of the candidate
keys chosen by the database designer to uniquely identify the entity set.
Example: consider the employee relation, Employee
(eno,ename,doj,sal,job,dno) in this eno is the primary key.
∙ Foreign key: An attribute in one relation whose value matches the primary key in
some other relation is called a foreign key.
Example: consider the two relations employee and dept,
Employee(eno,ename,doj,sal,job,dno)
Dept(dno,dname,dloc)
In the above relations, the primary key of dept 'dno' is present in the
employee relation so that employee relation dno is known as foreign
key.
Relationship:
∙ The association among entities is called a relationship.
∙ Example:
◦ An employee works_at a department, a student enrolls in a course. Here,
Works_at and Enrolls are called relationships.
2
Relationship Set:
A set of relationships of similar type is called a relationship set. Like entities,
a relationship too can have attributes. These attributes are called descriptive
attributes.
Degree of Relationship:
The number of participating entities in a relationship defines the degree of the
relationship. ∙ Binary = degree 2
∙ Ternary = degree 3
∙ n-ary = degree n
Mapping Cardinalities:
Cardinality defines the number of entities in one entity set, which can be
associated with the number of entities of other set via relationship set.
∙ One-to-one − One entity from entity set A can be associated with at most one entity
of entity set B and vice versa.
however an entity
from entity set B,
can be associated
with at most one
entity.
∙ Many-to-one − More than one entities from entity set A can be associated with at
most one entity of entity set B, however an entity from entity set B can be
associated with more than one entity from entity set A.
∙ Many-to-many − One entity from A can be associated with more than one entity
from B and vice versa.
attribute and is
directly connected to
its entity (rectangle).
If the attributes are composite, they are further divided in a tree like structure.
Every node is then connected to its attribute. That is, composite attributes are
represented by ellipses that are connected with an ellipse.
Relationship
Relationships are represented by diamond-shaped box. Name of the relationship
is written inside the diamond-box. All the entities (rectangles) participating in a
relationship, are connected to it by a line.
∙ Many-to-one − When more than one instance of entity is associated with the
relationship, it is marked as 'N:1'.
∙ Many
-to
many
− The following image reflects that more than one instance of an entity on the
left and more than one instance of an entity on the right can be associated with
the relationship. It depicts many-to-many relationship.
Participation Constraints:
represe
nted by
single
lines.
6
2.3 ENHANCED-ER MODEL (EER Model):
∙ The EER model includes all the basic modeling concepts of the ER model. ∙ In
addition, it includes the concepts of subclass and superclass and the related
concepts of specialization and generalization.
∙ Another concept included in the EER model is that of a category or union type ,
which is used to represent a collection of objects (entities) that is the union of
objects of different entity types.
∙ EER is used to model concepts more accurately than the ER diagram.
participates. Specialization:
7
The subset symbol on each line connecting a subclass to the circle indicates the
direction of the superclass/subclass relationship.
∙ Attributes that apply only to entities of a particular subclass—such as TypingSpeed
of SECRETARY—are attached to the rectangle representing that subclass.
These are called specific attributes (or local attributes) of the subclass.
∙ A subclass can participate in specific relationship types, such as the
HOURLY_EMPLOYEE subclass participating in the BELONGS_TO
relationship.
Generalization:
8
we can view {CAR, TRUCK} as a specialization of VEHICLE, rather than
viewing VEHICLE as a generalization of CAR and TRUCK.
∙ Several specializations can be defined on the same entity type.In such a case,
entities may belong to subclasses in each of the specializations.
∙ A specialization may also consist of a single subclass only.
∙ In some specializations we can determine exactly the entities that will become
members of each subclass by placing a condition on the value of some attribute
of the superclass. Such subclasses are called predicate-defined (or condition-
defined) subclasses
9
∙ Example:
◦ If the EMPLOYEE entity type has an attribute Job_type we can specify the
condition of membership in the SECRETARY subclass by the condition
(Job_type = ‘Secretary’), which we call the defining predicate of the subclass.
◦ A predicate-defined subclass is indicated by writing the predicate condition next
to the line that connects the subclass to the specialization circle.
∙ If all subclasses in a specialization have their membership condition on the same
attribute of the superclass, the specialization itself is called an attribute-defined
specialization, and the attribute is called the defining attribute of the
specialization.
∙ Two other constraints may apply to a specialization.
1.disjointness (or disjointedness) constraint, which means that an
entity can be a member of at most one of the subclasses of the
specialization.
2. completeness (or totalness) constraint, which may be total or partial.
5. One course is taught by only one instructor. But one instructor teaches many courses.
Hence the cardinality between course and instructor is Many to One (N :1)
Step 3: Identify the key attributes
c. {Ssn, Pnumber}→Hours
These functional dependencies specify that (a) the value of an employee’s
Social Security number (Ssn) uniquely determines the employee name (Ename), (b)
the value of a project’s number (Pnumber) uniquely determines the project name
(Pname) and location (Plocation), and (c) a combination of Ssn and Pnumber values
uniquely determines the number of hours the employee currently works on the project
per week (Hours).
13
Types
(a) Full functional dependency (b)
Partial functional dependency
(c) Transitive functional
dependency
14
(a) Full dependencies
∙ In a relation R, X and Y are attributes. X functionally determines Y. Subset of X
should not functionally determine Y.
∙ Example:
Student_no Course_no
Marks
X
Y
X
∙ Example:
YZZ
❖ Test relations to see if they are legal under a given set of functional dependencies. If a
relation r is legal under a set F of functional dependencies, we say that r satisfies
F.
❖ Specify constraints on the set of legal relations.
❖ A specific instance of a relation schema must satisfy a functional dependency
even if the functional dependency does not hold on all legal instances.
For example, a specific instance of loan-schema may by chance satisfy
Loan_no → customer_name
15
❖ A functional dependency is trivial if it is satisfied by all instance of a relation. For
example, A → A is satisfied by all relations involving attribute A. AB → A is
Armstrong’s Axioms:
AC implies D. Example:
Anomalies:
Anomalies can be classified into insertion anomalies, deletion anomalies, and
modification anomalies.
Insertion Anomalies:
Insertion anomalies can be differentiated into two types, illustrated by the
following examples based on the EMP_DEPT relation:
■ To insert a new employee tuple into EMP_DEPT, we must include either the
attribute values for the department that the employee works for, or NULLs . For
example, to insert a new tuple for an employee who works in department number 5, we
must enter all the attribute values of department 5 correctly so that they are consistent
with the corresponding values for department 5 in other tuples in EMP_DEPT.
■ It is difficult to insert a new department that has no employees as yet in the
EMP_DEPT relation. The only way to do this is to place NULL values in the attributes
for employee. This violates the entity integrity for EMP_DEPT because Ssn is its
primary key.
16
Deletion Anomalies:
The problem of deletion anomalies is related to the second insertion anomaly
situation. If we delete from EMP_DEPT an employee tuple that happens to represent
the last employee working for a particular department, the information concerning that
department is lost from the database.
This problem does not occur in the database were tuples are stored separately.
Modification Anomalies:
In EMP_DEPT, if we change the value of one of the attributes of a particular
department— say, the manager of department 5—we must update the tuples of all
employees who work in that department; otherwise, the database will become
inconsistent.
If we fail to update some tuples, the same department will be shown to have
two different values for manager in different employee tuples, which would be wrong.
17
2.6 DECOMPOSITION
A single decomposition schema R={A ,A ,…..A } that includes all the attributes
1 2 n
of the database. Every attributes name is unique using the FDs, the algorithms
decomposes the universal relation schema R into a set of relation schema D={R , R , 1 2
….R } that will become the relational database schema D is called decomposition of
m
R.
We must make sure that each attribute in R will appear in atleast one relation
schema Ri in the decomposition so that no attributes are lost formally.
m
i=1
This is called attribute preservation condition of decomposition.
Properties of Decomposition
(1) Dependency preservation.
(2) Lossless (or non additive) join property.
Dependency Preservation
If each functional dependency X → Y specified in F either appeared directly in
one of the relation schemas R in the decomposition D or could be from the
i
condition.
Definition
Given a set of dependencies F on R, the projection of F on R i, denoted by Π Ri(F)
where Ri-subset of R, is the set of dependencies X → Y in F+ such that the attributes in XUY are all
condition in Ri.
Hence the projection of F on each relation schema Ri in the decomposition D is
the set of FDs in F , such that all their left and RHS attributes are in Ri.
+
(Π R ,(R))∪.....∪(Π R (F))) =F
(ie)
+ +
im
property with respect to the set of dependencies F on r if, for every relation state r of R
that satisfies F, the following holds, where (πR (r), ……πR (r)) = r is the natural join
i m
2.7 NORMALIZATION:
∙ Normalization is a process of organizing the data in the database.
∙ It is a systematic approach of decomposing tables to eliminate data
redundancy. ∙ It was developed by E. F. Codd.
∙ Normalization is a multi-step process that puts the data into a tabular form by
removing the duplicate data from the relation tables.
∙ It is a step by step decomposition of complex records into simple records.
∙ It is also called as Canonical Synthesis.
18
∙ It is the technique of building database structures to store data.
Definition of Normalization
“Normalization is a process of designing a consistent database by minimizing
redundancy and ensuring data integrity through decomposition which is
lossless.”
Features of Normalization
∙ Normalization avoids the data redundancy.
∙ It is a formal process of developing data structures.
∙ It promotes the data integrity.
∙ It ensures data dependencies make sense that means data is logically stored. ∙ It
eliminates the undesirable characteristics like Insertion, Updation and Deletion
Anomalies.
Types of Normalization
Following are the types of Normalization:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. Fourth Normal Form
5. Fifth Normal Form
6. BCNF (Boyce – Codd Normal Form)
7. DKNF (Domain Key Normal Form)
1) The domain of DLOCATIONS contains atomic values, but some tuples can
have a set of these values. Therefore DLOCATIONS is not functionally dependent on
DNUMBER.
DNAME DNUMBER DMGRSSN DLOCATIONS
19
2) Domain of DLOCATIONS has set of values and hence non-atomic. But
DNUMBER→ DLOCATIONS exist.
DNAME DNUMBER DMGRSSN DLOCATIONS
Research 5 334
DNUMBER DLOCATIONS
5 Bellaire
5 Alaska
5 Newyork
20
E101 P27 90 Raju Systems ABC
21
Pname & PLocation violates 2NF. FD2 and FD3 are partially dependent on primary key.
Normalisation to achieve 2NF:
Test for Second Normal Form
Ename,
For relations where primary key contains multiple attributes, non-key or non-
prime attribute should not be functionally dependent on a part of primary key.
Remedy:
Decompose and set up a new relation for each partial key with its dependent
attribute(s). Make sure to keep a relation with the original primary key and any
attributes that are fully functionally dependent on it.
22
Transitive Dependency:
• Find and remove non-key attributes that are functionally dependent on the
attributes that are not the primary key. Place them in a different table.
• Group the remaining attributes.
E1 Veronica P2 48
23
E2 Anthony P5 100
E3 Mac P6 15
E4 Susan P3 250
E4 Susan P5 75
E1 Veronica P5 40
Employee
Ecode Name
E1 Veronica
E2 Anthony
E3 Mac
E4 Susan
E4 Susan
E1 Veronica
Project
Proj Code Hours
P2 48
P5 100
P6 15
P3 250
P5 75
24
P5 40
Ecode
E1
Proj Code P2
25
E2
E3
E4
E4
E1
P5 P6 P3 P5 P5
Multi-valued Dependency:
A table is said to have multi-valued dependency, if the following conditions are true,
1.For a dependency A → B, if for a single value of A, multiple value of B exists,
then the table may have multi-valued dependency.
2.Also, a table should have at-least 3 columns for it to have a multi-valued
dependency. 3.And, for a relation R(A,B,C), if there is a multi-valued
dependency between, A and B, then B and C should be independent of each
other.
Pizza Delivery:
Restaurant Pizza Variety Delivery Area
The {Restaurant} attribute depends on both Pizza Variety and Delivery Area. The
dependencies are: ∙ {Restaurant} {Pizza Variety}
∙ {Restaurant} {Delivery Area}
A1 Pizza Springfield
26
Fifth normal form (5NF), also known as project-join normal form (PJ/NF)
is a level of database normalization designed to reduce redundancy.
A database is said to be in 5NF, if and only if,
∙ It's in 4NF
∙ If we can decompose table further to eliminate redundancy and anomaly, and when
we re join the decomposed tables by means of candidate keys, we should not be
losing the original data or any new record set should not arise.
Traveling Brand
Salesman
Louis Robusto
Ferguson
Louis Acme
Ferguson
Louis Nimbus
Ferguson
Acme Breadbox
Acme Lava Lamp
Robusto Breadbox
Robusto Telescope
28