KEMBAR78
Dbms Notes | PDF | Databases | Conceptual Model
0% found this document useful (0 votes)
77 views206 pages

Dbms Notes

Uploaded by

gynanvi.swetha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views206 pages

Dbms Notes

Uploaded by

gynanvi.swetha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 206

UNIT-I

Database System Applications: A Historical Perspective, File Systems versus a DBMS, the Data

Model, Levels of Abstraction in a DBMS, Data Independence, Structure of a DBMS

Introduction to Database Design: Database Design and ER Diagrams, Entities, Attributes, and
Entity Sets, Relationships and Relationship Sets, Additional Features of the ER Model,
Conceptual Design With the ER Model

Data:

Data is the known facts or figures that have implicit meaning. It can also be defined as it is the
representation of facts, concepts or instruction in a formal manner, which is suitable for
understanding and processing. Data can be represented in alphabets (A-Z, a-z),in digits(0-9) and
using special characters(+,-.#,$, etc) e. g: 25, “ ajit ” etc.

Information:

Information is the processed data on which decisions and actions are based. Information can be
defined as the organized and classified data to provide meaningful values. Eg: “The age of Ravi
is 25”

Eg: “The age of Ravi is 25”

DBMS process the data stored in the files into information.

File: File is a collection of related data stored in secondary memory.

File oriented approach:

The traditional file oriented approach to information processing has for each application a
separate master file and its own set of personal file. In file oriented approach the program
dependent on the files and files become dependent on the files and files become dependents upon
the programs.

Disadvantages of file oriented approach:

1) Data redundancy and inconsistency: The same information may be written in several files.
This redundancy leads to higher storage and access cost. It may lead data inconsistency that is
the various copies of the same data may longer agree for example a changed customer address
may be reflected in single file but not elsewhere in the system.

2) Difficulty in accessing data: The conventional file processing system does not allow data to
retrieve in a convenient and efficient manner according to user choice.
3) Data isolation: Because data are scattered in various file and files may be in different formats
with new application programs to retrieve the appropriate data is difficult.

4) Integrity Problems: Developers enforce data validation in the system by adding appropriate
code in the various application programs. However when new constraints are added, it is difficult
to change the programs to enforce them.

5) Atomicity: It is difficult to ensure atomicity in a file processing system when transaction


failure occurs due to power failure, networking problems etc. (atomicity: either all operations of
the transaction are reflected properly in the database or non are)

6) Concurrent access: In the file processing system it is not possible to access a same file for
transaction at same time.

7) Security problems: There is no security provided in file processing system to secure the data
from unauthorized user access.

Database:

It is a collection of interrelated data. These can be stored in the form of tables. A database can be
of any size and varying complexity.

Example: Customer database consists the fields as cname, cno, and ccity

Cno Cname Ccity


1 Hari Hyd
2 Sure Ban
Fig: Customer database

Database Management System (DBMS): A database management system consists of collection


of related data and refers to a set of programs for defining, creation, maintenance and
manipulation of a database.

Popular DBMS Software

 My SQL
 Oracle
 IBM DB2
 SQLite
Advantages of DBMS:

Reduction of redundancies: Centralized control of data by the DBA avoids unnecessary


duplication of data and effectively reduces the total amount of data storage required avoiding
duplication in the elimination of the inconsistencies that tend to be present in redundant data
files.

Concurrent Access: A DBMS schedules concurrent access to the data in such manner that users
can think of the data being accessed by only one user at a time.

Ex: Railway reservation

Data Security: The DBA who has the ultimate responsibility for the data in the DBMS can
ensure that proper access procedures are followed including proper authentication schemas for
access to the DBS and additional check before permitting access to sensitive data.

Backup and Recovery: It provides backup and recovery subsystems which create automatic
backup of data, from hardware and software failures and restores the data if required.

Data Independence: Ina database system, the database management system provides the
interface between the application programs and the data. When changes are made to the data
representation, the meta data obtained by the DBMS is changed but the DBMS is continues to
provide the data to application program in the previously used way. The DBMs handles the task
of transformation of data wherever necessary.

Applications of DBMS:

Library Management System: There are thousands of books in the library so it is very difficult
to keep record of all the books in a copy or register. So DBMS used to maintain all the
information relate to book issue dates, name of the book, author and availability of the book.
Banking: For customer information, account activities, payments, deposits, loans, etc.
Airlines: For reservations and schedule information.
Universities: For to store the information about all instructors, students, departments, and course
offerings, colleges, grades.
Telecommunication: It helps to keep call records, monthly bills, maintaining balances, etc.
Manufacturing: It is used for the management of supply chain and for tracking production of
items.
For example distribution centre should keep a track of the product units that supplied into the
centre as well as the products that got delivered out from the distribution centre on each day.
HR Management: For information about employees, salaries, payroll, deduction, generation of
paychecks, etc.
Online Shopping:
Online shopping has become a big trend of these days. No one wants to go to shops and waste
his time. Everyone wants to shop from home. So all these products are added and sold only with
the help of DBMS. Purchase information, invoice bills and payment, all of these are done with
the help of DBMS.

History of DBMS

1960’s:
From the earliest days of computers storing and manipulating data have been a major application
focus. The first general purpose DBMS was designed by Charles Bachman at General Electric
in the early 1960s, called Integrated Data Store. It formed the basis for network data model,
which was standardized by the Conference on Data Systems Languages(CODASYL) type of
DBMS, the hierarchical DBMS.

In the late 1960’s IBM developed the Information Management System(IMS) DBMS, used even
today in many major illustrations. IMS formed as basis for an alternative data representation
framework called the hierarchical data model. The SABRE system for making airline
reservations was jointly developed by American Airlines and IBM at the same time and allows
the people to access the same data through computer network.

In 1970’s, Edger codd, at IBM, proposed a new data representation frame work called the
relational model: It sparked the rapid development of several DBMS’s based on the relational
model.

In 1980’s, the relational model consolidated this position as the dominant DBMS paradigm and
database systems continued to gain widespread use. The SQL query language for relational
databases developed at IBM. SQL was standardized in the late 1980’s and current standard
SQL: 1999, was adopted by ANSI and ISO.

In the late1980’s and 1990’s advances were made in many areas of database systems. Several
vendors (DB2, Oracle 8) extended their systems with ability to store new data types such as
images and text. Specialized systems have been developed by numerous for creating data
warehouses, consolidating data from several databases, and for carrying specialized analysis.

An interesting phenomenon is emergence of several enterprise resource planning (ERP) and


Management resource planning (MRP) packages (Oracle, SAP, People Soft, Siebel) add a
substantial layer of application-oriented features on top of a DBMS.

The Internet Age has perhaps influenced the data models much more. Data models were developed
using object oriented programming features, embedding with scripting languages like Hyper Text
Markup Language (HTML) for queries. With humongous data being available online, DBMS is
gaining more importance as more data is brought online and made over more accessible through
computer networking.

In the early years of computing, punch card was used in unit record machines for input, data
storage and processing this data. Data was entered offline and for both data, and computer
programs input. This input method is similar to voting machines now a days. This was the only
method, where it was fast to enter data, and retrieve it, but not to manipulate or edit it.
After that era, there was the introduction of the file type entries for data, then the DBMS as
hierarchical, network, and relational.

Components of DBMS
The database management system can be divided into five major components, they are:
1. Hardware
2. Software
3. Data
1. Procedures
2. Database Access Language

Fig: Components of DBMS

Hardware:
When we say Hardware, we mean computer, hard disks, I/O channels for data, and any other
physical component involved before any data is successfully stored into the memory.
When we run Oracle or MySQL on our personal computer, then our computer's Hard Disk, our
Keyboard using which we type in all the commands, our computer's RAM, ROM all become a
part of the DBMS hardware.
Software
This is the main component, as this is the program which controls everything. The DBMS
software is more like a wrapper around the physical database, which provides us with an easy-to-
use interface to store, access and update data.
The DBMS software is capable of understanding the Database Access Language and intrepret it
into actual database commands to execute them on the DB.

Data:
Data is that resource, for which DBMS was designed. The motive behind the creation of DBMS
was to store and utilize data.
In a typical Database, the user saved Data is present and meta data is stored.
Metadata is data about the data. This is information stored by the DBMS to better understand
the data stored in it.
For example: When I store my Name in a database, the DBMS will store when the name was
stored in the database, what is the size of the name, is it stored as related data to some other data,
or is it independent, all this information is metadata.
Procedures:
Procedures refer to general instructions to use a database management system. This includes
procedures to setup and install a DBMS, To login and logout of DBMS software, to manage
databases, to take backups, generating reports etc.
Database Access Language
Database Access Language is a simple language designed to write commands to access, insert,
update and delete data stored in any database.
A user can write commands in the Database Access Language and submit it to the DBMS for
execution, which is then translated and executed by the DBMS, can create new databases, tables,
insert data, fetch stored data, update data and delete the data using the access language.
Users

Database Administrators: Database Administrator or DBA is the one who manages the
complete database management system. DBA takes care of the security of the DBMS, it's
availability, managing the license keys, managing user accounts and access etc.

Application Programmer or Software Developer: This user group is involved in developing


and designing the parts of DBMS.

End User: These days all the modern applications, web or mobile, store user data. How do you
think they do it? Yes, applications are programmed in such a way that they collect user data and
store the data on DBMS systems running on their server. End users are the one who store,
retrieve, update and delete data.
Data Model:
Data Model is the modeling of the data description, data semantics, and consistency constraints
of the data. It provides the conceptual tools for describing the design of a database at each level
of data abstraction. Data models are used for to describe the structure of the database.

Hierarchical Model

In a Hierarchical database, model data is organized in a tree-like structure. The hierarchy starts
from the Root data, and expands like a tree, adding child nodes to the parent nodes. Data is
represented using a parent-child relationship. In Hierarchical DBMS parent may have many
children, but children have only one parent.

This model efficiently describes many real-world relationships like index of a book, recipes etc.
In hierarchical model, data is organized into tree-like structure with one one-to-many
relationship between two different types of data, for example, one department can have many
courses, many professors and of-course many students.

Network Model
The network database model allows each child to have multiple parents. It helps you to address
the need to model more complex relationships like as the orders/parts many-to-many
relationship. In this model, entities are organized in a graph which can be accessed through
several paths.
Relational model
Relational DBMS is the most widely used DBMS model because it is one of the easiest. This
model is based on normalizing data in the rows and columns of the tables. Relational model
stored in fixed structures and manipulated using SQL.
Ex: an instance of a student relation.

Sid Name Age Gpa

1201 Suresh 20 3.7

1202 Ramesh 21 3.6

1203 Maesh 19 3.6

1204 Naresh 20 3.8

1205 Haresh 21 3.7

The above relation contains four attributes sid, name, age,gpa and five tuples or records or rows.

Object-Oriented Model
In Object-oriented Model data stored in the form of objects. The structure which is called classes
which display data within it. It defines a database as a collection of objects which stores both
data members values and operations.

Entity Relationship Model: The entity-relationship (E-R) data model uses a collection of basic
objects, called entities, and relationships among these objects. An entity is a “thing” or “object”
in the real world that is distinguishable from other objects. A relationship is an association
among several entities. For example, a depositor relationship associates a customer with each
account that she has

Figure: A sample E-R diagram


View of Data:

A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data. A major purpose of a database system is to provide users with an
abstract view of the data. That is, the system hides certain details of how the data are stored and
maintained.

Data Abstraction

Database systems are made-up of complex data structures. To ease the user interaction with
database, the developers hide internal irrelevant details from users. This process of hiding
irrelevant details from user is called data abstraction.

A data definition language (DDL) is used to define the external and conceptual schemas

The data in a DBMS is described at three levels of abstraction

Physical level: This is the lowest level of data abstraction. It the physical schema summarizes
how the relations described in the conceptual schema are actually stored on secondary storage
devices such as disks and tapes.
The access methods like sequential or random access and file organization methods like B+
trees, hashing used for the same.
Suppose we need to store the details of an employee. Blocks of storage and the amount of
memory used for these purposes is kept hidden from the user.

Conceptual level: The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The conceptual schema describes all
relations that are stored in the database. In our sample university database, these relations contain
information about entities, such as students and faculty, and about relationships, such as
students’ enrollment in courses.

Conceptual schema for university describes


Students(sid: string, name: string, login: string,age: integer, gpa: real)
Faculty(fid: string, fname: string, sal: real) Courses(cid: string, cname: string, credits: integer)
Rooms(rno: integer, address: string, capacity: integer)
Enrolled(sid: string, cid: string, grade: string)
Teaches(fid: string, cid: string)
Meets In(cid: string, rno: integer, time: string)
Fig: Levels of data abstraction

View level:
External schema allows data access to be customized (and authorized) at the level of individual
users or groups of users. Any given database has exactly one conceptual schema and one
physical schema because it has just one set of stored relations, but it may have several external
schemas. Each external schema consists of a collection of one or more views and relations from
the conceptual schema. A view is conceptually a relation, but the records in a view are not stored
in the DBMS, they are computed using a definition for the view, in terms of relations stored in
the DBMS.
For example, we might want to allow students to find out the names of faculty members teaching
courses, as well as course enrollments.
This can be done by defining the following view: Courseinfo (cid: string, fname: string,
enrollment: integer)
A user can treat a view just like a relation and ask questions about the records in the view. Even
though the records in the view are not stored explicitly, they are computed as needed. We did not
include Courseinfo in the conceptual schema because we can compute Courseinfo from the
relations in the conceptual schema, and to store it in addition would be redundant. Such
redundancy, in addition to the wasted space, could lead to inconsistencies.
Data Independence:
A very important advantage of using a DBMS is that it offers data independence. The ability to
modify schema definition in one level without affecting schema of that definition in the next
higher level is called data independence. Data independence is achieved through use of the three
levels of data abstraction;
There are two levels of data independence; they are Physical data independence and Logical data
independence.
Physical data independence is the ability to modify the physical schema without causing
application programs to be rewritten. Modifications at the physical level are occasionally
necessary to improve performance. It means we change the physical storage/level without
affecting the conceptual or external view of the data.
Logical data independence is the ability to modify the logical schema without causing
application program to be rewritten. Modifications at the logical level are necessary whenever
the logical structure of the database is altered.
For example, suppose that the Faculty relation in our university database is replaced by the
following two relations:
Faculty public(fid: string, fname: string, office: integer)
Faculty private(fid: string, sal: real)
The Courseinfo view relation can be redefined in terms of Faculty public and Faculty private,
which together contain all the information in Faculty, so that a user who queries Courseinfo will
get the same answers as before.

Fig: Data Independence


Instance and schema:
DBMS Schema Definition of schema:
Design of a database is called the schema. Schema is of three types: Physical schema, logical
schema and view schema.
For example: In the following diagram, we have a schema that shows the relationship between
three tables: Course, Student and Section. The diagram only shows the design of the database, it
doesn’t show the data present in those tables. Schema is only a structural view(design) of a
database as shown in the diagram below.

The design of a database at physical level is called physical schema, how the data stored in
blocks of storage is described at this level.
Design of database at logical level is called logical schema, programmers and database
administrators work at this level, at this level data can be described as certain types of data
records gets stored in data structures, however the internal details such as implementation of data
structure is hidden at this level (available at physical level).
Design of database at view level is called view schema. This generally describes end user
interaction with database systems.
DBMS Instance
Definition of instance: The data stored in database at a particular moment of time is called
instance of database. Database schema defines the variable declarations in tables that belong to a
particular database; the value of these variables at a moment of time is called the instance of that
database.

For example, lets say we have a single table student in the database, today the table has 100
records, so today the instance of the database has 100 records. Lets say we are going to add
another 100 records in this table by tomorrow so the instance of database tomorrow will have
200 records in table. In short, at a particular moment the data stored in database is called the
instance, that changes over time when we add or delete data from the database.
Database Architecture
The architecture of a database system is greatly influenced by the underlying computer system
on which the database is running. The Database systems can be Centralized, Client-server,
Parallel (multi-processor), Distributed, the architecture database is divided into two types: Two-
tier architecture and Three-tier architecture.
Two-tier architecture: In Two-tier architecture the application resides at the client machine,
where it invokes database system functionality at the server machine through query language
statements. Application program interface standards like ODBC and JDBC are used for
interaction between the client and the server.
Example: Transaction performed by the bank employee after filling the deposit form by the
customer. Another example is railway reservation made by the employee in reservation counter.
Three-tier architecture: In three-tier architecture, the client machine acts as merely a front end
and does not contain any direct database calls. Instead, the client end communicates with an
application server, usually through a forms interface. The application server in turn
communicates with a database system to access data. The business logic of the application,
which says what actions to carry out under what conditions, is embedded in the application
server, instead of being distributed across multiple clients. Three-tier applications are more
appropriate for large applications, and for applications that run on the Worldwide Web.
Example: Amount transferred by the user through application program (Net banking) without
physically going to the bank. Another example is railway reservation made by the user from his
location without going to the reservation counter.
Three tier architecture is provides scalability when more no of clients or users interact with the
database.

Fig: Two-tier and three tier architectures


STRUCTURE OF A DBMS
Database Management System (DBMS) is software that allows access to data stored in a
database and provides an easy and effective method of defining, Storing, Manipulating,
Protecting information from system crashes or data theft, and differentiating access
permissions for different users.
The database system is divided into three components: Query Processor, Storage Manager,
and Disk Storage.

Fig: Structure of a DBMS


Database Users and User Interfaces
There are four different types of database-system users, differentiated by the way they expect to
interact with the system. Different types of user interfaces have been designed for the different
types of users.

Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written by the application programmer previously.

For example, a user who wishes to find her account balance over the World Wide Web. Such a
user may access a form, where she enters her account number. An application program at the
Web server then retrieves the account balance, using the given account number, and passes this
information back to the user.

The typical user interface for naive users is a forms interface, where the user can fill in
appropriate fields of the form. Naive users may also simply read reports generated from the
database.
Application programmers are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces. Rapid
application development (RAD) tools are tools that enable an application programmer to
construct forms and reports without writing a program.
There are also special types of programming languages that combine imperative control
structures (for example, for loops, while loops and if-then-else statements) with statements of the
data manipulation language. These languages, sometimes called fourth-generation languages,
often include special features to facilitate the generation of forms and the display of data on the
screen. Most major commercial database systems include a fourth generation language.

Sophisticated users interact with the system without writing programs. Instead, they form their
requests in a database query language. They submit each such query to a query processor, whose
function is to break down DML statements into instructions that the storage manager
understands. Analysts who submit queries to explore data in the database fall in this category.

Online analytical processing (OLAP) tools simplify analysts’ tasks by letting them view
summaries of data in different ways. For instance, an analyst can see total sales by region (for
example, North, South, East, and West), or by product, or by a combination of region and
product (that is, total sales of each product in each region). The tools also permit the analyst to
select specific regions, look at data in more detail (for example, sales by city within a region) or
look at the data in less detail (for example, aggregate products together by category). Another
class of tools for analysts is data mining tools, which help them find certain kinds of patterns in
data.
Sophisticated users who write specialized database applications that do not fit into the traditional
data processing framework. Among these applications are computer-aided design systems,
knowledge- base and expert systems, systems that store data with complex data types (for
example, graphics data and audio data), and environment-modeling systems.
Chapters 8 and 9 cover several of these applications.
Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is called
a database administrator (DBA).

The functions of a DBA include:

Schema definition: The DBA creates the original database schema by executing a set of data
definition statements in the DDL.

Schema and physical-organization modification: The DBA carries out changes to the schema
and physical organization to reflect the changing needs of the organization, or to alter the
physical organization to improve performance.

Granting of authorization for data access: By granting different types of authorization, the
database administrator can regulate which parts of the data base various users can access. The
authorization information is kept in a special system structure that the database system consults
whenever someone attempts to access the data in the system.

Routine maintenance:

 Periodically backing up the database, either onto tapes or onto remote servers, to prevent
loss of data in case of disasters such as flooding.
 Ensuring that enough free disk space is available for normal operations, and upgrading
disk space as required.
 Monitoring jobs running on the database and ensuring that performance is not degraded
by very expensive tasks submitted by some users.

Storage Manager

A storage manager is a program module that provides the interface between the low level data
stored in the database and the application programs and queries submitted to the system. The
storage manager is responsible for the interaction with the file manager. The raw data are stored
on the disk using the file system, which is usually provided by a conventional operating system.
The storage manager translates the various DML statements into low-level file-system
commands. Thus, the storage manager is responsible for storing, retrieving, and updating data in
the database.

The storage manager components include:

Authorization and integrity manager: Tests for the satisfaction of integrity constraints and
checks the authority of users to access data.

Transaction manager: Ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
File manager: Manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.

Buffer manager: It is responsible for fetching data from disk storage into main memory, and
deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than the
size of main memory.

Data structures implemented by storage manager.

Data files: Stored in the database itself.

Data dictionary: Stores metadata about the structure of the database.

Indices: Provide fast access to data items.

The Query Processor:

The query processor components include


DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands. A query can
usually be translated into any of a number of alternative evaluation plans that all give the same
result. The DML compiler also performs query optimization, that is, it picks the lowest cost
evaluation plan from among the alternatives.

Query evaluation engine, which executes low-level instructions generated by the DML compiler.
Introduction to Database Design

Database Design: The database design process can be divided into six steps.

Requirements Analysis: The very first step in designing a database application is to understand
what data is to be stored in the database, what applications must be built on top of it, and what
operations are most frequent and subject to performance requirements. In other words, we must
find out what the users want from the database.

This is usually an informal process that involves discussions with user groups, a study of the
current operating environment and how it is expected to change, analysis of any available
documentation on existing applications that are expected to be replaced or complemented by the
database, and so on. Several methodologies have been proposed for organizing and presenting
the information gathered in this step, and some automated tools have been developed to support
this process.

Conceptual Database Design: The information gathered in the requirements analysis step is
used to develop a high-level description of the data to be stored in the database, along with the
constraints that are known to hold over this data. This step is often carried out using the ER
model, or a similar high-level data model, and is discussed in the rest of this chapter.

Logical Database Design: We must choose a DBMS to implement our database design, and
convert the conceptual database design into a database schema in the data model of the chosen
DBMS. We will only consider relational DBMSs, and therefore, the task in the logical design
step is to convert an ER schema into a relational database schema.

Schema Refinement: The fourth step in database design is to analyze the collection of relations
in our relational database schema to identify potential problems, and to refine it. In contrast to
the requirements analysis and conceptual design steps, which are essentially subjective, schema
refinement can be guided by some elegant and powerful theory.

Physical Database Design: In this step we must consider typical expected workloads that our
database must support and further refine the database design to ensure that it meets desired
performance criteria. This step may simply involve building indexes on some tables and
clustering some tables, or it may involve a substantial redesign of parts of the database schema
obtained from the earlier design steps.

Security Design: In this step, we identify different user groups and different roles played by
various users (e.g., the development team for a product, the customer support representatives, the
product manager). For each role and user group, we must identify the parts of the database that
they must be able to access and the parts of the database that they should not be allowed to
access, and take steps to ensure that they can access only the necessary parts.
Entity-Relationship Model:

An Entity–relationship model (ER model) describes the structure of a database with the help of
a diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER model is a
design or blueprint of a database that can later be implemented as a database.

The entity-relationship (E-R) data model perceives the real world as consisting of basic objects,
called entities, and relationships among these objects. It develops a conceptual design for the
database.

The E-R data model employs three basic notions: entity sets, relationship sets, and attributes.
For example, Suppose we design a school database. In this database, the student will be an
entity with attributes like address, name, id, age, etc.

Fig: ER Model

Entity Sets:
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects.
An entity has a set of properties, and the values for some set of properties may uniquely identify
an entity.
An entity set is a set of entities of the same type that share the same properties, or attributes. An
entity set is represented by a rectangle, and an attribute is represented by an oval.
For example, the Employees entity set could use name, social security number (ssn), and parking
lot (lot) as attributes.

name

ssn lot

Employees

Figure: Employees Entity Set


Weak Entity Type:

An entity which doesn’t have its key attribute, it is uniquely identified by the key attribute of
another entity set is called weak entity. It can be represented by using double rectangle.

For example: weak entity transactions is identified using ATM ID in ATM Transactions.

Fig: Weak entity

Simple and composite:

The attributes have been simple; that is, they are not divided into subparts. Composite attributes,
on the other hand, can be divided into subparts (that is, other attributes).

For example, an attribute address could be structured as a composite attribute consisting of


str_address, city, pincode.

Fig: Simple and composite:


Single-valued and multi valued attributes:

The attributes in our examples all have a single value for a particular entity. For instance, the
CustID attribute for a specific customer entity refers to only one customer. Such attributes are
said to be single valued.

There may be instances where an attribute has a set of values for a specific entity. Consider an
customer entity set with the attribute Custphone. A customer may have more than one phone
numbers. This type of attribute is said to be multi valued. The Multi valued attribute is modeled
in ER using double circle

Fig: Multi Valued Attribute

Derived attribute: The value for this type of attribute can be derived from the values of other
related attributes or entities. Derived attribute is modeled in ER using dotted ellipse.

Suppose that the student entity set has an attribute age, which indicates the students’s age. If the
student entity set also has an attribute date-of-birth, we can calculate age from date-of-birth, age
is a derived attribute.

Fig: Derived attribute


Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a primary
key. The key attribute is represented by an ellipse with the text underlined.
Example: in student entity set id is the key attribute.

Relationship Sets:

A relationship is an association among several entities. A relationship set is a set of relationships


of the same type. Diamond or rhombus is used to represent the relationship.

The relationship set Works In, in which each relationship indicates a department in which an
employee works. Note that several relationship sets might involve the same entity sets. For
example, we could also have a Manages relationship set involving Employees and Departments.

Figure: Works in Relationship Set

A relationship can also have descriptive attributes. Descriptive attributes are used to record
information about the relationship, rather than about any one of the par- ticipating entities; for
example, we may wish to record that Attishoo works in the pharmacy department as of January
1991. This information is captured by adding an attribute, since, to Works In.

A relationship must be uniquely identified by the participating entities, without reference to


the descriptive attributes. In the Works In relationship set, for example, each Works In
relationship must be uniquely identified by the combination of employee ssn and department
did. Thus, for a given employee-department pair, we cannot have more than one associated
since value.
Degree of a relationship set:
The number of different entity sets participating in a relationship set is called as degree of a
relationship set.
Unary Relationship:
When there is only one entity set participating in a relation, the relationship is called as unary
relationship. For example, one person is married to only one person.

Binary Relationship:
When there are two entities set participating in a relation, the relationship is called as binary
relationship.For example, Student is enrolled in Course.

Ternary Relationship:

When there are 3 entities set participating in a relation, the relationship is called as Ternary

Relationship. In this example three entity sets participates in a relationship works_in3.

Fig: A Ternary Relationship Set


Degree of a relationship (Cordiality):

The number of times an entity of an entity set participates in a relationship set is known as
cardinality. Cardinality can be of different types:

1. One to One Relationship


When a single instance of an entity is associated with a single instance of another entity then it is
called one to one relationship. For example, Let us assume that a male can marry to one female
and a female can marry to one male. So the relationship will be one to one.

Fig: One to One Relationship

Using Sets, it can be represented as:

2. One to-Many Relationship


When a single instance of an entity is associated with more than one instances of another entity
then it is called one to many relationship. For example – a customer can place many orders but a
order cannot be placed by many customers.

Using Sets, it can be represented as:


3. Many to One Relationship
When more than one instances of an entity is associated with a single instance of another entity
then it is called many to one relationship. For example – many students can study in a single
college but a student cannot study in many colleges at the same time.

Using Sets, it can be represented as:

4. Many to-Many Relationship


When more than one instances of an entity is associated with more than one instances of another
entity then it is called many to many relationship. For example, a student can enrolled in multiple
courses and a course can be enrolled by many students.

Using Sets, it can be represented as:


Additional features of ER-Model:
Some of the constructs in the ER model us to describe some fine properties of data.

1. Key constraints:
The values of the attribute, values of an entity such that uniquely identify the entity.
Keys also help to identify relationship uniquely, thus distinguish relationship from each
other.
For example consider the relationship set called manages between the employees and
department’s entity sets such that each department has at most one manager, although
single employee is allowed to manage more than one department. The restriction each
department has at most one manager is the key constraint. This constraint represented in
ER diagram using arrow from Departments to Manages. Here ssn is key attribute of
employees entity set, did is key attribute of departments entity set

ssn
name since did dname budget
lot

Manages Departments
Employees

Fig: Key constraints on Manages

Participation Constraint:
Participation Constraint is applied on the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship. If
each student must enroll in a course, the participation of student will be total. Total
participation is shown by double line in ER diagram.
2. Partial Participation – The entity in the entity set may or may not participate in the
relationship. If some courses are not enrolled by any of the student, the participation of
course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total
participation and Course Entity set having partial participation.
Weak entities:
An entity can be identified uniquely only by considering some of its attributes in conjunction
with the primary key of another entity is called weak entity.
It must hold the following restrictions
The owner entity set and the weak entity set must participate in one to-many relationship set.
This relationship set is called identifying relationship set of the weak entity set. The weak entity
set must have the total participation in the identifying relationship set.

For example a dependents entity can be identified uniquely only if we take the key of the owing
entity employees. Here the relationship between dependents and policy is total, it indicates that
each dependent entity is appears in at most one policy relationship

pname age
ssn lot cost
name

Employees Policy Dependents

Fig: Departments weak entity set

Class Hierarchies:
Class hierarchies are used to classify the entities of entity set into sub classes. It represented
using ISA (is a) relationship and can be viewed in two ways.
Generalization: Generalization is the process of extracting common properties from a set of
entities and creates a generalized entity from it. It is a bottom-up approach in which two or more
entities can be generalized to a higher level entity if they have some attributes in common. For
Example,
STUDENT and FACULTY can be generalized to a higher level entity called PERSON as shown
in Figure 1. In this case, common attributes like P_NAME, P_ADD become part of higher entity
(PERSON) and specialized attributes like S_FEE become part of specialized entity (STUDENT).

Specialization: In specialization, an entity is divided into sub-entities based on their


characteristics. It is a top-down approach where higher level entity is specialized into two or
more lower level entities. For Example, EMPLOYEE entity in an Employee management system
can be specialized into DEVELOPER, TESTER etc. as shown in Figure 2. In this case, common
attributes like E_NAME, E_SAL etc. become part of higher entity (EMPLOYEE) and
specialized attributes like TES_TYPE become part of specialized entity (TESTER).

Aggregation: Aggregation allows us to indicate that the relationship set participates in another
relationship set. An ER diagram is not capable of representing relationship between an entity and
a relationship which may be required in some scenarios. In those cases, a relationship with its
corresponding entities is aggregated into a higher level entity.
For Example, Employee working for a project may require some machinery. So, REQUIRE
relationship is needed between relationship WORKS_FOR and entity MACHINERY. Using
aggregation, WORKS_FOR relationship with its entities EMPLOYEE and PROJECT is
aggregated into single entity and relationship REQUIRES is created between aggregated entity
and MACHINERY.
Example2: suppose that we have entity set called projects and each project entity is sponsored
by one or more departments. A department that sponsors a project might assign employees to
monitor the sponsorship.
Ssn name lot

Employees

Monitors until

dname budget
since
did
Started_o
pid pbudget
n

Projects sponsors Departments

Fig: Aggregation
Conceptual Design with the ER Model:

Developing and ER diagram presents several choices

1. Should a concept be modeled as an entity or an attribute?

2. Should a concept be modeled as an entity or relationship?

3. What are the relationship sets and their participating entity sets? Should we use binary or
ternary relationships?

4. Should we use aggregation?

A concept can be modeled as an entity or an attribute:

Sometimes it is not clear whether a property should be modeled as an attribute or an entity set.
We have to record only one value; it can be modeled as attribute. Modeled as Entity set, we
have to record more than one value and we want to capture the structure of property in ER
model.

For example: to model a concept as an entity set rather than an attribute an attribute. Employees
works for a department, it records the interval during an employee works for a department.

did dname
name to e budget
Ssn lot from

Employees Works_in Departments

Fig: Works_In relationship set

Suppose an employee works a department more than one period, the problem is that we want to
record several values for descriptive attributes for each instance of the works_in relationship.
We could define duration as entity set in association with works_in relationship.
did name budget

Ssn name
lot

Departments
Employees Works_in

from Duration to

Fig: Works_In relationship set

A concept can be modeled as an entity or relationship:

Consider the relationship manages suppose each department manager is given a discretionary
budget.

ssn dname budget


name did
lot
since dbudget

Employees Manages Departments

Fig: Entity versus relationships

If the discretionary budget is sum that covers all departments managed by employee, in this case
manages relationship that involves a given employee will have in the dbudget field. It leading to
redundant storage of same information.

This problem can be addressed by adding a new entity set called managers is a hierarchy with
employees entity set, with dbudget is an attribute of manager, and since is an attribute of
manages relationship.
dname
ssn did
name lot budget
since

Employees Manages Departments

ISA

Manager dbudget

Fig: Entity versus relationships

A concept can be modeled as binary or ternary relationships:

Consider the ER diagram an employee can own several policies, each policy can be owned by
several employees, and each dependent covered by several policies, and dependents entity must
be deleted if the owning employees entity is deleted.

ssn lot pname age


name

Employees covers Dependents

pid policy cost

Fig: policy as entity set

Suppose we have the following requirements:


 A policy cannot be owned jointly by two or more employees.
 Every policy must be owned by some employee.
 Dependent is a week entity set, and each dependent entity is uniquely identified by taking
pname in conjunction with policyid of policy entity.
ssn pname age
name lot

Employees Dependents
Benficiary

purchaser

pid policy cost

Fig: policy revisited

A concept can be modeled as ternary or Aggregation:

We can model Concept a project can be sponsored by any number of number of departments, a
department can sponsor one or more projects, and each sponsorship is monitored by one or more
employees using aggregation.

Ssn name lot

Employees

Monitors until

Started_on

pid pbudget dname


did budget
since

Departments
Projects sponsors

Fig: Aggregation
If we don’t need to record the until attribute of monitors, i.e we can’t express the constraint each
sponsorship is monitored by at most one employee, then we might use ternary relationship.

ssn name lot

Employees

dname
did
Started_on budget
pbudget
pid

Projects sponsors Departments

Fig: using a ternary relationship instead of aggregation.

ER Diagrams to practice:
ER diagram of Bank Management System

ER diagram is known as Entity-Relationship diagram. It is used to analyze to structure of the


Database. It shows relationships between entities and their attributes. An ER model provides a
means of communication.
ER diagram of Bank has the following description :
 Bank have Customer.
 Banks are identified by a name, code, address of main office.
 Banks have branches.
 Branches are identified by a branch_no., branch_name, address.
 Customers are identified by name, cust-id, phone number, address.
 Customer can have one or more accounts.
 Accounts are identified by acc_no., acc_type, balance.
 Customer can avail loans.
 Loans are identified by loan_id, loan_type and amount.
 Account and loans are related to bank’s branch.
ER diagram for Library Management system
ER diagram for online shopping portal:
Unit-II

Introduction to the Relational Model: Integrity constraint over relations, enforcing integrity
constraints, querying relational data, logical data base design, introduction to views,
destroying/altering tables and views.
Relational Algebra, Tuple relational Calculus, Domain relational calculus

Relational Model:
The Relational Model was developed by E. F. Codd for IBM in 1970 to model data in the form
of relations or tables
Relational Model represents how data is stored in relational databases. A relational database
stores data in the form of Relations (Tables).
After designing the conceptual model of database using ER we need to convert the conceptual
model in the relational model which can be implemented using any RDBMS languages.
RDBMS Languages like SQL Server, My SQL, and Oracle etc.
RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and
for all modern database systems like My SQL, SQL Server, IBM DB2, and Microsoft Access. It
is based on relational model.
Ex: student relation consists of attributes (sid, name, age, phone)

S_id name age Phone. no


1201 ramesh 19 9999999999
1202 suresh 20 8888888888
1203 hareesh 18 7777777777
1204 afreena 20 9848002310

A relation model can be represented as a relation or table, each table consists of rows and
columns.
Attribute: column of a relation is called as attribute or field.
Ex: S_id, name, age, phone.
Tuple: Each row of a table is called as record or tuple.

Domain: Each attribute has domain, it specifies type and range of values allowed to store in the
attribute

Ex: Student (S-Id numeric(10), name char(10), age number(2), phone. No number(10))
The main construct for representing data in relational model is relation or table. A relation
consists of relation schema and relation instance. Relation instance is a table, that is set of tuples
or records, and relation schema describes the attribute or column name and domain of a relation.

Relation Schema: It specifies the relation’s name, the name of each fteld (or column, or
attribute), and the domain of each field. A domain is referred to in a relation schema by the
domain name and has a set of associated values.

Relation schema of student information in a university database:


Students(sid: string, name: string, age: integer, gpa: real)

This says, for instance, that the field named sid has a domain named string. The set of values
associated with domain string is the set of all character strings.
Relation instance: Instance of a relation is a set of tuples, also called records, in which each
tuple has the same number of fields as the relation schema. A relation instance can be thought of
as a table in which each tuple is a row, and all rows have the same number of fields.
Ex: an instance of a student relation.
Sid Name Age Gpa
1201 Suresh 20 3.7
1202 Ramesh 21 3.6
1203 Maesh 19 3.6
1204 Naresh 20 3.8
1205 Haresh 21 3.7
The above relation contains four attributes sid, name, age,gpa and five tuples or records or rows.
Degree of a relation: number of attributes of a relation is called degree of a relation. Degree of
the above student relation is 4.
Cardinality of a relation: number of tuples or records or rows of a relation is called cardinality
of relation, the above student relation is 5.
Column: the set of values of an attribute.
Null Value: Un Known or un available value is called null value.
Key attribute: is an attribute which is used for unique identification of each row of a table.
Properties of a Relation:
1. The relation has name that should be distinct from all names in the relation schema.
2. Each cell of relation should contain only one value.
3. Each attribute of a relation should be distinct.
4. Each tuple of relation should be distinct.
5. The order of attribute has no significance.
SQL: SQL (structured query language) is the most widely used language for creating,
manipulating the relation in relational database. It was developed by ANSI in 1986.
SQL uses these database languages read, store and update the data in the database.

1. Data Definition Language (DDL)


DDL is used to define database structure or pattern. It is used to create schema, tables,
indexes, constraints, etc. in the database. DDL statements are used to create structure of the
database. Data definition language is used to store the information of metadata like the number
of tables and schemas, their names, indexes, columns in each table, constraints, etc.
DDL commands are:

 CREATE statement or command is used to create a new database. In structured query


language the create command creates an object in a relational database management
system. The commonly used create command is as follows
Syntax:
CREATE TABLE TABLE_NAME (COLUMN_NAME DATATYPES[,....]);
Example:
CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Email VARCHAR2(100), DO
B DATE);

 DROP statement destroys or deletes database or table. In structured query language, it


also deletes an object from relational database management system. Typically used
DROP statement is
DROP type of object name of object
Syntax
DROP TABLE ;
Example
DROP TABLE EMPLOYEE;

 ALTER The ALTER TABLE statement is used to add, delete, or modify columns, to add
and drop various constraints on in an existing table

To add a column in a table, use the following syntax:

ALTER TABLE table_name


ADD column_name datatype;

To delete a column in a table, use the following syntax:


ALTER TABLE table_name
DROP COLUMN column_name;
To change the data type of a column in a table, use the following syntax:

ALTER TABLE table_name


MODIFY COLUMN column_name datatype;

SQL statement to change the data type of the column named "DateOfBirth" in the
"Persons" table is

SQL statement:

ALTER TABLE Persons


MODIFY COLUMN DateOfBirth year;

 TRUNCATE: It is used to delete all the rows from the table and free the space
containing the table.
Syntax:
TRUNCATE TABLE table_name;
Example:
TRUNCATE TABLE EMPLOYEE;

 RENAME statement is used to rename a database. It’s statement is as follows


RENAME TABLE old name of table to new name of table
Syntax:
ALTER TABLE table_name RENAME TO new_table_name;
Columns can be also be given new name with the use of ALTER TABLE.
ALTER TABLE table_name
RENAME COLUMN old_name TO new_name;

Data Manipulation Language (DML)


It is used for accessing and manipulating data in a database. Here are some commands that come
under DML:

 INSERT: The INSERT statement is a SQL query. It is used to insert data into the row of
a table.
Syntax:
INSERT INTO TABLE_NAME
(col1, col2, col3,.... col N)
VALUES (value1, value2, value3, .... valueN);
Or
INSERT INTO TABLE_NAME
VALUES (value1, value2, value3, .... valueN);
For example:
INSERT INTO javatpoint (Author, Subject) VALUES ("Sonoo", "DBMS");

 UPDATE: This command is used to update or modify the value of a column in the table.
Syntax:
UPDATE table_name SET [column_name1= value1,...column_nameN = valueN] [WHE
RE CONDITION]
For example:
UPDATE students
SET User_Name = 'Sonoo'
WHERE Student_Id = '3' ;

 DELETE: It is used to remove one or more row from a table.


Syntax:
DELETE FROM table_name [WHERE condition];
For example:
DELETE FROM javatpoint
WHERE Author="Sonoo";

 Select: The SQL SELECT command is used to fetch data from the MySQL database.
Syntax
SELECT field1, field2,...fieldN
FROM table_name1, table_name2...
[WHERE Clause]
[OFFSET M ][LIMIT N]

Ex:
SELECT * from tutorials_tbl
+-------------+----------------+-----------------+-----------------+
| tutorial_id | tutorial_title | tutorial_author | submission_date |
+-------------+----------------+-----------------+-----------------+
| 1 | Learn PHP | John Poul | 2007-05-21 |
| 2 | Learn MySQL | Abdul S | 2007-05-21 |
| 3 | JAVA Tutorial | Sanjay | 2007-05-21 |
+-------------+----------------+-----------------+-----------------+
Integrity constrains over relations:
An integrity constraint is a condition specified on a database schema and restricts the data that
can be stored in an instance of database.
If a database instance satisfies all the integrity constraints specified on the database schema then
it becomes a legal instance, it is used to maintain the quality of information.
DBMS enforces the integrity constraints to permit only the legal instances to be stored in the
database.
Integrity constraints are specified and enforced on database
1. When the DBA or end user defines a database schema.
2. When a database application is run, the DBMS checks for violation and disallow changes
to the data that violates the specified IC’s.
For example no two students have the same id value.

Key constraints:
A key constraint is a statement, a minimal set of attributes of a relation that uniquely identifies a
tuple of relation. A set of attributes that uniquely identifies a tuple in a relation is called a
candidate key for relation.A relation may have several candidate keys, a database designer can
identifies a primary key out of all candidate keys.
Unique key: it allows only unique value to store into the database it does not allows the
duplicate values to store in to the database.
Example: CREATE table StudentS (SID VARCHAR (10), NAME CHAR (10), AGE
INETEGER, EMAIL VARHAR(20) UNIQUE (EMAIL), SID);
SID NAME AGE EMAIL
101 Ramesh 19 ramesh@gmail.com
102 Suresh 20 sursresh@gmail.com
103 Mahesh 18 mahesh@gmail.com
104 Hareesh 19
EMAIL attribute is defined as unique key for the above relation; it may allow null values but
does not allow the duplicate values.
NOT NULL: Is a key constraint it does not allows the NULL values to store into a relation.
Example: Example: CREATE table Students (SID VARCHAR (10) , NAME CHAR (10) NOT
NULL, AGE INETEGER, EMAIL VARHAR(20) UNIQUE (EMAIL), CONSTRAINT
Students key PRIMARYKEY (SID));
SID NAME AGE EMAIL
101 Ramesh 19 ramesh@gmail.com
102 Suresh 20 sursresh@gmail.com
103 Mahesh 18 mahesh@gmail.com
104 Ramesh 19 Ramesh123@gmail.com

Primary key: is a key constraint it allows only unique and not null values to store into a relation.
It won’t allow the duplicate and null values to store into the relation.
Example: CREATE table Students (SID VARCHAR (10), NAME CHAR (10), AGE
INETEGER, EMAIL VARHAR(20) UNIQUE (EMAIL), CONSTRAINT Students key
PRIMARYKEY (SID));
SID NAME AGE EMAIL
101 Ramesh 19 ramesh@gmail.com
102 Suresh 20 sursresh@gmail.com
103 Mahesh 18 mahesh@gmail.com
104 Hareesh 19 Hareesh@gmail.com

SID is the primary key for the above relation; it does not allow the null values and duplicate
values.
Foreign key: foreign key is used to link two tables or relations. The information stored in one
relation is linked to the information stored in another relation. If one of the relation is modified
the other must be checked and modified to keep the database consistent. The key constraint
involving two relations is called foreign key constraint. The foreign key in referencing relation
must matches with primary key of referenced relation.
For example in addition student we have another relation called enrolled, to ensure that only
bonafied students can enroll in course. Any value appear in the SID field of an instance of
enrolled, should also appear in SID field of student relation.
Example: CREATE table Students (SID VARCHAR (10), NAME CHAR (10), AGE
INETEGER, EMAIL VARHAR(20) UNIQUE (EMAIL), CONSTRAINT Students key
PRIMARYKEY (SID));
Ex: CREATE table Enrolled (SID VARCHAR (10), CID CHAR (10), GRADE CHAR(10)
PRIMARY KEY (CID),
FOREIGN KEY (SID) REFERENCES STUDENTS);

CID GRADE SID SID NAME AGE EMAIL


C1 A 101 101 Ramesh 19 ramesh@gmail.com
C2 B 102 102 Suresh 20 sursresh@gmail.com
C3 A 103 103 Mahesh 18 mahesh@gmail.com
C4 C 104 104 Hareesh 19 Hareesh@gmail.com

Table: Enrolled Table: Students


If we try to insert SID in Enrolled, it validates the integrity constraint, SID in students relation it
does not allow a tuple to insert in enrolled relation if SID is not in Students relation and it does
not allows a tuple to delete from students relation.
Domain constraints: defines the valid set of values for an attribute. The type of domain includes
string, integer, char, varchar, time, date and range of values it is allowed to be stored.
EX: if we define an attribute SID int(5), it allows only integer with maximum length 5 to store,
it does not allows the char or string to store into the database.
General constraints: relational database supports the general constraints in the form table
constraints and assertions. Table constraint are associated with single table, although the
conditional expression in the CHECK clause can refer to other tables. Table constraints are
required to hold only if the associated table is nonempty. We can specify table constraints using
CHECK conditional expression.
Ex: to ensure that rating must be an integer in the range 1 to 10.
CREATE TABLE Sailors( sid INTEGER, sname CHAR(10), rating INTEGER, age REAL,
PRIMARY KEY (sid),
CHECK (rating >=1 AND rating <=10));
When a constraint involves two or more tables assertions are used.
Ex: CREATE TABLE Sailors(sid INTEGER, sname CHAR(10), rating INTEGER, age REAL,
PRIMARY KEY (sid),
CHECK(rating>=1 AND rating <=10)
CHECK((SELECT COUNT (s.sid) FROM Sailors S) + (SELECT COUNT (B.bid) FROM Boats
B) < 100));
Enforcing Integrity constraints:
Integrity constraints are specified when relation is created and enforced when relation is
modified. Every integrity constraints violation is generally checked at the end of sql statement
execution.
Ex: INSERT INTO Students (sid , name, age) VALUES (null, ‘mahesh’,20);
This insertion violates the constraint that the primary key cannot contain null.
Ex: UPDATE students S SET s.sid=1204 WHERE S.sid=1203;
This update violates the primary key constraint because there is already a tuple with sid 1204.
Referential integrity enforcement:
Insertion and deletion of tuples do not violate the integrity constraints.
Deletion of enrolled tuples do not violate the referential integrity, but insertions of enrolled
tuples could violate the integrity constraints.
Insertion of student tuples do not violate referential integrity, and deletion of students tuple could
cause violations.
SQL provides several alternative ways to handle foreign key violations:
1. If an enrolled row is inserted, with a sid that column value does not appear in any row of
students table. The INSERT command is simply rejected.
Ex: INSERT INTO Enrolled(cid,grade,sid) VALUES (‘dbms01’,ramesh,1271);
This insertion is illegal, because there is no students tuple with sid 1271.
2. If a students row is deleted
 Delete all enrolled rows that refer to the deleted student row
 Disallow the deletion of the student row if an enrolled row references to it.
3. If the primary key value of student row is updated
We can specify when a student row is deleted all enrolled rows refer to it are to be
deleted as well, but when the sid column of a student row is modified, this update is to be
rejected if an enrolled row refers to the modified students row.
CREATE TABLE Enrolled(sid char(20) cid char(20), grade char(10), PPRIMARY KEY
(sid,cid), FOREIGN KEY (sid) REFERENCES Students ON DELETE CASCADE ON
UPDATE NO ACTION);
The options are specified as part of foreign key declarations, the default option is NO
ACTION.
SQL also allows the use of null as the default value by specifying ON DELETE SET
NULL.
Transactions and constraints:
A program that runs against a database called a transaction, and it can contains several
statements that access the database. If a statement in a transaction violates the integrity
constraints then that statement is rejected.
To change a constraints validation granularity to the value in another column, we have to
explicitly declare a constraint as deferrable. The constraint deferrable allows a transaction to
differ validation until commit.
Consider the variants of following students and courses relations, every student is required to
have an honors course, and every course is required to have a grader, who is some student.
CREATE TABLE Students(sid char(20), name char(20), age INTEGER, honors CHAR(10)
NOT NULL, gpa REAL, PRIMARY KEY (SID), FOREIGN KEY (honors) REFERENCES
Courses (cid));
CREATE TABLE Courses( cid CHAR(10), cname CHAR(10), credits INTEGER, grader
CHAR(20) NOT NULL,
PRIMARY KEY(cid), FORIGN KEY (grader) REFERENCES Students (sid));
While inserting values into the above relations one cannot be inserted without the other. T he
only way to accomplish this insertion is to defer the constraint checking that would be normally
carried out at the end of an INSERT statement.
SQL allows constraint to be in DEFERRED or IMMEADIATE mode.
SET CONSTRAINT honors DEFERRED.
SET CONSTRAINT grader DEFERRED.
Logical database design: The ER model is convenient for representing in a high level database
design. Data organized in the database in the form of relations in relational model, Logical
database design describes how to convert an E-R Model to Relational Model.
Steps to convert an E-R Model to relational database schema or Relational Model
1. Entity sets to relations
2. Relationship sets (without constraints) to tables
3. Relationship set with key constraints into relation
4. Translating relationship sets with participation constraints
5. Translating weak entity sets into a relation
6. Translating class hierarchies into a relation
7. Translating ER diagram with aggregation into relation
1. Entity sets to relations :
An entity set in a ER diagram is mapped in to relation, each attribute of the entity set
becomes an attribute of the table by specifying the domain of each attribute and key of an
entity set. For example consider an Employee entity set with attributes ssn, name, and lot.

Ssn
name lot

Employees

Figure: The Employees Entity set


The employees entity set is represented in a table format .
ssn name Lot
123-22 suresh 48
321-21 ramesh 30
131-24 maheh 20
Fig: An instance of Entity set
The following SQL statement captures the preceding information, including the domain
constraints and key information:
CREATE TABLE Employees (ssn CHAR(10), name CHAR(20), lot INTEGER, PRIMARY
KEY (ssn));
2. Relationship sets (without constraints) to tables:
A relationship set, like an entity set directly mapped into a relation in the relational model by
identifying each participating entity and giving the values to the descriptive attributes of the
relation.
The attributes of the relation include:
 The primary key attribute of each participating entity set, as foreign key fields.
 The descriptive attributes of the relationship set.
budget
Did dname
Ssn e
name since
lot

Employees Works_in Departments

Address Locations capacity

Fig: A ternary relationship set


All the information about works_in relationship into works-in relation is captured by the
following SQL definition:
CREATE TABLE works_in (ssn CHAR(10), did INTEGER, address CHAR(20), since DATE,
PRIMARY KEY (ssn,did, address), FOREIGN KEY (ssn) REFERENCES Employees,
FOREIGN KEY (did) REFERENCES Departments,
FOREIGN KEY (address) REFERENCES Locations);
3. Relationship set with key constraints into relation
If a relationship involves n entity sets and some m of them are connected with arrow in the ER
diagram, key for any one of these m entity sets constitutes a key for the relation to which the
relationship set is mapped. Hence we have m candidate keys, and one of these should be
designated as the primary key.
Did
ssn dname budget
name lot since

Employees Manages Departments

Fig: Key constraints on Manages


So the manages has the attributes ssn, did, since, each department has at most one manager, no
two tuples can have the same did value but differ on ssn value, so did is act as key for manages
relation.
CREATE TABLE Manages(ssn CHAR(11), did INTEGER, since DATE, PRIMARY KEY
(did), FOREIGN KEY (did) REFERRENES Departments, FOREIGN KEY (ssn)
REFERRENCES Employees.
Manages:
ssn did since
Another approach is instead of creating a distinct table for the relationship set, include the
information about the relationship set in the table corresponding to the entity set with key. In this
example, because department has atmost one manager, we can add the key fields of employees
relation denoting the manager and since attribute to the departments relation.

CREATE TABLE Dept_Mgr (ssn CHAR(11), did INTEGER, since DATE, dname
CHAR(10), Budget REAL, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERRENCES
Employees);
Relation Dept_Mgr:
ssn did since dname budget

4. Translating relationship sets with participation constraints to relations

ssn did dname budget


name s
since
lot

Employees Manages Departments

Works_in

since

Fig: Manages and works_in


CREATE TABLE Dept_Mgr (ssn CHAR(11) NOT NULL, did INTEGER, since DATE,
dname CHAR(10), Budget REAL, PRIMARY KEY (did), FOREIGN KEY (ssn)
REFERRENCES Employees ON DELETE NO ACTION);
It captures the participating constraint that every department must have a manager, because of
ssn cannot take on null values, each tuple of Dept_mgr identifies a tuple in employees(who is
manager).An employee tuple cannot be deleted while it is pointed to dept_mgr tuple. If we wish
to delete such an employee tuple, we must first change the dept_mgr tuple to have a new
employee as manager. (we could have specified CASCADE instead of NO ACTION).
There are many participating constraints that we cannot capture using SQL, short of using table
constraints or assertions. To ensure total participation of Departments in works_in we need an
assertion, to guarantee that every did value in departments appears in a tuple of works_in.
The tuples of works_in must also have not null values.
We can also capture participating constraints using key and foreign key constraints if a
relationship set in which all participating entity sets have key constraints and total participation
by mapping all the entities as well as relationship into a single table.
5. Translating weak entity sets into a relation
A weak entity set always participates in a one-to-many binary relationship and has a key
constraint and total participation. Weak entity has only partial key and also when owner entity is
deleted, we must want all the weak entities to be deleted.
Consider dependents weak entity set with a partial key pname. A dependents entity can be
identified uniquely if we take the key of owning employees entity and the pname of dependents
entity, and dependents entity must be deleted if the owning employees entity is deleted.

pname age

lot
Ssn cost
name

Employees Policy Dependents

Fig: Departments weak entity set


CREATE TABLE Dep_Policy (Pname CHAR(20), age INTEGER, cost REAL, ssn
CHAR(11), PRIMARY KEY (pname,ssn), FOREIGN KEY (ssn) REFERRENCES
Employees ON DELETE CASCADE);
6. Translating class hierarchies into a relation
We can map each of the entity sets Employees, hourly_emps, contract_emps to distinct relation.
The relation for hourly_emps includes hourly_wages and hours_worked attributes and also key
attribute of super class ssn.
The relation for contract_emps includes Contractid attributes and also key attribute of super class
ssn. If supper class employee tuple is deleted the delete must cascade to Hourly_emps relation
and contract_emps relation.
Another approach is we can create to relations hourly_emps, contract_emps by including all the
attributes of super class employees entity set along with the its own attributes.
The second approach is not applicable if we have employees who are neither hourly employees
nor contract employees.

name
Ssn lot

Employees

hours_worked
Contract_id

ISA
houlrly_wages

Hourly_Emps Contract_Emps
oyees
Figure: Class Hierarchy
Employees: ssn name lot
Hourly_Empsoyees: (houlrly_wages, hours_worked, Ssn)
Contract_Emps: (Contract_id, ssn)
7. Translating ER diagram with aggregation into relation

Ssn name lot

Employees
8.

Monitors
until

Started_on
pid dname budget
pbudget since did

Projects sponsors Departments

Fig: Aggregation
Employees, Projects, and departments entity sets and sponsors relationship set are
mapped into relations as like Project(pid,stard_on, pbudget), employees(ssn, name, lot),
Dep_sponsors (did, dname, budget, pid since), for monitors relationship set create a
relation with following attributes, key attributes of employees(ssn), key attributes of
sponsors (pid, did), and descriptive attribute of monitors(until).
Monitors(ssn,pi,did,until)
The multivalued attribute is represented by a separate table:

It is not possible to represent in single column, so we should create a new table for Multivalued
attribute of ER model.

Examples on Logical data base design:

Student (Student_ID, Name, DOB, Door, Street, City, State, Pin, Course_Id)

Lecturer (Lecturer_Id, Lecturer_Name, Course_Id)

Subject (Subject_Id, Subject_Name, Lecturer_Id)

Course (Course_Id, Course_Name)


Stud_Hobby(Student_ID, Hobby)

Attends(Student_ID, Course_Id)

Takes (Lecturer_Id, Course_Id)

Teaches (Lecturer_Id, Subject_Id, Student_ID)

Relational Model:

There are some points for converting the ER diagram to the table:

A key attribute of the entity type represented by the primary key.

In the given ER diagram, COURSE_ID, STUDENT_ID, SUBJECT_ID, and LECTURE_ID are


the key attribute of the entity.

Entity type becomes a table.

In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE forms individual
tables.

All single-valued attribute becomes a column for the table.

In the STUDENT entity, STUDENT_NAME and STUDENT_ID form the column of STUDENT
table. Similarly, COURSE_NAME and COURSE_ID form the column of COURSE table and so
on.

Composite attribute represented by components.


In the given ER diagram, student address is a composite attribute. It contains CITY, PIN,
DOOR#, STREET, and STATE. In the STUDENT table, these attributes can merge as an
individual column.
Derived attributes are not considered in the table.
In the STUDENT table, Age is the derived attribute. It can be calculated at any point of time by
calculating the difference between current date and Date of Birth.
Using these rules, you can convert the ER diagram to tables and columns and assign the mapping
between the tables.
Tables for the given ER diagram is as below:
Example2:

Relational Model for above ER Model:


Example3:

Relational Model:
Example4:

Relational Model:
Querying Relational Data
A relational database query (query, for short) is a question about the data, and the
answer consists of a new relation containing the result. For example, we might want to
find all students younger than 20.
A query language is a specialized language for writing queries.

SQL is the most popular commercial query language for a relational DBMS. We can
retrieve rows corresponding to students who are younger than 20 with the following SQL
query:

SELECT *
FROM Students S
WHERE S.age < 20

The symbol * means that we retain all fields of selected tuples in the result. S is a variable
that takes on the value of each tuple in Students, one tuple after the other. The condition
S.age < 20 in the WHERE clause specifies that we want to select only tuples in which the
age field has a value less than 20.
SID NAME AGE EMAIL
101 Ramesh 19 ramesh@gmail.com
103 Mahesh 18 mahesh@gmail.com
104 Hareesh 19 Hareesh@gmail.com

Student realtion:
sid name age marks mobileno
101 Ramesh 21 89 2222555222
102 Mahesh 22 98 8890333322
103 Hareesh 30 92 1233421123
104 Suresh 19 95 1234567321

Enrollmont table

sid cid grade


101 Coo2 A
SELECT Sid, Name, Marks FROM Student;

SELECT Sid, Name, Marks FROM Student where age< 20;


sid name marks
104 sureh 95

SELECT Sid, Name FROM Student WHERE marks>90;

sid name
102 Mahesh
103 Hareesh
104 Suresh

Age is <20 and marks>90

SELECT DISTINCT Sname FROM Student;

AND: The AND operator displays a record if all the conditions separated by AND are TRUE.

SELEC * FROM Student WHERE marks>80 AND marks<95;

OR: The OR operator displays a record if any of the conditions separated by OR is TRUE.

SELEC * FROM Student WHERE age>20 OR marks>90;

The students whose age above 20 or got marks higher than 90

SELEC * FROM Student WHERE marks>80 AND marks<90 AN age<20;

Answer: the students who got marks between 80 to 90 and age is less than 20
IN: The IN operator allows you to specify multiple values in a WHERE clause.

The IN operator is a shorthand for multiple OR conditions.

SELECT * FROM STUDENT WHERE name IN ‘ma’,’he’;

NOT: The NOT operator displays a record if the condition(s) is NOT TRUE.

SELECT * FROM STUDENT WHERE name NOT IN ‘ma’;

SELECT * FROM STUDENT WHERE marks NOT IN ‘89’;

sid name age marks mobileno


101 Ramesh 21 89 2222555222
102 Mahesh 22 98 8890333322
103 Hareesh 30 92 1233421123
104 Suresh 19 95 1234567321

BETWEEN: The BETWEEN operator selects values within a given range. The values can be
numbers, text, or dates.

The BETWEEN operator is inclusive: begin and end values are included.

Example:
SELECT * FROM Student WHERE Marks BETWEEN (70, 100)

sid name age marks mobileno


101 Ramesh 21 89 2222555222
102 Mahesh 22 98 8890333322
103 Hareesh 30 92 1233421123
104 Suresh 19 95 1234567321

Enrollment
sid cid grade
101 C1 A
102 C2 B
We can also combine information in the Students and Enrolled relations. If we want to
obtain the names of all students who obtained an A and the id of the course in which they
got an A, we could write the following query:

SELECT name, cid


FROM Students S , Enrollement E
WHERE S. sid = E.sid AND E.grade = ‘A’

LIKE: The LIKE operator is used in a WHERE clause to search for a specified pattern in a
column.

There are two wildcards often used in conjunction with the LIKE operator:

The percent sign ( % ) represents zero, one, or multiple characters

The underscore sign ( _ ) represents one, single character

Example queries:

SELEC * FROM Student WHERE Name LIKE ‘%n’;

SELEC * FROM Student WHERE Name LIKE ‘n%’;

SELEC * FROM Student WHERE Name LIKE ‘n_’;

SELEC * FROM Student WHERE Name LIKE ‘_n’;

SELEC * FROM Student WHERE Name LIKE ‘S_ _ _ _h’;

SELEC * FROM Student WHERE Name LIKE ‘S%h’;

SELEC * FROM Student WHERE Name LIKE ‘%Sh%’;

SELEC * FROM Student WHERE Name LIKE ‘n%’ OR Mobile no LIKE ‘9%’;
Views: a view is a virtual table whose rows are not explicitly stored in the database, but are
computed as needed from view definition. A view also has rows and columns as they are in a
real table in the database. We can create a view by selecting fields from one or more tables
present in the database. A View can either have all the rows of a table or specific rows based
on certain condition.
Syntax:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE condition;
view_name: Name for the View
table_name: Name of the table
Condition: Condition to select rows

Example:

CREATE VIEW StudentNames AS


SELECT S_ID, NAME
FROM StudentDetails
ORDER BY NAME;

Consider students and enrolled relations. Suppose if we want to find names, id of students who
got B grade in some course with course id.
CREATE VIEW B-Students (name,sid,course) AS
SELECT S.sname, S.sid, E.cid
FROM Students S, Enrolled E
WHERE S.sid=E.sid AND E.Grade=’B’;
Just like table query, we can query the view to view the data.
SELECT * FROM Viewname;
SELECT * FROM B-Students;
UPDATING VIEWS
There are certain conditions needed to be satisfied to update a view. If any one of these
conditions is not met, then we will not be allowed to update the view.

1. The SELECT statement which is used to create the view should not include GROUP
BY clause or ORDER BY clause.
2. The SELECT statement should not have the DISTINCT keyword.
3. The View should have all NOT NULL values.
4. The view should not be created using nested queries or complex queries.
5. The view should be created from a single table. If the view is created using multiple
tables then we will not be allowed to update the view.
We can use the CREATE OR REPLACE VIEW statement to add or remove fields from a
view.
Syntax:
CREATE OR REPLACE VIEW view_name AS
SELECT column1,coulmn2,..
FROM table_name
WHERE condition;
For example, if we want to update the view MarksView and add the field AGE to this View
from StudentMarks Table, we can do this as:

CREATE OR REPLACE VIEW MarksView AS


SELECT StudentDetails.NAME, StudentDetails.ADDRESS, StudentMarks.MARKS,
StudentMarks.AGE
FROM StudentDetails, StudentMarks
WHERE StudentDetails.NAME = StudentMarks.NAME;
Example:
CREATE VIEW GoodStudents (sid, gpa)
AS SELECT S.sid, S.gpa
FROM Students S
WHERE S.gpa > 3.0
We can modify the Good Students row, we can delete Good Students row by deleting
corresponding row from base table students. We can insert a Good Students row by inserting a
row into base table students.
A tuple in Clubs denotes that the student called mname has been a member of the club cname
since the date jyear.4 Suppose that we are often interested in _nding thenames and logins of
students with a gpa greater than 3 who belong to at least one club, along with the club name and
the date they joined the club.
We can define a view for this purpose:
CREATE VIEW ActiveStudents (name, login, club, since)
AS SELECT S.sname, S.login, C.cname, C.jyear
FROM Students S, Clubs C
WHERE S.sname = C.mname AND S.gpa > 3
Inserting a row in a view:

We can insert a row in a View in a same way as we do in a table. We can use the INSERT
INTO statement of SQL to insert a row in a View.Syntax:
INSERT INTO view_name(column1, column2 , column3,..)
VALUES(value1, value2, value3..);
Deleting a row from a View:
Deleting rows from a view is also as simple as deleting rows from a table. We can use the
DELETE statement of SQL to delete rows from a view. Also deleting a row from a view first
delete the row from the actual table and the change is then reflected in the view.

Syntax:
DELETE FROM view_name
WHERE condition;
Example:
In this example we will delete the last row from the view DetailsView which we just added in
the above example of inserting rows.
DELETE FROM DetailsView
WHERE NAME="Suresh";
Destroying Views
A view can be dropped using the DROP VIEW command, which is just like DROP TABLE.
for example, we would use the following command:
DROP VIEW Viewname;
Consider the instances of Students and Clubs relations
cname Jyear mname

Sailing 1996 Dave


Hiking 1997 Smith
Rowing 1998 Smith
Figure: An Instance C of Clubs

sid name Login age gpa


50000 Dave dave@cs 19 3.3
53666 Jones jones@cs 18 3.4
53688 Smith smith@ee 18 3.2
53650 Smith smith@math 19 3.8
Figure 3.20 An Instance S3 of Students

name Login Club since


Dave dave@cs Sailing 1996
Smith smith@ee Hiking 1997
Smith smith@ee Rowing 1998
Smith smith@math Hiking 1997
Smith smith@math Rowing 1998
Figure 3.21 Instance of ActiveStudents
Relational Algebra:
Relational algebra is one of the two formal query languages associated with the relational
model. A fundamental property is that every operator in the algebra accepts (one or two) relation
instances as arguments and returns a relation instance as the result. the basic operators of the
algebra (selection, projection, union, cross-product,and difference. It describes a step-by-step
procedure for computing the desired answer, based on the order in which operators are applied in
the query.

Consider these instances of relations for writing queries

sid sname rating age sid sname rating age


22 Dustin 7 45.0 28 yuppy 9 35.0
31 Lubber 8 55.5 31 Lubber 8 55.5
58 Rusty 10 35.0 44 guppy 5 35.0
Figure Instance S1 of Sailors 58 Rusty 10 35.0

Figure: Instance S2 of Sailors


sid bid day
22 101 10/10/96
58 103 11/12/96
Figure: Instance R1 of Reserves

Selection(σ): this selection operator is for to select the tuples from a relation. It is denoted by
sigma(σ).

it can be represented as σ p R. Here σ is selection, p is condition, R is relation.

Ex: find the sailors rating is greater than from the instance of sailor’s relation (s2).
σ rating>8 (s2)

id sname rating age


28 yuppy 9 35.0
58 Rusty 10 35.0

2,4,5,6,7,8,9,10,11,12,13,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,36,39,40,
41,42,43,44,45,46,47,48,49,50,51,52,54,56,57,58,59,60,LE-1,2,3,4,6
Projection(π): this selection operator is for to select the attributes from a relation. It is denoted
by pie(π).it can be represented as πa1,a2,….an (R). Here π is projection, a1, a2,.. etc, attributes of
relation and R is the relation.

Ex: find the sname, rating from instance of sailors S2.

Π sname, rating(S2)
sname Rating
yuppy 9
Lubber 8
guppy 5
Rusty 10
Ex: find the sname, rating of sailors whose rating is greater than 8 from instance of sailors S2.

Π sname, rating(σ rating>8 (s2))


sname Rating
yuppy 9
Rusty 10
Set Operations
The following standard operations on sets are also available in relational algebra:
union(U),intersection(∩),set-di_erence (−), and cross-product (X).
Union: RUS returns a relation instance containing all tuples that occur in either relation instance
R or relation instance S (or both). R and S must be union compatible,and the schema of the result
is defined to be identical to the schema of R.
Two relation instances are said to be union-compatible if the following conditions hold:
{ they have the same number of the fields, and
{ corresponding fields, taken in order from left to right, have the same domains.

sid sname rating age


22 Dustin 7 45.0
31 Lubber 8 55.5
58 Rusty 10 35.0
28 yuppy 9 35.0
44 guppy 5 35.0

Fig S1US2
Intersection: R∩S returns a relation instance containing all tuples that occur in both R and S.
The relations R and S must be union-compatible, and the schema of the result is defined to be
identical to the schema of R.
sid sname rating age
31 Lubber 8 55.5
58 Rusty 10 35.0

Fig: S1∩S2
Set-difference: R−S returns a relation instance containing all tuples that occur in R but not in S.
The relations R and S must be union-compatible, and the schema of the result is defined to be
identical to the schema of R.

sid sname rating age


22 Dustin 7 45.0
Fig: S1-S2

Cross-product: RXS returns a relation instance whose schema contains all the fields of R (in the
same order as they appear in R) followed by all the fields of S (in the same order as they appear
in S). The result of R _ S contains one tuple hr; si (the concatenation of tuples r and s) for each
pair of tuples r 2 R; s 2 S. The cross-product opertion is sometimes called Cartesian product.

sid sname rating age sid bid day


22 Dustin 7 45.0 22 101 10/10/96
22 Dustin 7 45.0 58 103 11/12/96
31 Lubber 8 55.5 22 101 10/10/96
31 Lubber 8 55.5 58 103 11/12/96
58 Rusty 10 35.0 22 101 10/10/96
58 Rusty 10 35.0 58 103 11/12/96
Fig S1XR1
Rename: rename operator is used to give another name to relation. It is dented by rho(p).
The expression p(R(F),E) it takes an arbitrary relational algebra expression E and returns an
instance of new relation called R. R contains same tuples as the result of E and same schema as
E, but some field are renamed. The field names in relation R are same as in E, except for fields
renamed in the renaming list, which is list term having the form old namenew name or
position  new name.
Syntax: p(C(1sid1,5sid2), S1 X R1)
It returns the relation contains the tuples( sid1:integer, sname:string,rating:integer,age :real,
sid2:integer, bid: integer, day:date).
For example If we want to create a relation student_names with roll_no and name from student it
can be done using rename operator as:
P(student_names,π(rollnoo,name)(student)).
Division: this operator use full when a query contains ‘all’. For example find the names of
sailors who have reserved all boats. It is denoted slash (/).
Consider two relation instances A and B in which A has two fields x and y, and B has just one
field y with same domain as in A. The division operation (A/B) return set of x values for every y
value (a tuple of) B, there is tuple (x, y) in A.
The division operation (A/B) can be defined as
Π x (A)- Π x(( Π x (A) X B) – A).
Ex: consider the relation student (s), course (C)relations and find sid’s of students enrolled in all
or Every course.

Sid cid
S1 C1 cid
S2 C1 C1
S1 C2 C2
S3 C2 Table: course (C)
Table: Enrolled (E)
E/C= Π sid (E)- Π sid(( Π sid (E) X C) – E).

Sid Cid
Π sid (E) X C =
S1 C1
S1 C2
S2 C1
S2 C2
S3 C1
S3 C2

( Π sid (E) X C) – E is
Sid Cid
S2 C1
S3 C1

Π sid(( Π sid (E) X C) – E is

Sid
S2
Π sid (E) is S3

sid
S1
S2
S3

(E/C) = Π sid (E)- Π sid(( Π sid (E) X C) – E)


=s1
Output: S1
Consider relations for the writing queries
sid sname rating age
22 Dustin 7 45.0 si bid Day
29 Brutus 1 33.0 d
31 Lubber 8 55.5 22 101 10/10/98
32 Andy 8 25.5 22 102 10/10/98
58 Rusty 10 35.0 22 103 10/8/98
64 Horatio 7 35.0 22 104 10/7/98
71 Zorba 10 16.0 31 102 11/10/98
74 Horatio 9 35.0 31 103 11/6/98
85 Art 3 25.5 31 104 11/12/98
95 Bob 3 63.5 64 101 9/5/98
Fig: Instance of Sailors (S3) 64 102 9/8/98
74 103 9/8/98

Fig: Instance of Reserves R2

bid bname color


101 Interlake Blue
102 Interlake Red
103 Clipper Green
104 Marine Red
Fig: instance of Boats (B1)
Relational algebra queries on above tables:
Find the boats reserved by sailors with id 567
Πboats((σ sid=567 sailors)x Ressevesxboats)
1. Find the names of sailors who have reserved at least two boats.
ρ (Reservations, , Πsid,Sname,bid(SailorsX Reserves))
ρ (Reservation pairs(1Sid1, 2Sname1,3 bid1, 4 Sid2, 5Sname2, 6bid2),
Reservation X Reservations)
Πsname1( σ Sid1=Sid2) ∩ (Bid1 <>Bid2) Resevation pairs

Find the boats which have at least two reservations by different saiors
Ρ( reseserves, , Πsid,bid,bname,bcolor(sailorsxreservesxboats))
ρ (Reservation pairs(1Sid1,2 bid1,3bname, 4bcolor, 5 Sid2, 6
bi2,7bname, 8bcolor , 4bid2), Reservation X Reservations)
Πboats( σ Sid1< >Sid2) ∩ (Bid1 < >Bid2) Resevation pairs

2. Find the names of sailors who reserved boat 103.


Πsname ((σ bid=103 Reserves) X Sailors)
Or
Πsname (σ bid=103 (Reserves X Sailors))
Or
ρ (Temp1, σ bid=103 Reserves)
ρ (Temp2, Temp1X Sailors)
Πsname (Temp2)

3. Find the names of sailors who have reserved a red a boat.


Πsname (σ color=’red’ Boats) X Reserves X Sailors)
4. Find the colors of boats reserved by Lubber.
Πcolor((σ sname=’Luber’ Sailors) X reserves X boats)
5. Find the names of sailors who have reserved at least one boat.
Πsname (Sailors X Reserves)
6. Find the names of sailors who have reserved a red or green boat.
ρ (Tempboats,( σ color=’red’Boats) U ,( σ color=’green’Boats))
Πsname (Tempboats X Reserves X Sailors)
Or
ρ (Tempred, Πsid ((σ color=’red’ Boats) X Reserves ))
ρ (Tempgreen, Πsid ((σ color=’green’ Boats) X Reserves ))
Πsname((Tempred U Tempgreen) X Sailors)
7. Find the names of sailors who have reserved a red and green boat.
ρ (Tempboats( σ color=’red’Boats) ∩ ( σ color=’green’Boats))
Πsname (Tempboats X Reserves X Sailors)
8. Find the names of sailors who have reserved at least two boats.
ρ (Reservations, , Πsid,Sname,bid(SailorsX Reserves))
ρ (Reservation pairs(1Sid1, 2Sname1,3 bid1, 4 Sid2, 5Sname2, 6bid2),
Reservation X Reservations)
Πsname1( σ Sid1=Sid2) ∩ (Bid1 <>Bid2) Resevation pairs
9. Find the Sid’s of Sailors with age over 20 and who have not Reserved a red boat.
Πsid(σ age>20 Sailors)- Πsid((σ color=’red’Boats) X Reserves X Saulors)
10. Find names of sailors who have reserved all boats.
ρ (TempSid,( ΠSid,Bid Reserves)/ (ΠBid Boats))
Πsname(TempSid X Sailors)
11. Find the names of sailors who have reserved all boats called Interlake.
ρ (TempSid,,( ΠSid,Bid Reserves)/ (ΠBid (σ bname=’Interlake Boats))
Πsname(TempSids X Sailors)

Relational calculus: it is a non procedural language, it provides only description of the query,
but does not provide the methods to solve it. That is it explains what to do, not how to do.
It has two variants
Tuple relational calculus: a tuple variable is a variable that takes the tuples of particular
relation schema as values. Every value assigned to a given tuple variable has same nuber and
type of fields. A TRC a query can be expressed as {T|P (T)}. Where T is tuple variable, P(T) is a
predicate and these are conditions to fetch T. it returns the set of tuples T, such that predicate
P(T) is true for T.
P (T) may have various conditions
AND(∩), OR(V),NOT(┐), implise(
It also uses quantifiers
There exist Tє r(Q(T)), there exist a tupe T in relation r such that predicate Q(t) is true.
For all Tє r(Q(T)), q(t) is true for all tuples in relation r.
In relational calculus there exist and for all are said to bind the variables R.
A variable is said to be free in a formula or sub formula does not contain an occurrence of a
quantifier that binds it. Every variable in a TRC formula appears in a subformula that is atomic.
Syntax of TRC queries:
Let Rel be the relation name, R and S are tuple variables, a be the attributes of R, and b be the
attribute of S.
 R€ Rel
 R.a op S.b ( here op= <,>,= ,….etc)
 R.a op constant

Semantics of TRC Queries:


A tuple value F evaluates to true if one of the following holds:
 F is an atomic formula R€ Rel, R is assigned a tuple in the instance of relation Rel.
 F is a comparison R.a op S.b, R.a op constant, or constant op R.a, and the tuples assigned
to R and S have the field values R.a and S.b makes the comparison true.
 F is of the form negation P and p is not true, or of the form P∩Q, both p and q are true,
PUQ and one of them or true, P implise Q and q is true when p is true.
 F is of the form ,There exist R(p(R)) p® is true.
 F is of the form for all R(p(R)) and there is some assignment of tuple to the free variables
in p® that makes the formula p(R) true
Tuple relational calculus Queries on above tables:

1. Find sailors with the rating above 7.

{S|S є Sailors c S.rating>7}


2. Find the names and ages of sailors with a rating above 7.

{P|∃ S є Sailors (S.rating>7 ∩ P.sname=S.sname ∩ P.age=S.age)}


12. Find the names of sailors who have reserved a red or green boat.

{P|∃ S є Sailors,B є boats, R є Reserves ((B.color=’red’ U (B.color=green)) (S.sid=


R.sid ∩ R.bid=B.bid) ∩ P.sname=S.sname }

3. Find the sailor name, boat id and reservation date for each reservation.
{P| ∃ R є Reserves ∃ S є Sailors ( R.sid= S.sid ∩ P.bid= R.bid ∩ P. day=R. day ∩
P.sname= S.sname)}
4. Find the names of sailors who have reserved boat 103.
{P|∃ S є Sailors ∃ R є Reserves(R.sid= S.sid ∩ R.bid= 103 ∩ P.sname=S.sname)}
5. Find the names of sailors who have reserved a red boat.
{P|∃ S є Sailors ∃ R є Reserves(R.sid= S.sid ∩ P.sname=S.sname) ∩ ∃ B є
boats(B.bid=R.bid ∩ B.color=’red’)}
6. Find the names of sailors who have reserved at least two boats.
{P|∃ S є Sailors ∃ R1 є Reserves ∃ R2 є Reserves( S.sid=R1.sid ∩ R1.sid= R2.sid ∩
R1.bid != R2.bid ∩ P.sname=S.sname)}
7. Find the names of sailors who have reserved all boats.
{P|∃ S є Sailors ∀ B є boats ∃ R є Reserves(S.sid= R.sid ∩ R.bid=B.bid ∩
P.sname=S.sname)}
8. Find the names of sailors who have reserved all red boats.
{P|∃ S є Sailors ∀ B є boats (B.color=’red’ ⇒(∃ R є Reserves (S.sid= R.sid ∩
R.bid=B.bid))}
Or
{P|∃ S є Sailors ∀B є boats(B.color !=’red’ ∪(∃ R є Reserves (S.sid= R.sid ∩
R.bid=B.bid))}
Domain relational calculus: a domain variable is variable that ranges over the values in the
domain of some attribute.
DRC query can be expressed as {(x1, x2,……, xn )|P(x1, x2,……, xn )}, where x1, x2,……, xn represents
the domain variables, P(x1, x2,……, xn ) represents the condition or formula equivalent to the
predicate calculus.
The predicate calculus formula as TRC it contains:
1. Set of all comparison operators
2. Set of connectivity operators like AND,OR,NOT
3. Set of Quantifiers.
Domain relational calculus Queries on above tables:
1. Find all sailors with a rating above 7.
{(I,N,T,A)| <I,N,T,A> є Sailors ∩ T>7}
2. Find the names of sailors who have reserved boat 103.
{<N> | ∃ I,T,A <I,T,N,A> є Sailors ∩ ∃ Ir, Br, D <Ir, Br, D> є Reserves ( Ir= I ∩ Br=103)}
3. Find the names of sailors who have reserved a red boat.
{<N>|∃ I,T,A <I,T,N,A> є Sailors ∩ <Ir, Br, D> є Reserves ∩ ∃(Br, Bn, ‘red’) є Boats)}
4. Find the names of sailors who have reserved at least two boats.
{<N>|∃ I,T,A <I,T,N,A> є Sailors ∩ ∃ Br1, Br2, D1, D2(I,Br1, D1) є Reserves ∩ (I,Br2, D2) є
Reserves ∩ Br1!= Br2)}
5. Find the names of sailors who have reserved all boats.
{<N> | ∃ I,T,A <I,T,N,A> є Sailors ∩ ∀ B, Bn, C(¬ (B, Bn, C) ) є Boats) ∪ ∃ (<Ir, Br, D> є
Reserves (I=Ir ∩ Br= B)}
6. Find names of sailors who have reserved all red boats.
{(I,N,T,A)| <I,N,T,A> є Sailors ∩ ∀ (B, Bn, C) є Boats (c=’red’ ⇒ ∃ Ir, Br, D <Ir, Br, D> є
Reserves(I=Ir ∩ Br=B)}
Unit-III
SQL: QUERIES, CONSTRAINTS, TRIGGERS: form of basic SQL query, UNION,
INTERSECT, and EXCEPT, Nested Queries, aggregation operators, NULL values, complex
integrity constraints in SQL, triggers and active data bases.
Schema Refinement: Problems caused by redundancy, decompositions, problems related to
decomposition, reasoning about functional dependencies, FIRST, SECOND, THIRD normal
forms, BCNF, lossless join decomposition, multi-valued dependencies, FOURTH normal form,
FIFTH normal form.
SQL: SQL (structured query language) is most widely used commercial database language. It
was developed by IBM in 1970’s, it has several aspects:
DDL(Data Definition Language): this subset of SQL supports the creation, deletion,
modification of tables and views, allows to specify the integrity constraints on tables.
DML(Data Manipulation Language): it allows the user to pose the queries and to insert, delete
and modify the rows on a table.
Embedded and Dynamic SQL: Embedded SQL features allows user to call SQL code from a
host language such as C or COBOL. Dynamic SQL allows a query to construct at runtime.
Trigger: the new SQL1999 standard supports the triggers. Which are the actions executed by
DBMS whenever changes to the database meet the conditions specified in the trigger.
Security: SQL provides the mechanisms to control access to data objects such as tables and
views.
Transaction management: it allows a user to explicitly control how a transaction is to be
executed.
Form of a Basic SQL Query:

SQL is a programming language for Relational Databases. It is designed over relational algebra
and tuple relational calculus. SQL comprises both data definition and data manipulation
languages. Using the data definition properties of SQL, one can design and modify database
schema, whereas data manipulation properties allows SQL to store and retrieve data from
database.
The basic form of SQL query is
SELECT [Distinct] Select-list
FROM From-list
WHERE Condition;
Example: SQL Query for to get the names of students who got marks above 60.
SELECT sname
From student
Where marks>60
This query corresponds to a relational algebra expression involves selection, projection, and
cross product.
The SELECT clause specifies which columns to be retained in the result.
Select-list: it specifies the list of column names.
The FROM clause specifies the cross product of tables
From-list: list of table names.
WHERE clause specifies the selection condition on the tables mentioned in the FROM clause.
Conceptual evaluation strategy:
 Computes the cross product of tables in the from-list.
 Deletes the rows from the cross product that fails the condition.
 Delete the column that does not appear in the select-list.
 Eliminates the duplicate rows.
In this we wrote the queries using following table definitions.
Sailors (sid: integer, sname: string, rating: integer, age: real)
Boats ( bid: integer, bname: string, color: string)
Reserves( sid: integer, bid: integer, day: date)
Id sname rating Age Sid bid Day
22 Dustin 7 45.0 22 101 10/10/98
29 Brutus 1 33.0 22 102 10/10/98
31 Lubber 8 55.5 22 103 10/8/98
32 Andy 8 25.5 22 104 10/7/98
58 Rusty 10 35.0 31 102 11/10/98
64 Horatio 7 35.0 31 103 11/6/98
71 Zorba 10 16.0 31 104 11/12/98
74 Horatio 9 35.0 64 101 9/5/98
85 Art 3 25.5 64 102 9/8/98
95 Bob 3 63.5 74 103 9/8/98
Fig instance of sailors s3
Figure: An Instance R2 of Reserves

bid bname color


101 Interlake blue
102 Interlake Red
103 Clipper green
104 Marine Red
Fig: Instance of boats b1
Examples of Basic SQL queries:
1. Find the names and ages of all sailors.
SELECT S.sname, S.age
FROM Sailors S
2. Find all sailors with rating above 7?
SELECT *
FROM Sailors S
WHERE S.rating>7;
3. Find the names of sailors who reserved boat number 103?
SELECT S.sname
FROM Sailors S, Reserves R
WHERE S.sid= R.sid AND R.bid=103;
4. Find Sid’s of sailors who reserved a red boat?
SELECT Sid
FROM Reserves, Boats
WHERE Rserves.bid= Boates.bid AND Boats. color=’red’;
Or
SELECT S.sid
FROM Reserves R, Boats B
WHERE R.bid= B.bid AND B. color=’red’;
5. Find the colors of boats reserved by sailor name Lubber?
SELECT B. color
FROM Reserves R, Boats B, Sailors S
WHERE S.sid=R.sid AND R.bid=B.bid AND S.sname= ‘Lubber’;
6. Find the Names of sailors who have reserved at least one boat?
SELECT S.sname
FROM Sailors S, Reserves R
WHERE S.sid= R.sid;
7. Find the names of sailors who have reserved at least two boats on same day?
SELECT S.sname
FROM Sailors S, Reserves R1, Reserves R2
WHERE (S.sid=R1.sid AND R1.sid=R2.sid AND R1.bid < > R2.bid AND R1.day=
R2.day);
8. Find the ages of sailors whose name begins and ends with B and has at least three characters.
SELECT S.age
FROM Sailors S
WHERE S.sname LIKE `B %B'.
9. Compute increments for the ratings of persons who have sailed two different boats on the
same day.
SELECT S.sname, S.rating+1 AS rating
FROM Sailors S, Reserves R1, Reserves R2
WHERE S.sid = R1.sid AND S.sid = R2.sid
AND R1.day = R2.day AND R1.bid <> R2.bid
Set operators:
SQL provides three set manipulation constructs (UNION, INTERSECTION, EXCEPT) that
extend the basic form of sql query, these constructs are used when answer to query is multiset of
rows. SQL also provides other set operations: IN, op ANY, op ALL, EXIST, NOT IN, NOT
EXIST. These operators are used to join the results of two (or more) SELECT statements
UNION: this UNION operator, it returns the combined result from all the compounded
SELECT queries, after removing all duplicates and in sorted order (ascending by default),
without ignoring the NULL values.
Example: Find the names of sailors who have reserved a red or a green boat.
SELECT S.sname
FROM Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = `red'
UNION
SELECT S2.sname
FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = `green'.
Example: Find all Sid’s of sailors who have a rating of 10 or have reserved boat 104.
SELECT S.sid
FROM Sailors S
WHERE S.rating = 10
UNION
SELECT R.sid
FROM Reserves R
WHERE R.bid = 104
INTERSECT: INTERSECT operator, displays the common rows from both the SELECT
statements, with no duplicates and data arranged in sorted order.
Example: Find the names of sailors who have reserved a red and a green boat.
SELECT S.sname
FROM Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = `red'
INTERSECT
SELECT S2.sname
FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = `green'
EXCEPT: It displays the rows which are present in the first query but not presented in the
second query, with no duplicates and data arranged in ascending order
Example: Find the Sid’s of all sailors who have reserved red boats but not green boats.
SELECT S.sid
FROM Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = `red'
EXCEPT
SELECT S2.sid
FROM Sailors S2, Reserves R2, Boats B2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = `green'
Nested Queries: A nested query is a query that has another query embedded within it; the
embedded query is called a sub query. In this inner query is executed first later the outer query
will be executed. A sub query typically appears within the WHERE clause of a query, sometimes
appear in the FROM clause or the HAVING clause.
Example: Find the names of sailors who have reserved boat 103.
SELECT S.sname
FROM Sailors S
WHERE S.sid IN (SELECT R.sid
FROM Reserves R
WHERE R.bid = 103)

Example: Find the names of sailors who have reserved a red boat.
SELECT S.sname
FROM Sailors S
WHERE S.sid IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid IN ( SELECT B.bid
FROM Boats B
WHERE B.color = `red' )
Example: Find the names of sailors who have not reserved a red boat.
SELECT S.sname
FROM Sailors S
WHERE S.sid NOT IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid IN ( SELECT B.bid
FROM Boats B
WHERE B.color = `red' )
Correlated Nested Queries:
Correlated sub queries are used for row-by-row processing. Each sub query is executed once for
every row of the outer query.
Example: Find the names of sailors who have reserved boat number 103.
SELECT S.sname
FROM Sailors S
WHERE EXISTS ( SELECT *
FROM Reserves R
WHERE R.bid = 103 AND R.sid = S.sid)
Set-Comparison Operators:
SQL support the set operators EXIST, IN, and UNIQUE it also supports op ANY and op ALL,
where op is one of the arithmetic comparison operators (<;<=;=; <>;>=;>). SOME is also
available, but it is just a synonym for ANY.
The ANY and ALL operators are used with a WHERE or HAVING clause.
The ANY operator returns true if any of the subquery values meet the condition.
The ALL operator returns true if all of the subquery values meet the condition.
SOME: This operator is used to compare a value with a single column set of values returned by
the subquery. The SOME operator in SQL must match at least one value in a subquery and that
value must be preceded by comparison operators.

Generally we will use this SOME operator in WHERE clause to check whether the required
column values are matching with the set of values returned by subquery or not.

Example: write a Query to find the sailors whose rating of is greater than 8.

SELECT * FROM Sailors WHERE rating> SOME


(SELECT Sid FROM sailors WHERE rating >8)
Example: Find sailors whose rating is better than some sailor called Horatio.
SELECT S.sid
FROM Sailors S
WHERE S.rating > ANY (SELECT S2.rating
FROM Sailors S2
WHERE S2.sname = `Horatio’)
The sub query computes all rating values in sailors. The outer query WHERE condition is
satisfied when S.rating is greater than or equals to each of these rating values.
Example: find the sailors with highest rating.
SELECT S.sid
FROM Sailors S
WHERE S.rating >= ALL (SELECT S2.rating
FROM Sailors S2)
Example: find the names of sailors who has reserved a red and green boat.
SELECT S3.sname
FROM Sailors S3
WHERE S3.sid IN ((SELECT R.sid
FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color = `red’)
INTERSECT
(SELECT R2.sid
FROM Boats B2, Reserves R2
WHERE R2.bid = B2.bid AND B2.color = `green’))
Or
SELECT S3.sname
FROM Sailors S3
WHERE S3.sid IN (( SELECT R.sid
FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color = `red' )
INTERSECT
(SELECT R2.sid
FROM Boats B2, Reserves R2
WHERE R2.bid = B2.bid AND B2.color = `green' ))
Example: Find the names of sailors who have reserved all boats.
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS (( SELECT B.bid
FROM Boats B )
EXCEPT
(SELECT R.bid
FROM Reserves R
WHERE R.sid = S.sid ))
Aggregate Operators:
These operators are used to perform the calculation on the multiple rows of the column of a
table. SQL supports five aggregate operations
COUNT (A): it returns the number of (unique) values in column A.
Example: Count the number of sailors.
SELECT COUNT (*)
FROM Sailors S
Example: Count the number of different sailor names.
SELECT COUNT (DISTINCT S.snmae)
FROM Sailors S
SUM (A): it returns the sum of all the values in column A.
Example: Find the sum of ages of all sailors.
SELECT SUM (S.age)
FROM Sailors S
AVG (A): it returns the the average of all values in column A.
Example: Find the average age of sailors with a rating of 10.
SELECT AVG (S.age)
FROM Sailors S
WHERE S.rating = 10
MAX (A): it returns the maximum value in the column A.
Example: Find the name and age of the oldest sailor.
SELECT S.sname, MAX (S.age)
FROM Sailors S
Or
SELECT S.sname, S.age
FROM Sailors S
WHERE S.age = (SELECT MAX (S2.age)
FROM Sailors S2 )
MIN (A): it returns the minimum value in column A.
Ex: Find the name and age of the youngest sailor.
SELECT S.sname, MIN (S.age)
FROM Sailors S
Example: Find the names of sailors who are older than the oldest sailor with a rating of 10.
SELECT S.sname
FROM Sailors S
WHERE S.age > ( SELECT MAX ( S2.age )
FROM Sailors S2
WHERE S2.rating = 10)
Group by Having Clause:
The aggregate operators can be applied on all the rows of a relation. We want to apply aggregate
operations on groups of rows in relation then this having clause can be used.
The HAVING clause was added to SQL because the WHERE keyword could not be used with
aggregate functions.
ORDER BY:
The ORDER BY keyword is used to sort the result-set in ascending or descending order.
The ORDER BY keyword sorts the records in ascending order by default. To sort the records in
descending order, use the DESC keyword.
Syntax:
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
Example: find the age of the youngest sailor for each rating level.
SELECT S.rating, MIN (S.age)
FROM Sailor S
Group By S.rating
HAVING SID>58
Example: find the age of the youngest sailor who is eligible to vote for each rating level with
at least two such categories in saaending order.
SELECT S.rating, MIN(S.age) AS minage
FROM Sailors S
WHERE S.age>18
GROUP BY S.rating
HAVING COUNT(*)>1
ORDER By Sid.
Example: for each red boat, find the number of reservations this boat.
SELECT B.bid, COUNT (*) AS reservation count
FROM Boats B, Reserves R
WHERE R.bid==B.bid
GROUP BY B.bid
Having B.color=’red’
Example: find those ratings for which the average age of sailors is the minimum over the
all ratings.
SELECT S.rating
FROM Sailors S
WHERE AVG (S.age) = (SELECT MIN(S2.age))
FROM Sailors S2
Group By S2.rating)
Null Values:
A field with a NULL value is a field with no value. If a field in a table is optional, it is possible
to insert a new record or update a record without adding a value to this field. Then, the field will
be saved with a NULL value.
We can disallow null values by specifying NOT NULL as part of field definition for example
Sname CHAR(20) NOT NULL. There is an implicit NOT NULL constraint for every field listed
in a PRIMARY KEY Constraint.
IS NULL and IS NOT NULL operators used to test the null values.
Syntax: IS NULL
SELECT column_names
FROM table_name
WHERE column_name IS NULL;
Syntax: IS NOT NULL
SELECT column_names
FROM table_name
WHERE column_name IS NOT NULL;
Example find the Sid, Sname of sailors whose name is NULL.
SELECT Sid, Sname
FROM Sailors
WHERE Sname IS NULL;
Joins:
A SQL Join statement is used to combine data or rows from two or more tables based on a
common field between them.

Example

Consider the following two tables.


Table 1 − CUSTOMERS Table is as follows.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00
Table 2 − ORDERS Table is as follows.
+-----+---------------------+-------------+--------+
| OID | DATE | CUSTOMER_ID | AMOUNT |
+-----+---------------------+-------------+--------+
| 102 | 2009-10-08 00:00:00 | 3 | 3000 |
| 100 | 2009-10-08 00:00:00 | 3 | 1500 |
| 101 | 2009-11-20 00:00:00 | 2 | 1560 |
| 103 | 2008-05-20 00:00:00 | 4 | 2060

Different types of Joins are:


 INNER JOIN
 LEFT JOIN
 RIGHT JOIN
 FULL JOIN

INNER JOIN: The INNER JOIN creates a new result table by combining column values of two
tables (table1 and table2) based upon the join-predicate.

SELECT ID, NAME, AMOUNT, DATE


FROM CUSTOMERS
INNER JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

ID | NAME | AMOUNT | DATE |


+----+----------+--------+---------------------+
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
+----+----------+--------+--------------------
LEFT JOIN: returns all rows from the left table, even if there are no matches in the right table.
This means that if the ON clause matches 0 (zero) records in the right table; the join will still
return a row in the result, but with NULL in each column from the right table.

SELECT ID, NAME, AMOUNT, DATE


FROM CUSTOMERS
LEFT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
This would produce the following result
+----+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+----+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL

RIGHT JOIN: returns all rows from the right table, even if there are no matches in the left
table. This means that if the ON clause matches 0 (zero) records in the left table; the join will
still return a row in the result, but with NULL in each column from the left table.
This means that a right join returns all the values from the right table, plus matched values from
the left table or NULL in case of no matching join predicate
SQL> SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
RIGHT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00
FULL JOIN: combines the results of both left and right outer joins.
The joined table will contain all records from both the tables and fill in NULLs for missing
matches on either side.
SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
FULL JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
This would produce the following result
+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00
Complex integrity constraints in SQL:
Integrity constraints over a single table:
We can specify complex integrity constraints over a single table using table constraints, which
have the form CHECK conditional-expression. When a row is inserted into table or an Existing
row is modified, the conditional expression in CHECK constraint is evaluated. If it evaluates to
false, the command is rejected.
Example:
Ensure that rating must be an integer in the range 1 to 10 only.
CREATE TABLE Sailors( sid INTEGER,
Sname CHAR(10),
Rating INTEGER,
Age REAL, PRIMARY KEY (Sid),
CHECK (rating >=1 AND rating <=10));
Restrict values:
Example: enforce the constraint that Interlake boats cannot be reserved
CREATE TABLE Reserves (Sid INTEGER,
Bid INTEGER (10),
Day DATE,
FOREIGN KEY (sid) REFERENCES Sailors,
FOREIGN KEY (bid) REFERENCES Boats,
CONSTRAINT noInterlakes
CHECK (‘Interlake’ < > (SELECT B.bname
FROM Boats B
WHERE B.bid=Reserves.bid)))
Domain Constraints and distinct types:
A user can define a new domain using the CREATE DOMAIN statement, which uses check
constraints.
CREATE DOMAIN ratingval INTEGER DEFAULT 1
CHECK (VALUE >=1 AND VALUE <=10)
Assertions:
Table constraints are associated with a single table, although the conditional expression in the
CHECK clause can refer to other tables. Table constraints are required to hold only if the
associated table is non empty. Thus when a constraint involves two or more tables, the table
constraint mechanism is not quite desired. To cover such situations, SQL supports the creation of
assertions. Assertions are constraints not associated with any one table.
For example, suppose that we wish to enforce the constraint that the number of
boats plus the number of sailors should be less than 100.
When a constraint involves two or more tables assertions are used.

Ex: CREATE TABLE Sailors (Sid INTEGER,


Sname CHAR (10),
Rating INTEGER,
Age REAL,
PRIMARY KEY (sid),
CHECK (rating>=1 AND rating <=10)
CHECK ((SELECT COUNT (s.sid) FROM Sailors S) + (SELECT COUNT (B.bid) FROM
Boats B) < 100));
This solution is suffers from two draw backs, it iis associated with sailors, although it involves
boats in a completely symmetric way. More important is if the sailors table is empty, this
constraint is defined to hold, even if we have more than 100 rows in Boats! We could extend this
constraint specification to check that Sailors is nonempty but this approach becomes
cumbersome. The best solution is creating an assertion as follows:
CREATE ASSERTION SmallClub
CHECK ((SELECT COUNT (S.sid) FROM Sailors S)
+ (SELECT COUNT (B.bid) FROM Boats B) < 100)
Example2: the salary of employee must not be greater than the salary of the manager of the
department that the employee works for.
CREATE ASSERTION SALARY_CONSTRINT
CHECK (NOT EXISTS (SELECT * FROM EMPLOYEE E, EMPLOYEE M, DEPARTMENT
D
WHERE (E.SALARY> M. SALARY AND E. DNO=D.NUMBER AND d. MGRSSN=M.SSN))
Triggers and active databases:
A trigger is a stored procedure in database which is automatically invoked in response to
specified changes to the database. The database that has a set of associated triggers is called an
active database.
Triggers are invoked when the database is modified, when a row is inserted into a specified table,
when column of certain table is updated.
A condition in a trigger is true or false statement. It equals to true then the action associated with
the trigger is executed.
Trigger description consists of three parts.
Event: A change to the database that activates the trigger. Trigger monitors’ the database and is
executed when a database is modified in way that matches the event specification.
Condition: It is a true or false statement, when the condition evaluates to true, the action
associated the trigger is activated.
Action: A procedure that is executed when the trigger is activated and condition is true.
Syntax:
CREATE [OR REPLACE ] TRIGGER trigger_name
{BEFORE | AFTER | INSTEAD OF }
{INSERT [OR] | UPDATE [OR] | DELETE}
[OF col_name]
ON table_name
[REFERENCING OLD AS o NEW AS n]
[FOR EACH ROW]
WHEN (condition)
DECLARE
Declaration-statements
BEGIN
Executable-statements
EXCEPTION
Exception-handling-statements
END;
Example: Database contains the relation student (Sid, sname, Subject1, Subject2, Subject3,
total ), create trigger to compute total when insert the subject values into a student relation.
Create Trigger T1
BEFORE INSERT
ON Student
For each row
Set new.total = new.subject1 + new.subject2 + new.subject3;

Example: Database contains the CUSTOMERS table (id, name, age, address, salary),
creates a row-level trigger for the customers table to display the salary difference between
the old values and new values, that would fire for INSERT or UPDATE or DELETE
operations performed on the CUSTOMERS table.

CREATE OR REPLACE TRIGGER display_salary_changes


BEFORE DELETE OR INSERT OR UPDATE ON customers
FOR EACH ROW
WHEN (NEW.ID > 0)
DECLARE
sal_diff number;
BEGIN
sal_diff := :NEW.salary - :OLD.salary;
dbms_output.put_line('Old salary: ' || :OLD.salary);
dbms_output.put_line('New salary: ' || :NEW.salary);
dbms_output.put_line('Salary difference: ' || sal_diff);
END;
/

Drop Trigger: we can remove the trigger on database using drop command.
Syntax: DROP TRIGGER [IF EXISTS] [schema_name.]trigger_name
Example: DROP TRIGGER [IF EXISTS] [customers] display_salary_changes
Active Databases:
Active Database is a database consisting of set of triggers. These databases are very difficult to be
maintained because of the complexity that arises in understanding the effect of these triggers. In
such database, DBMS initially verifies whether the particular trigger specified in the statement that
modifies the database is activated or not, prior to executing the statement.
If the trigger is active then DBMS executes the condition part and then executes the action part
only if the specified condition is evaluated to true. It is possible to activate more than one trigger
within a single statement.
In such situation, DBMS processes each of the trigger randomly. The execution of an action part of
a trigger may either activate other triggers or the same trigger that Initialized this action. Such
types of trigger that activates itself is called as ‘recursive trigger’. The DBMS executes such chains
of trigger.

Features of Active Database:


 It possess all the concepts of a conventional database i.e. data modeling facilities, query
language etc.
 It supports all the functions of a traditional database like data definition, data
manipulation, storage management etc.
 It detects event occurrence.
 It must be able to evaluate conditions and to execute actions.
 It means that it has to implement rule execution.
 Enhances traditional database functionalities with powerful rule processing capabilities.
 Enable a uniform and centralized description of the business rules relevant to the
information system.
 Suitable platform for building large and efficient knowledge base and expert systems.
Schema Refinement: It is the last step before considering physical design; it checks the tables
for redundancies and addresses the problems caused by redundancy based on decomposition.
Redundancy means storing the multiple copies of same data in the database, it leads to several
problems.

For example consider Hourly_Emps relation, ssn is the key for this relation, and hourly wages
attribute is determined by rating attribute. That is, for a given rating value, there is only one
permissible hourly_wages value. This integrity constraint is an example of functional
dependency. It leads to possible redundancy in the relation Hourly_Emps.

Redundant storage: the rating value 8 corresponds to the hourly wage 10, this association
repeated three times, the rating value 5 corresponds to the hourly wage 7, this association
repeated two times.

Fig: instance of Hourly_Emps relation

Problems caused due to redundancy are:


Insertion anomalies: it may not possible to store certain information unless some other,
unrelated, information is stored as well.
For example we cannot insert a tuple for an employee unless we know the hourly wages for the
employee’s rating value.
Deletion anomalies: it may not possible to delete certain information without losing some other,
unrelated, information as well.
For example if we delete all tuples with a given rating value we lose the association between that
rating value and its hourly_wage value.
Update anomalies: if one copy of such repeated data is updated, then database will be in
inconsistent state, unless all copies are similarly updated.
For example the hourly_wages in the first tuple (row) could be updated without making a similar
change in the second tuple.

Decompositions: Many problems arising from redundancy can be addressed by replacing a


relation with collection of smaller relations. This decomposition is used to eliminate the
redundancy from relation and to achieve heir normal form.

A decomposition of a relational schema R consists of replacing the relation schema by two(or


more) relation schemas that each contains a subset of the attribute of R and together include all
attributes in R.

Example: we can decompose Hourly_Emps into two relations:

hourly_emps2 (ssn,name,lot,rating,hpurs_worked) and wages( rating, hourly_wages). Where ssn


number acts as primary key for relation hourly_emps2, rating acts as primary key for wages
relation.

Fig: instance of Hourly_Emps relation


Fig: instance of Hourly_Emps2 and Wages relations

Problems related to decomposition: decomposition relation schema can create more problems
than it solve. So we have to understand when we normally decompose a table into n number of
sub tables we have to realize that what the importance of doing decomposition is and problems
that might be we may if we do decomposition.

Two questions must be asked to do decomposition


1. Do we need to decompose a relation?
Several normal forms have been proposed for the relations, the normal form of given relation
schema help us to decide whether or not to decompose it further. If we decide that a relation
schema must be decomposed further, we must choose a particular decomposition.
2. What problems does a given decomposition cause?
Decomposition of a relation is done when a relation in relational model is not in appropriate
normal form. Relation R is decomposed into two or more relations if decomposition is lossless join
as well as dependency preserving.
Two properties of decomposition are important.

Lossless-join decomposition: A decomposition {R1,R2,……..Rn} of relation R is called loss


less decomposition if the natural join of {R1,R2,……..Rn} produces exactly the relation R.
Decomposition is lossless if R1 ⋈ R2 = R

A decomposition is to be lossless it must hold the following conditions.

1. Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of R


must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
2. Intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
3. Common attribute must be a key must be super key or candidate key of either of the
relation R1 or relation R2 or both R1 and R2
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)

Example: Relation R(ABC) is decomposed into R1(AB) and R2(BC) check whether the
decomposition is lossy or loss less decomposition.

R(ABC)=

A B C
1 2 1
2 2 2
3 3 2
R1(AB)=

A B
1 2
2 2
3 3

R2(BC)=

B C
2 1
2 2
3 2

R1 X R2=

A B B C
1 2 2 1
1 2 2 2
1 2 3 2
2 2 2 1
2 2 2 2
2 2 3 2
3 3 2 1
3 3 2 2
3 3 3 2
R1 ⋈ R2 =

A B C
1 2 1
1 2 2
2 2 1
2 2 2
3 3 2
R1 ⋈ R2 != R, This decomposition is lossy decomposition.

Example: Relation R(ABC) is decomposed into R1(AB) and R2(AC) check whether the
decomposition is lossy or loss less decomposition.

R(ABC)=

A B C
1 1 1
2 1 2
3 2 1
4 3 2
R(ABC) is decomposed into R1(AB) and R2(AC)

R 1=

A B
1 1
2 1
3 2
4 3
R2=

A C
1 1
2 2
3 1
4 2
R1 ⋈ R2 = R

A B C
1 1 1
2 1 2
3 2 1
4 3 2
This decomposition is loss less.

Dependency preserving decomposition:

The decomposition of relation schema R with set of FD’s, F into R 1 an R2 with FD’s F1 and F2
then this decomposition is said to be dependency preserving, the closure of set of functional
dependency set (F) is equals to the closure of the functional dependency sets F1 and F2.

The decomposition is said to be dependency preserving if F+=(F1 U F2)+ .


Example: A relation R (A, B, C, D) with FD set{AB, BC, CD, D B} is decomposed
into R1(AB) and R2(BC) ,R3(BD) check this decomposition dependency preserved or not.
The R (A, B, C, D)
Set of FD’s F= {AB, BC, CD, D B}
F+= {AB, BC, CD, DB, BD,CB}
R is decomposed into R1(AB) , R2(BC), R3(BD).
From R1(AB) we can derive functional dependency set F1={ AB}
From R2(BC) we can derive functional dependency set F 2={ BC , CB(from CD and D
B)}
From R3(BD) we can derive functional dependency set F3={ DB, B D(from BC and CD
),}
(F1 U F2 U F3)+ = {AB, BC , CB, B D, DB} = F+
This decomposition is dependency preserved decomposition.
Example: A relation R (A, B, C, D) with FD set{AB, CD } is decomposed into R1(AC)
and R2(BD) check this decomposition dependency preserved or not.
Relation R (A, B, C, D) and
The set of Functional dependencies of relation R is F= {AB, CD}
R is decomposed into R1 (A,C) and R2(BD)
From R1(AC) we can derive functional dependency set F1={ }
From R2(BD) we can derive functional dependency set F2 ={}
(F1 U F2)+= { } != F+
This decomposition is not dependency preserved.
Functional Dependency:
The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table. Functional Dependency is
represented by  (arrow sign).
If column X of a table uniquely identifies the column Y of same table then it can represented as
XY (Attribute Y is functionally dependent on attribute X, here attribute X determinant, Y is
dependent)
A functional dependency XY in a relation holds if two tuples having same value of attribute X
also have same value for attribute Y.
For Example, in relation STUDENT shown in table 1, Functional Dependencies
STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE hold
But
STUD_NAME->STUD_ADDR do not hold

Types of Functional dependency

Trivial functional dependency

A functional dependency A → B has trivial functional dependency if B is a subset of A.

Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial de
pendencies too.
2. Non-trivial functional dependency
A → B has a non-trivial functional dependency if B is not a subset of A.

When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

ID → Name,
Name → DOB

Functional Dependency Set: Functional Dependency set or FD set of a relation is the set of all
FDs present in the relation.

For Example, FD set for relation STUDENT shown in table 1 is:


{STUD_NOSTUD_NAME, STUD_NOSTUD_PHONE, STUD_NOSTUD_STATE,
STUD_NOSTUD_COUNTRY,
STUD_NO  STUD_AGE, STUD_STATESTUD_COUNTRY}
Attribute Closure: Attribute closure of an attribute set can be defined as set of attributes which
can be functionally determined from it.
To find attribute closure of an attribute set:
 Add elements of attribute set to the result set.
 Recursively add elements to the result set which can be functionally determined from the
elements of the result set.
Using FD set of table 1, attribute closure can be determined as:
(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY,
STUD_AGE}
(STUD_STATE)+ = {STUD_STATE, STUD_COUNTRY}
Candidate Keys and Super Keys using Attribute Closure
Super key: If attribute closure of an attribute set contains all attributes of relation, the attribute
set will be super key of the relation.
Candidate Key: If no subset of this attribute set can functionally determine all attributes of the
relation, the set will be candidate key as well.
For Example, using FD set of table 1,

(STUD_NO, STUD_NAME)+ = {STUD_NO, STUD_NAME, STUD_PHONE,


STUD_STATE, STUD_COUNTRY, STUD_AGE}

(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,


STUD_COUNTRY, STUD_AGE}
(STUD_NO, STUD_NAME) will be super key but not candidate key because its subset
(STUD_NO)+ is equal to all attributes of the relation. So, STUD_NO will be a candidate key.
Prime and non-prime attributes
Attributes which are parts of any candidate key of relation are called as prime attribute, others are
non-prime attributes.
For Example, STUD_NO in STUDENT relation is prime attribute, others are non-prime attribute.
Example: relation of attributes R(ABCD) , set of functional dependencies F={AB, BC,
CD} ,then find the candidate key of relation.
Given set of FD’s are F={AB, BC, CD}
To find the candidate key we should find the closure of all the attributes of relation based on
functional dependies. Which attribute of relation R functionally determined all the attributes of the
relation is candidate key for the Relation R.
Attribute closure of A is A+= {A,B,C,D} ( AB, BC then AC, AC and CD then AD)
Attribute closure of B is B+= {BCD}
Attribute closure of C is C+= {C,D}
Attribute closure of D is D+= {D}
So Attribute A is candidate key for the relation R.
Prime attribute is A, Non prime attributes are BCD.
Example: Relation R(ABCDEFGH) of set of functional dependencies F={ABC, BCDE,
CFG, DF} find the
Given set of FD’s are
F= {ABC, BCDE, CFG, DF}
Attribute closure of A (A+) = { A,B,C,D,E,F,G}
Attribute closure of D (D+) = {D,F }
Attribute closure of BC (BC+) = {B,C,D,E,F,G }
Example: Relation R(ABCD) of set of functional dependencies F={AB, BC, CD,
DA} find the candidate key’s and prime attributes for the relation R.
Given set of FD’s are
F= {AB, BC, CD, DA}
To find the candidate key’s we should find attribute closure based on given functional
dependencies.
Attribute closure of A (A+) = {A,B,C,D }, Attribute closure of B (B+) = { B, C, D,A}
Attribute closure of C (C+) = {C,D,A,B}, Attribute closure of D (D+) = {D,A, B, C}
Then candidate key of Relation R = {A, B, C, D}, Prime attributes of relation R= {A, B, C, D}
Example: Relation R(ABCDE) of set of functional dependencies F={AB, BCD, EC,
DA} find the candidate key’s and prime attributes for the relation R.
Given set of FD’s are
F= {AB, BCD, EC, DA}
To find the candidate key’s we should find attribute closure based on given functional
dependencies.
Note: Here attribute E is not in the part of R.H.S of the functional dependencies, so each candidate
key a relation must contain that attribute E.
Attribute closure (ABCDE+) = { A, B, C, D, E}
Attribute closure (ABC+)={A,B,C,D}
Attribute closure (CDE+)={C,D,E,A,B}
Attribute closure (BCE+)={B,C,E,D,A}
Attribute closure (BDE+)={B,D,E,C,A}
Attribute closure (ABE+)={A,B,E,C,D}
Attribute closure (AB+)={A,B}
Attribute closure (BC+)={B,C,D,A}
Attribute closure (CD+)={C,D,A,B}
Attribute closure (DE+)={D,E,A,B,C}
Attribute closure (BE+)={B,E,C,D,A}
Attribute closure (AE+)={A,E,C,B,D}
Attribute closure (A+)={A,B}
Attribute closure (B+)={B}
Attribute closure (C+)={C}
Attribute closure (D+)={D,A,B}
Attribute closure (E+)={E,C}
Super keys={(ABCDE), (CDE),( BCE), (BDE.),( ABE),(AE),(BE),(DE)}
Candidate keys={(AE),(BE),(DE)}
Prime attributes={A,E,B,D}
Non Prime attributes= {C}
Attribute closure of AE (AE+) = {A, B, C, D, E}
Attribute closure of BE (BE+) = {A, B, C, D, E}
Attribute closure of CE (CE+) = {A, B, C, D, E}
AE,BE,CE are the candidate keys.
Example: Relation R(ABCDE) of set of functional dependencies F={AB, DE} find the
candidate key’s and prime attributes for the relation R.
Given set of FD’s are
F= {AB, DE}

Example: Consider the relation scheme R = {E, F, G, H, I, J, K, L, M, N} and the set of


functional dependencies {{E, F} {G}, {F}  {I, J}, {E, H}  {K, L}, K  {M}, L  {N}?
Finding attribute closure of all given options, we get:
{E,F}+ = {EFGIJ}
{E,F,H}+ = {EFHGIJKLMN}
{E,F,H,K,L}+ = {{EFHGIJKLMN}
{E}+ = {E}
{EFH}+ and {EFHKL}+ results in set of all attributes, but EFH is minimal. So it will be candidate
key.

To check whether an FD AB can be derived from an FD set F,


Find (A)+ using FD set F.
If B is subset of (A)+, then AB is true else not true.
Closure of set of functional dependencies (F+): closure of set of functional dependencies F+, is
set of all FD’s that include F as well as all dependencies that can be inferred from F

Armstrong Axioms: The Armstrong axioms refer the set of inference rules. These are used to find
the closure of set of functional dependencies (F+), from the given set of functional dependencies
(F).

The Functional dependency has 6 types of inference rules or axioms:


1. Reflexivity: if Y is a subset of X, then X determines Y. If X ⊇ Y then X → Y
Example:
X = {a, b, c, d, e}
Y = {a, b, c}
2. Augmentation: The augmentation is also called as a partial dependency. In augmentation, if
X determines Y, then XZ determines YZ for any Z.
If X → Y then XZ → YZ
Example:
For R(ABCD), if A → B then AC → BC
3. Transitivity
In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
If X → Y and Y → Z then X → Z
4. Union: If X determines Y and X determines Z, then X must also determine Y and Z
If X → Y and X → Z then X → YZ
5. Decomposition
Decomposition rule is also known as project rule. It is the reverse of union rule.
This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
If X → YZ then X → Y and X → Z
6. Pseudo transitivity
In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ determines W.
If X → Y and YZ → W then XZ → W
Example: Relation has attributes R(ABCGHI) , set of functional dependencies of the relation
are F={AB, AC, CGH,CGI, BH} find the closure of set of functional
dependencies for relation R.

Given set of functional dependencies (F) = {AB, AC, CGH, CGI, BH}

Here we have
AB, and we also have BH then we can write AH (Transitivity)
CGH, and CGI, then we can write CG HI (Union)
AC, CGH, then we can write AGH (Pseudo transitivity)
AC, CGI, then we can write AGI (Pseudo transitivity)

Closure of set of functional dependencies (F+)= {AB, AC, CGH, CGI, BH, AH ,
CG HI, AGH, AGI }

Example: Relation has attributes R(ABCDEF) , set of functional dependencies of the relation
are F={AB, AC, CDE,CDF, BE} find the closure of set of functional
dependencies for relation R.
Given set of functional dependencies (F) = F={AB, AC, CDE,CDF, BE}
Here we have
AB and AC then we can write ABC(union)
CDE and CDF then we can write CDEF(union)
ABand BE then we can write AE(Transitivity)
AC, and CDE then we can write ADE (Pseudo transitivity)
AC, and CDF then we can write ADF (Pseudo transitivity)

Then closure of set of functional dependencies (F+)= { AB, AC, CDE,CDF, BE,
ABC, CDEF, AE, ADF, ADE }
Minimal cover or canonical cover or irreducible set of functional dependies:

To find the minimal set of functional dependencies we should follow these three rules

1. Single attribute in right hand side (R. H. S).


Ex: ABC ( it can be written as AB and AC )
2. Remove the extraneous attribute in left hand side (L.H.S)
Ex: ABC, AC (here B is extraneous attribute)
3. Remove the redundant functional dependencies.
Ex: AB, BC, AC (here AC is redundant because we can get A C from AB, BC
using transitivity)
Example: The relation R has attributes R (A, B, C), set of FD’s {ABC, BC, AB,
ABC}, find the canonical cover of functional dependencies.

Given set of functional dependencies


F= {ABC, BC, AB, ABC}
Step1: it should have the single attribute in RHS.
According rule one the functional dependencies can be written as
AB,
AC,
BC,
ABC
Step2: it should not have the extraneous attribute in left hand side (L.H.S).
ABC
To find the extraneous attribute we should find the closure of A and Closure of B
Closure of A is (A+) = {ABC}
Closure of B is (B+) = {BC}

Here attribute B is extraneous, because from attribute A we can get attribute B, but from
attribute B we can’t get attribute A.
To removing extraneous attribute B, the functional dependency AB C can be written as AC
After removing the extraneous attribute B the FD’s are
AB,
BC,
AC
Step3: we should remove the redundant FD’s
AB,
BC,
AC

Here AC is redundant because, we can get AC from {AB,


BC} (transitivity)
After removing this redundant functional dependency AC,
We have the functional dependencies
AB,
BC

The canonical cover of FD’s of relation R is

AB,
BC
Example2: find the canonical cover for the given set of functional dependencies
G = {AC, ABC,CDI, CDI, ECAB, EIC,AE}.

Given set of functional dependencies:


G = {AC, ABC, CDI, CDI, ECAB, EIC}

Step1: it should have the single attribute in RHS.


According rule one the functional dependencies can be written as
AC
ABC
CD
CI
CDI
ECA
ECB
EIC
AE
Step2: it should not have the extraneous attribute in left hand side (L.H.S).
ABC
CDI
ECA
EIC
ECB
ABC
To find the extraneous attribute we should find the closure of A and Closure of B

CLOSURE OF A {A+}={A,C,D,I,E,B}
CLOSURE OF B {B+}={B}
Here Attribute B Is Extranious.
AC
CDI

To find the extraneous attribute we should find the closure of C and Closure of D

CLOSURE OF C {C+} ={C,D,I}


CLOSURE OF B {D+}={ D}
Here Attribute D Is Extraneous
ECA

To find the extraneous attribute we should find the closure of C and Closure of D

CLOSURE OF E {E+} = { E}
CLOSURE OF C {C+}= {C,D,I }
HERE THERE IS N EXTRANIOUS ATTRIBUTE
EIC
To find the extraneous attribute we should find the closure of E and Closure of I

CLOSURE OF E {E+} = { E}
CLOSURE OF C {I+}= {I}
Here there is noextranious attribute

ECB
To find the extraneous attribute we should find the closure of E and Closure of I

CLOSURE OF E {E+} = { E}
CLOSURE OF C {C+}= {C,D,I}
HERE THERE IS N EXTRANIOUS ATTRIBUTE

STEP:3 we should remove the redundant FD’s


AC
AC
CI
EIC
ECB
ECA
After Applying Step3 We Got These Functional Dependencies
A-->I
EIC
ECAB

Example2: find the canonical cover for the given set of functional dependencies
F= {AD,EAD,BCAD,CB}

Example3: Find the canonical cover of F = { A  BC, B CE, A E, AC H, D B}

Normalization
Normalization is the process of organizing the data in the database. It is used to minimize the
redundancy from a relation or set of relations. Normalization divides the larger table into the
smaller table and links them using relationship.
Types of Normal Forms
First Normal Form (1NF):
A relation will be 1NF if it contains an atomic value. It states that an attribute of a table cannot
hold multiple values. It must hold only single-valued attribute, it doesn’t allows the multi-valued
attribute, composite attribute, and their combinations.

Example: Relation STUDENT is not in 1NF because of multi-valued attribute STUD_PHONE.

STUD_ID STUD_ NAME STUD_PHONE STUD _STATE STUD_COUNTRY


9716271721,
1 RAM HAYANA INDIA
9871717178
2 RAM 98982972881 PUNJAB INDIA

3 SURESH PUNJAB INDIA

The decomposition of the STUDENTT table into 1NF has been shown below:

STUD_ID STUD_ NAME STUD_PHONE STUD _STATE STUD_COUNTRY


1 RAM 9716271721 HAYANA INDIA
1 RAM 9871717178 HAYANA INDIA
2 RAM 98982972881 PUNJAB INDIA

3 SURESH PUNJAB INDIA


Second Normal Form (2NF):

To be in second normal form,


1. A relation must be in first normal form.
2. A relation must not contain any partial dependency.
A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which
are not part of any candidate key) is dependent on any proper subset of any candidate key of the
table.
Partial Dependency – If the proper subset of candidate key determines non-prime attribute, it is
called partial dependency

Example 1 – Consider table-3 as following below.

STUD_NO COURSE_NO COURSE_FEE


1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000

4 C1 1000

2 C5 2000

{Note that, there are many courses having the same course fee.
Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key
{STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE , i.e., COURSE_FEE is dependent on COURSE_NO,
which is a proper subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on
a proper subset of the candidate key, which is a partial dependency and so this relation is not in
2NF.
To convert the above relation to 2NF, we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO and Table 2: COURSE_NO, COURSE_FEE

STUD_NO COURSE_NO COURSE_NO COURSE_FEE

1 C1 C1 1000
2 C2 C2 1500
1 C4 C4 2000
4 C3 C3 1000

4 C1 C1 1000

2 C5 C5 2000
Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE
NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are
100 students taking C1 course, we dont need to store its Fee as 1000 for all the 100 records, instead
once we can store it in the second table as the course fee for C1 is 1000.

Example 2: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.

TEACHER table

Teacher ID Subject Teacher Age


25 Chemistry 30
25 Biol0gy 30
47 English 35
83 Math 38
83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which


is a proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

Teacher ID Teacher Age


25 30
47 35
83 38

TEACHER_SUBJECT table:

Teacher ID Subject
25 Chemistry
25 Biol0gy
47 English
83 Math
83 Computer

3. Third Normal Form (3NF):


A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency. 3NF
is used to reduce the data duplication, to achieve the data integrity.

A relation is in 3NF if at least one of the following condition holds in every non-trivial function
dependency X –> Y
1. X is a super key.
2. Y is a prime attribute i.e., each element of Y is part of some candidate key.
Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive
dependency.
Example 1 – In relation STUDENT

STUD_
STUD_ID STUD _ STATE STUD_ COUNTRY STUD_ AGE
NAME
1 RAM HAYANA INDIA 20
2 RAM PUNJAB INDIA 19
3 SURESH PUNJAB INDIA 21
FD set: {STUD_ID -> STUD_NAME, STUD_ID -> STUD_STATE, STUD_STATE ->
STUD_COUNTRY, STUD_ID -> STUD_AGE}
Candidate Key: {STUD_ID}
For this relation, STUD_ID -> STUD_STATE and STUD_STATE -> STUD_COUNTRY are
true. So STUD_COUNTRY is transitively dependent on STUD_ID. It violates the third normal
form.
To convert it in third normal form, we will decompose the relation STUDENT into two relations.
STUDENT (STUD_ID, STUD_NAME, STUD_STATE, STUD_AGE)
STUD_
STUD_ID STUD _ STATE STUD_ AGE
NAME
1 RAM HAYANA 20
2 RAM PUNJAB 19
3 SURESH PUNJAB 21

STATE_COUNTRY (Stud_id, COUNTRY)


STUD _Id STUD_ COUNTRY
1 INDIA
2 INDIA
3 INDIA

Boyce- Codd Normal Form (BCNF):


A relation R is in BCNF, if it is in 3NF, for every non-trivial functional dependency X –> Y, X is
a candidate key or super key of relation R.
Example: Let's assume there is a company where employees work in more than one department.
Employee table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
264 INDIA Designing D394 283
264 INDIA Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
In the above table Functional dependencies are as follows:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}


The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY
264 INDIA
364 UK

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO


Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT
264 Designing
264 Testing
364 Stores
364 Developing

Functional dependencies:

EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.
Fourth normal form (4NF):

 A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
 For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.

Example

STUDENT RELATION

STU_ID COURSE HOBBY


21 COMPUTER DANCING
21 MATH SINGING
34 CHEMESTRY DANCING
74 BIALOGY CRICKET
59 PHYSICS HOCKEY

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two


courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE, STUDENT_HOBBY.

Student_Course

STU_ID COURSE
21 COMPUTER
21 MATH
34 CHEMESTRY
74 BIALOGY
59 PHYSICS
Student_Hobby

STU_ID HOBBY
21 DANCING
21 SINGING
34 DANCING
74 CRICKET
59 HOCKEY
Fifth normal form (5NF):

 A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
 5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
 5NF is also known as Project-join normal form (PJ/NF)

SUBJECT LECTURER SEMESTER


COMPUTER Ansika Semester 1
COMPUTER John Semester 1
MATH John Semester 1
MATH Akash Semester 2
CHEMESTRY Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a valid
data.Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
Table P1
SEMESTER SUBJECT
Semester 1 COMPUTER
Semester 1 MATH
Semester 2 MATH
Semester 1 CHEMESTRY

Table P2
SUBJECT LECTURER

COMPUTER Ansika

COMPUTER John

MATH John

MATH Akash

CHEMESTRY Praveen

Table P3
SEMESTER LECTURER
Semester 1 Ansika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Example: Relation R has attributes R(ABCDEF) , set of FD’s, {ABC, CD, CE, EF,
FA} check the highest normal for relation R.

The given set of FD’s F = {ABC, CD, CE, EF, FA}

STEP:1 Find the all candidate keys of a relation


Attribute closure of AB is (AB+) = {ABCDEF}
Attribute closure of A is (A+) = {A}
Attribute closure of B is (B+) = {B}
AB is the candidate key
Here A is in RHS of FD (FA), so F can determine A ,
FB+={FBACDE}
FB also becomes the candidate key
Similarly EB is also the candidate key (EF),
EB+= {EBFACD}
CB is also the candidate key (CE)
CB+= {CBDEFA}
Candidate keys of relation R = {AB, FB, EB, CB}
STEP2: write all prime attributes
Prime attributes = {A,B, C,E,F}
STEP3: write all non prime attributes
Non prime attribute = {D}
Given set of FD’s
F = {ABC,
CDE,
EF,
FA}
Check for Highest normal form or BCNF
BCNF: L.H.S. of all FD’s should be candidate key.
It is not in BCNF because C, E and F are not candidate keys.
Check for next highest normal form 3NF
3NF: it should not contain any transitive dependency
It is not in 3NF because there is a transitive dependency ABC and C-DE then ABDE,
EF and FA, then EA. it is not in 3NF.

2NF: L.H.S. of all FD’s should be candidate key or RHS is Prime attribute it should not contain
any partial dependency L.H.S. is proper subset of any candidate key and R.H.S is non prime
attribute)
It is not in 2NF because is C is proper subset of candidate key and D is Non prime attribute.
(CDE)
The highest normal for of given relation R is first normal form (1NF).

Example: check the highest normal form of the Student relation R, set of functional
dependencies F is {RollnoName, RollnoVoterid, Voteridage, Voter Rollno }.

Roll Name Voter id Age


no
1 Ravi K0123 20
2 Varun M034 21
3 Ravi K786 23
4 Rahul D286 21

Given student relation R is

Roll Name Voter id Age


no
1 Ravi K0123 20
2 Varun M034 21
3 Ravi K786 23
4 Rahul D286 21
Functional dependencies set
F={RollnoName,
RollnoVoterid,
Voteridage,
Voterid Rollno }.

To check the highest normal form, we can check from highest to lowest.

BCNF: L.H. S of every functional dependency is the candidate key or super key of a relation.
Candidate keys of a relation are {Roll no, Voter id}

The above table is in BCNF in L.H. S of every functional dependency is the candidate key.

Example: R (A, B, C) and set of FD’s F={ABC, CA} show that R is in 3NF, not in

BCNF? And convert this relation R in 3NF into BCNF.

Given set of FD’s F= {ABC, CA}


Attribute closure of AB is(AB+)={ABC}
Attribute closure of A is (A+)={A}
Attribute closure of B is (B+)={B}

Attribute closure of C is(C+) = {AC}

So Candidate key is {AB} , prime attributes are {A, B}

The above relation is in 3NF because AB is candidate key in FD (AB C) and A is the prime
attribute in FD (CA) of Relation R.

The above relation is not in BCNF because attribute C is not candidate key in FD (CA) of
relation R

The above Relation in 3NF to BCNF


Decompose a relation R (ABC) into R1 (AB) and R2 (AC)
Example : Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC->D,
AC->BE, B->E}.

Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine all attribute of
relation, So AC will be candidate key. A or C can’t be derived from any other attribute of the
relation, so there will be only 1 candidate key {AC}.
Step 2. Prime attributes are those attribute which are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example.
Step 3. The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or
composite attribute.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper
subset of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E
is in 2nd normal form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a
prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd
normal for, either LHS of an FD should be super key or RHS should be prime attribute.
So the highest normal form of relation will be 2nd Normal form.

Example: Find the highest normal form in R (A, B, C, D, E) under following functional
dependencies.
ABC  D, CD  AE
1) It is always a good idea to start checking from BCNF, then 3 NF and so on.
2) If any functional dependency satisfied a normal form then there is no need to check for
lower normal form.
For example, ABC –> D is in BCNF (Note that ABC is a super key), so no need to check this
dependency for lower normal forms.
Candidate keys in the given relation are {ABC, BCD}
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a super key so this dependency
is not in BCNF. So, R is not in BCNF.

3NF: ABC -> D we don’t need to check for this dependency as it already satisfied BCNF. Let us
consider CD -> AE. Since E is not a prime attribute, so the relation is not in 3NF.

2NF: In 2NF, we need to check for partial dependency. CD which is a proper subset of a candidate
key and it determine E, which is non-prime attribute. So, given relation is also not in 2 NF. So, the
highest normal form is 1 NF.
UNIT –IV
Syllabus: Transaction Concept, Transaction State, Implementation of Atomicity and Durability,
Concurrent Executions, Serializability, Recoverability, Implementation of Isolation, Testing for
serializability, Lock Based Protocols, Timestamp Based Protocols, Validation-Based Protocols,
Multiple Granularity, Recovery and Atomicity, Log–Based Recovery, Recovery with Concurrent
Transactions.
Transaction:
A set of logically related operations is known as transaction. The main operations of a transaction are:
Read (A): Read operations Read (A) or R (A) reads the value of A from the database and stores it
in a buffer in main memory.
Write (A): Write operation Write (A) or W (A) writes the value back to the database from buffer.
Let us take a debit transaction from an account which consists of following operations:
1. R(A);
2. A=A-1000;
3. W(A);
Assume A’s value before starting of transaction is 5000.
 The first operation reads the value of A from database and stores it in a buffer.
 Second operation will decrease its value by 1000. So buffer will contain 4000.
 Third operation will write the value from buffer to database. So A’s final value will be
4000.
But it may also be possible that transaction may fail after executing some of its operations. The
failure can be because of hardware, software or power etc.
For example, if debit transaction discussed above fails after executing operation 2, the value of A
will remain 5000 in the database which is not acceptable by the bank.
To avoid this, Database has two important operations:
Commit: After all instructions of a transaction are successfully executed, the changes made by
transaction are made permanent in the database.
Rollback: If a transaction is not able to execute all operations successfully, all the changes made
by transaction are undone.
Properties of a transaction: To ensure the consistency of database each transaction must follow
these properties. These are called ACID properties.
Atomicity: it ensures that either the entire transaction takes place at once or doesn’t happen at all.
There is no midway i.e. transactions do not occur partially. Each transaction is considered as one
unit and either runs to completion or is not executed at all.
It involves the following two operations.
Abort: If a transaction aborts, changes made to database are not visible.
Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to
account Y.

If the transaction fails after completion of T1 but before completion of T2.( say, after write(X) but
before write(Y)), then amount has been deducted from X but not added to Y. This results in an
inconsistent database state. Therefore, the transaction must be executed in entirety in order to
ensure correctness of database state.
Consistency:
This means that integrity constraints must be maintained so that the database is consistent before
and after the transaction. It refers to the correctness of a database. Referring to the example above,
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700.
Total after T occurs = 400 + 300 = 700.
Therefore, database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As a
result T is incomplete.
Isolation:
This property ensures that multiple transactions can occur concurrently without leading to the
inconsistency of database state. Transactions occur independently without interference. Changes
occurring in a particular transaction will not be visible to any other transaction until that particular
change in that transaction is written to memory or has been committed. This property ensures that
the execution of transactions concurrently will result in a state that is equivalent to a state achieved
these were executed serially in some order.
Let X= 500, Y = 500.
Consider two transactions T and T”.

Suppose T has been executed till Read (Y) and then T’’ starts. As a result , interleaving of
operations takes place due to which T’’ reads correct value of X but incorrect value of Y and sum
computed by
T’’: (X+Y = 50, 000+500=50, 500)
is thus not consistent with the sum at end of transaction:
T: (X+Y = 50, 000 + 450 = 50, 450).
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take
place in isolation and changes should be visible only after they have been made to the main
memory.
Durability:
This property ensures that once the transaction has completed execution, the updates and
modifications to the database are stored in and written to disk and they persist even if a system
failure occurs. These updates now become permanent and are stored in non-volatile memory. The
effects of the transaction, thus, are never lost.
The ACID properties, in totality, provide a mechanism to ensure correctness and consistency of a
database in a way such that each transaction is a group of operations that acts a single unit,
produces consistent results, acts in isolation from other operations and updates that it makes are
durably stored.
States of Transactions

A transaction in a database can be in one of the following states –

Fig: Transaction States

Active: This is the initial state of every transaction, all the operations of transaction is being
executed in this state.
Partially Committed: When a transaction executes its final operation, it is said to be in a
partially committed state.
Failed: A transaction is said to be in a failed state if any of the checks made by the database
recovery system fails. A failed transaction can no longer proceed further.
Aborted: If any of the checks fails and the transaction has reached a failed state, then the
recovery manager rolls back all its write operations on the database to bring the database back to
its original state where it was prior to the execution of the transaction. Transactions in this state
are called aborted.
The database recovery module can select one of the two operations after a transaction aborts
 Re-start the transaction
 Kill the transaction
Committed: If a transaction executes all its operations successfully, it is said to be committed.
All its effects are now permanently established on the database system.
Implementation of Atomicity and Durability:
The Recovery Manager is responsible for ensuring Atomicity and Durability.
Atomicity is guaranteed by undoing the actions of the transactions that did not commit
(aborted). Durability is guaranteed by making sure that all actions of committed transactions
survive crashes and failures.
The recovery-management component of a database system can support atomicity and durability
by a variety of schemes, one of the simplest schemes is called Shadow copy.
Shadow copy:
In the shadow-copy scheme, a transaction that wants to update the database first creates a
complete copy of the database. All updates are done on the new database copy, leaving the
original copy, the shadow copy, untouched. If at any point the transaction has to be aborted, the
system merely deletes the new copy. The old copy of the database has not been affected.

This scheme is based on making copies of the database, called shadow copies, assumes that only
one transaction is active at a time. The scheme also assumes that the database is simply a file on
disk. A pointer called db-pointer is maintained on disk; it points to the current copy of the
database.
If the transaction completes, it is committed as follows:
 First, the operating system is asked to make sure that all pages of the new copy of the
database have been written out to disk. (Unix systems use the flush command for this
purpose.)
 After the operating system has written all the pages to disk, the database system updates the
pointer db-pointer to point to the new copy of the database; the new copy then becomes the
current copy of the database. The old copy of the database is then deleted.
Fig: Shadow copy technique for atomicity and durability
The transaction is said to have been committed at the point where the updated db pointer is
written to disk.
How the technique handles transaction failures:
 If the transaction fails at any time before db-pointer is updated, the old contents of the
database are not affected. We can abort the transaction by just deleting the new copy of
the database.
 Once the transaction has been committed, all the updates that it performed are in the
database pointed to by db pointer.
 Thus, either all updates of the transaction are reflected, or none of the effects are
reflected, regardless of transaction failure.
How the technique handles system failures:
 Suppose that the system fails at any time before the updated db-pointer is written to disk.
Then, when the system restarts, it will read db-pointer and will thus see the original contents
of the database, and none of the effects of the transaction will be visible on the database.
 Next, suppose that the system fails after db-pointer has been updated on disk. Before the
pointer is updated, all updated pages of the new copy of the database were written to disk.
 Again, we assume that, once a file is written to disk, its contents will not be damaged even if
there is a system failure. Therefore, when the system restarts, it will read db-pointer and will
thus see the contents of the database after all the updates performed by the transaction.
Implementation of Isolation
Isolation determines how transaction integrity is visible to other users and systems. It means that a
transaction should take place in a system in such a way that it is the only transaction that is
accessing the resources in a database system.
Isolation levels define the degree to which a transaction must be isolated from the data
modifications made by any other transaction in the database system. A transaction isolation level is
defined by the following phenomena.
Dirty Read – A Dirty read is the situation when a transaction reads a data that has not yet been
committed. For example, let’s say transaction 1 updates a row and leaves it uncommitted,
meanwhile, Transaction 2 reads the updated row. If transaction 1 rolls back the change,
transaction 2 will have read data that is considered never to have existed.

For example:

Consider two transactions TX and TY in the below diagram performing read/write


operations on account A where the available balance in account A is $300:
o At time t1, transaction TX reads the value of account A, i.e., $300.
o At time t2, transaction TX adds $50 to account A that becomes $350.
o At time t3, transaction TX writes the updated value in account A, i.e., $350.
o Then at time t4, transaction TY reads account A that will be read as $350.
o Then at time t5, transaction TX rollbacks due to server problem, and the value
changes back to $300 (as initially).
o But the value for account A remains $350 for transaction TY as committed, which
is the dirty read and therefore known as the Dirty Read Problem.

Non Repeatable read – Non Repeatable read occurs when a transaction reads same row twice,
and get a different value each time. For example, suppose transaction T1 reads data. Due to
concurrency, another transaction T2 updates the same data and commit, Now if transaction T1
rereads the same data, it will retrieve a different value.
A=200
T1 T2
R(A)
W(A)
R(A)
A=A+100
W(A)
COMMIT
R(A)
Phantom Read – Phantom Read occurs when two same queries are executed, but the rows
retrieved by the two, are different. For example, suppose transaction T1 retrieves a set of rows
that satisfy some search criteria. Now, Transaction T2 generates some new rows that match the
search criteria for transaction T1. If transaction T1 re-executes the statement that reads the rows,
it gets a different set of rows this time.
Based on these phenomena, The SQL standard defines four isolation levels:
Read Uncommitted: Read Uncommitted is the lowest isolation level. In this level, one
transaction may read not yet committed changes made by other transaction, thereby allowing
dirty reads. In this level, transactions are not isolated from each other.
Read Committed: This isolation level guarantees that any data read is committed at the
moment it is read. Thus it does not allow dirty read. The transaction holds a read or write
lock on the current row, and thus prevent other transactions from reading, updating or
deleting it.
Repeatable Read: This is the most restrictive isolation level. The transaction holds read
locks on all rows it references and writes locks on all rows it inserts, updates, or deletes.
Since other transaction cannot read, update or delete these rows, consequently it avoids non-
repeatable read.
Serializable: This is the Highest isolation level. A serializable execution is guaranteed to be
serializable. Serializable execution is defined to be an execution of operations in which
concurrently executing transactions appears to be serially executing.
The Table is given below clearly depicts the relationship between isolation levels, read
phenomena and locks:
Schedules in DBMS
The sequence of operation from one transaction to another transaction is known as schedule. It is
used to preserve the order of the operation in each of the individual transaction in a concurrent
execution of multiple transactions.
There are two types of Schedules
1. Serial schedule
2. Non serial schedule
Serial schedule
Schedules in which the transactions are executed non-interleaved. In the serial schedule, when
the first transaction completes its cycle, then the next transaction is executed.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
RA)
W(A)
R(B)
W(B)
R(A)
R(B)
This is a serial schedule since the transactions perform serially in the order T1 T2
Non serial schedule:
This is a type of Scheduling where the operations of multiple transactions are interleaved. This
might lead to a rise in the concurrency problem. in the non-serial schedule, the other transaction
proceeds without waiting for the previous transaction to complete. The transactions are executed
in a non-serial manner, keeping the end result correct and same as the serial schedule.
T1 T2
R(A)
W(A)

W(B)
R(A)
R(B)

R(B)

The Non-Serial Schedule can be divided further into two types


1. Serializable schedules
2. Non-Serializable schedules
Serializable schedules
This is used to maintain the consistency of the database. It is mainly used in the Non-Serial
scheduling to verify whether the scheduling will lead to any inconsistency or not. The non-serial
schedule is said to be in a serializable schedule only if its result is equal to the result of its
transactions executed serially.
Serializable schedule are two types:
Conflict Serializable:
A schedule is called conflict serializable if it can be transformed into a serial schedule by
swapping non-conflicting operations. Two operations are said to be conflicting if all conditions
satisfy:
 They belong to different transactions
 They operate on the same data item
 At Least one of them is a write operation
View Serializable:
A Schedule is called view serializable if it is view equal to a serial schedule (no overlapping
transactions). A conflict schedule is a view serializable but if the serializability contains blind
writes, then the view serializable does not conflict serializable.

Non-Serializable schedules:
The non-serializable schedule is divided into two types, Recoverable and Non-recoverable
Schedule.
Recoverable Schedule:
Schedules in which transactions commit only after all transactions whose changes they read
commit are called recoverable schedules. In other words, if some transaction Tj is reading value
updated or written by some other transaction Ti, then the commit of Tj must occur after the
commit of Ti.
Example – Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
W(A)
R(A)
Commit
commit
This is a recoverable schedule since T1 commits before T2, that makes the value read by T2 correct.
Recoverable schedules are three types:
Cascading roll back Schedule: A schedule in which failure in one transaction leads to the rolling
back or aborting other dependent transactions, then such scheduling is referred to as Cascading
rollback or cascading abort
Example:
The table below shows a schedule with two transactions, T1 reads and writes A and that value
is read and written by T2. But later on, T1 fails. So we have to rollback T1. Since T2 has read
the value written by T1, it should also be rollbacked. As it has not committed, we can rollback T2
as well. So it is recoverable with cascading rollback.
Therefore, if Tj is reading value updated by Ti and commit of Tj is delayed till commit of Ti, the
schedule is called recoverable with cascading rollback.

Cascade less Schedule: in which transactions read values only after all transactions whose
changes they are going to read commit are called cascade less schedules. It avoids that a single
transaction abort leads to a series of transaction rollbacks. A strategy to prevent cascading aborts
is to disallow a transaction from reading uncommitted changes from another transaction in the
same schedule.
In other words, if some transaction Tj wants to read value updated or written by some other
transaction Ti, then the commit of Tj must read it after the commit of Ti.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
W(A)
Commit
R(A)
Commit
This schedule is cascadeless. Since the updated value of A is read by T2 only after the updating
transaction i.e. T1 commits.
Strict Schedule:
A schedule is strict if for any two transactions Ti, Tj, if a write operation of Ti precedes a
conflicting operation of Tj (either read or write), then the commit or abort event of Ti also
precedes that conflicting operation of Tj.
In other words, Tj can read or write updated or written value of Ti only after Ti commits/aborts.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)

W(A)
commit
W(A)
R(A)
Commit

This is a strict schedule since T2 reads and writes A which is written by T1 only after the commit of
T1
Non-Recoverable Schedule:
When Tj is reading the value updated by Ti and Tj is committed before committing of Ti, the
schedule will be irrecoverable.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
W(A)
R(A)
Commit
abort
T2 read the value of A written by T1, and committed. T1 later aborted, therefore the value read by
T2 is wrong, but since T2 committed, this schedule is non-recoverable.
Serializability:
When multiple transactions are running concurrently then there is a possibility that the database
may be left in an inconsistent state. Serializability is a concept that helps us to check
which schedules are serializable. A serializable schedule is the one that always leaves the
database in consistent state.
A non-serial schedule of n number of transactions is said to be serializable schedule, if it is
equivalent to the serial schedule of those n transactions.
There are two types of Serializability.
1. Conflict Serializability
2. View Serializability
Conflict Serializability:
A schedule is called conflict serializable if it can be transformed into a serial schedule by
swapping non-conflicting operations. Two operations are said to be conflicting if all conditions
satisfy:
 They belong to different transactions
 They operate on the same data item
 At Least one of them is a write operation
Example1: check whether the given schedule is conflict serializable or non conflict serializable.
T1 T2
R(A)
R(B)
R(A)
R(B)
W(B)
W(A)

To convert this schedule into a serial schedule we must have to swap the R(A) operation of
transaction T2 with the W(A) operation of transaction T1. However we cannot swap these two
operations because they are conflicting operations, thus we can say that this given schedule is not
Conflict Serializable.
Example2: check whether the given schedule is conflict serializable or non conflict serializable.
T1 T2
R(A)
R(A)
R(B)
W(B)
R(B)
W(A)

Sol: After swapping R(A) of T1 and R(A) of T2 we get:


T1 T2
R(A)
R(A)
R(B)
W(B)
R(B)
W(A)

After swapping R(A) of T1 and R(B) of T2 we get:


T1 T2
R(A)
R(B)
R(A)
W(B)
R(B)
W(A)

After swapping R(A) of T1 and W(B) of T2 we get:


T1 T2
R(A)
R(B)
W(B)
R(A)
R(B)
W(A)

We finally got a serial schedule after swapping all the non-conflicting operations so we can say
that the given schedule is Conflict Serializable.
Precedence graph:
A precedence graph, also known as serialization graph, it is used to test Conflict Serializability of
a schedule. It is a directed Graph (V, E) consisting of a set of nodes V = {T1, T2, T3……….Tn} and
a set of directed edges E = {e1, e2, e3………………em}.
The graph contains one node for each Transaction Ti. An edge ei is of the form Tj –> Tk where Tj is
the starting node of ei and Tk is the ending node of ei. An edge ei is constructed between nodes Tj to
Tk if one of the operations in Tj appears in the schedule before some conflicting operation in Tk .
The Algorithm can be written as:
1. Create a node T in the graph for each participating transaction in the schedule.
2. For the conflicting operation read_item(X) and write_item(X) – If a Transaction
Tj executes a read_item (X) after Ti executes a write_item (X), draw an edge from Ti to
Tj in the graph.
3. For the conflicting operation write_item(X) and read_item(X) – If a Transaction
Tj executes a write_item (X) after Ti executes a read_item (X), draw an edge from Ti to
Tj in the graph.
4. For the conflicting operation write_item(X) and write_item(X) – If a Transaction
Tj executes a write_item (X) after Ti executes a write_item (X), draw an edge from Tj to
Ti in the graph.
5. The Schedule S is serializable if there is no cycle in the precedence graph.
Example: Consider the schedule S: R1(x), R1(y), W2(x),W1(x),R2(y) and check whether
schedule S, conflict serializable or non conflict serializable .
The given schedule is
S: R1(x), R1(y), W2(x), W1(x), R2(y)
T1 T2
R(x)
R(y)
W(x)
W(x)
R(y)

Creating Precedence graph:


1. Make two nodes corresponding to Transaction T1 and T2

T1 T2

2. For the conflicting pair r1(x) w2(x), where r1(x) happens before w2(x), draw an edge
from T1 to T2..

T1 T2

3. For the conflicting pair w2(x) w1(x), where w2(x) happens before w1(x), draw an edge
from T2 to T1.

T1 T2

Since the graph is cyclic, we can conclude that it is not conflict serializable
Example2: check whether the given schedule S1: R1(x), R3(y),W2(y) R3(x), W2(x) is conflict
serializable or non conflict serializable
Sol: the given schedule S1: R1(x), R3(y),W2(y) R3(x), W2(x)
The graph for this schedule is :

Since the graph is cyclic, we can conclude that it is not conflict serializable.
View Serializability:
View Serializability is used to check whether a given schedule is view serializable or not. To
check that, we need to check whether the given schedule is view equivalent to its serial
schedule.
View Equivalent:
Two schedules S1 and S2 are said to be view equivalent, if they satisfy all the following
conditions:

Initial Read: the first read operation on a data item is called initial read. An initial read of both
schedules must be the same. For example, there are two schedules S1 and S2. In schedule S1, if a
transaction T1 is reading the data item A, then in S2, transaction T1 should also read A.
Update Read: If in schedule S1, the transaction T1 is reading a data item updated by T2
then in schedule S2 also, T1 should read the value after the write operation of T2 on same
data item. For example, In schedule S1, T3 performs a read operation on A after the write
operation on A by T2 then in S2, T3 should read the A after T2 performs write on A

Above two schedules are not view equal because, in S1, T3 is reading A updated by T2
and in S2, T3 is reading A updated by T1.
Final Write: Final write operations on each data item must match in both the schedules. For
example, In schedule S1, the final write operation on A is done by transaction T2. In S2 also
transaction T2 performs the final write on X.
Above two schedules is view equal because Final write operation in S1 is done by T3 and in S2,
the final write operation is also done by T3.
Example: check whether the given non serial schedule (S1) is view serializable schedule or not.
T1 T2
S1

R(X)
W(X)
R(X)
W(X)

R(Y)
W(Y) R(Y)
W(Y)
Sol: The schedule (S1) is the given schedule, and schedule (S2) is some serial schedule for the
given schedule (S1). If the schedule (S1) is view equivalent to serial schedule (S2) then this
schedule is said to be view serializable schedule.
T1 T2
T1 T2
S1
S2
R(X) R(X)
W(X) W(X)
R(X) R(Y)
W(X) W(Y)
R(Y) R(X)
W(Y) W(X)
R(Y) R(Y)
W(Y) W(Y)

To check for view equivalence follow these steps


Initial Read: In schedule S1, transaction T1 first reads the data item X. In S2 also transaction T1 first
reads the data item X. Let’s check for Y. In schedule S1, transaction T1 first reads the data item Y. In S2
also the first read operation on Y is performed by T1. We checked for both data items X & Y and
the initial read condition is satisfied in S1 & S2.
Final Write: In schedule S1, the final write operation on X is done by transaction T2. In S2 also
transaction T2 performs the final write on X. Let’s check for Y. In schedule S1, the final write operation
on Y is done by transaction T2. In schedule S2, final write on Y is done by T2.
We checked for both data items X & Y and the final write condition is satisfied in S1 & S2.
Update Read
In S1, transaction T2 reads the value of X, written by T1. In S2, the same transaction T2 reads the X after
it is written by T1. In S1, transaction T2 reads the value of Y, written by T1. In S2, the same transaction
T2 reads the value of Y after it is updated by T1. The update read condition is also satisfied for both the
schedules.
Since all the three conditions are satisfied which means S1 and S2 are view equivalent. So we can say that
the schedule S1 is view serializable schedule.
Concurrent Execution

In the transaction process, a system usually allows executing more than one transaction
simultaneously. This process is called a concurrent execution.
Advantages: Advantages of concurrent execution of a transaction

 Decrease waiting time or turnaround time.


 Improve response time
 Increased throughput or resource utilization.

Problems of concurrency
Several problems can occur when concurrent transactions are run in an uncontrolled manner,
such type of problems is known as concurrency problems.
There are following different types of problems or conflicts which occur due to concurrent
execution of transaction:
Lost update problem (Write – Write conflict)
This type of problem occurs when two transactions in database access the same data item and
have their operations in an interleaved manner that makes the value of some database item
incorrect.
If there are two transactions T1 and T2 accessing the same data item value and then update it,
then the second record overwrites the first record.
Example: Let’s take the value of A is 100

Time Transaction T1 Transaction T2


t1 Read(A)
t2 A=A-50
t3 Read(A)
t4 A=A+50
t5 Write(A)
t6 Write(A)
Here,
At t1 time, T1 transaction reads the value of A i.e., 100.
At t2 time, T1 transaction deducts the value of A by 50.
At t3 time, T2 transactions read the value of A i.e., 100.
At t4 time, T2 transaction adds the value of A by 150.
At t5 time, T1 transaction writes the value of A data item on the basis of value seen at time t2
i.e., 50.
At t6 time, T2 transaction writes the value of A based on value seen at time t4 i.e., 150.
So at time T6, the update of Transaction T1 is lost because Transaction T2 overwrites the
value of A without looking at its current value.
Such type of problem is known as the Lost Update Problem.

Dirty read problem (W-R conflict)


This type of problem occurs when one transaction T1 updates a data item of the database, and
then that transaction fails due to some reason, but its updates are accessed by some other
transaction.
Example: Let’s take the value of A is 100

Time Transaction T1 Transaction T2


t1 Read(A)
t2 A=A+20
t3 Write(A)
t4 Read(A)
t5 A=A+30
t6 Write(A)
t7 Write(B)
t8 abort
Here,
At t1 time, T1 transaction reads the value of A i.e., 100.
At t2 time, T1 transaction adds the value of A by 20.
At t3 time, T1transaction writes the value of A (120) in the database.
At t4 time, T2 transactions read the value of A data item i.e., 120.
At t5 time, T2 transaction adds the value of A data item by 30.
At t6 time, T2transaction writes the value of A (150) in the database.
At t7 time, a T1 transaction fails due to power failure then it is rollback according to
atomicity property of transaction (either all or none).
So, transaction T2 at t4 time contains a value which has not been committed in the database.
The value read by the transaction T2 is known as a dirty read.

Unrepeatable read (R-W Conflict)


It is also known as an inconsistent retrieval problem. If a transaction T1 reads a value of data item
twice and the data item is changed by another transaction T2 in between the two read operation.
Hence T1 access two different values for its two read operation of the same data item.
Example: Let’s take the value of A is 100

Time Transaction T1 Transaction T2


t1 Read(A)
t2 Read(A)
t3 A=A+30
t4 Write(A)
t5 Read(A)
Here,
At t1 time, T1 transaction reads the value of A i.e., 100.
At t2 time, T2transaction reads the value of A i.e., 100.
At t3 time, T2 transaction adds the value of A data item by 30.
At t4 time, T2 transaction writes the value of A (130) in the database.
Transaction T2 updates the value of A. Thus, when another read statement is performed by
transaction T1, it accesses the new value of A, which was updated by T2. Such type of
conflict is known as R-W conflict.

Concurrency Control
Concurrency control is the procedure in DBMS for managing simultaneous operations without
conflicting with each another. It is used to address such conflicts which mostly occur with a
multi-user system. It helps you to make sure that database transactions are performed
concurrently without violating the data integrity of respective databases.

Concurrency control protocols ensure atomicity, isolation, and serializability of concurrent


transactions. The concurrency control protocol can be divided into three categories:

1. Lock based protocol


2. Time-stamp protocol
3. Validation based protocol

Lock based protocol

A lock is a data variable which is associated with a data item. This lock signifies that operations
that can be performed on the data item. Locks help synchronize access to the database items by
concurrent transactions.

All lock requests are made to the concurrency-control manager. Transactions proceed only once
the lock request is granted.
There are two types of lock:

1. Shared lock (Lock S (A)):

 It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
 It can be shared between the transactions because when the transaction holds a lock, then
it can't update the data on the data item.

2. Exclusive lock (Lock X (A)):

 In the exclusive lock, the data item can be both reads as well as written by the
transaction.
 This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.

Two-phase locking (2PL) Protocol:


 The two-phase locking protocol divides the execution phase of the transaction into three
parts.
 In the first part, when the execution of the transaction starts, it seeks permission for the
lock it requires.
 In the second part, the transaction acquires all the locks. The third phase is started as soon
as the transaction releases its first lock.
 In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.

Fig: Two-phase locking (2PL) Protocol


In this protocol Locking and Unlocking can be done in two phases.

Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released.

Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released,
but no new locks can be acquired.

This protocol supports the lock conversions

The upgrading of lock( from S(a) to X(a) ) is allowed in Growing Phase and downgrading of lock
(from X(a) to S(a)) must be done in shrinking phase.
This protocol assures Serializability, if all the transactions are serialized in order of their lock
points (the point where a transaction acquires its final lock), but there are still some drawbacks.
The drawbacks of 2-PL:
 Cascading Rollback is possible under 2-PL.
 Deadlocks and Starvation is possible.

Conservative Two-Phase Locking: In this all the locks should acquire before the operation of
the transaction started. It avoids the dead lock, but cascading roll back is not solved.

Strict Two-Phase Locking


A transaction should not release any of the exclusive locks until that transaction is committed or
abort. It avoids cascading rollback, but still dead lock may occur.

Rigorous Two-Phase Locking


It ensures that all the shared and exclusive locks held by the transaction, released only after the
transaction commit. It gives the freedom from dead lock cascading roll back, but still dead locks
are possible.
Example: see the transaction implementing in two phase locking

T T1 T2
1 Lock X(A)
2 Lock X(A)
3 Lock X(B)
4 Unlock (A)
5 Lock X(C)
6 Unlock (B)
7 Unlock (A)
8 Unlock (C)

Transaction T1 is in

Growing phase step1 to step3,

Shrinking phase step 4 to step 6, and

Lock point is step 3.

Transaction T2 is in

Growing phase step 2 to step 5

Shrinking phase step 7 to step 8

Lock point is step 5.


Timestamp Ordering Protocol
The Timestamp Ordering Protocol is used to order the transactions based on their Timestamps.
The order of transaction is nothing but the ascending order of the transaction creation.
The priority of the older transaction is higher that's why it executes first. To determine the
timestamp of the transaction, this protocol uses system time or logical counter.
The lock-based protocol is used to manage the order between conflicting pairs among
transactions at the execution time. But Timestamp based protocols start working as soon as a
transaction is created.
Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the
system at 007 times and transaction T2 has entered the system at 009 times. T1 has the higher
priority, so it executes first as it is entered the system first.
The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write' operation
on a data.
Basic Timestamp ordering protocol works as follows:

1. Check the following condition whenever a transaction Ti issues a Read (X) operation:

o If W_TS(X) >TS(Ti) then the operation is rejected.


o If W_TS(X) <= TS(Ti) then the operation is executed,
o Timestamps of all the data items are updated.

2. Check the following condition whenever a transaction Ti issues a Write(X) operation:

o If TS(Ti) < R_TS(X) then the operation is rejected.


o If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise
the operation is executed.

Where,
TS(Ti) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
Thomas' Write Rule

This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and Ti is rolled back.
Time-stamp ordering rules can be modified to make the schedule view serializable.
Instead of making Ti rolled back, the 'write' operation itself is ignored.

Example: there are four transactions are running concurrently, time stamp values of these
transactions are 10, 20, 30, and 40.

T1 T2 T3 T4
R1(A)

W1(A)

R3(A)

R2(A)

Abort

W4(A)

Sol: Initially Read time stamp and write time stamp of A is zero i.e. (RTS (A), and WTS (A) = 0)

Here, T1 is issuing a read operation with time stamp value 10

Time stamp (TS) of T1 = 10, WTS (T1) = 0, RTS (T1) = 0

So WTS (A) <=TS (T1) (that is 0 <10),

Then it executes this read operation R1 (A) and updates the RTS (A) =10.

After execution read operation RTS (A) =10, WTS (A) =0.
Then transaction T1 issuing the write operation on data item A
Now TS (T1) =10, RTS (A) =10, WTS (A) =0.
TS (T1) > WTS (A) (that is 10 > 0)
So it execute this write operation, and updates the WTS (A) = 10
After execution write operation RTS (A) =10, WTS (A) = 10.
Then transaction T3 issuing read operation on data item A with time stamp value 30,
Now TS (T3) = 30, RTS (A) =10 and WTS (A) = 10
WTS (A) <=TS (T3) (that is 10 <= 30)
Then it executes this read operation on data item A and updates the RTS (A) = 30
After execution of read operation RTS (A) =30 and WTS (A) = 10
Then transaction T2 issuing read operation on data item A with time stamp value 20
Now TS (T2) =20, RTS (A) =30 and WTS (A) = 10
RTS (A) >TS (T2) (that is 30 >20)
Here it reject the read operation of transaction T2
Then transaction T4 issuing write operation on data item A with time stamp value 40
Now TS (T4) = 40, RTS (A) =30 and WTS (A) = 10
WTS (A) <=TS (T4) (that is 10 < 40)
Then it executes this write operation on data item A and updates the WTS (A) = 40
After execution of write operation
RTS (A) =30 and WTS (A) = 40
Validation Based Protocol:
This protocol is also called optimistic concurrency control technique. This protocol is used in
DBMS for avoiding concurrency in transactions.
It is called optimistic because of very less interference occurs therefore there is no need for
checking while the transaction is executed.
In this technique, no checking is done while the transaction is been executed. Until the transaction
end is reached updates in the transaction are not applied directly to the database. All updates are
applied to local copies of data items kept for transaction.
At the end of transaction execution, while execution of transaction, a validation phase checks
whether any of transaction updates violate serializability. If there is no violation of serializability
the transaction is committed and the database is updated; or else, the transaction is updated and
then restarted.
The transaction is executed in the following three phases:
1. Read phase: In this phase, the transaction T is read and executed. It is used to read the
value of various data items and stores them in temporary local variables. It can perform
all the write operations on temporary variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated against
the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the temporary results
are written to the database or system otherwise the transaction is rolled back.
Here each phase has the following different timestamps:
Start(Ti): It contains the time when Ti started its execution.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its validation
phase.
Finish(Ti): It contains the time when Ti finishes its write phase.
The idea behind optimistic concurrency is to do all the checks at once; hence transaction execution
proceeds with a minimum of overhead until the validation phase is reached. If there is not much
interference among transactions most of them will have successful validation, otherwise, results
will be discarded and restarted later.
Validation based protocol is useful for rare conflicts.

Multiple granularity protocol:

Granularity is the size of data item allowed to lock. Multiple granularity means hierarchically
breaking up the database into blocks, can be represented graphically as a tree. In which lower level
nodes are nested within the higher blocks (nodes). When node is locked explicitly, children’s to
that node locked implicitly.

In this locks are acquired from root node to leaf node and released from leaf node to root node. In
this protocol to lock a lower level node no upper level nodes to that node should not be locked by
some other transaction and to acquire a lock on upper level node, its descendent nodes should not
locked by some other transaction.

For example, consider the tree, which consists of four levels of nodes. The highest level represents
the entire database. Below it are nodes of type area; the database consists of exactly these areas.
Area has children nodes which are called files. Every area has those files that are its child nodes.
No file can span more than one area.
Finally, each file has child nodes called records. As before, the file consists of exactly those
records that are its child nodes, and no record can be present in more than one file. Hence, the
levels starting from the top level are:
 database
 area
 file
 record
Figure – Multi Granularity tree Hierarchy

For example, if transaction Ti gets an explicit lock on file Fb in exclusive mode, then it has an
implicit lock in exclusive mode on all the records belonging to that file. It does not need to lock
the individual records of Fb explicitly. This is the main difference between Tree Based
Locking and Hierarchical locking for multiple granularities.
Intention Mode Lock –
To locks the data items of different levels, In addition to S and X lock modes, there are three
additional lock modes with multiple granularity:
 Intention-Shared (IS): if a node is locked in IS mode, the lower level nodes to that node are
locked explicitly in shared mode.
 Intention-Exclusive (IX): It indicates explicit locking at a lower level with exclusive or
shared locks.
 Shared & Intention-Exclusive (SIX): the sub-tree rooted by that node is locked explicitly in
shared mode and explicit locking is being done at a lower level with exclusive mode locks.

The compatibility matrix for these lock modes are described below:
The multiple-granularity locking protocol uses the intention lock modes to ensure serializability. It
requires that a transaction Ti that attempts to lock a node must follow these protocols:
1. Transaction Ti must follow the lock-compatibility matrix.
2. Transaction Ti must lock the root of the tree first, and it can lock it in any mode.
3. Transaction Ti can lock a node in S or IS mode only if Ti currently has the parent of the
node locked in either IX or IS mode.
4. Transaction Ti can lock a node in X, SIX, or IX mode only if Ti currently has the parent
of the node locked in either IX or SIX mode.
5. Transaction Ti can lock a node only if Ti has not previously unlocked any node (i.e., Ti is
two phase).
6. Transaction Ti can unlock a node only if Ti currently has none of the children of the node
locked.
Example: In Multi Granularity, transactions T1 and T2 are running concurrently, Transaction
T1 want to read the data item ra2, and Transaction T2 want to modify the record ra3.

Figure – Multi Granularity tree Hierarchy

T1 reads ra2: in order to read ra2, it need to obtain a shared lock , to obtain the shared lock it
should traverse from that node to the root node to check whether some node is locked in that
path, if not it should acquire IS in upper level node.
T2 modifies the ra3: to modify the data item it need to acquire exclusive lock on that data item,
to obtain the exclusive lock on ra3, it should acquire Ix lock on upper level, here the upper level
nodes are locked in intention shared IS by transaction. Here the IX lock is compatible with IS
locking, so to modify the data item, it acquires the IX intention exclusive lock on the data item.
Recovery and Atomicity

When a system crashes, it may have several transactions being executed and various files
opened for them to modify the data items. Transactions are made of various operations, which
are atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a
whole must be maintained, that is, either all the operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
 It should check the states of all the transactions, which were being executed.
 A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
 It should check whether the transaction can be completed now or it needs to be rolled
back.
 No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as maintaining
the atomicity of a transaction −
 Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.
 Maintaining shadow paging, where the changes are done on a volatile memory, and later,
the actual database is updated.

Log-based Recovery

Log is a sequence of records, which maintains the records of actions performed by a transaction.
It is important that the logs are written prior to the actual modification and stored on a stable
storage media, which is failsafe.
Log-based recovery works as follows −
 The log file is kept on a stable storage media.
 When a transaction enters the system and starts execution, it writes a log about it.
<Tn, Start>
 When the transaction modifies an item X, it write logs as follows −
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V1 to V2.

 When the transaction finishes, it logs −


<Tn, commit>
The database can be modified using two approaches −
 Deferred database modification − All logs are written on to the stable storage and the
database is updated when a transaction commits.
 Immediate database modification − Each log follows an actual database modification.
That is, the database is modified immediately after every operation.

Recovery with Concurrent Transactions

When more than one transaction are being executed in parallel, the logs are interleaved. At the
time of recovery, it would become hard for the recovery system to backtrack all logs, and then
start recovering. To ease this situation, most modern DBMS use the concept of 'checkpoints'.

Checkpoint

Keeping and maintaining logs in real time and in real environment may fill out all the memory
space available in the system. As time passes, the log file may grow too big to be handled at all.
Checkpoint is a mechanism where all the previous logs are removed from the system and stored
permanently in a storage disk. Checkpoint declares a point before which the DBMS was in
consistent state, and all the transactions were committed.

Recovery

When a system with concurrent transactions crashes and recovers, it behaves in the following
manner −
 The recovery system reads the logs backwards from the end to the last checkpoint.
 It maintains two lists, an undo-list and a redo-list.
 If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn,
Commit>, it puts the transaction in the redo-list.
 If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it
puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the
transactions in the redo-list and their previous logs are removed and then redone before saving
their logs.
UNIT – V
Data on External Storage, File Organization and Indexing, Cluster Indexes, Primary and Secondary Indexes, Index data
Structures, Hash Based Indexing, Tree base Indexing, Comparison of File Organizations, Indexes and Performance Tuning,
Intuitions for tree Indexes, Indexed Sequential Access Methods (ISAM), B+ Trees: A Dynamic Index Structure.

Data on External Storage:


A DBMS stores vast quantity of data and the data must persist across program executions.
Therefore, data is stored on external storage devices such as disks and tapes, and fetched into main
memory as needed for processing. In storage hierarchy storage space is increases from top to
bottom (CPU to Tape) and access speed is minimizes and storage capacity is high.
These storage devices can be broadly categorized into three types
o Primary Storage
o Secondary Storage
o Tertiary Storage

Primary Storage

It is the primary area that offers quick access to the stored data. We also know the primary storage
as volatile storage. It is because this type of memory does not permanently store the data. As soon
as the system leads to a power cut or a crash, the data also get lost. Main memory and cache are
the types of primary storage.

o Main Memory: It is the one that is responsible for operating the data that is available by the
storage medium. The main memory handles each instruction of a computer machine. This type of
memory can store gigabytes of data on a system but is small enough to carry the entire database.
At last, the main memory loses the whole content if the system shuts down because of power
failure or other reasons.
o Cache: It is one of the costly storage media. On the other hand, it is the fastest one. A cache is a
tiny storage media which is maintained by the computer hardware usually. While designing the
algorithms and query processors for the data structures, the designers keep concern on the cache
effects.

Secondary Storage

Secondary storage it is also called as online storage. It is the storage area that allows the user to
save and store data permanently. This type of memory does not lose the data due to any power
failure or system crash. That's why we also call it non-volatile storage.

There are some commonly described secondary storage media which are available in almost every
type of computer system:
o Flash Memory: A flash memory stores data in USB (Universal Serial Bus) keys which are further
plugged into the USB slots of a computer system. These USB keys help transfer data to a
computer system, but it varies in size limits. Unlike the main memory, it is possible to get back
the stored data which may be lost due to a power cut or other reasons. This type of memory
storage is most commonly used in the server systems for caching the frequently used data. This
leads the systems towards high performance and is capable of storing large amounts of databases
than the main memory.
o Magnetic Disk Storage: This type of storage media is also known as online storage media. A
magnetic disk is used for storing the data for a long time. It is capable of storing an entire
database. It is the responsibility of the computer system to make availability of the data from a
disk to the main memory for further accessing. Also, if the system performs any operation over
the data, the modified data should be written back to the disk. The tremendous capability of a
magnetic disk is that it does not affect the data due to a system crash or failure, but a disk failure
can easily ruin as well as destroy the stored data.

Tertiary Storage
It is the storage type that is external from the computer system. It has the slowest speed. But it is
capable of storing a large amount of data. It is also known as Offline storage. Tertiary storage is
generally used for data backup. There are following tertiary storage devices available:
o Optical Storage: An optical storage can store megabytes or gigabytes of data. A Compact Disk
(CD) can store 700 megabytes of data with a playtime of around 80 minutes. On the other hand, a
Digital Video Disk or a DVD can store 4.7 or 8.5 gigabytes of data on each side of the disk.
o Tape Storage: It is the cheapest storage medium than disks. Generally, tapes are used for
archiving or backing up the data. It provides slow access to data as it accesses data sequentially
from the start. Thus, tape storage is also known as sequential-access storage. Disk storage is
known as direct-access storage as we can directly access the data from any location on disk.

Fig. Storage Hierarchy


Magnetic Disks

Hard disk drives are the most common secondary storage devices in present computer systems.
These are called magnetic disks because they use the concept of magnetization to store
information. Hard disks consist of metal disks coated with magnetizable material. These disks are
placed vertically on a spindle. A read/write head moves in between the disks and is used to
magnetize or de-magnetize the spot under it. A magnetized spot can be recognized as 0 (zero) or
1 (one).
Hard disks are formatted in a well-defined order to store data efficiently. A hard disk plate has
many concentric circles on it, called tracks. Every track is further divided into sectors. A sector
on a hard disk typically stores 512 bytes of data.

Redundant Array of Independent Disks

RAID refers to redundancy array of the independent disk. It is a technology which is used to
connect multiple secondary storage devices for increased performance, data redundancy or both. It
gives you the ability to survive one or more drive failure depending upon the RAID level used.

It consists of an array of disks in which multiple disks are connected to achieve different goals.

o Level 0: Non- redundant: RAID level 0 provides data stripping, i.e., a data can place across
multiple disks. It is based on stripping that means if one disk fails then all data in the array is
lost.

o This level doesn't provide fault tolerance but increases the system performance. It doesn't
contain any error detection mechanism.
o In this level, failure of either disk results in complete data loss in respective array.

In our example, the RAID Level 0 solution consists of only four data disks. Independent of the
number of data disks, the effective space utilization for a RAID Level 0 system is always 100 percent.
Level 1: Mirrored:
This level is called mirroring of data as it copies the data from drive 1 to drive 2. It provides 100%
redundancy in case of a failure. Instead of having one copy of the data, two identical copies of the
data on two different disks are maintained. This type of redundancy is often called mirroring.
Only half space of the drive is used to store the data. The other half of drive is just a mirror to the
already stored data.

Level 2: Stripping and Mirroring:


RAID 2 records Error Correction Code using Hamming distance for its data, striped on different
disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC codes of the
data words are stored on different set disks. Due to its complex structure and high cost, RAID 2 is
not commercially available.

Level 3: Error- Correcting Codes:


o RAID 3 consists of byte-level striping with dedicated parity. In this level, the parity
information is stored for each disk section and written to a dedicated parity drive.
o RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is stored
on a different disk. This technique makes it to overcome single disk failures.
o In case of drive failure, the parity drive is accessed, and data is reconstructed from the
remaining devices. Once the failed drive is replaced, the missing data can be restored on the
new drive.
Level 4: Block - Interleaved Parity:
In this level, an entire block of data is written onto data disks and then the parity is generated and
stored on a different disk. Note that level 3 uses byte-level striping, whereas level 4 uses block-
level striping. Both level 3 and level 4 require at least three disks to implement RAID.
The effective space utilization is 80 percent. The effective space utilization increases with the
number of data disks, since always only one check disk is necessary.

Level 5:
RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data block
stripe are distributed among all the data disks rather than storing them on a different dedicated
disk.

Level 6: RAID 6 is an extension of level 5. In this level, two independent parities are generated
and stored in distributed fashion among multiple disks. Two parities provide additional fault
tolerance. This level requires at least four disk drives to implement RAID.
File Organization:
A database consists of a huge amount of data. The data is grouped within a table in RDBMS,
and each table has related records. A user can see that the data is stored in form of tables, but in
actual this huge amount of data is stored in physical memory in form of files.
A file is named collection of related information that is recorded on secondary storage such as
magnetic disks, magnetic tapes and optical disks.
File Organization refers to the logical relationships among various records that constitute the
file, particularly with respect to the means of identification and access to any specific record. In
simple terms, Storing the files in certain order is called file Organization. File Structure refers to
the format of the label and data blocks and of any logical control record.

Types of File Organizations

o Sequential file organization


o Heap file organization
o Hash file organization
o Clustered file organization

Sequential File Organization

This method is the easiest method for file organization. In this method, files are stored sequentially. This
method can be implemented in two ways:

1. Pile File Method:


o It is a simple method. In this we store the record in a sequence, i.e., one after another. Here, the
record will be inserted in the order in which they are inserted into tables.
o In case of updating or deleting of any record, the record will be searched in the memory blocks.
When it is found, then it will be marked for deleting, and the new record is inserted.
Insertion of the new record:

Suppose we have four records R1, R3 and so on up to R9 and R8 in a sequence. Hence, records are
nothing but a row in the table. Suppose we want to insert a new record R2 in the sequence, then it
will be placed at the end of the file. Here, records are nothing but a row in any table.

2. Sorted File Method:


o In this method, the new record is always inserted at the file's end, and then it will sort the
sequence in ascending or descending order. Sorting of records is based on any primary key or
any other key.
o In the case of modification of any record, it will update the record and then sort the file, and
lastly, the updated record is placed in the right place.

Insertion of the new record:

Suppose there is a preexisting sorted sequence of four records R1, R3 and so on up to R6 and R7.
Suppose a new record R2 has to be inserted in the sequence, then it will be inserted at the end of
the file, and then it will sort the sequence.
Pros of sequential file organization
o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism like magnetic tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used when most of the records have to be accessed like grade calculation of a
student, generating the salary slip, etc.
o This method is used for report generation or statistical calculations.

Cons of sequential file organization


o It will waste time as we cannot jump on a particular record that is required but we have to move
sequentially which takes our time.
o Sorted file method takes more time and space for sorting the records.

Heap file organization


o It is the simplest and most basic type of organization. It works with data blocks. In heap file
organization, the records are inserted at the file's end. When the records are inserted, it doesn't
require the sorting and ordering of records.
o When the data block is full, the new record is stored in some other block. This new data block
need not to be the very next data block, but it can select any data block in the memory to store
new records. The heap file is also known as an unordered file.
o In this file, every record has a unique id, and every page in a file is of the same size. It is the
DBMS responsibility to store and manage the new records.
Insertion of a new record

Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert a
new record R2 in a heap. If the data block 3 is full then it will be inserted in any of the data block
selected by the DBMS, let's say data block 1.
If we want to search, update or delete the data in heap file organization, then we need to traverse
the data from staring of the file till we get the requested record.

If the database is very large then searching, updating or deleting of record will be time-consuming
because there is no sorting or ordering of records. In the heap file organization, we need to check
all the data until we get the requested record.

Pros of Heap file organization


o It is a very good method of file organization for bulk insertion. If there is a large number of data
which needs to load into the database at a time, then this method is best suited.
o In case of a small database, storing and retrieving of records is faster than the sequential record.

Cons of Heap file organization


o This method is inefficient for the large database because it takes time to search or modify the
record.
o This method is inefficient for large databases.

Hash File Organization

Hash File Organization uses the computation of hash function on some fields of the records. The
hash function's output determines the location of disk block where the records are to be placed.

When a record has to be received using the hash key columns, then the address is generated, and
the whole record is retrieved using that address. In the same way, when a new record has to be
inserted, then the address is generated using the hash key and record is directly inserted. The same
process is applied in the case of delete and update.
In this method, there is no effort for searching and sorting the entire file. In this method, each
record will be stored randomly in the memory.

Indexing
If the records in the file are in sorted order, then searching will become very fast. But, in
most of the cases, records are placed in the file in the order in which they are inserted, so new
records are inserted at the end of the file. It indicates, the records are not in sorted order. In
order to make searching faster in the files with unsorted records, indexing is used.

Indexing is a data structure technique which allows you to quickly retrieve records from a
database file. An Index is a small table having only two columns. The first column contains a
copy of the primary or candidate key of a table. The second column contains a set of disk block
addresses where the record with that specific key value stored.
Indexing in DBMS can be of the following types:

Indexing

Primary Indexing Secondary Indexing Clustering Indexing

Dense Indexing Sparse Indexing

i. Primary Index
 If the index is created by using the primary key of the table, then it is known as primary
indexing.
 As primary keys are unique and are stored in a sorted manner, the performance of the
searching operation is quite efficient.
 The primary index can be classified into two types: dense index and sparse index. 

Dense index
 If every record in the table has one index entry in the index table, then it is called dense
index. 
 In this, the number of records (rows) in the index table is same as the number of records
(rows) in the main table.
 As every record has one index entry, searching becomes faster.

TS TS Hyderabad KCR
AP AP Amaravathi Jagan
TN TN Madras Palani Swamy
MH MH Bombay Thackray

Sparse index
 If only few records in the table have index entries in the index table, then it is called
sparse index. 
 In this, the number of records (rows) in the index table is less than the number of records
(rows) in the main table.
 As not all the record have index entries, searching becomes slow for records that does not
have index entries. 
TS TS Hyderabad KCR
TN AP Amaravathi Jagan
MH TN Madras Palani Swamy
MH Bombay Thackray

ii. Secondary Index


When the size of the main table grows, then size of index table also grows. If the index table size
grows then fetching the address itself becomes slower. To overcome this problem, secondary
indexing is introduced.

 In secondary indexing, to reduce the size of mapping, another level of indexing is introduced.
 It contains two levels. In the first level each record in the main table has one entry in the first-
level index table.
 The index entries in the fisrt level index table are divided into different groups. For each
group, one index entry is created and added in the second level index table.

Multi-level Index: When the main table size becomes too large, creating secondary level index
improves the search process. Even if the search process is slow; we can add one more level of
indexing and so on. This type of indexing is called multi-level index.
iii. Clustering Index
 Sometimes the index is created on non-primary key columns which may not be unique
for each record.
 In this case, to identify the record faster, we will group two or more columns to get the
unique value and create index out of them. This method is called a clustering index. 
 The records which have similar characteristics are grouped, and indexes are created for
these group.

Example: Consider a college contains many students in each department. All the students belong
to the same Dept_ID are grouped together and treated as a single cluster. One index pointers
point to the one cluster as a whole. The idex pointer points to the first record in each cluster.
Here Dept_ID is a non-unique key.

Index File Records of table in memory


CSE 501 Ajay BCD
ECE 502 Ramya BCA
EEE … … …
… 560 Fathima BCE
401 Vijay Reddy OC
… … …
460 Mallesh ST
201 Jhon SC
… … …
260 Sowmya BCC
… … …
… … …

In above diagram we can see that, indexes are created for each department in the index
file. In the data block, the students of each department are grouped together to form the cluster.
The address in the index file points to the beginning of each cluster.

Hash Based Indexing

Hashing is a technique to directly search the location of desired data on the disk without
using index structure. Hash function is a function which takes a piece of data ( key) as input and
produces a hash value as output which maps the data to a particular location in the hash table.
The concept of hashing and hash table is shown in the below figure

There are mainly two types of hashing methods:


i. Static Hashing
ii. Dynamic Hashing
 Extended hashing
 Linear hashing

Static Hashing:

In static hashing, the hash function produce only fixed number of hash values. For
example consider the hash function
f(x) = x mod 7
For any value of x, the above function produces one of the hash values from {0, 1, 2, 3, 4, 5, 6}.
Itmeans static hashing maps search-key values to a fixed set of bucket addresses.

Example: Inserting 10, 21, 16 and 12 in hash table.

Hash Value Data Record


f(10) = 10 mod 7 = 3 0 21*
f(21) = 21 mod 7 = 0 1
f(16) = 16 mod 7 = 2 2 16*
f(12) = 12 mod 7 = 5 3 10*
4
5 12*
6

Figure 5.1: Static hashing


Suppose, latter if we want to insert 23, it produce hash value as 2 (23 mod 7 = 2). But, in the
above hash table, the location with hash value 2 is not empty (it contains 16*). So, a collision
occurs. To resolve this collision, the following techniques are used.
o Open addressing
o Separate Chaining or Closed addressing

i. Open Addressing:

Open addressing is a collision resolving technique which stores all the keys inside the
hash table. No key is stored outside the hash table. Techniques used for open addressing are:

o Linear Probing
o Quadratic Probing
o Double Hashing

 Linear Probing:

In linear probing, when there is a collision, we scan forwards for the next empty slot to
fill the key’s record. If you reach last slot, then start from beginning.

Example: Consider figure 1. When we try to insert 23, its hash value is 2. But the slot
with 2 is not empty. Then move to next slot (hash value 3), even it is also full, then move
once again to next slot with hash value 4. As it is empty store 23 there. This is shown in
the below diagram.

Hash Value Data Record

0 21*

f(23) = 23 mod 7 = 2 2 16*

3 10*

4 23*

5 12*

Figure 5.2: Linear Probing


 Quadratic Probing:

In quadratic probing, when collision occurs, it compute new hash value by taking the
original hash value and adding successive values of quadratic polynomial until an open
slot is found. If here is a collision, it use the following hash function: h(x) = ( f(x) + i2 )
mod n , where I = 1, 2, 3, 4,….. and f(x) is initial hash value.

Example: Consider figure 1. When we try to insert 23, its hash value is 2. But the slot
with hash value 2 is not empty. Then compute new hash value as (2 +1 2) mod 7 =3, even
it is also full, and then once again compute new hash value as (2 +2 2) mod 7 = 6. As it is
empty store 23 there. This is shown in the below diagram.

Hash
Data Record
Value

0 21*

f(23) = 23 mod 7 = 2 2 16*

3 10*

5 12*

6 23*

Figure 5.3: Quadratic Probing

 Double Hashing

In double hashing, there are two hash functions. The second hash function is used to
provide an offset value in case the first function causes a collision. The following
function is an example of double hashing: (first Hash(key) + i * secondHash(key)) %
table Size. Use i = 1, 2, 3, …

A popular second hash function is : secondHash(key) = PRIME – (key % PRIME)


where PRIME is a prime smaller than the TABLE_SIZE.
Example: Consider figure 1. When we try to insert 23, its hash value is 2. But the slot
with hash value 2 is not empty. Then compute double hashing value as
secondHash (key) = PRIME – (key % PRIME) → secondHash (23) = 5 – (23 % 5) = 2
Double hashing: (firstHash(key) + i * secondHash(key)) % tableSize → ( 2+1*2))%7 =4
As the slot with hash value 4 is empty, store 23 there. This is shown in the below
diagram.
Hash Value Data Record
0 21*
1
f(23) = 23 mod 7 = 2 2 16*
3 10*
4 23*
5 12*
6

Figure 5.4: Double Probing

ii. Separate Chaining or Closed addressing:

To handle the collision, this technique creates a linked list to the slot for which collision
occurs. The new key is then inserted in the linked list. These linked lists to the slots appear like
chains. So, this technique is called as separate chaining. It is also called as closed addressing.

Example: Inserting 10, 21, 16, 12, 23, 19, 28, 30 in hash table.

Hash Value Data Record


f(10) = 10 mod 7 = 3 0 21*

f(21) = 21 mod 7 = 0 1

f(16) = 16 mod 7 = 2 2 16*


23* 30*
f(12) = 12 mod 7 = 5 3 10*

f(23) = 23 mod 7 = 2 4

f(19) = 19 mod 7 = 5 5 12* 19*


f(30) = 30 mod 7 = 2 6

Figure 5.5: Separate chaining example


2. Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the
size of the database grows or shrinks. Dynamic hashing provides a mechanism in which data
buckets are added and removed dynamically and on-demand. Dynamic hashing can be
implemented using two techniques. They are:

o Extended hashing
o Linear Hashing

i. Extendable hashing
In extendable hashing, a separate directory of pointers to buckets is used. The number
bits used in directory is called global depth (gd) and number entries in directory = 2 gd. Number of
bits used for locating the record in the buckets is called local depth(ld) and each bucket can
stores up to 2ld entries. The hash function use last few binary bits of the key to find the bucket. If
a bucket overflows, it splits, and if local depth greater than global depth, then the table doubles in
size. It is one form of dynamic hashing.
Example: Let global depth (gd) = 2. It means the directory contains four entries. Let the local
depth (ld) of each bucket = 2. It means each bucket need two bits to perform search operation.
Let each Bucket capacity is four. Let us insert 21, 15, 28, 17, 16, 13, 19, 12, 10, 24, 25 and 11.
21 = 10101 19 = 10011
15 = 01111 12 = 01100
28 = 11100 10 = 01010
17 = 10001 24 = 11000
16 = 10000 25 = 11101
13 = 01101 11 = 01011

Local depth 2
Global depth 28* 16* 12* 24* Bucket A
2 2
00 21* 17* 25* 13* Bucket B
01
2
10
10* Bucket C
11
Directory 2
15* 19* 11* Bucket D
Figure 6.1: Extendible hashing example
Now, let us consider insertion of data entry 20* (binary 10100). Looking at directory
element 00, we are led to bucket A, which is already full. So, we have to split the bucket by
allocating a new bucket and redistributing the contents (including the new entry to be inserted)
across the old bucket and its 'split image (new bucket). To redistribute entries across the old
bucket and its split image, we consider the last three bits; the last two bits are 00, indicating a
data entry that belongs to one of these two buckets, and the third bit discriminates between these
buckets. That is if a key’s last three bits are 000, then it belongs to bucket A and if the last three
bits are 100, then the key belongs to Bucket A2. As we are using three bits for A and A2, so the
local depth of these buckets becomes 3. This is illustrated in the below Figure 6.2.

Local depth 3
16* 24* Bucket A
Global depth 3
28* 12* 20* Bucket A2
2
00 2
01 21* 17* 25* 13* Bucket B
10
2
11
10* Bucket C

Directory 2
15* 19* 11* Bucket D
Figure 6.2: After inserting 20 and splitting Bucket A

After split, Bucket A and A2 have local depth greater than global depth (3 > 2), so double
the directory and use three bits as global depth. As Bucket A and A2 has local depth 3, so they
have separate pointers from the directory. But, Buckets B, C and D use only local depth 2, so
they have two pointers each. This is shown in the below diagram.

Local depth 3
16* 24* Bucket A
Global depth 3
28* 12* 20* Bucket A2
3
000
2 Bucket B
001
21* 17* 25* 13*
010
011
100 2 Bucket C
101 10*
110
111 2 Bucket D
15* 19* 11*
Directory

Figure 6.3: After inserting 20 and doubling the directory


An important point that arises is whether splitting a bucket necessitates a directory
doubling. Consider our example, as shown in Figure 6.3. If we now insert 9* (01001), it belongs
in bucket B; this bucket is already full. We can deal with this situation by splitting the bucket and
using directory elements 001 and 101to point to the bucket and its split image. This is shown in
the below diagram. As Bucket B and its split image now have local depth three and it is not
greater than global depth. Hence, a bucket split does not necessarily require a directory doubling.
However, if either bucket A or A2 grows full and an insert then forces a bucket split, we are
forced to double the directory once again.
Local depth 3
16* 24* Bucket A
Global depth 3
28* 12* 20* Bucket A2
3
000 3
001 9*
17* Bucket B
010
011 3
100 21* 25* 13* Bucket B2
101
110 2
111 10* Bucket C
2
Directory 15* 19* 11* Bucket D
Figure 6.4: After inserting 9

Key Observations:
 A Bucket will have more than one pointers pointing to it if its local depth is less than the
global depth.
 When overflow condition occurs in a bucket, all the entries in the bucket are rehashed
with a new local depth.
 If new Local Depth of the overflowing bucket is equal to the global depth, only then the
directories are doubled and the global depth is incremented by 1.
 The size of a bucket cannot be changed after the data insertion process begins.

ii. Linear Hashing


Linear hashing is a dynamic hashing technique that linearly grows or shrinks number of
buckets in a hash file without a directory as used in Extendible Hashing. It uses a family of hash
functions instead of single hash function.
This scheme utilizes a family of hash functions h0, h1, h2, ... , with the property that each
function's range is twice that of its predecessor. That is, if hi maps a data entry into one of N
buckets, hi+1 maps a data entry into one of 2N buckets. One example of such hash function
family can be obtained by: hi+1 (key) = key mod (2i N) where N is the initial number of
buckets and i = 0,1,2,….
Initially it use N buckets labelled 0 through N–1 and an initial hashing function h0(key) =
key % N is used to map any key into one of the N buckets. For each overflow bucket, one of the
buckets in serial order will be splited and its content is redistributed between it and its split
image. That is, for first time overflow in any bucket, bucket 0 will be splited, for second time
overflow in any bucket; bucket 1 will be splited and so on. When number of buckets becomes
2N, then this marks the end of splitting round 0. Hashing function h0 is no longer needed as all
2N buckets can be addressed by hashing function h1. In new round namely splitting-round 1,
bucket split once again starts from bucket 0. A new hash function h2 will be used. This process is
repeated when the hash file grows.

Example: Let N = 4, so we use 4 buckets and hash function h0(key) = key % 4 is used to map
any key into one of the four buckets. Let us initially insert 4, 13, 19, 25, 14, 24, 15, 18, 23, 11,
16, 12 and 10.This is shown in the below figure.

Bucket# h1 h0 Primary pages Overflow pages


0 000 00 4* 24* 16* 12*

1 001 01 13* 25*

2 010 10 14* 18* 10*

3 011 11 19* 15* 23* 11*

Next, when 27 is inserted, an overflow occurs in bucket 3. So, bucket 0 (first bucket) is splited
and its content is distributed between bucket 0 and new bucket. This is shown in below figure.

Bucket# h1 h0 Primary pages Overflow pages


0 000 00 24* 16*

1 001 01 13* 25*


2 010 10 14* 18* 10*
3 011 11 19* 15* 23* 11* 27*

4 100 00 4* 12*
Next, when 30, 31 and 34 is inserted, an overflow occurs in bucket 2. So, bucket 1 is splited and
its content is distributed between bucket 1 and new bucket. This is shown in below figure.
Bucket# h1 h0 Primary pages Overflow pages
0 000 00 24* 16*
1 001 01 13*
2 010 10 14* 18* 10* 30* 34*
3 011 11 19* 15* 23* 11* 27* 31*
4 100 00 4* 12*
5 101 01 25*

When 32, 35, 40 and 48 is inserted, an overflow occurs in bucket 0. So, bucket 2 is splited and its
content is distributed between bucket 2 and new bucket. This is shown in below figure.
Bucket# h1 h0 Primary pages Overflow pages
0 000 00 24* 16* 32* 40* 48*
1 001 01 13*
2 010 10 18* 10* 34*
3 011 11 19* 15* 23* 11* 27* 31* 35*
4 100 00 4* 12*
5 101 01 25*
6 110 10 14* 30*

When 26, 20 and 42 is inserted, an overflow occurs in bucket 0. So, bucket 3 is splited and its
content is distributed between bucket 3 and new bucket. This is shown in below figure.
Bucket# h1 h0 Primary pages Overflow pages
0 000 00 24* 16* 32* 40* 48*
1 001 01 13*
2 010 10 18* 10* 34* 26* 42

3 011 11 19* 11* 27* 35*

4 100 00 4* 12* 20*


5 101 01 25*
6 110 10 14* 30*
7 111 11 15* 23* 31*
This marks the end of splitting round. Hashing function h0 is no longer needed as all 2N
buckets can be addressed by hashing function h1. In new round namely splitting-round 1, bucket
split once again starts from bucket 0. A new hash function h2 will be used. This process is
repeated.

Intuitions for Tree Indexes

We can use tree-like structures as index as well. For example, a binary search tree can
also be used as an index. If we want to find out a particular record from a binary search tree, we
have the added advantage of binary search procedure, that makes searching be performed even
faster. A binary tree can be considered as a 2-way Search Tree, because it has two pointers in
each of its nodes, thereby it can guide you to two distinct ways. Remember that for every node
storing 2 pointers, the number of value to be stored in each node is one less than the number of
pointers, i.e. each node would contain 1 value each.

The abovementioned concept can be further expanded with the notion of the m-Way
Search Tree, where m represents the number of pointers in a particular node. If m = 3, then each
node of the search tree contains 3 pointers, and each node would then contain 2 values.
We use mainly two tree structure indexes in DBMS. They are:

 Indexed Sequential Access Methods (ISAM)


 B+ Tree

INDEXED SEQUENTIAL ACCESS METHODS (ISAM)


ISAM is a tree structure data that allows the DBMS to locate particular record using index
without having to search the entire data set.
 The records in a file are sorted according to the primary key and saved in the disk.
 For each primary key, an index value is generated and mapped with the record. This
index is nothing but the address of record.
 A sorted data file according to primary index is called an indexed sequential file.
 The process of accessing indexed sequential file is called ISAM.
 ISAM makes searching for a record in larger database is easy and quick. But proper
primary key has to be selected to make ISAM efficient.
 ISAM gives flexibility to generate index on other fields also in addition to primary key
fields.
ISAM contain three types of nodes:
 Non-leaf nodes: They store the search index key values.
 Leaf nodes: They store the index of records.
 Overflow nodes: They also store the index of records but after the leaf node is full.

On ISAM, we can perform search, insertion and deletion operations.


Search Operation: It follows binary search process. The record to be searched will be available
in the leaf nodes or in overflow nodes only. The non-leaf nodes are used to search the value.
Insertion operation: First locate a leaf node where the insertion to be take place (use binary
search). After finding leaf node, insert it in that leaf node if space is available, else create an
overflow node and insert the record index in it, and link the overflow node to the leaf node.
Deletion operation: First locate a leaf node where the deletion to be take place (use binary
search). After finding leaf node, if the value to be deleted is in leaf node or in overflow node,
remove it. If the overflow node is empty after removing the deleted value, then delete overflow
node also.
Example: Insert 10, 23, 31, 20, 68, 35, 42, 61, 27, 71, 46 and 59
31

23 68 42 59

10 20 23 27 68 71 31 35 42 46 59 61
After inserting 24, 33, 36, and 39 in the above tree, it looks like
31

23 68 42 59

10 20 23 27 68 71 31 35 42 46 59 61

24 33 36

39

Deletion: From the above figure, after deleting 42, 71, 24 and 36
31

23 68 42 59

10 20 23 27 68 31 35 46 59 61

33

39

B+ Tree:
B+ Tree is an extension of Binary Tree which allows efficient insertion, deletion and search
operations. It is used to implement indexing in DBMS. In B+ tree, data can be stored only on the
leaf nodes while internal nodes can store the search key values.

1. B+ tree of an order m can store max m-1 values at each node.


2. Each node can have a maximum of m children and at least m/2 children (except root).
3. The values in each node are in sorted order.
4. All the nodes must contain at least half full except the root node.
5. Only leaf nodes contain values and non-leaf nodes contain search keys.
B+ tree Search:
Searching for a value in the B+-Tree always starts at the root node and moves downwards
until it reaches a leaf node. The search procedure follows binary tree search procedure.

1. Read the value to be searched. Let us say this value as X.


2. Start the search process from root node
3. At each non-leaf node (including root node),
a. If all the values in the non-leaf node are greater than X, then move to its first child
b. If all the values in the non-leaf node are less than or equal to X, then move to its
last child
c. If for any two consecutive values in the non-leaf node, left value is less and right
value greater than or equal to X, then move to the child node whose pointer is in
between these two consecutive values.
4. Repeat step-3 until a leaf node reaches.
5. At leaf node compare the key with the values in that node from left to right. If the key
value is found then display found. Otherwise display it is not found.

Example: Searching for 35 in the below given B+ tree. The search path is shown in red color.

18

11 31 64

8 15 23 42 59 68

2 5 8 9 11 12 15 16 18 20 23 27 31 35 42 46 59 61 64 66 68 71

B+ tree Insertion:

1. Apply search operation on B+ tree and find a leaf node where the new value has to insert.
2. If the leaf node is not full, then insert the value in the leaf node.
3. If the leaf node is full, then
a. Split that leaf node including newly inserted value into two nodes such that each
contains half of the values (In case of odd, 2nd node contains extra value).
b. Insert smallest value from new right leaf node (2nd node) into the parent node.
Add pointers from these new leaf nodes to their parent node.
c. If the parent is full, split it too. Add the middle key (In case of even,1st value from
2nd part)of this parent node to its parent node.
d. Repeat until a parent is found that need not split.
4. If the root splits, create a new root which has one key and two pointers.

Example: Insert 1,5,3,7,9,2,4,6,8,10 into B+ tree of an order 4.

B+ tree of order 4 indicates there are maximum 3 values in a node.

Initially

After inserting 1
1

After inserting 5
1 5

After inserting 3
1 3 5

5
After inserting 7
1 3 5 7
1 3 5 7

After inserting 9
5

1 3 5 7 9

After inserting 2
5

1 2 3 5 7 9
After inserting 4
5 3 5

1 2 3 4 5 7 9 1 2 3 4 5 7 9

After inserting 6
3 5

1 2 3 4 5 7 9 6

3 5 7

1 2 3 4 5 6 7 9

After inserting 8
3 5 7

1 2 3 4 5 6 7 8 9

After inserting 10
3 5 7

1 2 3 4 5 6 7 8 9 10

3 5 7 9

1 2 3 4 5 6 7 8 9 10

3 5
9

9 10
1 2 3 4 5 6 7 8
B+ tree Deletion

 Identify the leaf node L from where deletion should take place.
 Remove the data value to be deleted from the leaf node L
 If L meets the "half full" criteria, then its done.
 If L does not meets the "half full" criteria, then
o If L's right sibling can give a data value, then move smallest value in right sibling
to L (After giving a data value, the right sibling should satisfy the half full
criteria. Otherwise it should not give)
o Else, if L's left sibling can give a data value, then move largest value in left
sibling to L (After giving a data value, the left sibling should satisfy the half full
criteria. Otherwise it does not give)
o Else, merge L and a sibling
o If any internal nodes (including root) contain key value same as deleted value,
then delete those values and replace with it successor. This deletion may
propagate up to root. (If the changes propagate up to root then tree height
decreases).

Example: Consider the given below tree and delete 19,


19

5 14 24 33

2 3 5 7 14 16 19 20 22 24 27 29 33 34 38 39

Delete 19 : Half full criteria is satisfied even after deleting 19, so just delete 19 from leaf node

19

5 14 24 33

2 3 5 7 14 16 20 22 24 27 29 33 34 38 39
Delete 20: Half full criteria is not satisfied after deleting 20, so bring 24 from its right sibling
and change key values in the internal nodes.
19

5 14 27 33

2 3 5 7 14 16 22 24 27 29 33 34 38 39

Delete 24: Half full criteria is not satisfied after deleting 24, bringing a value from its siblings
also not possible. Therefore merge it with its right sibling and change key values in the internal
nodes.
19

5 14 33

2 3 5 7 14 16 22 27 29 33 34 38 39

Delete 5: a half full criterion is not satisfied after deleting 5, bringing a value from its siblings
also not possible. Therefore merge it with its left sibling (you can also merge with right) and
change key values in the internal nodes.
19

14 33

2 3 7 14 16 22 27 29 33 34 38 39

Delete 7: Half full criteria is satisfied even after deleting 7, so just delete 7 from leaf node.
17

14 33

2 3 14 16 22 27 29 33 34 38 39
Delete 2: Half full criteria is not satisfied after deleting 2, bringing a value from its siblings also
not possible. Therefore merge it with its right sibling and change key values in the internal
nodes.
22 33

3 14 16 22 27 29 33 34 38 39

Indexes and Performance Tuning


Indexing is very important to execute DBMS query more efficiently. Adding indexes to
important tables is a regular part of performance tuning. When we identify a frequently executed
query that is scanning a table or causing an expensive key lookup, then first consideration is if an
index can solve this problem. If yes add index for that table.

While indexes can improve query execution speed, the price we pay is on index
maintenance. Update and insert operations need to update the index with new data. This means
that writes will slow down slightly with each index we add to a table. We also need to monitor
index usage and identify when an existing index is no longer needed. This allows us to keep our
indexing relevant and trim enough to ensure that we don’t waste disk space and I/O on write
operations to any unnecessary indexes. To improve performance of the system, we need to do the
following:

 Identify the unused indexes and remove them.


 Identify the minimally used indexes and remove them.
 An index that is scanned more frequently, but rarely finds the required answer. Modify
the index to reach the answer.
 Identify the indexes that are very similar and combine them.

You might also like