KEMBAR78
DBMS | PDF | Relational Database | Databases
0% found this document useful (0 votes)
66 views333 pages

DBMS

Uploaded by

avsms 6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views333 pages

DBMS

Uploaded by

avsms 6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 333

LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING

(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

17CI09 Data Base Management Systems


Program & Semester: B.Tech & II SEM
CSE
Academic Year: 2021 - 22

UNIT I
Module 1: INTRODUCTION
An overview of Database Management
System
A database-management system (DBMS) is a collection of interrelated
data and a set of programs to access those data. The collection of data,
usually referred to as the database.
(or)
The DBMS is a general-purpose software system that facilitates the
processes of defining, constructing, manipulating, and sharing databases
among various users and applications.

Database has the following implicit properties:


 A database represents some aspect of the real world, sometimes called
the miniworld or the universe of discourse (UoD).
 A database is a logically coherent collection of data with some inherent
Meaning.
 A database is designed, built, and populated with data for a specific
purpose. It has an intended group of users and some preconceived
applications in which these users are interested.
Database System Applications
✓ Banking
✓ Airlines
✓ Universities
✓ Credit card transactions
✓ Finance
✓ Sales
✓ Manufacturing
✓ Human resources
Evolution of Database Management Systems
❖ File Management System
❖ Hierarchical database System
❖ Network Database System
❖ Relational Database System

Classification of Database Management System


Based on the data model
✓ Hierarchical database
✓ Network database
✓ Relational Database
✓ Object oriented database
✓ Object related database

Based on the users


✓ Single user
✓ Multiple users
Based on the sites over which network is distributed
✓ Centralized database system
✓ Parallel network database system
✓ Distributed database system

Database Systems versus File Systems


 One way to keep the information on a computer is to store it in
operating system files. File-processing system is supported by a
conventional operating system.
 The system stores permanent records in various files, and it needs
different application programs to extract records from, and add records
to, the appropriate files.
 A file management system is an abstraction to store, retrieve, and
management and update a set of files. A File Management System keep
track on the files and also manage them
Drawbacks of using file systems
Data redundancy and inconsistency
Multiple file formats, duplication of information in different files
Difficulty in accessing data
Need to write a new program to carry out each new task
Data isolation
multiple files and formats
Integrity problems
 Integrity constraints (e.g. account balance > 0) become part of
program code.
 Hard to add new constraints or change existing ones
Atomicity of updates
 Failures may leave database in an inconsistent state with partial
updates carried out
E.g. transfer of funds from one account to another should either
complete or not happen at all
Concurrent access by multiple users
 Concurrent accessed needed for performance
 Uncontrolled concurrent accesses can lead to inconsistencies
E.g. two people reading a balance and updating it at the same time
Security problems
Advantages of DBMS:

•Data Independence
•Efficient data access
•Data Integrity and Security.
•Data Administration.
•Concurrent access and Crash recovery.
•Reduced application development time.
Data Models
A database model shows the logical structure of a database, including
the relationships and constraints that determine how data can be
stored and accessed. Individual database models are designed based
on the rules and concepts of whichever broader data model the
designers adopt. Most data models can be represented by an
accompanying database diagram.
We have different data models
 The Entity-Relationship Model
 Relational Model
 Object oriented data Model
 Object Relational Model
 Network data Model
 Hierarchical data model

The Entity-Relationship(ER) Model:


The entity-relationship (E-R) data model is based on a perception of a
real world that consists of a collection of basic objects, called entities,
and of relationships among these objects.
 An entity is a “thing” or “object” in the real world that is
distinguishable from other objects.
 Entities are described in a database by a set of attributes. A unique
identifiable attribute must be assigned to each entity.
 The set of all entities of the similar type are termed as Entity set.
 A relationship is an association among several entities. Set of all
relationship of the similar type are termed as relationship set.
 The overall logical structure (schema) of a database can be expressed
graphically by an E-R diagram, which is built up from the following
components:
• Rectangles: which represent entity sets
• Ellipses: which represent attributes
• Diamonds: which represent relationships among entity sets
• Lines: which link attributes to entity sets and entity sets to
relationships
Relational Model
 The relational model uses a collection of tables to represent both data
and the relationships among those data. Each table has multiple
columns, and each column corresponds to an attribute which has a
unique name.
 Together, the attributes in a relation are called a domain.
 A particular attribute or combination of attributes is chosen as a
primary key that can be referred to in other tables, when it’s called a
foreign key.
 Each row, also called a tuple, includes data about a specific instance of
the entity. Relational databases are typically written in Structured
Query Language (SQL). The model was introduced by E.F. Codd in
1970.
Object oriented model
This model defines a database as a collection of objects, or reusable
software elements, with associated features and methods. There are
several kinds of object-oriented databases:
A multimedia database incorporates media, such as images, that could
not be stored in a relational database.
A hypertext database allows any object to link to any other object. It’s
useful for organizing lots of disparate data, but it’s not ideal for
numerical analysis.
The object-oriented database model is the best known post-relational
database model, since it incorporates tables, but isn’t limited to tables.

Object-relational model
This hybrid database model combines the simplicity of the relational
model with some of the advanced functionality of the object-oriented
database model. In essence, it allows designers to incorporate objects into
the familiar table structure.
Three Schema Architecture
A database system is a collection of interrelated files and a set of programs
that allow users to access and modify these files. It provides users with an
abstract view of the data. That is, the system hides certain details of how
the data are stored and maintained.
Developers hide the complexity from users through several levels of
abstraction, to simplify users’ interactions with the system.
The data in the DBMS is described at three levels of abstraction
Physical Level (Internal Schema)

❖ Logical Level (Conceptual Schema)
❖ View Level (External Schema)
Physical Level
The lowest level of abstraction describes how the data are stored. The
physical level has an internal schema which describes the physical storage
structure of the database.
Logical Level
 Logical level describes what data are stored in the database, and what
relationships exist among those data.
 Logical level has a conceptual schema, which describes the structure of
the whole database for a community of users.
The conceptual schema hides the details of physical storage structures
and concentrates on describing entities, data types, relationships, user
operations, and constraints.
View level
 This is the highest level in data abstraction.
 This level includes number of external schemas or user views. Each
external schema describes the part of the database that a particular user
group is interested in and hides the rest of the database from that user
group.
Database Schema and Instances
The description of a database is called the database schema, which is
specified during database design and is not expected to change
frequently.
 A schema diagram displays only some aspects of a schema, such as the
names of record types and data items, and some types of constraints.
 The data in the database at a particular moment in time is called a
database state or Instance.
 Database schema defines the variable declarations in tables that belong
to a particular database, the values of these variables at a particular
moment of time is called data instance.
 The distinction between database schema and database state is very
important
 The DBMS stores the descriptions of the schema constructs and
constraints—also called the meta-data
Data Independence
 Data Independence can be defined as the capacity to change the
schema at one level of a database system without having to change the
schema at the next higher level.
 Metadata itself follows a layered architecture, so that when we change
data at one layer, it does not affect the data at another level. This data is
independent but mapped to each other.
 There are two types of Data Independence
❖ Logical Data Independence
❖ Physical Data Independence
Logical data independence is the capacity to change the conceptual
schema without having to change external schemas or application
programs.
Physical data independence is the capacity to change the internal
schema without having to change the conceptual schema. Hence, the
external schemas need not be changed as well.

Database Languages
Database languages are used for read, update and store data in a
database. There are several such languages that can be used for this
purpose; one of them is SQL (Structured Query Language).
A database system provides a data definition language to specify the
database schema and a to express database queries and updates. data
manipulation language.
There are three categories of DBMS Languages. Those are
1) Data Definition Language (DDL)
2) Data Manipulation Language (DML)
3) Data Control Language (DCL)
Data Definition Language (DDL)
DDL or Data Definition Language actually consists of the SQL commands
that can be used to define the database schema. It simply deals with
descriptions of the database schema and is used to create and modify the
structure of database objects in the database.

CREATE – is used to create the database or its objects (like table, index,
function, views, store procedure and triggers).
DROP – is used to delete objects from the database.
ALTER-is used to alter the structure of the database.
COMMENT –is used to add comments to the data dictionary.
RENAME –is used to rename an object existing in the database
Data Manipulation Language (DML)
The SQL commands that deals with the manipulation of data present in the
database belong to DML or Data Manipulation Language and this includes
most of the SQL statements.
Examples of DML:
INSERT – is used to insert data into a table.
UPDATE – is used to update existing data within a table.
DELETE– is used to delete records from a database table.
TRUNCATE–is used to remove all records from a table, including all spaces
allocated for the records are removed.

Data Control Language (DCL)


DCL includes commands such as GRANT and REVOKE which mainly
deal with the rights, permissions and other controls of the database
system.

Examples of DCL commands:

GRANT-gives user’s access privileges to the database.


REVOKE-withdraw user’s access privileges given by using the GRANT
command.
Database System Structure
A database system is partitioned into modules that deal with each of the
responsibilities of the overall system. The functional components of a
database system can be broadly divided into the storage manager and the
query processor components.

Storage manager
 A storage manager is a program module that provides the interface between
the low-level data stored in the database and the application programs and
queries submitted to the system.
 Thus, the storage manager is responsible for storing, retrieving, and
updating data in the database. The storage manager components include:

Authorization and integrity manager: which tests for the satisfaction of


integrity constraints and checks the authority of users to access data.
Transaction manager: which ensures that the database remains in a
consistent (correct) state despite system failures, and
that concurrent transaction executions proceed
without conflicting.
File manager: which manages the allocation of space on disk storage and the
data structures used to represent information stored on disk.
Buffer manager: which is responsible for fetching data from disk storage
into main memory, and deciding what data to cache in
main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle
data sizes that are much larger than the size of main
memory.
The storage manager implements several data structures as part of the
physical system implementation:
• Data files: which store the database itself.
• Data dictionary: which stores metadata about the structure of the
database, in particular the schema of the database.
• Indices: which provide fast access to data items that hold particular
values.
The Query Processor
The query processor components include
• DML compiler, which translates DML statements in a query language
into an evaluation plan consisting of low-level instructions that the query
evaluation engine understands. A query can usually be translated into any
of a number of alternative evaluation plans that all give the same result.
The DML compiler also performs query optimization, that is, it picks the
lowest cost evaluation plan from among the alternatives.
• DDL interpreter, which interprets DDL statements and records the
definitions in the data dictionary.
• Query evaluation engine, which executes low-level instructions
generated by the DML compiler
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING
(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

17CI09 Data Base Management Systems


Program & Semester: B.Tech & III SEM
CSE
Academic Year: 2021 - 22

UNIT I
Module 1I: Data Modelling using the Entity
Relationship Model
The database design process can be divided into six (6) steps.
Requirements Analysis
The very first step in designing a database application is to understand.
 What type of data is to be stored in the Database.
 What applications must be built on top of it.
 What type of operations could be performed on Database.
Conceptual Database Design
In this we have to develop a high level description of the data which is gathered
from the requirement analysis step. For that purpose, we have to use ER Model
concept.

Logical Database Design


In this we must choose a DBMS to implement our database design and convert
the conceptual database design in to a database schema in the data model of the
chosen DBMS. We will consider only relational DBMS, So, we converting the
given conceptual database into relational database schema.
Schema Refinement
In this we have to analyze all the relations in our relational database
schema to identify potential problems and refine it(Normalization).

Physical Database Design


This step involves building indexes on some tables and clustering some
tables.

Application and Security Design


In this we must identify the entities and processes involved in the
application.
ER model concepts
The ER model defines the conceptual view of a database. It works around real-
world entities and the associations among them. At view level, the ER model is
considered a good option for designing databases.
Entity Sets
 An entity is a “thing” or “object” in the real world that is distinguishable from
all other objects.
 An entity is represented by a set of attributes.
 An entity set is a set of entities of the same type that share the same
properties, or attributes.
 Attributes are descriptive properties possessed by each member of an entity
set.
 Entity sets do not need to be disjoint.
 For each attribute, there is a set of permitted values, called the domain, or
value set, of that attribute.
 An attribute, as used in the E-R model, can be characterized by the following
attribute types
✓ Simple and composite attributes
✓ Single-valued and multivalued attributes
✓ Derived attribute
Simple attribute: Simple attributes are atomic values, which cannot be divided
further. For example, a student's phone number is an
atomic value of 10 digits.
Composite attribute: Composite attributes are made of more than one simple
attribute. For example, a student's complete name
may have first_name and last_name.
Derived attribute: Derived attributes are the attributes that do not exist can be
derived from data_of_bin the physical database, but
their values are derived from other attributes present
in the database. For example, average_salary in a
department should not be saved directly in the
database, instead it can be derived. For another
example, age can be derived from date_of_b irth.
Single-value attribute: Single-value attributes contain single value. For
example − Social_Security_Number.
Multi-value attribute: Multi-value attributes may contain more than one values.
For example, a person can have more than one
phone number, email_address, etc.
 An attribute takes a null value when an entity does not have a value
for it. The null value may indicate “not applicable”—that is, that the
value does not exist for the entity.
 Null can also designate that an attribute value is unknown.
Keys
Key is an attribute or collection of attributes that uniquely identifies an entity
among entity set.
For example, the roll_number of a student makes him/her identifiable
among students.
Super Key: A set of attributes (one or more) that collectively identifies an entity
in an entity set.
Candidate Key: A minimal super key is called a candidate key. An entity set may
have more than one candidate key.
Primary Key: A primary key is one of the candidate keys chosen by the database
designer to uniquely identify the entity set.

Entity Types

There are two types of entities are there.


Strong Entity: A Strong entity is an entity which has a primary key as an
attribute.
Weak Entity: Weak Entity is an entity that depends on another entity. Weak entity
does not have key attribute of their own.
Relationship Sets
 A relationship is an association among several entities
 A relationship set is a set of relationships of the same type. Formally, it is
a mathematical relation on n ≥ 2 (possibly nondistinct) entity sets. If E1,
E2, . . .,En are entity sets, then a relationship set R is a subset of
{(e1, e2, . . . , en) | e1 ∈ E1, e2 ∈ E2, . . . , en ∈ En}
where (e1, e2, . . . , en) is a relationship.
Degree of Relationship
The number of participating entities in a relationship defines the degree of the
relationship.
 Binary = degree 2
 Ternary = degree 3
 n-ary = degree n
Types of Relationships: -
 Binary Relationship
 Ternary Relationship
 Recursive Relationship
Binary Relationship:
A Binary Relationship is the relationship between two different Entities i.e. it is a
relationship of role group of one entity with the role group of another entity.
There are three types of cardinalities for Binary Relationships:
 1. One-to-One
 2. One-to-many
 3. Many-to-Many
One-to-One
Here one role group of one entity is mapped to one role group of another entity. In
simple terms one instance of one entity is mapped with only one instance of another
entity. In this type the primary key of one entity must be available as foreign key in
other entity.
Example: consider two entities Person and Driver_License.
One Person should have only one Driver License number.
One-to-Many
One role group of one entity is mapped with many role groups of second entity and
one role group of second entity is mapped with one role group of first entity.
Example: consider two entities Project and Employee. One Project can have many
Employee's working on it but one Employee will always be engaged in only one
Project.
Many-to-Many
One role group of one entity is mapped with many role groups of second entity and
one role group of second entity is mapped with many role groups of first entity. In
these kind of relationships a third table is always associated that defines the
relationship between the two entities.
Example: Consider two entities Student and Books.
 Many Students can have a Book and many Books can be issued to a Student so
in this way this is a many-to-many relationship.
Recursive Relationship
A relationship between two entities of similar entity type is called a recursive
relationship.
Here the same entity type participates more than once in a relationship type with
a different role for each instance.

Ternary Relationship
Association between three entities are called ternary relationship.
Generalization, Specialization and Aggregation in ER model are used for data
abstraction in which abstraction mechanism is used to hide details of a set of
objects.
Generalization
Generalization is the process of extracting common properties from a set of
entities and create a generalized entity from it. It is a bottom-up approach in
which two or more entities can be generalized to a higher level entity if they have
some attributes in common
Specialization
In specialization, an entity is divided into sub-entities based on their
characteristics. It is a top-down approach where higher level entity is specialized
into two or more lower level entities.
Aggregation
An ER diagram is not capable of representing relationship between an entity and
a relationship which may be required in some scenarios. In those cases, a
relationship with its correspoding entities is aggregated into a higher level entity.

For Example, Employee working for a project may require some machinery. So,
REQUIRE relationship is needed between relationship WORKS_FOR and entity
MACHINERY. Using aggregation, WORKS_FOR relationship with its entities
EMPLOYEE and PROJECT is aggregated into single entity and relationship
REQUIRE is created between aggregated entity and MACHINERY.
Relationship Constraints
The limitations which are imposed on a relationship is called Relationship
Constraints. There are two types of Relationship constraints are there. Those are
1) Cardinality Ratio
2) Participation Constraint
Cardinality Ratio
Cardinality defines the number of entities in one entity set, which can be
associated with the number of entities of other set via relationship set.
One-to-one − One entity from entity set A can be associated with at most one
entity of entity set B and vice versa.

One-to-many − One entity from entity set A can be associated with more than
one entities of entity set B however an entity from entity set B, can be associated
with at most one entity.

Many-to-one − More than one entities from entity set A can be associated with at
most one entity of entity set B, however an entity from entity set B can be
associated with more than one entity from entity set A.

Many-to-many − One entity from A can be associated with more than one entity
from B and vice versa.
Participation Constraints
Participation constraints define the least number of relationship instances in
which an entity must compulsorily participate. There are two types of
participation constraint

Total Participation
❑ It specifies that each entity in the entity set must compulsorily participate in
at least one relationship instance in that relationship set.
❑ That is why, it is also called as mandatory participation.
❑ Total participation is represented using a double line between the entity set
and relationship set.
Example of Total participation

Here,
 Double line between the entity set “Student” and relationship set “Enrolled
in” signifies total participation.
 It specifies that each student must be enrolled in at least one course.
Partial Participation
 It specifies that each entity in the entity set may or may not participate in the
relationship instance in that relationship set.
 That is why, it is also called as optional participation.
 Partial participation is represented using a single line between the entity set
and relationship set.
Example of partial participation

Here,
 Single line between the entity set “Course” and relationship set “Enrolled in”
signifies partial participation.
 It specifies that there might exist some courses for which no enrolments are
made.
Relationship between Cardinality and Participation Constraints-
Minimum cardinality tells whether the participation is partial or total.
 If minimum cardinality = 0, then it signifies partial participation.
 If minimum cardinality = 1, then it signifies total participation.

Maximum cardinality tells the maximum number of entities that participates in a


relationship set.
ER- Diagram for College Database
ER-Diagram for Hospital Management System
ER-Diagram for Railway Reservation System
Reduction of ER-Diagram to Tables
Following rules are used for converting an ER diagram into the tables
Rule-01: For Strong Entity Set With Only Simple Attributes
Strong entity set with only simple attributes will require only one table
in relational model.
 Attributes of the table will be the attributes of the entity set.
 The primary key of the table will be the key attribute of the entity set.

Roll_no Name sex

Schema : Student ( Roll_no , Name , Sex )


Rule-02: For Strong Entity Set With Composite Attributes
A strong entity set with any number of composite attributes will require only one
table in relational model. While conversion, simple attributes of the composite
attributes are taken into account and not the composite attribute itself.

Roll_no First_name Last_name City Street House_no

Schema : Student ( Roll_no , First_name , Last_name , House_no , Street , City )


Rule-03: For Strong Entity Set With Multi Valued Attributes
A strong entity set with any number of multi valued attributes will require
two tables in relational model.
 One table will contain all the simple attributes with the primary key.
 Other table will contain the primary key and all the multi valued attributes.

Roll_no city Roll_no Mobile_no

Schema : Student ( Roll_no , city) Schema : Student ( Roll_no , Mobile_no)


Rule-04: Translating Relationship Set into a Table
Relationship set will require one table in the relational model.
Attributes of the table are-
 Primary key attributes of the participating entity sets
 Its own descriptive attributes if any.
 Set of non-descriptive attributes will be the primary key

If we consider the overall ER diagram, three tables will be required in


relational model-
✓ One table for the entity set “Employee”
✓ One table for the entity set “Department”
✓ One table for the relationship set “Works in”
Emp_no Emp-name salary

Employee Table

Dept_id Dept_name

Department Table

Emp_no Dept_id since

Works in Table
Rule-05: For Binary Relationships With Cardinality Ratios
The following four cases are possible

Case-01: Binary relationship with cardinality ratio m:n

Here, three tables will be required-


1. A ( a1 , a2 )
2. R ( a1 , b1 )
3. B ( b1 , b2 )
Case-02: For Binary Relationship With Cardinality Ratio 1:n

Here, two tables will be required-


1. A ( a1 , a2 )
2. BR ( a1 , b1 , b2 )

Here, combined table will be drawn for the entity set B and relationship set R.
Case-03: For Binary Relationship With Cardinality Ratio m:1

Here, two tables will be required-


1. AR ( a1 , a2 , b1 )
2. 2. B ( b1 , b2 )

Here, combined table will be drawn for the entity set A and relationship set R.
Case-04: For Binary Relationship With Cardinality Ratio 1:1

Here, two tables will be required. Either combine ‘R’ with ‘A’ or ‘B’
Way-01:
1. AR ( a1 , a2 , b1 )
2. B ( b1 , b2 )

Way-02:
 1. A ( a1 , a2 )
 2. BR ( a1 , b1 , b2 )
While determining the minimum number of tables required for binary
relationships with given cardinality ratios, following thumb rules must be kept in
mind-
 For binary relationship with cardinality ration m : n , separate and individual
tables will be drawn for each entity set and relationship.
 For binary relationship with cardinality ratio either m : 1 or 1 : n , always
remember “many side will consume the relationship” i.e. a combined table
will be drawn for many side entity set and relationship set.
 For binary relationship with cardinality ratio 1 : 1 , two tables will be
required. You can combine the relationship set with any one of the entity sets.

Rule-06: For Binary Relationship with Both Cardinality


Constraints and Participation Constraints
 Cardinality constraints will be implemented as discussed in Rule-05.
 Because of the total participation constraint, foreign key acquires NOT
NULL constraint i.e. now foreign key can not be null.
Case-01: For Binary Relationship With Cardinality Constraint and Total
Participation Constraint From One Side

Because cardinality ratio = 1 : n , so we will combine the entity set B and


relationship set R.
Then, two tables will be required-
1. A ( a1 , a2 )
2. BR ( a1 , b1 , b2 )
Because of total participation, foreign key a1 has acquired NOT NULL
constraint, so it can’t be null now.
Case-02: For Binary Relationship With Cardinality Constraint and Total
Participation Constraint From Both Sides

If there is a key constraint from both the sides of an entity set with total
participation, then that binary relationship is represented using only single table.

Here, Only one table is required.


ARB ( a1 , a2 , b1 , b2 )
Extended Entity Relationship Model
➢ It includes all modelling concepts of basic ER model.
➢ It includes additional concepts like
 Sub Class and Super Class
 Specialization and Generalization
 Union or Category
 Aggregation
 Relationship Inheritance
Sub Class and Super Class
Sub class and Super class relationship leads the concept of Inheritance. The
relationship between sub class and super class is denoted with symbol.
Super Class
Super class is an entity type that has a relationship with one or more subtypes.
An entity cannot exist in database merely by being member of any super class.
For example: Shape super class is having sub groups as Square, Circle,
Triangle.
Sub Class
 Sub class is a group of entities with unique attributes.
 Sub class inherits properties and attributes from its super class.
For example: Square, Circle, Triangle are the sub class of Shape super class.

Category or Union
 Category represents a single super class or sub class relationship with more
than one super class.
 It can be a total or partial participation.
For example: Car booking, Car owner can be a person, a bank (holds a possession
on a Car) or a company. Category (sub class) → Owner is a subset of
the union of the three super classes → Company, Bank, and Person.
A Category member must exist in at least one of its super classes.
Constraints on Specialization & Generalization
There are 2 types of constraints are available on specialization & Generalization.
Those are Disjointness and Completeness.

Disjointness
Disjointness: An entity can be a member of at most one of the subclass
entities.
Non Disjointness: An entity is not a member of anyone of the subclass
entities.

Completeness Constraint:
Total: It specifies that every entity in the super class must be a member
of some sub class. It is indicated by double line.
Partial: It specifies that entity need not belong to any of the subclass. It
is indicated with single line.
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING
(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

17CI09 Data Base Management Systems


Program & Semester: B.Tech & III SEM
CSE
Academic Year: 2021 – 22

UNIT II
Module 1: Relational Data Model and
Language
Relational Data Model Concepts
Codd proposed the Relational data model in 1970. Before 1970 most of the
database systems follows one of the two data models. Those are
✓ Hierarchical Data Model
✓ Network Data Model
 The relational model represents the database as a collection of relations.
 A relation is thought of as a table of values, each row in the table represents a
collection of related data values.
 A row represents a fact that typically corresponds to a real-world entity or
relationship.
 The table name and column names are used to help to interpret the meaning
of the values in each row.
Example: Student relationship.

Relational data model is the primary data model, which is used widely around
the world for data storage and processing. This model is simple and it has all the
properties and capabilities required to process data with storage efficiency.

A Relation mainly contains two things


✓ Relation Schema
✓ Relation Instance
Relation Schema
Relation schema specifies the name of the relation and name of each field and
domain of each field. Here domain specifies that the set of values that are
associated to the given field.

A relation schema R, denoted by R(A1, A2, ...,An), is made up of a relation name


R and a list of attributes, A1, A2, ..., An. Each attribute Ai is the name of a role
played by some domain D in the relation schema R. D is called the domain of Ai
and is denoted by dom(Ai).

A relation (or relation state) r of the relation schema R(A1, A2, ..., An) can be
denoted by r(R), is a set of n-tuples r = {t1, t2, ..., tm}.
 Each n-tuple t is an ordered list of n values t =<v1, v2, ..., vn>, where each
value vi, 1 ≤ i ≤ n, is an element of dom (Ai)

Definition of a relation can be restated more formally using set theory concepts. A
relation (or relation state) r(R) is a mathematical relation of degree n on the
domains dom(A1), dom(A2), ..., dom(An), which is a subset of the Cartesian
product (denoted by ×) of the domains that define R:
r(R) ⊆ (dom(A1) × dom(A2) × ... × dom(An))
The degree (or arity) of a relation is the number of attributes in its
relation schema.
Ex: Student (sid: string, name:string, login:string, age:integer, gpa:real)

Relation Instance
An instance of a relation is set of tuples also called as records, in which each
tuple has a same number of fields as the relation schema.

Domains, Attributes, Tuples, and Relations


A domain D is a set of atomic values. A data type or format is also specified
for each domain. It is also useful to specify a name for the domain, to help in
interpreting its values.
 Degree or arity of a Relation is Number of fields in a relation.
 Cardinality of a relation instance is number of tuples in it.
Tables
In relational data model, relations are saved in the format of Tables. This format
stores the relation among entities. A table has rows and columns, where rows
represents records and columns represent the attributes.
Tuple
A single row of a table, which contains a single record for that relation is called a
tuple.
Relation instance
A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema
A relation schema describes the relation name table name, attributes, and their
names.
Relation key
Each Relation has one or more attributes, known as relation key, which can
identify the row in the relation table uniquely.
Attribute domain
Every attribute has some pre-defined value scope, known as attribute domain.
Integrity Constraints
Set of rules or conditions specified on a database schema and restricts the data
that can be stored in an instance of the database. They ensures that the data
insertion, updating and other processes have to be performed in such way that
data integrity is not affected.
Types of integrity constraints
✓ Key constraints
✓ Domain constraints
✓ Referential integrity constraints
✓ Entity Integrity Constraints
Domain constraints

Domain constraints can be defined as the definition of a valid set of values for an
attribute. The data type of domain includes string, character, integer, time, date,
currency, etc.

The value of the attribute must be available in the corresponding domain .


Entity integrity constraints
 The entity integrity constraint states that primary key value can't be
null.
 This is because the primary key value is used to identify individual
rows in relation and if the primary key has a null value, then we can't
identify those rows.
 A table can contain a null value other than the primary key field.
Referential Integrity Constraints
 A referential integrity constraint is specified between two tables.
 In the Referential integrity constraints, if a foreign key in Table 1 refers to the
Primary Key of Table 2, then every value of the Foreign Key in Table 1 must
be available in Table 2 or NULL.
Key constraints
 Keys are the entity set that is used to identify an entity within its entity
set uniquely.
 An entity set can have multiple keys, but out of which one key will be
the primary key.
 A primary key can contain a unique value in the relational table.
Relational Algebra
Relational Algebra is procedural query language, which takes Relation as input
and generate relation as output. It uses operators to perform queries. An operator
can be either unary or binary.
Relational algebra mainly provides theoretical foundation for relational databases
and SQL.
Relational algebra is performed recursively on a relation and intermediate results
are also considered relations.
We can divide the operations in two categories:
1. Basic Operations
2. Derived Operations
Basic/Fundamental Operations:
1. Select (σ)
2. Project (∏)
3. Union (∪)
4. Set Difference (-)
5. Cartesian product (X)
6. Rename (ρ)
Derived Operations
1. Natural Join (⋈)
2. Left, Right, Full outer join (⟕, ⟖, ⟗)
3. Intersection (∩)

Select Operator (σ)


Select Operator is denoted by sigma (σ) and it is used to find the tuples (or rows)
in a relation (or table) which satisfy the given condition.

If you understand little bit of SQL then you can think of it as a where clause in
SQL, which is used for the same purpose.
Syntax
σ Condition/Predicate (Relation/Table name)
Customer_Id Customer_Name Customer_City
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Example
σ Customer_City="Agra" (CUSTOMER)

Customer_Id Customer_Name Customer_City


C10100 Steve Agra
C10111 Raghu Agra

Project Operator (∏)


Project operator is denoted by ∏ symbol and it is used to select desired columns
(or attributes) from a table (or relation).

Project operator in relational algebra is similar to the Select statement in SQL.

Syntax

∏ column_name1, column_name2, ...., column_nameN (table_name)


In this example, we have a table CUSTOMER with three columns, we want to
fetch only two columns of the table, which we can do with the help of Project
Operator ∏.

∏ Customer_Id, Customer_Name (CUSTOMER)

Customer_Id Customer_Name
C10100 Steve
C10111 Raghu
C10115 Chaitanya
C10117 Ajeet
C10118 Carl
Union Operator (∪)
Union operator is denoted by ∪ symbol and it is used to select all the rows (tuples)
from two tables (relations).

Lets discuss union operator a bit more. Lets say we have two relations R1 and R2
both have same columns and we want to select all the tuples(rows) from these
relations then we can apply the union operator on these relations.

Note: The rows (tuples) that are present in both the tables will only appear once in
the union set. In short you can say that there are no duplicates present after the
union operation.
r ∪ s = { t | t ∈ r or t ∈ s}

For a union operation to be valid, the following conditions must hold


✓ r, and s must have the same number of attributes.
✓ Attribute domains must be compatible.
✓ Duplicate tuples are automatically eliminated.
Syntax
table_name1 ∪ table_name2
Course_Id Student_Name Student_Id
C101 Aditya S901
C104 Aditya S901
C106 Steve S911
C109 Paul S921
C115 Lucy S931

Student_Id Student_Name Student_Age


S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18
∏ Student_Name (COURSE) ∪ ∏ Student_Name (STUDENT)

Student_Name
Aditya
Carl
Paul
Lucy
Rick
Steve

As you can see there are no duplicate names present in the output even
though we had few common names in both the tables, also in the
COURSE table we had the duplicate name itself.
Set Difference (-)
Set Difference is denoted by – symbol. Lets say we have two relations R1 and
R2 and we want to select all those tuples(rows) that are present in Relation R1
but not present in Relation R2, this can be done using Set difference R1 – R2.
Syntax
table_name1 - table_name2
Example
∏ Student_Name (STUDENT) - ∏ Student_Name (COURSE)

A query to select those student names that are present in STUDENT table but not
present in COURSE table

Student_Name
Carl
Rick
Cartesian product (X)
Cartesian Product is denoted by X symbol.
Lets say we have two relations R1 and R2 then the cartesian product of these two
relations (R1 X R2) would combine each tuple of first relation R1 with the each
tuple of second relation R2.
Syntax
R1 X R2
R1
A B
AA 100
BB 200
CC 300
R2

X Y
XX 99
YY 11
ZZ 101
R1 X R2

A B X Y
AA 100 XX 11
AA 100 YY 99
AA 100 ZZ 101
BB 200 XX 99
BB 200 YY 11
BB 200 ZZ 101
CC 300 XX 99
CC 300 YY 11
CC 300 ZZ 101
Rename (ρ)
Rename (ρ) operation can be used to rename a relation or an attribute of a
relation.

Syntax
ρ(new_relation_name, old_relation_name)

Example
Customer_Id Customer_Name Customer_City
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
ρ(CUST_NAMES, ∏Customer_Name (CUSTOMER))
CUST_NAMES
Steve
Raghu
Chaitanya
Ajeet
Carl
Derived Operators
Intersection Operator (∩)
Intersection operator is denoted by ∩ symbol and it is used to select
common rows (tuples) from two tables (relations).
Lets say we have two relations R1 and R2 both have same columns and
we want to select all those tuples(rows) that are present in both the
relations, then in that case we can apply intersection operation on these
two relations R1 ∩ R2.

Syntax
table_name1 ∩ table_name2
Example
∏ Student_Name (COURSE) ∩ ∏ Student_Name (STUDENT)

Student_Name
Aditya
Steve
Paul
Lucy
JOIN
An SQL Join is used to combine data from two or more tables, based on a
common field between them.
Example
Orders table

OrderID CustomerID OrderDate


10308 2 1996-09-18
10309 37 1996-09-19
10310 77 1996-09-20

Customers table

CustomerI CustomerName ContactName Country


D
1 Alfreds Futterkiste Maria Anders Germany
2 Ana Trujillo Ana Trujillo Mexico
Emparedados y helados
3 Antonio Moreno Antonio Moreno Mexico
Taquería
The "CustomerID" column in the "Orders" table refers to the "CustomerID" in the
"Customers" table. The relationship between the two tables above is the
"CustomerID" column.

SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate


FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;

OrderID CustomerName OrderDate


10308 Ana Trujillo Emparedados y helados 9/18/1996
Different Types of SQL JOINs
Here are the different types of the JOINs in SQL:

(INNER) JOIN: Returns records that have matching values in both tables
LEFT (OUTER) JOIN: Returns all records from the left table, and the matched
records from the right table
RIGHT (OUTER) JOIN: Returns all records from the right table, and the
matched records from the left table
FULL (OUTER) JOIN: Returns all records when there is a match in either left or
right table.
Left Outer Join
Returns all records from the left table, and the matched records from the right
table
The result of a left outer join (or simply left join) for
tables Employee and Location always contains all records of the "left" table
(Employee), even if the join-condition does not find any matching record in the
"right" table (Location).

select * from employee left outer join on employee.empID =


location.empID;
Right outer Join
Returns all records from the right table, and the matched records from the
left table.
A right outer join (or right join) closely resembles a left outer join, except
with the treatment of the tables reversed. Every row from the "right" table
(Location) will appear in the joined table at least once. If no matching
row from the "left" table (Employee) exists, NULL will appear in
columns from Employee for those records that have no match in
Location.
select * from employee right outer join on employee.empID =
location.empID;
Full Outer Join

Full Outer Join or Full Join is to retain the nonmatching information by including
nonmatching rows in the results of a join, use a full outer join. It includes all rows
from both tables, regardless of whether or not the other table has a matching value.
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING
(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

17CI09 Data Base Management Systems


Program & Semester: B.Tech & III SEM
CSE
Academic Year: 2021 - 22

UNIT II
Module 1I: Introduction to SQL:
✓ SQL means Structured Query Language.
✓ SQL was called SEQUEL (Structured English QUEry Language) and
was designed and implemented at IBM Research
✓ SQL used to create, storing, manipulating and retrieving the data from
the relational database.
✓ SQL is a comprehensive database language: It has statements for data
definitions, queries, and updates.
✓ The SQL was implemented by Oracle Corporation

Characteristics of SQL
✓ SQL is an ANSI and ISO standard computer language for creating
and manipulating databases.
✓ SQL allows the user to create, update, delete, and retrieve data from a
database.
✓ SQL is very simple and easy to learn.
✓ SQL works with database programs like DB2, Oracle, MS Access,
Sybase, My SQL Server etc.
Advantages of SQL:
➢ High Speed: SQL Queries can be used to retrieve large amounts of records
from a database quickly and efficiently.
➢ Well Defined Standards Exist: SQL databases use long-established standard,
which is being adopted by ANSI & ISO. Non-SQL databases do not adhere
to any clear standard.
➢ No Coding Required: Using standard SQL it is easier to manage database
systems without having to write substantial amount of code.
➢ Emergence of ORDBMS: Previously SQL databases were synonymous with
relational database.
➢ With the emergence of Object-Oriented DBMS, object storage capabilities
are extended to relational databases.

Disadvantages of SQL:
➢ Difficulty in Interfacing: Interfacing an SQL database is more complex than
adding a few lines of code.
➢ More Features Implemented in Proprietary way: Although SQL databases
conform to ANSI & ISO standards, some databases go for proprietary
extensions to standard SQL to ensure vendor lockin.
SQL Data types & Literals:
✓ SQL Data Type is an attribute that specifies the type of data of any object.
✓ Each column, variable and expression has a related data type in SQL. You can
use these data types while creating your tables. You can choose a data type for
a table column based on your requirement.
SQL Server offers six categories of data types for your use which are listed
below
 Exact Numeric Data Types
 Approximate Numeric Data Types
 Date and Time Data Types
 Character Strings Data Types
 Unicode Character Strings Data Types
 Binary Data Types
Exact Numeric Data Types

DATA FROM TO
TYPE
bigint -9,223,372,036,854,775,808 9,223,372,036,854,775,807
int -2,147,483,648 2,147,483,647
smallint -32,768 32,767
tinyint 0 255
bit 0 1
decimal -10^38 +1 10^38 -1
numeric -10^38 +1 10^38 -1

money -922,337,203,685,477.5808 +922,337,203,685,477.5807

smallmoney -214,748.3648 +214,748.3647


Approximate Numeric Data Types

DATA TYPE FROM TO


float -1.79E + 308 1.79E + 308
real -3.40E + 38 3.40E + 38

Date and Time Data Types

DATA TYPE FROM TO


datetime Jan 1, 1753 Dec 31, 9999
smalldatetime Jan 1, 1900 Jun 6, 2079
date Stores a date like June 30, 1991
time Stores a time of day like 12:30 P.M.

Note − Here, date time has 3.33 milliseconds accuracy where as smalldatetime
has 1 minute accuracy.
Character Strings Data Types

DATA TYPE Description


Char Maximum length of 8,000 characters
Varchar Maximum of 8,000 characters
varchar(max) Maximum length of 2E + 31 characters,
Text Variable-length non-Unicode data with a
maximum length of 2,147,483,647
characters.

Unicode Character Strings Data Types

DATA TYPE Description


nchar Maximum length of 4,000 characters
nvarchar Maximum length of 4,000 characters

nvarchar(max): Maximum length of 2E + 31 characters


ntext: Maximum length of 1,073,741,823 characters.
Binary Data Types
DATA TYPE Description
binary: Maximum length of 8,000 bytes(Fixed-length
binary data )
varbinary Maximum length of 8,000 bytes.(Variable
length binary data)

varbinary(max) Maximum length of 2E + 31 bytes (SQL


Server 2005 only). (
Variable length Binary data)

SQL Literals
In SQL, a literal is the same as a constant. There are several types of literals
✓ String
✓ integer
✓ Decimal
✓ datetime literals.
String Literals
String literals are always surrounded by single quotes ('). These string literal
examples contain of strings enclosed in single quotes.
Example:
'TechOnTheNet.com'
'This is a literal'
'XYZ'
'123’

Integer Literals
Integer literals can be either positive numbers or negative numbers, but do not
contain decimals. If you do not specify a sign, then a positive number is
assumed. Here are some
Example:
536
+536
-536
Decimal Literals
Decimal literals can be either positive numbers or negative numbers and contain
decimals. If you do not specify a sign, then a positive number is assumed.
Example:
24.7
+24.7
-24.7
Datetime Literals
Datetime literals are character representations of datetime values that are enclosed
in single quotes.
Examples:
April 30, 2015'
'2015/04/30'
'2015/04/30 08:34:25'
Character Literals
Character literals are character representations of Character values that are
enclosed in single quotes.
Example:
'A'
'2'
'%’
Constant in SQL
As the name implies a constant is a value used in a PL/SQL Block that remains
unchanged throughout the program. A constant is a user-defined literal value. You
can declare a constant and use it instead of actual value.
Example
If you want to write a program which will increase the salary of the employees by
25%, you can declare a constant and use it throughout the program.
Next time when you want to increase the salary again you can change the value of
the constant which will be easier than changing the actual value throughout the
program.
Syntax to declare a constant
constant_name CONSTANT datatype := VALUE;
salary_increase CONSTANT number (3) := 10;
SQL Commands:
❖ DDL (Data Definition Language)
❖ DML (Data Manipulation Language)
❖ DQL (Data Query Language)
❖ DCL (Data Control Language)
❖ TCL(Transactional control commands)

DDL (Data Definition Language)


Data Definition Language (DDL), is the part of SQL that allows a database
user to create and restructure database objects, such as the creation or the deletion
of a table. Some of the most fundamental DDL commands include the following:
❑ CREATE TABLE
❑ ALTER TABLE
❑ DROP TABLE
❑ CREATE INDEX
❑ ALTER INDEX
❑ DROP INDEX
❑ CREATE VIEW
❑ DROP VIEW
CREATE TABLE
Creating a basic table involves naming the table and defining its columns and each
column's data type. The SQL CREATE TABLE statement is used to create a new
table.
Basic syntax of CREATE TABLE statement is as follows:
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
column3 datatype,
.....
columnN datatype,
PRIMARY KEY (one or more columns) );
CREATE TABLE is the keyword telling the database system what you want to do.
In this case, you want to create a new table.
The unique name or identifier for the table follows the CREATE TABLE statement.
Then in brackets comes the list defining each column in the table and what sort of
data type it is.
Example:
CREATE TABLE CUSTOMERS (
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25),
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID) );
ALTER TABLE
The ALTER TABLE statement is used to add, delete, or modify columns in an
existing table. The ALTER TABLE statement is also used to add and drop various
constraints on an existing table.

ALTER TABLE - ADD Column: To add a column in a table, use the following
syntax:
Syntax: ALTER TABLE table_name ADD column_name datatype;

Example: The following SQL adds an "Email" column to the "Customers" table:
ALTER TABLE Customers ADD Email varchar(255);
ALTER TABLE - DROP COLUMN:
To delete a column in a table, use the following syntax (notice that some database
systems don't allow deleting a column):
Syntax: ALTER TABLE table_name DROP COLUMN column_name;

Example: The following SQL deletes the "Email" column from the "Customers"
table:
ALTER TABLE Customers DROP COLUMN Email;

ALTER TABLE - ALTER/MODIFY COLUMN:


To change the data type of a column in a table, use the following syntax:

Syntax: ALTER TABLE table_name MODIFY column_name datatype;

Example: ALTER TABLE Customers MODIFY Address varchar(40);


DROP TABLE
This command is used to drop the table along with the data and structure of the
table.
Syntax: DROP TABLE table_name;
Example: DROP TABLE employee;

DML (Data Manipulation Language)


Data Manipulation Language (DML), is the part of SQL used to manipulate data
within objects of a relational database. There are three basic DML commands:
➢ INSERT
➢ UPDATE
➢ DELETE
INSERT INTO:
The SQL INSERT INTO Statement is used to add new rows of data to a table in the
database.
Syntax: There are two basic syntaxes of INSERT INTO statement as follows:
1. INSERT INTO TABLE_NAME (column1, column2, column3...columnN)]
VALUES
(value1, value2, value3...valueN);
Example:
INSERT INTO CUSTOMERS (ID, NAME, AGE, ADDRESS, SALARY)
VALUES (1, 'Ramesh', 32, 'Ahmedabad', 2000.00);
Here, column1, column2...columnN are the names of the columns in the table into
which you want to insert data.

The SQL INSERT INTO syntax would be as follows:

INSERT INTO TABLE_NAME VALUES (value1, value2, value3...valueN);


Example:
INSERT INTO CUSTOMERS VALUES (1, 'Ramesh', 32,'Ahmedabad',2000.00);

UPDATE TABLE

The SQL UPDATE Query is used to modify the existing records in a table. You
can use WHERE clause with UPDATE query to update selected rows otherwise
all the rows would be affected.
Syntax: The basic syntax of UPDATE query with WHERE clause is as follows:
Syntax:
UPDATE table_name
SET column1 = value1, column2 = value2...., columnN = valueN
WHERE [condition];
You can combine N number of conditions using AND or OR operators.

Example: update ADDRESS for a customer whose ID is 6:


SQL> UPDATE CUSTOMERS SET ADDRESS = 'Pune’ WHERE ID = 6;

If you want to modify all ADDRESS and SALARY column values in


CUSTOMERS table, you do not need to use WHERE clause and UPDATE query
would be as follows:
SQL> UPDATE CUSTOMERS SET ADDRESS = 'Pune', SALARY = 1000.00;
DELETE
The SQL DELETE Query is used to delete the existing records from a table. You
can use WHERE clause with DELETE query to delete selected rows, otherwise
all the records would be deleted.

Syntax:
DELETE FROM table_name
WHERE [condition];
You can combine N number of conditions using AND or OR operators.

Example: DELETE a customer, whose ID is 6:


SQL> DELETE FROM CUSTOMERS WHERE ID = 6;

If you want to DELETE all the records from CUSTOMERS table, you do not
need to use WHERE clause and
DELETE query would be as follows:

SQL> DELETE FROM CUSTOMERS;


Now, CUSTOMERS table would not have any record.
DQL (Data Query Language)
Though comprised of only one command, Data Query Language (DQL) is the most
concentrated focus of SQL for modern relational database users. The base
command is as follows:
SELECT
SQL SELECT statement is used to fetch the data from a database table which
returns data in the form of result table. These result tables are called result-sets.

Syntax:
SELECT column1, column2, columnN FROM table_name;
If you want to fetch all the fields available in the field, then you can use the
following syntax:
SELECT * FROM table_name;

Example:
SQL> SELECT ID, NAME, SALARY FROM CUSTOMERS;
If you want to fetch all the fields of CUSTOMERS table, then use the following
query:
SQL> SELECT * FROM CUSTOMERS;
DCL (Data Control Language)
Data control commands in SQL allow you to control access to data within the
database. These DCL commands are normally used to create objects related to user
access and also control the distribution of privileges among users. Some data
control commands are as follows:
✓ GRANT
✓ REVOKE

GRANT
It is used to grant various privileges on tables to user. These privileges can be any
combination of select, insert, update, delete, alter, or all.

Syntax: GRANT PRIVILEGES ON OBJECTS TO USER;

Here Privileges are select, insert, update, delete, all.


Objects are name of database object that you are granting privileges for.
User is name of the user.

Example: Grant all on Customers to Ramesh.


REVOKE
It is used to revoke various privileges on tables from the user.

Syntax: REVOKE PRIVILEGES ON OBJECTS FROM USER;

Here Privileges are select, insert, update, delete, all.


Objects are name of database object that you are granting privileges for.
User is name of the user.

Example: Revoke delete on Customers from Ramesh.

Transactional Control Commands


In addition to the previously introduced categories of commands, there are
commands that allow the user to manage database transactions.

COMMIT: Saves database transactions


ROLLBACK: Undoes database transactions
SAVEPOINT: Creates points within groups of transactions in which to
ROLLBACK
COMMIT
This command is used to end a transaction. By using this the transaction changes
can be made permanent to the database. This command also removes all save
points in the transaction.

When we use any DML command like INSERT, UPDATE or DELETE, the
changes made by these commands are not permanent, until the current session is
closed, the changes made by these commands can be rolled back.
To avoid that, we use the COMMIT command to mark the changes as permanent.

Syntax: commit work; (or) commit;

SAVE POINT
Save points are used to divide a very lengthy transaction to smaller ones. These
are used to identifies a point in a transaction to which we can roll back later.

Syntax: savepoint savepoint_name;


ROLLBACK
This command restores the database to last committed stte. It is also used with
SAVEPOINT command to jump to a savepoint in an ongoing transaction.

If we have used the UPDATE command to make some changes into the database,
and realise that those changes were not required, then we can use the ROLLBACK
command to rollback those changes, if they were not committed using the
COMMIT command.

Syntax: ROLLBACK TO savepoint_name;

Example

id name
1 Abhi
2 Adam
4 Alex
INSERT INTO class VALUES(5, 'Rahul');
COMMIT;
UPDATE class SET name = 'Abhijit' WHERE id = '5';
SAVEPOINT A;
INSERT INTO class VALUES(6, 'Chris');
SAVEPOINT B;
INSERT INTO class VALUES(7, 'Bravo');
SAVEPOINT C;
SELECT * FROM class;

5 Abhijit
6 Chris
7 Bravo
ROLLBACK TO B;
SELECT * FROM class;

5 Abhijit
6 Chris

ROLLBACK TO A;
SELECT * FROM class;

5 Abhijit
SQL OPERATORS
An operator is a reserved word, or a character used primarily in an SQL statement's
WHERE clause to perform operations, such as comparisons and arithmetic
operations. Operators are used to specify conditions in an SQL statement and to
serve as conjunctions for multiple conditions in a statement.
❖ Arithmetic operators
❖ Comparison operators
❖ Logical operators
❖ Operators used to negate conditions
SQL Arithmetic Operators
Assume variable a holds 10 and variable b holds 20
Operator Description Example

+ (Addition) Adds values on either side of the a + b will give 30


operator.
- (Subtraction) Subtracts right hand operand from a - b will give -10
left hand operand.
* (Multiplication) Multiplies values on either side of a * b will give 200
the operator.
Operator Description Example
/ (Division) Divides left hand operand by right b / a will give 2
hand operand.
% (Modulus) Divides left hand operand by right b % a will give 0
hand operand and returns remainder.

SQL Comparison Operators


Assume variable a holds 10 and variable b holds 20

Operator Description Example

= Checks if the values of two operands (a = b) is not true.


are equal or not, if yes then
condition becomes true.
!=. Checks if the values of two operands (a != b) is true
are equal or not, if values are not
equal then condition becomes true.
<> Checks if the values of two operands (a <> b) is true.
are equal or not, if values are not
equal then condition becomes true.
Operator Description Example
> Checks if the value of left operand is greater (a > b) is not true.
than the value of right operand, if yes then
condition becomes true.
< Checks if the value of left operand is less than (a < b) is true.
the value of right operand, if yes then condition
becomes true.
>= Checks if the value of left operand is greater (a >= b) is not true.
than or equal to the value of right operand, if
yes then condition becomes true.
<= Checks if the value of left operand is less than (a <= b) is true.
or equal to the value of right operand, if yes
then condition becomes true.
SQL
Sr.No.Logical Operators Operator & Description
Here
1 is a list
ALL:of all
ThetheALL
logical operators
operator available
is used in SQL. a value to all values in
to compare
another value set.
2 AND: The AND operator allows the existence of multiple conditions
in an SQL statement's WHERE clause.
3 ANY: The ANY operator is used to compare a value to any applicable
value in the list as per the condition.
4 BETWEEN: The BETWEEN operator is used to search for values that
are within a set of values, given the minimum value and the maximum
value.
5 EXISTS: The EXISTS operator is used to search for the presence of a
row in a specified table that meets a certain criterion.
6 IN: The IN operator is used to compare a value to a list of literal values
that have been specified.
7 LIKE: The LIKE operator is used to compare a value to similar values
using wildcard operators.
8 NOT: The NOT operator reverses the meaning of the logical operator
with which it is used. Eg: NOT EXISTS, NOT BETWEEN, NOT IN, etc.
This is a negate operator.
9 OR: The OR operator is used to combine multiple conditions in an
SQL statement’s WHERE clause
Sr.No. Operator & Description
10. IS NULL: The NULL operator is used to compare a value with a
NULL value
11 UNIQUE: The UNIQUE operator searches every row of a specified
table for uniqueness (no duplicates).

SQL Aggregate Functions:


An aggregate function allows you to perform a calculation on a set of values to
return a single scalar value.

We often use aggregate functions with the GROUP BY and HAVING clauses of
the SELECT statement. The following are the most used SQL aggregate functions:

✓ AVG – calculates the average of a set of values.


✓ COUNT – counts rows in a specified table or view.
✓ MIN – gets the minimum value in a set of values.
✓ MAX – gets the maximum value in a set of values.
✓ SUM – calculates the sum of values.

syntax: aggregate_function (DISTINCT | ALL expression)


INDEXE’S:
Indexes are special lookup tables that the database search engine can use to speed
up data retrieval. Simply an index is a pointer to data in a table. An index in a
database is very similar to an index in the back of a book.

For example, if you want to reference all pages in a book that discusses a certain
topic, you first refer to the index, which lists all the topics alphabetically and are
then referred to one or more specific page numbers.

An index helps to speed up SELECT queries and WHERE clauses, but it slows
down data input, with the UPDATE and the INSERT statements. Indexes can be
created or dropped with no effect on the data.

Creating an index involves the CREATE INDEX statement, which allows you to
name the index, to specify the table and which column or columns to index, and
to indicate whether the index is in an ascending or descending order.

Indexes can also be unique, like the UNIQUE constraint, in that the index
prevents duplicate entries in the column or combination of columns on which
there is an index.
Syntax
CREATE INDEX index_name ON table_name;

Single-Column Indexes
A single-column index is created based on only one table column. The basic
syntax is as follows.
CREATE INDEX index_name
ON table_name (column_name);

Unique Indexes
Unique indexes are used not only for performance, but also for data integrity. A
unique index does not allow any duplicate values to be inserted into the table. The
basic syntax is as follows.
CREATE UNIQUE INDEX index_name
on table_name (column_name);

Composite Indexes
A composite index is an index on two or more columns of a table. Its basic syntax
is as follows.
CREATE INDEX index_name
on table_name (column1, column2);

Whether to create a single-column index or a composite index, take into


consideration the column(s) that you may use very frequently in a query's WHERE
clause as filter conditions.
Implicit Indexes
Implicit indexes are indexes that are automatically created by the database server
when an object is created. Indexes are automatically created for primary key
constraints and unique constraints.

DROP INDEX Command


An index can be dropped using SQL DROP command. Care should be taken when
dropping an index because the performance may either slow down or improve.
The basic syntax is as follows
DROP INDEX index_name;
VIEW’S
A view is nothing more than a SQL statement that is stored in the database with
an associated name. A view is actually a composition of a table in the form of a
predefined SQL query.

A view can contain all rows of a table or select rows from a table. A view can be
created from one or many tables which depends on the written SQL query to
create a view.

Database views are created using the CREATE VIEW statement. Views can be
created from a single table, multiple tables or another view.

To create a view, a user must have the appropriate system privilege according to
the specific implementation.

The basic CREATE VIEW syntax is as follows


CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE [condition];
You can include multiple tables in your SELECT statement in a similar way as you
use them in a normal SQL SELECT query. Following is an example to create a
view from the CUSTOMERS table.

This view would be used to have customer name and age from the CUSTOMERS
table.
CREATE VIEW CUSTOMERS_VIEW AS
SELECT name, age
FROM CUSTOMERS;

Updating Views: To update the view we should use CREATE or REPLACE


command.
Syntax:
CREATE OR REPLACE VIEW view_name AS
SELECT column1, column2......
FROM table_name
WHERE [condition];
Example:
CREATE OR REPLACE VIEW CUSTOMERS_VIEW1 AS
SELECT name, age, SALARY
FROM CUSTOMERS;
Dropping a VIEW: This command is used to drop a view.
Syntax: DROP VIEW View_name;
Example: DROP VIEW CUSTOMERS_VIEW1;

NESTED QUERIES
A Subquery or Inner query or a Nested query is a query within another SQL query
and embedded within the WHERE clause.
A subquery is used to return data that will be used in the main query as a condition
to further restrict the data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE
statements along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.

There are a few rules that subqueries must follow −


✓ Subqueries must be enclosed within parentheses.
✓ A subquery can have only one column in the SELECT clause, unless multiple
columns are in the main query for the subquery to compare its selected columns.
✓ An ORDER BY command cannot be used in a subquery, although the main
query can use an ORDER BY.
✓ The GROUP BY command can be used to perform the same function as the
ORDER BY in a subquery.
✓ Subqueries that return more than one row can only be used with multiple value
operators such as the IN operator.
✓ A subquery cannot be immediately enclosed in a set function.
Subqueries with the SELECT Statement
Subqueries are most frequently used with the SELECT statement.
SELECT column_name
FROM table_name
WHERE column_name expression operator
( SELECT column_name from table_name WHERE ... );

ID NAME AGE ADDRESS SALARY

1 John 20 US 2000.00
2 26 Dubai 1500.00
3 David 27 Bangkok 2000.00
4 Alina 29 UK 6500.00
5 Kathrin 34 Bangalore 8500.00
6 Harry 42 China 4500.00
7 Jackson 25 Mizoram 10000.00
Example
SELECT *
FROM EMPLOYEE
WHERE ID IN (SELECT ID
FROM EMPLOYEE
WHERE SALARY > 4500);

ID NAME AGE ADDRESS SALARY

4 Alina 29 UK 6500.00
5 Kathrin 34 Bangalore 8500.00
7 Jackson 25 Mizoram 10000.00

Subqueries with the INSERT Statement

SQL subquery can also be used with the Insert statement. In the insert statement,
data returned from the subquery is used to insert into another table.

In the subquery, the selected data can be modified with any of the character, date
functions
INSERT INTO table_name (column1, column2, column3....)
SELECT * FROM table_name WHERE VALUE OPERATOR

Consider a table EMPLOYEE_BKP with similar as EMPLOYEE.

We can use the following syntax to copy the complete EMPLOYEE table into the
EMPLOYEE_BKP table.
INSERT INTO EMPLOYEE_BKP SELECT * FROM EMPLOYEE
WHERE ID IN (SELECT ID FROM EMPLOYEE);
Select * from EMPLOYEE_BKP ;
Subqueries with the UPDATE Statement
The subquery of SQL can be used in conjunction with the Update statement. When a
subquery is used with the Update statement, then either single or multiple columns in a
table can be updated.
Syntax:

UPDATE table SET column_name = new_value WHERE VALUE OPERATOR


(SELECT COLUMN_NAME FROM TABLE_NAME WHERE condition);

Example

Let's assume we have an EMPLOYEE_BKP table available which is backup of


EMPLOYEE table. The given example updates the SALARY by .25 times in the
EMPLOYEE table for all employee whose AGE is greater than or equal to 29.

UPDATE EMPLOYEE
SET SALARY = SALARY * 0.25
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 29);
ID NAME AGE ADDRESS SALARY

1 John 20 US 2000.00
2 Stephan 26 Dubai 1500.00
3 David 27 Bangkok 2000.00
4 Alina 29 UK 1625.00
5 Kathrin 34 Bangalore 2125.00
6 Harry 42 China 1125.00
7 Jackson 25 Mizoram 10000.00

Subqueries with the DELETE Statement

The subquery of SQL can be used in conjunction with the Delete statement just
like any other statements mentioned above.

Syntax

DELETE FROM TABLE_NAME WHERE VALUE OPERATOR


(SELECT COLUMN_NAME FROM TABLE_NAME WHERE condition);
Example
Let's assume we have an EMPLOYEE_BKP table available which is backup of
EMPLOYEE table.
The given example deletes the records from the EMPLOYEE table for all
EMPLOYEE whose AGE is greater than or equal to 29.

DELETE FROM EMPLOYEE WHERE AGE IN (SELECT AGE FROM EMPLOY


EE_BKP WHERE AGE >= 29 );

ID NAME AGE ADDRESS SALARY

1 John 20 US 2000.00
2 Stephan 26 Dubai 1500.00
3 David 27 Bangkok 2000.00
7 Jackson 25 Mizoram 10000.00
SET Operations in SQL
SQL supports few Set operations which can be performed on the table data. These
are used to get meaningful results from data stored in the table, under different
special conditions. Some of the Set operations are
1. UNION
2. UNION ALL
3. INTERSECT
4. MINUS
UNION
UNION is used to combine the results of two or more SELECT statements.
However, it will eliminate duplicate rows from its resultset. In case of union,
number of columns and datatype must be same in both the tables, on which
UNION operation is being applied.

Syntax

SELECT column_name FROM table1


UNION
SELECT column_name FROM table2;
The First table

ID NAME

1 Jack
2 Harry
3 Jackson

The Second table

ID NAME

3 Jackson
4 Stephan
5 David

Example

SELECT * FROM First


UNION
SELECT * FROM Second;
ID NAME

1 Jack
2 Harry
3 Jackson
4 Stephan
5 David

Union All
Union All operation is equal to the Union operation. It returns the set without
removing duplication and sorting the data.
Syntax:
SELECT column_name FROM table1
UNION ALL
SELECT column_name FROM table2;
Example:
SELECT * FROM First
UNION ALL
SELECT * FROM Second;
ID NAME

1 Jack
2 Harry
3 Jackson
3 Jackson
4 Stephan
5 David

Intersect
It is used to combine two SELECT statements. The Intersect operation returns the
common rows from both the SELECT statements.

In the Intersect operation, the number of datatype and columns must be the same.
It has no duplicates and it arranges the data in ascending order by default.

Syntax
SELECT column_name FROM table1
INTERSECT
SELECT column_name FROM table2;
ID NAME

3 Jackson

Minus
It combines the result of two SELECT statements. Minus operator is used to
display the rows which are present in the first query but absent in the second
query.
It has no duplicates and data arranged in ascending order by default.
Syntax:
SELECT column_name FROM table1
MINUS
SELECT column_name FROM table2;
Example
SELECT * FROM First
MINUS
SELECT * FROM Second;

ID NAME

1 Jack
2 Harry
CURSORS
Oracle creates a memory area, known as the context area, for processing an SQL
statement, which contains all the information needed for processing the statement;
for example, the number of rows processed, etc.
A cursor is a pointer to this context area. PL/SQL controls the context area
through a cursor. A cursor holds the rows (one or more) returned by a SQL
statement. The set of rows the cursor holds is referred to as the active set.
You can name a cursor so that it could be referred to in a program to fetch and
process the rows returned by the SQL statement, one at a time. There are two
types of cursors
❖ Implicit cursors
❖ Explicit cursors

Implicit Cursors

Implicit cursors are automatically created by Oracle whenever an SQL statement


is executed, when there is no explicit cursor for the statement.
Programmers cannot control the implicit cursors and the information in it.
Whenever a DML statement (INSERT, UPDATE and DELETE) is issued, an
implicit cursor is associated with this statement.
For INSERT operations, the cursor holds the data that needs to be inserted. For
UPDATE and DELETE operations, the cursor identifies the rows that would be
affected.
In PL/SQL, you can refer to the most recent implicit cursor as the SQL cursor,
which always has attributes such as %FOUND, %ISOPEN, %NOTFOUND, and
%ROWCOUNT.
The SQL cursor has additional attributes, %BULK_ROWCOUNT and
%BULK_EXCEPTIONS, designed for use with the FORALL statement. The
following table provides the description of the most used attributes

S.No Attribute & Description


1 %FOUND: Returns TRUE if an INSERT, UPDATE, or
DELETE statement affected one or more rows or a SELECT
INTO statement returned one or more rows. Otherwise, it
returns FALSE
2 %NOTFOUND: The logical opposite of %FOUND. It returns
TRUE if an INSERT, UPDATE, or DELETE statement affected
no rows, or a SELECT INTO statement returned no rows.
Otherwise, it returns FALSE.
S.No Attribute & Description
3 %ISOPEN: Always returns FALSE for implicit cursors, because
Oracle closes the SQL cursor automatically after executing its
associated SQL statement.
4 %ROWCOUNT: Returns the number of rows affected by an
INSERT, UPDATE, or DELETE statement, or returned by a SELECT
INTO statement.
Any SQL cursor attribute will be accessed as sql%attribute_name as shown below
in the example.
The following program will update the table and increase the salary of each
customer by 500 and use the SQL%ROWCOUNT attribute to determine the
number of rows affected
ID NAME AGE ADDRESS SALARY
1 Ramesh 32 Ahmedabad 2000.00

2 Khilan 25 Delhi 1500.00


3 kaushik 23 Kota 2000.00
4 Chaitali 25 Mumbai 6500.00
5 Hardik 27 Bhopal 8500.00
6 Komal 22 MP 4500.0
DECLARE
total_rows number(2);
BEGIN
UPDATE customers
SET salary = salary + 500;
IF sql%notfound THEN
dbms_output.put_line('no customers selected');
ELSIF sql%found THEN
total_rows := sql%rowcount;
dbms_output.put_line( total_rows || ' customers selected ');
END IF;
END;

Explicit Cursors:
Explicit cursors are programmer-defined cursors for gaining more control over the
context area. An explicit cursor should be defined in the declaration section of the
PL/SQL Block. It is created on a SELECT Statement which returns more than one
row.
Syntax:
CURSOR cursor_name IS select_statement;
Working with an explicit cursor includes the following steps
✓ Declaring the cursor for initializing the memory
✓ Opening the cursor for allocating the memory
✓ Fetching the cursor for retrieving the data
✓ Closing the cursor to release the allocated memory
Declaring the Cursor
Declaring the cursor defines the cursor with a name and the associated SELECT
statement.
CURSOR c_customers IS
SELECT id, name, address FROM customers;
Opening the Cursor
Opening the cursor allocates the memory for the cursor and makes it ready for
fetching the rows returned by the SQL statement into it. For example, we will
open the above defined cursor as follows −
OPEN c_customers;
Fetching the Cursor

Fetching the cursor involves accessing one row at a time. For example, we will
fetch rows from the above opened cursor as follows
FETCH c_customers INTO c_id, c_name, c_addr;

Closing the Cursor


Closing the cursor means releasing the allocated memory. For example, we will
close the above-opened cursor as follows
CLOSE c_customers;

ID NAME AGE ADDRESS SALARY


1 Ramesh 32 Ahmedabad 2000.00

2 Khilan 25 Delhi 1500.00


3 kaushik 23 Kota 2000.00
4 Chaitali 25 Mumbai 6500.00
5 Hardik 27 Bhopal 8500.00
6 Komal 22 MP 4500.0
DECLARE
c_id customers.id%type;
c_name customers.name%type;
c_addr customers.address%type;
CURSOR c_customers is
SELECT id, name, address FROM customers;
BEGIN
OPEN c_customers;
LOOP
FETCH c_customers into c_id, c_name, c_addr;
EXIT WHEN c_customers%notfound;
dbms_output.put_line(c_id || ' ' || c_name || ' ' || c_addr);
END LOOP;
CLOSE c_customers;
END;
1 Ramesh Ahmedabad
2 Khilan Delhi
3 kaushik Kota
4 Chaitali Mumbai
5 Hardik Bhopal
6 Komal MP
PL/SQL procedure successfully completed.
Trigger
A trigger is a stored procedure in database which automatically invokes
whenever a special event in the database occurs.
Triggers are written to be executed in response to any DML statement
or DDL statement.

CREATE [OR REPLACE ] TRIGGER trigger_name


{BEFORE | AFTER | INSTEAD OF }
{INSERT [OR] | UPDATE [OR] | DELETE}
[OF col_name]
ON table_name
[FOR EACH ROW]
WHEN (condition)
DECLARE
Declaration-statements
BEGIN
Executable-statements
EXCEPTION
Exception-handling-statements
END;
create or replace trigger trigger9 before insert or update or delete on
departmnet
declare
day varchar2(20);
begin
day:=to_char(sysdate,'day');
if(day=’Tuesday’ and day='thursday') then
if inserting then
raise_application_error(-20002,'inserting restricted');
elsif updating then
raise_application_error(-20003,'updating restricted');
elsif deleting then
raise_application_error(-20004,'deleting restricted');
end if;
end if;
end;
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING
(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

17CI09 Data Base Management Systems


Program & Semester: B.Tech & III SEM
CSE
Academic Year: 2021 - 22

UNIT III
Normalization
Following determine the quality of relation schema design:
✓ Making sure that the semantics of the attributes is clear in the schema
✓ Reducing the redundant information in tuples
✓ Reducing the NULL values in tuples
✓ Disallowing the possibility of generating spurious tuples.

Functional Dependency
A functional dependency, denoted by X → Y, between two sets of attributes X and Y
that are subsets of R specifies a constraint on the possible tuples that can form a
relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that have
t1[X] = t2[X], they must also have t1[Y] = t2[Y].

Functional dependency (FD) is a set of constraints between two attributes in a


relation.
This means that the values of the Y component of a tuple in r depend on, or are
determined by, the values of the X component; alternatively, the values of the X
component of a tuple uniquely (or functionally) determine the values of the Y
component.
We also say that there is a functional dependency from X to Y, or that Y is
functionally dependent on X. The abbreviation for functional dependency is FD
or f.d. The set of attributes X is called the Determinant (left-hand side) of the FD,
and Y is called the dependent (right-hand side).

Ex: Zip_code → Area_code

Consider relation schema R = (A, B, C, G, H, I) and the set of functional


dependencies. We can represent functional dependencies as
A →B
A →C
CG→ H
CG→ I
B→H

Let F be a set of functional dependencies. The closure of F, denoted by F+, is the


set of all functional dependencies logically implied by F. Given F, we can
compute F+ directly from the formal definition of functional dependency.
Armstrong's Axioms
Axioms, or rules of inference, provide a simpler technique for reasoning about
functional dependencies.

Reflexivity rule
If α is a set of attributes and β ⊆ α, then α →β holds.
Augmentation rule
If α → β holds and γ is a set of attributes, then γα → γβ holds.
Transitivity rule
If α →β holds and β → γ holds, then α → γ holds.
Union rule
If α → β holds and α → γ holds, then α →βγ holds.
Decomposition rule
If α →βγ holds, then α → β holds and α →γ holds.
Pseudo transitivity rule
If α→β holds and γβ →δ holds, then αγ →δ holds.
Let us apply our rules to the example of schema R = (A, B, C, G, H, I) and the
set F of functional dependencies {A → B, A → C, CG → H, CG → I, B → H}.

We can list several members of F+ here:


A → H. Since A → B and B → H hold, we apply the transitivity rule. Observethat
it was much easier to use Armstrong’s axioms to show that A → H holds
than it was to argue directly from the definitions, as we did earlier in this
section.
CG → HI . Since CG → H and CG → I , the union rule implies that CG → HI .
AG → I. Since A → C and CG → I, the pseudo transitivity rule implies that
AG → I holds.

Types of Functional Dependencies


❖ Trivial functional dependency
❖ non-trivial functional dependency
❖ Multivalued dependency
❖ Transitive dependency
Trivial functional dependency
The dependency of an attribute on a set of attributes is known as trivial functional
dependency if the set of determining attributes includes dependent attribute.
Symbolically:
A ->B is trivial functional dependency if B is a subset of A.
The following dependencies are also trivial: A->A & B->B

For example:
Consider a table with two columns Student_id and Student_Name.
{Student_Id, Student_Name} -> Student_Id is a trivial functional dependency as
Student_Id is a subset of {Student_Id, Student_Name}.

That makes sense because if we know the values of Student_Id and


Student_Name then the value of Student_Id can be uniquely determined.

Also, Student_Id -> Student_Id & Student_Name -> Student_Name are trivial
dependencies too.
Non-trivial functional dependency
If a functional dependency X->Y holds true where Y is not a subset of X then this
dependency is called non -trivial Functional dependency.
For example:
An employee table with three attributes: emp_id, emp_name, emp_address. The
following functional dependencies are non-trivial:
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)

On the other hand, the following dependencies are trivial:


{emp_id, emp_name} -> emp_name [

Multivalued dependency:
MVD or multivalued dependency means that for a single value of attribute ‘a’
multiple values of attribute ‘b’ exist. Multivalued dependency occurs when there
are more than one independent multivalued attributes in a table.
For example:
Consider a bike manufacture company, which produces two colors (Black and
white) in each model every year.
bike_model manuf_year color
M1001 2007 Black
M1001 2007 Red
M2012 2008 Black
M2012 2008 Red
M2222 2009 Black
M2222 2009 Red

Here columns manuf_year and color are independent of each other and
dependent on bike_model. In this case these two columns are said to be
multivalued dependent on bike_model. These dependencies can be represented
like this:
bike_model ->-> manuf_year
bike_model ->-> color
Transitive dependency:
A functional dependency is said to be transitive if it is indirectly formed by two
functional dependencies.
X -> Z is a transitive dependency if the following three functional dependencies
hold true:
 X->Y
 Y does not ->X
 Y->Z
Example: Let’s take an example to understand it better:
Book Author Author_age
Game of Thrones George R. R. Martin 66
Harry Potter J. K. Rowling 49
Dying of the Light George R. R. Marti 66

{Book} ->{Author} (if we know the book, we knows the author name)
{Author} does not ->{Book}
{Author} -> {Author_age}

Therefore, as per the rule of transitive dependency: {Book} -> {Author_age}


should hold, that makes sense because if we know the book name we can know the
author’s age.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized.
These are Insertion, update and deletion anomaly.

Update anomaly:
In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we
have to update the same in two rows or the data will become inconsistent. If
somehow, the correct address gets updated in one department but not in other then
as per the database, Rick would be having two different addresses, which is not
correct and would lead to inconsistent data.
Insert anomaly:
Suppose a new employee joins the company, who is under training and currently
not assigned to any department then we would not be able to insert the data into
the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the
department D890 then deleting the rows that are having emp_dept as D890 would
also delete the information of employee Maggie since she is assigned only to this
department.
To overcome these anomalies, we need to normalize the data.

Normalization
Normalization is a process of organizing the data in database for avoiding data
redundancy which solves the problem of insertion anomaly, update anomaly &
deletion anomaly.

✓ First normal form(1NF)


✓ Second normal form(2NF)
✓ Third normal form(3NF)
✓ Boyce & Codd normal form (BCNF)
First Normal form (1NF)
It states that the domain of an attribute must include only atomic (simple,
indivisible) values and the value of any attribute in a tuple must be a
single value from the domain of that attribute.
As per the rule of first normal form, an attribute (column) of a table cannot hold
multiple values. The only attribute values permitted by 1NF are single atomic (or
indivisible) values.

After applying 1NF


Second Normal Form (2NF)
Second normal form (2NF) is based on the concept of full functional dependency.

A functional dependency X → Y is a full functional dependency if removal of


any attribute value A from X means that the dependency does not hold any more;
that is, for any attribute value A ε X, (X – {A}) does not functionally determine Y.

A functional dependency X→Y is a partial dependency if some attribute value A ε


X can be removed from X and the dependency still holds; that is, for some A ε X,
(X – {A}) → Y.
Partial Dependency
If the proper subset of candidate key determines non-prime attribute, it is called
partial dependency.

Prime attribute: An attribute, which is a part of the candidate-key, is known as a


prime attribute.

Non-prime attribute: An attribute, which is not a part of the candidate-key, is


said to be a non-prime attribute.
Definition
A relation schema R is said to be in 2NF
If it is in 1NF and if every nonprime attribute of R is fully functionally dependent
on the primary key of R. (there should be no partial dependencies).

Candidate Keys: {teacher_id, subject}


Primary key: {teacher_id, subject}
Non-prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in
2NF because non-prime attribute teacher_age is dependent on teacher_id alone
which is a proper subset of candidate key. i.e., teacher_id → teacher_age.
This violates the rule for 2NF as the rule says “no non-prime attribute is dependent
on the proper subset of any candidate key of the table”.
Third Normal Form (3NF)
According to Codd’s original definition, a relation schema R is in 3NF if it
satisfies 2NF and no nonprime attribute of R is transitively dependent on the
primary key (it should not have transitive dependencies).

A table design is said to be in 3NF if both the following conditions hold:


✓ Table must be in 2NF
✓ Transitive functional dependency of non-prime attribute on any super key
should be removed.

A relation is in 3NF if at least one of the following condition holds in every non-
trivial function dependency X –> Y
✓X is a super key.
✓Y is a prime attribute (each element of Y is part of some candidate key)
emp_distric
emp_id emp_name emp_zip emp_state emp_city
t

1001 John 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 TN Chennai M-City


Urrapakka
1006 Lora 282007 TN Chennai
m
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan

Here, emp_state, emp_city & emp_district dependent on emp_zip and, emp_zip


is dependent on emp_id that makes non-prime attributes (emp_state, emp_city &
emp_district) transitively dependent on super key (emp_id). This violates the rule
of 3NF.

To make this table complies with 3NF we have to break the table into two tables to
remove the transitive dependency:
Employee table
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999

Employee_zip

emp_zip emp_state emp_city emp_district

282005 UP Agra Dayal Bagh

222008 TN Chennai M-City

282007 TN Chennai Urrapakkam

292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan


Boyce Codd normal form (BCNF)
It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is
stricter than 3NF. A table complies with BCNF if it is in 3NF and for every
functional dependency X->Y, X should be the Super key of the table.

EMP_ID EMP_COU EMP_DEP DEPT_TYP EMP_DEP


NTRY T_NO E T

264 India 283 D394 Designing


264 India 300 D394 Testing
364 UK 232 D283 Stores
364 UK 549 D283 Developin
g

Functional dependencies in the table above:


emp_id -> emp_Country,
emp_dept_no -> {dept_type, emp_dept)
Candidate key: {emp_id, emp_dept_no}
The table is not in BCNF as neither emp_id nor emp_dept_no alone are keys. To
make the table comply with BCNF we can break the table in three tables like this:

EMP_ID EMP_COUNTRY

264 India
364 UK

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283


Testing D394 300
Stores D283 232
Developing D283 549
EMP_ID EMP_DEPT_NO

264 283
264 300
364 232
364 549

Closure of Functional Dependencies


The set of all possible Functional Dependencies that can be derived from the
F is called closure of Functional Dependencies denoted by F+. To compute
closure, we should apply the following rules repeatedly for the given FD’s
called as Armstrong Axioms.
Consider a relation R (A, B, C, D) and F= {A→BC, B→CD}.
Let us consider A→BC. From this functional dependency we can derive A→B and
A→C by using decomposition rule.

From B→CD we can derive B→C and B→D by using decomposition rule.
Let us consider A→B and B→C. From these two functional dependencies we
can derive A→C by applying Transitivity rule.
From A→B and B→D we can derive A→D.
So, final F+ = {A→BC, B→CD, A→B, A→C, B→C, B→D, A→C, A→D}.

Attribute Closure
Let X→Y be the Functional Dependency. Attribute closure of X denoted
with X+ with respect to F is the set of all attributes A such that X→A can be
inferred using Armstrong Axioms.

Algorithm
Closure= X;
Repeat until there is no change in closure
{
If there is an FD U→V in F
Such that U ⊆ Closure,
Then set Closure= Closure ∪ V.
}
A→BC, B→CD.
Super Key
Let R be the relational schema, and X be the set of attributes over R. If X+
determine all the attributes of R, then X is said to be super key of R.

To Identify Super keys, we need to follow some steps


❖ Compute Closure for the attributes or combination of attributes on the LHS
of Functional Dependency.
❖ If any closure includes all the attributes, then that can be declared as a key
for the table.

Let R (ABCDE) is a relational schema with following functional dependencies.


AB → C
DE → B
CD → E
Step 1: Identify the closure of LHS of FD -
(AB)+ = ABC
(DE)+ = DEB
(CD)+ = CDE = CDEB {as DE → B}
No Super Key Found in step 1.
Step 2:
If no super Key found from step 1, then follow step 2 to find a
New key by applying augment rule.
Apply Augment rule until all attributes are mentioned in the closure
result. So, choosing (CD)+ as it contains more attributes than others one
i.e. CDEB,
(ACD)+ = ABCDE {by augment Rule}
Hence (ACD)+ determines all the attributes of R. So
ACD is a Super Key.

As ACD is a super key, we make the combination of remaining attributes


With ACD. So, superkey are -
ACDB,
ACDE,
ACDBE.
Step 3:
Follow step 3 to identify more superkey from new Superkey (ACD) by
applying Pseudo Transitive rule -Check the other Functional
Dependencies in which the LHS is a subset of new super key, and that on
its RHS contains some other attribute of new Superkey.
There is only one i.e. AB → C
Applying so gives you a key that are certainly superkey, but not
necessarily irreducible
Ones:
A (AB) D = ABD: Superkey
Other Super Keys will be
ADBE,
ABDC, (Already Found)
ABDCE. (Already Found)
Repeat the procedure again for the new superkey (ABD) till we get all superkey.
Step 3 Continued... (To find Superkey from ABD)
For ABD, We have a functional dependency again i.e. {DE → B}. So,
A (DE) D = ADE: Superkey
Other Super keys will be -
ADEB, (Already Found)
ADEC, (Already Found)
ADEBC. (Already Found)
After finding all super keys we need to find the candidate key. Candidate Key is a
Super Key whose no proper subset is a Super key.

Problem 1
Given a relation R( A, B, C, D) and Functional Dependency set FD = { AB → CD,
B → C }, determine whether the given R is in 2NF? If not convert it into 2 NF.
Problem 2
Given a relation R( P, Q, R, S, T) and Functional Dependency set FD = { PQ → R,
S → T }, determine whether the given R is in 2NF? If not convert it into 2 NF.
Problem 3
Given a relation R( P, Q, R, S, T, U, V, W, X, Y) and Functional Dependency set FD
= { PQ → R, PS → VW, QS → TU, P → X, W → Y }, determine whether the given
R is in 2NF? If not convert it into 2 NF.
Data Redundancy
Data redundancy in database means that some data fields are repeated in the
database. This data repetition may occur either if a field is repeated in two or more
tables or if the field is repeated within the table.
Disadvantages of data redundancy
1. Increases the size of the database unnecessarily.
2. Cause’s data inconsistency.
3. Decrease’s efficiency of database.
4. May cause data corruption.

Decomposition:
Decomposition is the process of dividing the normal form into tables for remove
the anamoly. Decomposition makes easy to find the data in database. It removes
the inconsistency, duplication. Decomposition means replacing a relation with a
collection of smaller relations.
Properties of Decomposition: While decomposing a relation R into R1, R2… Rn.
The decomposition must satisfy the following properties. Those are

❖ Lossless Join Decomposition.


❖ Dependency Preserving Decomposition.
Lossless Join Decomposition
Assume that a relation R with set of functional dependencies F. If R is
decomposed into relations R1 and R2, then this decomposition is said to be
lossless decomposition (or lossless-join decomposition) if and only if at least one
of the following functional dependencies holds in the closure of set of functional
dependencies F+ ;
R1 ∩ R2 → R1
or
R1 ∩ R2 → R2
R1 ∩ R2 gives you the attribute or set of attributes that is/are used in joining R1
and R2. The above functional dependencies ensure that the attributes involved in
the natural join of R1 and R2 are candidate keys for at least one of the relations
R1 and R2.
Consider the following relation R( A , B , C )-

R( A , B , C )
A B C

1 2 1
2 5 3
3 3 3
Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-

The two sub relations are-

R1( A , B ) R2( B , C )

A B B C
1 2 2 1
2 5 5 3
3 3 3 3
The dependency preservation decomposition is another property of decomposed
relational database schema D in which each functional dependency X -> Y
specified in F either appeared directly in one of the relation schemas Ri in the
decomposed D or could be inferred from the dependencies that appear in some Ri.

The dependencies are preserved because each dependency in F represents a


constraint on the database. If decomposition is not dependency-preserving, some
dependency is lost in the decomposition.
Example:
Let a relation R(A,B,C,D) and set a FDs F = { A -> B , A -> C , C -> D} are given.
A relation R is decomposed into
R1 = (A, B, C) with FDs F1 = {A -> B, A -> C}, and
R2 = (C, D) with FDs F2 = {C -> D}.
F' = F1 ∪ F2 = {A -> B, A -> C, C -> D}
so, F' = F.
And so, F'+ = F+
Algorithm
Input: X → Y in F and a decomposition of R {R1, R2, …, Rn}
Output: return true if X → Y is in G+, i.e., Y is a subset of Z else return
false
Begin
Z: = X;
while changes to Z occur do
For i: = 1 to n do
Z := Z ∪ ((Z ∩ Ri)+ ∩ Ri) w.r.t. F;
If Y is a subset of Z then return true
Else return false;
End;
Example
R (ABCDEF) has following FD’s F = {A→BCD, A→EF, BC→AD,
BC→E, BC→F, B→F, D→E} D = {ABCD, BF, DE} check whether
decomposition is dependency preserving or not.

Check whether BC → E preserved in the decomposition or not


How to find the highest normal form of a relation
Steps to find the highest normal form of a relation:
1. Find all possible candidate keys of the relation.
2. Divide all attributes into two categories: prime attributes and non-prime
attributes.
3. Checkfor 1st normal form then 2nd and so on. If it fails to satisfy nth normal
form condition, highest normal form will be n-1.
Example
Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC->D,
AC->BE, B->E}
Step 1.
As we can see, (AC)+ = {A, C,B,E,D} but none of its subset can determine all
attribute of relation, So AC will be candidate key. A or C can’t be derived from
any other attribute of the relation, so there will be only 1 candidate key {AC}.

Step 2.
Prime attribute are those attribute which are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example.
Step 3.
The relation R is in 1st normal form as a relational DBMS does not allow multi-
valued or composite attribute.

The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is
not proper subset of candidate key AC) and AC->BE is in 2nd normal form (AC is
candidate key) and B->E is in 2nd normal form (B is not a proper subset of
candidate key AC).

The relation is not in 3rd normal form because in BC->D (neither BC is a super
key nor D is a prime attribute) and in B->E (neither B is a super key nor E is a
prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be super
key or RHS should be prime attribute.

So the highest normal form of relation will be 2nd Normal form.


Find the highest normal form of a relation R(A,B,C,D,E) with FD set {A->D,
B->A, BC->D, AC->BE}

Find the highest normal form of a relation R (P, Q, R, S, T) with Functional


Dependency set (Q->P, P->R, QR->S, PR->QT).
Fourth Normal Form (4NF):
A relation R is in Fourth Normal Form (4NF) if and only if the following
conditions are satisfied simultaneously:
✓ R is already in 3NF or BCNF.
✓ If it contains no multi-valued dependencies.

Multi-Valued Dependency (MVD):


MVD is the dependency where one attribute value is potentially a 'multi-
valued fact' about another. MVD occurs when there are more than one
independent multivalued attributes in a table.
To understand it clearly, consider a table with Car_model, Manf_year and
color
In this example, maf_year and color are independent of each other but dependent on
car_model. In this example, these two columns are said to be multivalue dependent
on car_model.

This dependence can be represented like this:


car_model -> maf_year
car_model-> colour

Car_model Maf_year Color


H001 2017 Metallic
H001 2017 Green
H005 2018 Metallic
H005 2018 Blue
H010 2015 Metallic
H033 2012 Gray
To remove MVD and bring the above table into 4NF we will divide it into
two tables with ( car_mdl, Man_yr) as one table and (car-mdl, color) as
another table.
Fifth Normal Form (5NF)
A relation R is said to be in 5Th normal form if and only if it satisfies the
following conditions.
❖ R is already in 4NF
❖ A relation R is decomposed into two relations must have loss less
join dependency. (shouldn’t have join dependency)
or
❖ If we can decompose table further to eliminate redundancy and
anomaly, and when we re-join the decomposed tables by means of
candidate keys, we should not be losing the original data or any new
record set should not arise. In simple words, joining two or more
decomposed table should not lose records nor create new records

Example

Consider the below schema, “if a company makes a product and an agent is an agent
for that company, then he always sells that product for the company”. Under these
circumstances, the ACP table is shown as:
The relation ACP is again decomposed into 3 relations. Now, the natural Join of
all the three relations will be shown as:
Result of natural join of R1 and R3 over ‘Company’ and then natural join of R13
and R2 over ‘Agent’ and ‘Product’ is given below.

Hence, in this example, all the redundancies are eliminated, and the
decomposition of ACP is a lossless join decomposition. Hence the relation is in
5NF as it does not violate the property of lossless join.
Inclusion Dependencies
It is a statement of the form that some columns of a relation contained in another
column.
Example: Foreign Key.
✓ We should not split groups of attributes that participate in an inclusion
dependency.
For example AB is inclusion dependent on CD that means AB ⊆ CD. While
decomposing the above schema contains AB we should ensure that at least one of
the schema’s obtained in that decomposition contains both A and B.
✓ Most of the inclusion dependencies are key based.
✓ In E-R diagram IS A hierarchies also leads to key-based inclusion
dependencies.
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING
(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

17CI09 Data Base Management Systems


Program & Semester: B.Tech & III SEM
CSE
Academic Year: 2020 - 21

UNIT IV
 A transaction is a unit of program execution that accesses and possibly
updates various data items.
 A transaction is delimited by statements (or function calls) of the form
begin transaction and end transaction. The transaction consists of all
operations executed between the begin transaction and end
transaction.
Let’s take an example of a simple transaction. Suppose a bank employee transfers
Rs 500 from A's account to B's account. This very simple and small transaction
involves several low-level tasks.
A’s Account
Open_Account (A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account (A)
B’s Account
Open_Account (B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account (B)
ACID Properties
A transaction in a database system must maintain Atomicity, Consistency, Isolation,
and Durability − commonly known as ACID properties − in order to ensure
accuracy, completeness, and data integrity.
Atomicity (A)
This property ensures that either all the operations of a transaction reflect in
database or none.
Example
Suppose Account A has a balance of Rs.400 & B has Rs.700. Account A is
transferring Rs.100 to Account B. This is a transaction that has two operations

a) Debiting Rs.100 from A’s balance


b) Creating Rs.100 to B’s balance.

Let’s say first operation passed successfully while second failed, in this case A’s
balance would be Rs. 300 while B would be having Rs. 700 instead of Rs. 800.
This is unacceptable in a banking system. Either the transaction should fail without
executing any of the operation or it should process both the operations. The
Atomicity property ensures that.
Consistency (C)
Integrity Constraints should be maintained. So, that the database is consistent
before and after the transaction. To preserve the consistency of database, the
execution of transaction should take place in isolation (that means no other
transaction should run concurrently when there is a transaction already running).
Example
Account A is having a balance of 400 and it is transferring 100 to account B & C
both. We have two transactions here. Let’s say these transactions run concurrently
and both the transactions read 400 balance, in that case the final balance of A
would be 300 instead of 200 .
This is wrong. If the transaction were to run in isolation, then the second
transaction would have read the correct balance 300 (before debiting 100 ) once
the first transaction went successful.

Isolation
Even though multiple transactions may execute concurrently, the system
guarantees that, for every pair of transactions Ti and Tj , it appears to Ti that either
Tj finished execution before Ti started, or Tj started execution after Ti finished.
Thus, each transaction is unaware of other transactions executing concurrently in
the system.
Durability (D)
Once a transaction completes successfully, the changes it has made into the
database should be permanent even if there is a system failure. The recovery
management component of database systems ensures the durability of transaction.

Transaction States
Any transaction at any point of time must be in one of the following states:
Active: The initial state; the transaction stays in this state while it is executing.
Partially committed: After the final statement has been executed.
Failed: After the discovery that normal execution can no longer proceed.
Aborted: After the transaction has been rolled back and the database has been
restored to its state prior to the start of the transaction
Committed: After successful completion.

A transaction that completes its execution successfully is said to be committed. A


committed transaction that has performed updates transforms the database into a
new consistent state, which must persist even if there is a system failure.
Once a transaction has committed, we cannot undo its effects by aborting it. The
only way to undo the effects of a committed transaction is to execute a
compensating transaction.
An aborted transaction has two options
• It can be restarted, but only if the transaction was aborted as a result of
some hardware or software error that was not created through the
internal logic of the transaction. A restarted transaction is considered to
be a new transaction.

• It can be killed, It usually does so because of some internal logical error


that can be corrected only by rewriting the application program, or
because the input was bad, or because the desired data were not found in
the database.
Serializability
Schedule
A chronological execution sequence of a transaction is called a schedule. A
schedule can have many transactions in it and each comprising of number of
instructions/tasks.
In general, there are two types of schedules
✓ Serial Schedules
✓ Equivalence Schedules
T1 and T2 be two transactions that transfer funds from one account to another.
Transaction T1 transfers $50 from account A to account B. It is defined as

T1: read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Transaction T2 transfers 10 percent of the balance from account A to account B. It
is defined as
T2: read(A);
temp := A * 0.1;
A := A − temp;
write(A);
read(B);
B := B + temp;
write(B).
Suppose the current values of accounts A and B are $1000 and $2000
Schedule2 Concurrent schedule
Serial Schedules
It is a schedule in which transactions are aligned in such a way that one transaction
is executed first. When the first transaction completes its cycle, then the next
transaction is executed. Transactions are ordered one after the other. This type of
schedule is called a serial schedule, as transactions are executed in a serial manner.
Equivalence Schedules
An equivalence schedule can be of the following types −
✓Result Equivalence
✓View Equivalence
✓Conflict Equivalence

Result Equivalence:
If two schedules produce the same result after execution, they are said to be
result equivalent. They may yield the same result for some value and different
results for another set of values. That's why this equivalence is not generally
considered significant.
View Equivalence
Two schedules would be view equivalence if the transactions in both the
schedules perform similar actions in a similar manner.
For example −
 If Ti reads the initial data in S1, then it also reads the initial data in S2.
 If Ti reads the value written by J in S1, then it also reads the value written by J
in S2.
 If Ti performs the final write on the data value in S1, then it also performs the
final write on the data value in S2.
Conflict Equivalence
Two Operations would be conflicting if they have the following properties –
✓ Both belong to separate transactions.
✓ Both accesses the same data item.
✓ At least one of them is "write" operation.
Example:
Conflicting operations pair (R1(A), W2(A)) because they belong to two
different transactions on same data item A and one of them is write operation.
Similarly,
(W1(A), W2(A)) and (W1(A), R2(A)) pairs are also conflicting.
On the other hand, (R1(A), W2(B)) pair is non-conflicting because they
operate on different data item.
Similarly, ((W1(A), W2(B)) pair is non-conflicting.

S: R1(X) R2(X) W1(X) R1(Y) W2(X) W1(Y)

Two schedules having multiple transactions with conflicting operations are


said to be conflict equivalent if and only if

❖ Both the schedules contain the same set of Transactions.

❖ The order of conflicting pairs of operation is maintained in both


the schedules.
Check whether the following schedules are conflict equivalent or not.
S1: R1(X) R2(X) W1(X) R1(Y) W2(X) W1(Y)
S2:R1(X) W1(X) W1(Y) R1(Y) R2(X) W2(X)

Let us consider S1. In this the conflict operations are


 R1(X) W2(X)
 R2(X) W1(X)
 W1(X) W2(X)

Now, consider S2. In this the conflict operations are


 R1(X) W2(X)
 W1(X)R2(X)
 W1(X) W2(X)

Note − View equivalent schedules are view serializable and conflict equivalent
schedules are conflict serializable. All conflict serializable schedules are view
serializable too.
Serializability
A schedule is said to be serializable schedule if it is equivalent to the any one of
the serial schedule.
Conflict Serializable Schedules
A schedule S is said to be conflict serializable schedule if it is conflict equivalent
to the any one of the serial schedule.

Method-1 for checking the conflict Serializability: -


Step-1: Generate all possible serial schedules for the given schedule. If a
schedule contains n transactions, then possible number of serial
schedules are n!
Step-2: Now compare each serial schedule with given schedule for conflict
equivalence. If any serial schedule is conflict equivalent with the given
schedule, then it is conflict serializable schedule otherwise it is not a
conflict serializable schedule.
Example: S: R1(X) R2(X) W1(X) R1(Y) W2(X) W1(Y)
Step-1: Generate all possible serial schedules for S. Here S contains 2
transactions, so the possible number of serial schedules are 2(2!). Those
are
<T1 T2> (S12)
<T2 T1> (S21)
Step-2: Now compare S with <T1 T2 > for conflict equivalence.
Now S: R1(X) R2(X) W1(X) R1(Y) W2(X) W1(Y)
S12: R1(X)W1(X) R1(Y)R2(X)W2(X)

In S the conflicting operations are


R1(X) W2(X),
R2(X) W1(X),
W1(X)W2(X)
In S12 the conflicting operations are
R1(X) W2(X),
W1(X) R2(X)
W1(X)W2(X)

Here S, S12 are not following the same order for conflicting operation pairs. S
Now compare S with <T2 T1> for conflict equivalence.

S: R1(X) R2(X) W1(X) R1(Y) W2(X) W1(Y)


S21: R2(X)W2(X)R1(X)W1(X) R1(Y) W1(Y)

In S the conflicting operations are


R1(X) W2(X),
R2(X) W1(X),
W1(X)W2(X)

In S21 the conflicting operations are


R2(X) W1(X),
W2(X)W1(X),
W2(X)R1(X)

Here S, S21 are not following the same order for conflicting operation pairs.
So, the given schedule S is not a conflict serializable schedule.

S: R1(A)W1(A)R2(A)W2(A)R1(B)W1(B)R2(B)W2(B)
View Serializable Schedules
A schedule S is said to be view Serializable schedule if it is view equivalent to
the any one of the serial schedule.

Method for checking the View Serializability:

Step-1: Generate all possible serial schedules for the given schedule. If a schedule
contains n transactions, then possible number of serial schedules are n!
Step-2: Now compare each serial schedule with given schedule for view
equivalence. If any serial schedule is view equivalent with the given
schedule, then it is view Serializable schedule otherwise it is not a view
Serializable schedule.

Example:
Check whether the schedule is view Serializable or not?
S: R2 (B); R2 (A); R1 (A); R3 (A); W1 (B); W2 (B); W3 (B);
Solution: With 3 transactions, total number of schedules possible = 3! = 6
<T1 T2 T3>
<T1 T3 T2>
<T2 T3 T1>
<T2 T1 T3>
<T3 T1 T2>
<T3 T2 T1>
Step 1: Initial Read
A: T2
B: T2
Remaining Schedules:
< T2 T1 T3>
<T2 T3 T1 >

Step 2: Write Read Sequence (WR):


No need to check here.
Step 3: Final Updation on data items:
A: -
B: T3
Since the final Updation on B is made by T3, so the transaction T3 must execute
after transactions T1 and T2.
Now, removing those schedules in which T3 is not executing at last.
Hence, view equivalent serial schedule is:
T2 → T1 → T3
Hence, given schedule is view Serializable schedule.
S:R1(A)W2(A)R3(A)W1(A)W3(A)
Recoverability
✓ A computer system, like any other device, is subject to failure from a variety
of Causes. In any failure, information may be lost.
✓ Therefore, the database system must take actions in advance to ensure that the
atomicity and durability properties of transactions are preserved.
✓ An integral part of a database system is a recovery scheme that can restore the
database to the consistent state that existed before the failure.

Failure Classification
There are various types of failure that may occur in a system, each of which needs
to be dealt with in a different manner. The failures are classified as follows
✓ Transaction failure.
✓ System crash.
✓ Disk failure.
Transaction failure
There are two types of errors that may cause a transaction to fail:
Logical error
The transaction can no longer continue with its normal execution because of some
internal condition, such as bad input, data not found, overflow, or resource limit
exceeded.
System error
The system has entered an undesirable state (for example, deadlock), as a result of
which a transaction cannot continue with its normal execution. The transaction,
however, can be executed later.

System crash
There is a hardware malfunction, or a bug in the database software or the
operating system, that causes the loss of the content of volatile storage and brings
transaction processing to a halt. The content of non-volatile storage remains intact
and is not corrupted.
Disk failure
✓ A disk block loses its content as a result of either a head crash or failure during
a data transfer operation.

To determine how the system should recover from failures, we need to identify
the failure modes of those devices used for storing data. Next, we must consider
how these failure modes affect the contents of the database.
Storage Structure
The storage structure can be divided into two categories

Volatile storage
✓ As the name suggests, a volatile storage cannot survive system crashes.
Examples
main memory ,RAM and cache memory are examples of volatile storage.

Non-volatile storage
✓ These memories are made to survive system crashes. They are huge in data
storage capacity, but slower in accessibility.

Examples:
Harddisks, magnetic tapes, flash memory, and non-volatile (battery backed up)
ROM.
Recovery and Atomicity
✓ When a system crashes, it may have several transactions being executed and
various files opened for them to modify the data items.
✓ According to atomicity of transactions must be maintained, that is, either all
the operations are executed or none.

When a DBMS recovers from a crash, it should maintain the following

❖ It should check the states of all the transactions, which were being executed.
❖ A transaction may be in the middle of some operation; the DBMS must ensure
the atomicity of the transaction in this case.
❖ It should check whether the transaction can be completed now or it needs to be
rolled back.
❖ No transactions would be allowed to leave the DBMS in an inconsistent state.

There are two types of techniques, which can help a DBMS in recovering as well
as maintaining the atomicity of a transaction
➢ Maintaining the logs of each transaction and writing them onto some stable
storage before modifying the database.
➢ Maintaining shadow paging, where the changes are done on a volatile
memory, and later, the actual database is updated.
Log-based Recovery
Log maintains the records of actions performed by a transaction. It is important
that the logs are written prior to the actual modification and stored on a stable
storage media, which is failsafe.

Log-based recovery works as follows


The log file is kept on a stable storage media.

 When a transaction enters the system and starts execution, it writes a log about
it.
<Tn, Start>
 When the transaction modifies an item X, it write logs as follows −
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V1 to V2.
 When the transaction finishes, it logs −
<Tn, commit>

The database can be modified using two approaches


➢ Deferred database modification
➢ Immediate database modification
Deferred database modification
In this all the writes are deferred to after partially commit. In this we have two
cases.
✓ All the operations are successful.
✓ Some or all the operations are failed.

If all the operations in schedule are successful, then all the writes are deferred to
partial commit otherwise no writes are deferred to partial commit.

✓ In this method write operation does not need old record i.e., (write, T, x, new
value).
✓ (Abort, T) entry never used in this method.
✓ In this method if any system crash occurs because of any transaction failure then
the schedule looks up into log file and reads all the writes until (commit, T)
entry is found. If the entry is found write all (write, T) entries into the database
otherwise redo all the operations. If (commit, T) entry is not found nothing to be
done.
✓ This method is also called as (No-undo) / redo recovery scheme.
✓ Redo must be done in a order.
Example:
(Start,T0)
(Write,T0,a,9)
Crash (Nothing is to be done because no commit found)
(commit,T0)
(start, T1)
(write,T1,a,7)
Crash (T0 committed so redo T0)
(commit,T1)
Crash (T0, T1 are committed so redo T0,T1)

Immediate Database Modification:


✓ It allows un-committed writes.
✓ Needs old, new value for each write operation.
✓ In this method if any system crash occurs because of any transaction failure then
the schedule look up into log file and reads all the writes until (commit, T) entry
is found. If the entry is found redo all the operations otherwise undo all the
operations.
✓ It is also called as undo/redo recovery scheme.
✓ Undo is done in the reverse order of the log where as redo will be done in
forward order.
✓ In this first we should perform undo operations then perform redo operation.

Example:
(Start, T0)
(Write, T0, A, 3, 9)
Crash (Undo T0 so A=3)
(Commit, T0)
(Start, T1)
(Write, T1, B, 2, 5)
(Write, T1, a, 9, 7)
Crash (undo – T1 and Redo T0 so B=2, A=9)
(Commit, T1)
Crash (Redo T0, T1 so B=5, A=7)
Recovery with concurrent Transactions
When more than one transaction are being executed in parallel, the logs are
interleaved. At the time of recovery, it would become hard for the recovery system
to backtrack all logs, and then start recovering. To ease this situation, most modern
DBMS use the concept of 'checkpoints’.
Checkpoint
✓ Checkpoint declares a point before which the DBMS was in consistent state,
and all the transactions were committed.
✓ Checkpoint is a mechanism where all the previous logs are removed from the
system and stored permanently in a storage disk.
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the
following manner
✓ The recovery system reads the logs backwards from the end to the last
checkpoint.
✓ It maintains two lists, an undo-list and a redo-list.
✓ If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just
<Tn, Commit>, it puts the transaction in the redo-list.
✓ If the recovery system sees a log with <Tn, Start> but no commit or abort log
found, it puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed.
All the transactions in the redo-list and their previous logs are removed and then
redone before saving their logs.
Example:
(Start, T1)
(Write, T1, B, 2, 3)
(Start, T2)
(Commit, T1)
(Write, T2, C, 5, 7)
(Commit, T2)
(Checkpoint, {T2})
(Start, t3)
(Write, T3, A, 1, 9)
(Commit, T3)
(Start, T4)
(Write, T4, C, 7, 2)
In the above example Undo list is – T4
Redo list is – T2,T3
First it performs Undo list in the order i.e., T4 and then it performs Redo T2,T3.
Deadlock Handling
Deadlock is a state of a database system having two or more transactions, when
each transaction is waiting for a data item that is being locked by some other
transaction.
A deadlock can be indicated by a cycle in the wait-for-graph. This is a directed
graph in which the vertices denote transactions and the edges denote waits for
data items.

Example
Transaction T1 is waiting for data item X which is locked by T3. T3 is waiting for
Y which is locked by T2 and T2 is waiting for Z which is locked by T1. Hence, a
waiting cycle is formed, and none of the transactions can proceed executing.

T1
X
Z

T2 T3
Y
There are three classical approaches for deadlock handling, namely
❖ Deadlock prevention.
❖ Deadlock avoidance.
❖ Deadlock detection and removal.
All of the three approaches can be incorporated in both a centralized and a
distributed database system.

Deadlock Prevention
 The deadlock prevention approach does not allow any transaction to acquire
locks that will lead to deadlock.
 The convention is that when more than one transactions request for locking the
same data item, only one of them is granted the lock.
 One of the most popular deadlock prevention methods is pre-acquisition of all
the locks.
 In this method, a transaction acquires all the locks before starting to execute
and retains the locks for the entire duration of transaction.
 Using this approach, the system is prevented from being deadlocked since
none of the waiting transactions are holding any lock.
There are two algorithms for this purpose, namely wait-die and wound-wait.
Let us assume that there are two transactions, T1 and T2, where T1 tries to lock a
data item which is already locked by T2. The algorithms are as follows
Wait-Die
If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is younger than T2,
T1 is aborted and later restarted.

Wound-Wait
If T1 is older than T2, T2 is aborted and later restarted. Otherwise, if T1 is
younger than T2, T1 is allowed to wait.

Deadlock Avoidance
✓ The deadlock avoidance approach handles deadlocks before they occur. It
analyzes the transactions and the locks to determine whether waiting leads to a
deadlock or not .
✓ When a transaction requests a lock on data item The lock manager checks
whether the lock is available. If it is available, the lock manager allocates the
data item and the transaction acquires the lock.
✓ If the item is locked by some other transaction in incompatible mode, the lock
manager runs an algorithm to test whether keeping the transaction in waiting
state will cause a deadlock or not.
Deadlock Detection and Removal
 The deadlock detection and removal approach runs a deadlock detection
algorithm periodically and removes deadlock in case there is one.

 When a transaction requests a lock, the lock manager checks whether it is


available. If it is available, the transaction is allowed to lock the data item;
otherwise the transaction is allowed to wait.

 To detect deadlocks, the lock manager periodically checks if the wait-for graph
has cycles.

 If the system is deadlocked, the lock manager chooses a victim transaction


from each cycle. The victim is aborted and rolled back; and then restarted later.

Some of the methods used for victim selection are


➢ Choose the youngest transaction.
➢ Choose the transaction with fewest data items.
➢ Choose the transaction that has performed least number of updates.
➢ Choose the transaction having least restart overhead.
➢ Choose the transaction which is common to two or more cycles.
Concurrency Control
In a multiprogramming environment where multiple transactions can be executed
simultaneously, it is highly important to control the concurrency of transactions.
We have concurrency control protocols to ensure atomicity, isolation, and
serializability of concurrent transactions. Concurrency control protocols can be
broadly divided into two categories
 Lock based protocols
 Time stamp based protocols

Lock based protocol


This protocol requires that all the data items must be accessed in a mutually
exclusive manner, i.e. when one transaction is executing then no other transaction
should interrupt the same object.

To implement Lock based protocol we need 2 types of locks.

Shared Lock: When we take this lock we can just read the item but cannot write.
Exclusive Lock: In this type of lock we can write as well as read the data item.
Below table will give clear idea about what we can do and cannot while having
shared or exclusive lock.
In general we have 2 types of locking protocols. Those are
❖ Simple locking protocol
❖ 2 – Phase locking Protocol

Simple locking protocol


In this simply we have only two operations.
1. Lock the item before access that item.
Read operation (Lock-S (A))
Write operation & Read operation (Lock- X (A))
2. Unlock the item after access
Unlock-S (A)
Unlock-X (A)
Problem 1: Inconsistency in the database
So above we have shown one interleaving and we could understand the data flow
by below example.
Output:
Let say in the start of the schedule A=100 and B=200 so in total we should have
300 always. But as we see that after Till Display (B+A) we have A=50 and B=200
to total=250 but it should be 300 to make consistency.

Problem 2: Deadlock Problem


Let’s say T2 is waiting for T1 to unlock (A) and T1 is waiting for T2 to unlock (B)
Two-Phase Locking protocol
A transaction is said to follow the Two-Phase Locking protocol if Locking and
Unlocking can be done in two phases.
1. Growing Phase: New locks on data items may be acquired but none can be
released.
2. Shrinking Phase: Existing locks may be released but no new locks can be
acquired.
LOCK POINT: The Point at which the growing phase ends, i.e., when a
transaction takes the final lock it needs to carry on its work
2 PL suffers from other problems such as cascading rollback and rollback.
Cascading Rollback Problem:
Failure of one Transaction leads to failure of its sub sequent Transactions. Here we
could see cascading rollback by the following example

So if transaction T1 rollbacks then transaction T2 and T3 has to rollback.


Deadlock Problem Example

In 2 PL we have seen:
Serializability: It is guaranteed to happen.
Cascading Rollback: It is possible which is bad.
Deadlock: It is possible.

Strict Two-phase locking (Strict-2PL)


The first phase of Strict-2PL is like 2PL.
✓ In the first phase, after acquiring all the locks, the transaction continues to
execute normally.
✓ The only difference between 2PL and strict 2PL is that Strict-2PL does not
release a lock after using it.
✓ Strict-2PL waits until the whole transaction to commit, and then it releases all
the Exclusive locks at a time.
✓ Strict-2PL protocol does not have shrinking phase of lock release.
Strict 2-PL ensures that our schedule is:
✓ Recoverable
✓ Cascade less
Rigorous 2-PL
✓ In the first phase, after acquiring all the locks, the transaction continues to
execute normally.
✓ The only difference between 2PL and Rigorous 2PL is that Rigorous 2PL does
not release a lock after using it.
✓ Rigorous 2PL waits until the whole transaction to commit, and then it releases
all the Exclusive locks and Shared locks at a time.
Rigorous 2-PL ensures that our schedule is:
✓ Recoverable
✓ Cascade less

The difference between


Strict 2-PL and Rigorous 2-PL is that Rigorous is more restrictive, it requires both
Exclusive and Shared locks to be held until after the Transaction commits and this
is what makes the implementation of Rigorous 2-PL easier.
Conservative 2-PL
✓ This protocol requires the transaction to lock all the items it access before the
Transaction begins execution by predeclaring it’s read-set and write-set.
✓ If any of the required locks are not available, the transaction waits until it gets
them.
✓ Conservative 2-PL is Deadlock free, but it does not ensure a Strict schedule.
✓ It is difficult to use in practice because of the need to predeclare the read-set
and the write-set which is not possible in many situations.

The Venn Diagram below shows the classification of schedules that are rigorous
strict and conservative.
Example 1:

Now we will see what is the above schedule following the properties discussed above

2 PL: There is growing and shrinking phase.


Strict 2 PL: There is Lock-x (B) and it is unlocked before commit so no strict 2 PL.
Rigorous 2 PL: If it is not strict 2 PL then it can’t be Rigorous.
Conservative 2 PL: If it is not strict 2 PL then it can’t be conservative.
Example 2:

2 PL: There is growing and shrinking phase so it is 2 PL.


Strict 2 PL: Exclusive locks are unlocked after commit. So yes it is.
Rigorous: We have unlocked all the locks after commit so it is rigorous.
Conservative: We have not taken all the locks at first then start the transaction so no
conservative.
Example 3:

2 PL: There is growing and shrinking phase so it is 2 PL.


Strict 2 PL: Exclusive locks are unlocked after commit. so it is strict.
Rigorous: We have unlocked all the locks after commit so it is rigorous.
Conservative: We have taken all the locks at first then start the transaction so it is
conservative.
Time Stamp Protocol
✓ The Timestamp Ordering Protocol is used to order the transactions based on
their Timestamps.
✓ The priority of the older transaction is higher that's why it executes first. To
determine the timestamp of the transaction, this protocol uses system time or
logical counter.
✓ The lock-based protocol is used to manage the order between conflicting pairs
among transactions at the execution time. But Timestamp based protocols start
working as soon as a transaction is created. We use two Timestamp Values
relating to each database item X.
✓ W_TS(X) is the largest timestamp of any transaction that
executed write(X) successfully.
✓ R_TS(X) is the largest timestamp of any transaction that
executed read(X) successfully.

Basic Timestamp ordering protocol works as follows:


Check the following condition whenever a transaction Ti issues a Read
(X) operation:
✓ If W_TS(X) >TS(Ti) then the operation is rejected.
✓ If W_TS(X) <= TS(Ti) then the operation is executed.
✓ Timestamps of all the data items are updated.
Check the following condition whenever a transaction Ti issues
a Write(X) operation:
✓ If R_TS(X) > TS(Ti) then the operation is rejected.
✓ If W_TS(X) > TS(Ti) then the operation is rejected and Ti is rolled back
otherwise the operation is executed.
Where,
TS(Ti) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.

Advantages and Disadvantages of TO protocol:


✓ Timestamp Ordering protocol ensures serializability
✓ Timestamp protocol ensures freedom from deadlock as no transaction ever
waits.
✓ But the schedule may not be cascade free, and may not even be
recoverable.
Thomas Write Rule
Timestamp Ordering Protocol states that if Ri(X) and Wj(X) are conflicting
operations then Ri (X) is processed before Wj(X) if and only if TS(Ti) < TS(Tj).
Whenever a schedule does not follow a serializability order according to the
Timestamp, a user generally rejects it and rollback the Transaction.
✓ Thomas Write Rule allows such operations and is a modification on the Basic
Timestamp Ordering protocol. In Thomas Write Rule users ignore outdated
writes.
Thomas Write Rule
✓ Thomas Write Rule does not enforce Conflict Serializability but rejects fewer
Write Operations by modifying the check Operations for W_item(X)

1. If R_TS(X) > TS(T), then abort and roll back T and reject the operation.
2. If W_TS(X) > TS(T), then don’t execute the Write Operation and continue
processing. This is a case of Outdated or Obsolete Writes. Remember, outdated
writes are ignored in Thomas Write Rule but a Transaction following Basic TO
protocol will abort such a Transaction.
3. If neither the condition in 1 or 2 occurs, then and only then execute the
W_item(X) operation of T and set W_TS(X) to TS(T)
Validation Based Protocol (Optimistic Concurrency)
validation based protocol executes transaction in three phases:

Read phase
In this phase, the transaction T is read and executed. It is used to read the value of
various data items and stores them in temporary local variables. It can perform all
the write operations on temporary variables without an update to the actual
database.

Validation phase
In this phase, the temporary variable value will be validated against the actual data
to see if it violates the serializability.

Write phase
If the validation of the transaction is validated, then the temporary results are
written to the database or system otherwise the transaction is rolled back.
Here each phase has the following different timestamps:
Start(Ti): It contains the time when Ti started its execution.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its
validation phase.
Finish(Ti): It contains the time when Ti finishes its write phase.

✓ This protocol is used to determine the time stamp for the transaction for
serialization using the time stamp of the validation phase, as it is the actual
phase which determines if the transaction will commit or rollback.
✓ Hence TS(T) = validation(T).
✓ The serializability is determined during the validation process. It can't be
decided in advance.
✓ While executing the transaction, it ensures a greater degree of concurrency and
also less number of conflicts.
✓ Thus it contains transactions which have less number of rollbacks.
Multiple Granularity
Granularity: It is the size of data item allowed to lock.
✓ Multiple Granularity can be defined as hierarchically breaking up the database
into blocks which can be locked.
✓ The Multiple Granularity protocol enhances concurrency and reduces lock
overhead.
✓ It maintains the track of what to lock and how to lock.
✓ It makes easy to decide either to lock a data item or to unlock a data item. This
type of hierarchy can be graphically represented as a tree.
Example
✓ Consider a tree which has four levels of nodes.
✓ The first level or higher level shows the entire database.
✓ The second level represents a node of type area. The higher level database
consists of exactly these areas.
✓ The area consists of children nodes which are known as files. No file can be
present in more than one area.
✓ Finally, each file contains child nodes known as records. The file has exactly
those records that are its child nodes. No records represent in more than one
file.
Hence, the levels of the tree starting from the top level are as follows:
1.Database
2.Area
3.File
4.Record
There are three additional lock modes with multiple granularity:
Intention Mode Lock
Intention-shared (IS): It contains explicit locking at a lower level of the tree but
only with shared locks.
Intention-Exclusive (IX): It contains explicit locking at a lower level with
exclusive or shared locks.
Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared
mode, and some node is locked in exclusive mode by the same transaction.

Compatibility Matrix
It uses the intention lock modes to ensure serializability. It requires that if a
transaction attempts to lock a node, then that node must follow these protocols:

✓ Transaction T1 should follow the lock-compatibility matrix.


✓ Transaction T1 firstly locks the root of the tree. It can lock it in any mode.
✓ If T1 currently has the parent of the node locked in either IX or IS mode, then
the transaction T1 will lock a node in S or IS mode only.
✓ If T1 currently has the parent of the node locked in either IX or SIX modes,
then the transaction T1 will lock a node in X, SIX, or IX mode only.
✓ If T1 has not previously unlocked any node only, then the Transaction T1 can
lock a node.
✓ If T1 currently has none of the children of the node-locked only, then
Transaction T1 will unlock a node.

In multiple-granularity, the locks are acquired in top-down order, and locks must
be released in bottom-up order.
Algorithm for Recovery and Isolation Exploiting Semantics (ARIES)

ARIES algorithm follows three principles


write-ahead logging (WAL)
✓ It provides atomicity and durability (two of the ACID properties) in database
systems.
✓ WAL states that the changes are first recorded in the log, the changes should
be written to stable storage, before they are written to the database.

Repeating history during Redo


On restart after a crash, ARIES retraces the actions of a database before the crash
and brings the system back to the exact state that it was in before the crash.

Logging changes during Undo


Changes made to the database while undoing transactions are logged to ensure
such an action isn't repeated in the event of repeated restarts
Log sequence number
✓ It refers to a pointer used to identify the log records.

Dirty page table


✓ It refers to pages with their updated version placed in main memory and disk
version of it is not updated.
✓ A table is maintained which is useful in reducing unnecessary redo operation.

The ARIES recovery procedure consists of three main steps:

Analysis
✓ The analysis step identifies the dirty (updated) pages in the buffer and the set of
transactions active at the time of the crash.
✓ The appropriate point in the log where the REDO operation should start is also
determined

REDO

✓ The REDO phase updates only committed transactions from the log to the
database.
✓ ARIES will have information which provides the start point for REDO
✓ Information stored by ARIES and in the data pages will allow ARIES to determine
whether the operation to be redone had been applied to the database.
✓ Thus only the necessary REDO operations are applied during recovery.
UNDO
✓ During the UNDO phase, the log is scanned backwards and the operations of
transactions that were active at the time of the crash are undone in reverse
order.
✓ The information needed for ARIES to accomplish its recovery procedure
includes the log, the Transaction Table, and the Dirty Page Table.
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING
(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

17CI09 Data Base Management Systems


Program & Semester: B.Tech & III SEM
CSE
Academic Year: 2021 - 22

UNIT V: Physical Database Design


Overview of Physical Storage
Cache
✓ The cache is the fastest and most costly form of storage.
✓ Cache memory is relatively small; cache effects when designing query
processing data structures and algorithms.
Main memory.
✓ The storage medium used for data that are available to be operated on is
main memory.
✓ The general-purpose machine instructions operate on main memory.
✓ Although main memory may contain several gigabytes of data on a personal
computer, or even hundreds of gigabytes of data in large server systems, it is
generally too small (or too expensive) for storing the entire database.
Flash memory.
✓ Flash memory differs from main memory in that stored data are retained even
if power is turned off (or fails).
✓ There are two types of flash memNAND flash has a much higher storage
capacity for a given cost, and is widely used for data storage in devices s uch as
cameras, music players, and cell phones, and ory, called NAND and NOR flash.
Of these, increasingly, in laptop computers as well.
Flash memory is alsowidely used for storing data in “USB keys,” which can be
plugged into the Universal Serial Bus (USB) slots of computing devices.
Magnetic-disk storage
✓The primary medium for the long-term online storage of data is the magnetic
disk.
✓Usually, the entire database is stored on magnetic disk. The system must move
the data from disk to main memory so that they can be accessed.
Optical storage
✓ The most popular forms of optical storage are the compact disk (CD), and
the digital video disk (DVD).
✓ The optical disks used in read-only compact disks (CD-ROM) or read-only
digital video disks (DVD-ROM) cannot be written, but are supplied with data
pre-recorded.
✓ There are also “record-once” versions of compact disk (called CD-R) and
digital video disk (called DVD-R and DVD+R), which can be written only
once.
Tape storage
Tape storage is used primarily for backup and archival data. Although magnetic
tape is cheaper than disks, access to data is much slower, because the tape must
be accessed sequentially from the beginning. For this reason, tape storage is
referred to as sequential-access storage.
Magnetic Disk
Magnetic disks provide the bulk of secondary storage for modern computer
systems. A very large database may require hundreds of disks. In recent years,
flash-memory storage sizes have grown rapidly, and flash storage is increasingly
becoming a competitor to magnetic disk storage for several applications.
Physical Characteristics of Disks
Each disk platter has a flat, circular shape. Its two surfaces are covered with a
magnetic material, and information is recorded on the surfaces.
There is a read–write head positioned just above the surface of the platter. The
disk surface is logically divided into tracks, which are subdivided into sectors. A
sector is the smallest unit of information that can be read from or written to the
disk.
The read–write head stores information on a sector magnetically.
A disk typically contains many platters, and the read–write heads of all the tracks
are mounted on a single assembly called a disk arm, and move together.
RAID (Redundant Arrays of Independent Disks)
✓ RAID, or “Redundant Arrays of Independent Disks” is a technique which
makes use of a combination of multiple disks instead of using a single disk for
increased performance, data redundancy or both.

✓ Data redundancy, although taking up extra space, adds to disk reliability. This
means, in case of disk failure, if the same data is also backed up onto another
disk, we can retrieve the data and go on with the operation.

✓ If the data is spread across just multiple disks without the RAID technique, the
loss of a single disk can affect the entire data.
✓ RAID is very transparent to the underlying system.
✓ To the host system, it appears as a single big disk presenting itself as a linear
array of blocks. This allows older technologies to be replaced by RAID
without making too many changes in the existing code.
✓ RAID System is evaluated by Reliability, Availability, Performance, Capacity.
✓ There are 7 levels of RAID schemes. These schemas are as RAID 0, RAID 1,
...., RAID 6.
RAID-0 (Striping)
✓ RAID level 0 provides data stripping, i.e., a data can place across multiple
disk.

✓ Instead of placing just one block into a disk at a time, we can work with two
(or more) blocks placed into a disk before moving on to the next one.
Evaluation
Reliability: 0
✓ There is no duplication of data. Hence, a block once lost cannot be recovered.
Capacity: N*B
✓ The entire space is being used to store data. Since there is no duplication, N disks
each having B blocks are fully utilized.
RAID-1 (Mirroring)
✓ More than one copy of each block is stored in a separate disk. Thus, every block
has two (or more) copies, lying on different disks.

✓ The above figure shows a RAID-1 system with mirroring level 2.


✓ Every write of a disk block involves a write on both disks.
✓ RAID 0 was unable to tolerate any disk failure. But RAID 1 is capable of
reliability.
Evaluation:
Assume a RAID system with mirroring level 2.
Reliability: 1 to N/2
✓ 1 disk failure can be handled for certain, because blocks of that disk would
have duplicates on some other disk.
✓ If we are lucky enough and disks 0 and 2 fail, then again this can be handled as
the blocks of these disks have duplicates on disks 1 and 3. So, in the best case,
N/2 disk failures can be handled.
Capacity: N*B/2
✓ Only half the space is being used to store data. The other half is just a mirror to
the already stored data.
RAID 2(Error- Correcting Code)
✓ RAID 2 consists of bit-level striping using hamming code parity.
✓ In this level, each data bit in a word is recorded on a separate disk and ECC
(Error- Correcting Codes) of data words is stored on different set disks.
✓ Due to its high cost and complex structure, this level is not commercially used.
The same performance can be achieved by RAID 3 at a lower cost.
Evaluation:
✓ This level uses one designated drive to store parity.
✓ It uses the hamming code for error detection.
✓ It requires an additional drive for error detection

RAID 3 (Byte-level striping )


✓ RAID 3 consists of byte-level striping with dedicated parity.
✓ In this level, the parity information is stored for each disk section and written
to a dedicated parity drive.
✓ In case of drive failure, the parity drive is accessed, and data is reconstructed
from the remaining devices. Once the failed drive is replaced, the missing data
can be restored on the new drive.
✓ In this level, data can be transferred in bulk. Thus high-speed data transmission
is possible.

Evaluation:
✓ In this level, data is regenerated using parity drive.
✓ It contains high data transfer rates.
✓ In this level, data is accessed in parallel.
✓ It required an additional drive for parity.
✓ It gives a slow performance for operating on small sized files.

RAID 4 (Block-Level Striping with Dedicated Parity)


Instead of duplicating data, this adopts a parity-based approach.
Parity is calculated using a simple XOR function. If the data bits are
0,0,0,1 the parity bit is XOR(0,0,0,1) = 1. If the data bits are 0,1,1,0 the
parity bit is XOR(0,1,1,0) = 0. A simple approach is that even number of
ones results in parity 0, and an odd number of ones results in parity 1.

Evaluation:
✓ Reliability: 1
RAID-4 allows recovery of at most 1 disk failure (because of the way parity
works). If more than one disk fails, there is no way to recover the data.
✓ Capacity: (N-1)*B
One disk in the system is reserved for storing the parity. Hence, (N-1) disks are
made available for data storage, each disk having B blocks.
RAID-5 (Block-Level Striping with Distributed Parity)
This is a slight modification of the RAID-4 system where the only difference is
that the parity rotates among the drives.

This was introduced to make the random write performance better.


Evaluation:
Reliability: 1
✓ RAID-5 allows recovery of at most 1 disk failure (because of the way parity
works). If more than one disk fails, there is no way to recover the data. This is
identical to RAID-4.
Capacity: (N-1)*B
✓ Overall, space equivalent to one disk is utilized in storing the parity. Hence,
(N-1) disks are made available for data storage, each disk having B blocks.
RAID-6 (Block-Level Striping with Dual Distributed
Parity)
✓ This level is an extension of RAID 5. It contains block-level stripping with 2
parity bits.
✓ In RAID 6, you can survive 2 concurrent disk failures.
✓ RAID 6 plays its part where you can survive two concurrent disk failures before
you run out of options
File Organization
A file is organized logically as a sequence of records. These records are mapped
onto disk blocks.
 A page is collection of slots, each of which contains a record. Each page assigned
an page id to identify in the File.
 A record is identified by using the pair (page id, slot number); this is the record id
(rid). Alternative approaches to managing slots on a page
 In addition to the information about records on the page, a page usually contains
additional file-level information.

Fixed-Length Records
 All records on the page are guaranteed to be of the same length, record slots arc
uniform and can be arranged consecutively within a page.
 When a record is inserted into the page, we must locate an empty slot and place the
record.
 For inserting a record into empty slot we must find where we have a empty slot it
can be done in two ways
 The first alternative is to store records in the first N. whenever a record is deleted,
we move the last record on the page into the vacated slot.
➢ This format allows us to locate the ith record on a page by a simple offset
calculation, and all empty slots appear together at the end of the page.
➢ This approach docs not work if there are external references to the record
 The second alternative is to handle deletions by using an array of bits, one per
slot, to keep track of free slot information.
 Locating records on the page requires scanning the bit array to find slots
whose bit is on; when a record is deleted, its bit is turned off.

Variable-Length Records
 In variable length record type we cannot divide the page into a fixed collection
of slots. The problem is that, when a new record is to be inserted, we have to
find an empty slot of just the right length.
 When a record is deleted, we must move records to fill the hole created by the
deletion, to ensure that all the free space on the page is contiguous
 The most flexible organization for variable-length records is to maintain a
directory of slots for each page, with a (record offset (Location), record
length(size)) pair per slot.
 The first component (record offset) is a 'pointer' to the record.
 Deletion is done by setting the record offset to -1.
 Records can be moved around on the page because the rid, which is the page
number and slot number does not change when the record is moved; only the
record offset stored in the slot changes.
RECORD FORMATS
 While choosing a way to organize the fields of a record, we must consider
whether the fields of the record are of fixed or variable length and consider the
cost of various operations on the record.
 Information common to all records of a given record type (such as the number
of fields and field types) is stored in the system catalog, which can be thought of
as a description of the contents of a database, maintained by the DBMS.

Fixed-Length Records
 In a fixed-length record, each field has a fixed length, and the number of fields
is also fixed.
 The fields of such a record can be stored consecutively, and given the address of
the record, the address of a particular field can be calculated using information
about the lengths of preceding fields.
Variable-Length Records
 In the relational model, every record in a relation contains the same number of
fields. If the number of fields is fixed, a record is of variable length only
because some of its fields are of variable length.
 One possible organization is to store fields consecutively, separated by
delimiters . This organization requires a scan of the record to locate a desired
field.
 Another possible organization is to reserve some space at the beginning of a
record for use as an array of integer offsets.
 The ith integer in this array is the starting address of the ith field value relative
to the start of the record.
 we also store an offset to the end of the record; this offset is needed to
recognize where the last held ends.
File Organization
A database consist of a huge amount of data. The data is grouped within a table in
RDBMS, and each table have related records. A user can see that the data is stored
in form of tables, but in actual this huge amount of data is stored in physical
memory in form of files.
File : A file is named collection of related information that is recorded on
secondary storage such as magnetic disks, magnetic tables and optical
disks.
File Organization refers to the logical relationships among various records that
constitute the file, particularly with respect to the means of identification and
access to any specific record.

Types of File Organizations


✓ Sequential File Organization
✓ Heap File Organization
✓ Hash File Organization
✓ B+ Tree File Organization
✓ Clustered File Organization
Sequential File Organization
The easiest method for file Organization is Sequential method. In this method the
file are stored one after another in a sequential manner. There are two ways to
implement this method:
✓ Pile Method
✓ Sorted File Method
Pile Method
This method is quite simple, in which we store the records in a sequence i.e one
after other in the order in which they are inserted into the tables
Advantages
✓ Fast and efficient method for huge amount of data.
✓ Simple design.
✓ Files can be easily stored in magnetic tapes i.e cheaper storage mechanism.
Disadvantages
✓ Time wastage as we cannot jump on a particular record that is required, but we
must move in a sequential manner which takes our time.
✓ Sorted file method is inefficient as it takes time and space for sorting records.

Heap File Organization

✓ Heap File Organization works with data blocks.


✓ In this method records are inserted at the end of the file, into the data blocks.
✓ No Sorting or Ordering is required in this method.
✓ If a data block is full, the new record is stored in some other block, Here the
other data block need not be the very next data block, but it can be any block
in the memory.
✓ It is the responsibility of DBMS to store and manage the new records.
If a new block is inserted
Advantages
✓ Fetching and retrieving records is faster than sequential record but only in case
of small databases.
✓ When there is a huge number of data needs to be loaded into the database at a
time, then this method of file Organization is best suited.
Disadvantages
✓ Problem of unused memory blocks.
✓ Inefficient for larger databases.

Hash File Organization


✓ Hashing is an efficient technique to directly search the location of desired data
on the disk without using index structure.
✓ Data is stored at the data blocks whose address is generated by using hash
function.
✓ The memory location where these records are stored is called as data block or
data bucket.
Data bucket
Data buckets are the memory locations where the records are stored. These
buckets are also considered as Unit Of Storage.
Hash Function
✓ Hash function is a mapping function that maps all the set of search keys to
actual record address.
✓ Generally, hash function uses primary key to generate the hash index.
✓ Hash function can be simple mathematical function to any complex
mathematical function.
Hashing is further divided into two sub categories :
✓ Static Hashing
✓ Dynamic Hashing
Static Hashing
In static hashing, when a search-key value is provided, the hash function always
computes the same address.
For example, if we want to generate address for STUDENT_ID = 76 using mod
(5) hash function, it always result in the same bucket address 1. There will not be
any changes to the bucket address here.
Hence number of data buckets in the memory for this static hashing remains
constant throughout.
Insertion
When a new record is inserted into the table, The hash function h generate a
bucket address for the new record based on its hash key K.
Bucket address = h(K)
Searching
When a record needs to be searched, The same hash function is used to retrieve
the bucket address for the record.
If we want to retrieve whole record for ID 76, and if the hash function is mod (5)
on that ID, the bucket address generated would be 1. Then we will directly got to
address 1 and retrieve the whole record for ID 76. Here ID acts as a hash key.
Deletion
If we want to delete a record, Using the hash function we will first fetch the
record which is supposed to be deleted. Then we will remove the records for that
address in memory.
Updation
The data record that needs to be updated is first searched using hash function, and
then the data record is updated.
When, we want to insert some new records into the file, but the data bucket
address generated by the hash function is not empty or the data already exists in
that address. This becomes a critical situation to handle. This situation in the
static hashing is called bucket overflow.
 Open Hashing
 Closed hashing
 Quadratic probing
 Double Hashing
Open Hashing
Open hashing method, next available data block is used to enter the new record,
instead of overwriting older one. This method is also called linear probing.

Closed hashing
In Closed hashing method, a new data bucket is allocated with same address and is
linked it after the full data bucket. This method is also known as overflow
chaining.
Quadratic probing
Quadratic probing is an open-addressing scheme where we look for i2‘th slot in
i’th iteration if the given hash value x collides in the hash table.
How Quadratic Probing is done?
Let hash(x) be the slot index computed using the hash function.
If the slot hash(x) % S is full, then we try (hash(x) + 1*1) % S.
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S.
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S.
This process is repeated for all the values of i until an empty slot is found

Double hashing
Double hashing is a collision resolving technique in Open Addressed Hash tables.
Double hashing uses the idea of applying a second hash function to key when a
collision occurs.

Double hashing can be done using :


(hash1(key) + i * hash2(key)) % TABLE_SIZE
Here hash1() and hash2() are hash functions and TABLE_SIZE is size of hash
table.
Dynamic Hashing
In Dynamic hashing, data buckets grows or shrinks (added or removed
dynamically) as the records increases or decreases. Dynamic hashing is also
known as extended hashing.
Extendible Hashing
Extendible Hashing is a dynamic hashing method where in directories, and
buckets are used to hash data. It is an aggressively flexible method in which the
hash function also experiences dynamic changes.
Directories
These containers store pointers to buckets. Each directory is given a unique id
which may change each time when expansion takes place. The hash function
returns this directory id which is used to navigate to the appropriate bucket.
Number of Directories = 2^Global Depth.
Buckets
They store the hashed keys. Directories point to buckets. A bucket may contain
more than one pointers to it if its local depth is less than the global depth.
Global Depth
It is associated with the Directories. They denote the number of bits which are
used by the hash function to categorize the keys. Global Depth = Number of bits
in directory id.
Local Depth
✓ It is the same as that of Global Depth except for the fact that Local Depth is
associated with the buckets and not the directories.
✓ Local depth in accordance with the global depth is used to decide the action
that to be performed in case an overflow occurs.
✓ Local Depth is always less than or equal to the Global Depth.

Bucket Splitting:
When the number of elements in a bucket exceeds a particular size, then the
bucket is split into two parts.
Directory Expansion
Directory Expansion Takes place when a bucket overflows. Directory Expansion
is performed when the local depth of the overflowing bucket is equal to the global
depth.
Example of hashing the following elements: 16,4,6,22,24,10,31,7,9,20,26.
Bucket Size: 3 (Assume)
Hash Function: Suppose the global depth is X. Then the Hash Function returns X
LSBs.
16- 10000 , 4- 00100 , 6- 00110 , 22- 10110 , 24- 11000 , 10- 01010 ,
31- 11111, 7- 00111, 9- 01001 , 20- 10100 , 26- 11010
The drawback of static hashing is that that it does not expand or shrink
dynamically as the size of the database grows or shrinks

Dynamic Hashing
In Dynamic hashing, data buckets grows or shrinks as the records increases or
decreases. Dynamic hashing is also known as extended hashing.

B+ Tree File Organization


B+ Tree, as the name suggests, It uses a tree like structure to store records in File.
It uses the concept of Key indexing where the primary key is used to sort the
records.
B+ Tree is very much like binary search tree, with the only difference that instead
of just two children, it can have more than two.
All the information is stored in leaf node and the intermediate nodes acts as
pointer to the leaf nodes.
The information in leaf nodes always remain a sorted sequential linked list.

Advantages
Tree traversal is easier and faster.
Searching becomes easy as all records are stored only in leaf nodes and are sorted
sequential linked list.
There is no restriction on B+ tree size. It may grows/shrink as the size of data
increases/decreases.
Disadvantages
Inefficient for static tables.
Cluster File Organization
In cluster file organization, two or more related tables/records are stored withing
same file known as clusters.
These files will have two or more tables in the same data block and the key
attributes which are used to map these table together are stored only once.
Thus it lowers the cost of searching and retrieving various records in different files
as they are now combined and kept in a single cluster.
For example we have two tables or relation Employee and Department. These table
are related to each other
These table are allowed to combine using a join operation and can be seen in a
cluster file.

If we must insert, update or delete any record we can directly do so. Data is sorted
based on the primary key or the key with which searching is done. Cluster key is
the key with which joining of the table is performed.

There are two ways to implement this method:


✓ Indexed Clusters
✓ Hash Clusters
Indexed Clusters:
In Indexed clustering the records are group based on the cluster key and
stored together. The above example of the Employee and Department
relationship is an example of Indexed Cluster where the records are based
on the Department ID.
Hash Clusters:
This is very much similar to indexed cluster with only difference that
instead of storing the records based on cluster key, we generate hash key
value and store the records with same hash key value.
Advantages of Cluster file organization
• The cluster file organization is used when there is a frequent request for joining
the tables with same joining condition.
• It provides the efficient result when there is a 1:M mapping between the tables.

Disadvantages of Cluster file organization


• This method has the low performance for the very large database.
• If there is any change in joining condition, then this method cannot use. If we
change the condition of joining then traversing the file takes a lot of time.
• This method is not suitable for a table with a 1:1 condition.
Comparison Of File Organizations:
Sequentia Heap/ Hash ISAM B+ tree Cluster
l Direct
Method Stored as Stored at Stored at Address Stored in Frequentl
of they come the end of the hash index is a tree like y joined
storing or sorted the file. address appended structure tables are
as they But the generated to the clubbed
come address in record into one
the file based
memory on cluster
is random. key
Types Pile file and Static and Dense, Indexed
sorted file dynamic Sparse, and Hash
Method hashing multilevel
indexing
Design Simple Simplest Medium Complex Complex Simple
Design
Storage Cheap Cheap Medium Costlier Costlier Medium
Cost (magnetic
tapes)
Indexing
✓ Indexing is a way to optimize the performance of a database by minimizing
the number of disk accesses required when a query is processed.
✓ It is a data structure technique which is used to quickly locate and access the
data in a database.
Indexes are created using a few database columns.
The first column is the Search key that contains a copy of the primary key or
candidate key of the table.
The second column is the Data Reference or Pointer which contains a set of
pointers holding the address of the disk block where that particular key value can
be found.
Classification Of Index
Dense Index
✓ For every search key value in the data file, there is an index record.
✓ This record contains the search key and also a reference to the first data
record with that search key value.
Sparse Index
The index record appears only for a few items in the data file. Each item
points to a block as shown.
Types of Indexing
1. Single-level Indexing
❖ Primary indexing
❖ Clustering Indexing
❖ Secondary Indexing
2. Multilevel Indexing
❖ B Trees
❖ B+ Trees

Primary indexing
It is defined mainly on the primary key of the data-file, in which the data-
file is already ordered based on the primary key.
 Primary Index is an ordered file whose records are of fixed length with
two fields. The first field of the index replicates the primary key of the
data file in an ordered manner, and the second field of the ordered file
contains a pointer that points to the data-block where a record
containing the key is available.
 The first record of each block is called the Anchor record or Block
anchor. There exists a record in the primary index file for every block of
the data-file.
 The average number of blocks using the Primary Index is = log2B + 1,
where B is the number of index blocks.
Clustered index:
Clustered index is created only when both the following conditions satisfy
1. The data or file, that you are moving into secondary memory should be
in sequential or sorted order.
2. There should be non key value, meaning it can have repeated values.

 Whenever you apply clustered indexing in a table, it will perform


sorting in that table only. You can create only one clustered index in a
table like primary key. Clustered index is as same as dictionary where
the data is arranged by alphabetical order.
 In clustered index, index contains pointer to block but not direct data.
Secondary Index or Non-clustered Index:
 The secondary Index in DBMS can be generated by a field which has a
unique value for each record, and it should be a candidate key. It is also
known as a non-clustering index.
 This two-level database indexing technique is used to reduce the
mapping size of the first level. For the first level, a large range of
numbers is selected because of this; the mapping size always remains
small.
Multilevel Indexing:
With the growth of the size of the database, indices also grow. As the index is
stored in the main memory, a single-level index might become too large a size to
store with multiple disk accesses. The multilevel indexing segregates the main
block into various smaller blocks so that the same can stored in a single block.
The outer blocks are divided into inner blocks which in turn are pointed to the
data blocks. This can be easily stored in the main memory with fewer overheads.
B Tree
✓ B-Tree is a self-balancing search tree.
✓ B Tree is a specialized m-way tree that can be widely used for disk access.
✓ A B-Tree of order m can have at most m-1 keys and m children.
✓ One of the main reason of using B tree is its capability to store large number of
keys in a single node and large key values by keeping the height of the tree
relatively small.

Properties of B-Tree
❖ A B-Tree is defined by the term degree or Order ‘M’. The value of M depends
upon disk block size.
❖ Every node in a B-Tree except the root node and the leaf node contain at least
M/2 children.
❖ The root node must have at least 2 child nodes.
❖ All nodes (including root) may contain at most M – 1 keys.
❖ Every node in a B-Tree contains at most M children.
❖ All leaves are at the same level.
❖ All keys of a node are sorted in increasing order. The child between two keys
k1 and k2 contains all keys in the range from k1 and k2.
A B tree of order 4 is shown below

While performing some operations on B Tree, any property of B Tree may violate
such as number of minimum children a node can have. To maintain the properties
of B Tree, the tree may split or join.
Operations
Searching
Insertion
Deletion
Searching
Searching in B Trees is like that in Binary search tree.
For example, if we search for an item 49 in the following B Tree. The process will
something like following :

Compare item 49 with root node 78. since 49 < 78 hence, move to its left sub-tree.
Since, 40<49<56, traverse right sub-tree of 40.
49>45, move to right. Compare 49.
match found, return.
Searching in a B tree depends upon the height of the tree. The search algorithm
takes O(log n) time to search any element in a B tree.
Insertion
Insertions are done at the leaf node level.
The following algorithm needs to be followed in order to insert an item into B
Tree.
✓ Traverse the B Tree in order to find the appropriate leaf node at which the node
can be inserted.
✓ If the leaf node contain less than m-1 keys then insert the element in the
increasing order.
✓ Else, if the leaf node contains m-1 keys, then follow the following steps.
✓ Insert the new element in the increasing order of elements.
✓ Split the node into the two nodes at the median.
✓ Push the median element upto its parent node.
✓ If the parent node also contain m-1 number of keys, then split it too by
following the same steps.
DELETION
✓ Deletion from a B-tree is more complicated than insertion, because we can
delete a key from any node-not just a leaf.
✓ when we delete a key from an internal node, we will have to rearrange the
node’s children.
✓ Just as we had to ensure that a node didn’t get too big due to insertion. we must
ensure that a node doesn’t get too small during deletion.
✓ A simple approach to deletion might have to back up if a node (other than the
root) along the path to where the key is to be deleted has the minimum
number of keys.
✓ Since most of the keys in a B-tree are in the leaves, deletion operations are
most often used to delete keys from leaves.
✓ The recursive delete procedure then acts in one downward pass through the
tree, without having to back up.
✓ When deleting a key in an internal node, however, the procedure makes a
downward pass through the tree but may have to return to the node from
which the key was deleted to replace the key with its predecessor or
successor.
✓ No of children must be according to the ceil of m/2.
After Deleting 6 the tree loos like
After deleting 13 the tree looks like

After deleting 7. Two leaf nodes are added to form the new node
For a B-Tree of order M
✓ Each internal node has up to M-1 keys to search
✓ Each internal node has between M/2 and M children
✓ Depth of B-Tree storing N items is O(log M/2 N)
Run time is:
✓ O(log M) to binary search which branch to take at each node.
But M is small compared to N.
✓ Total time to find an item is O(depth*log M) = O(log N)

The drawback of B-tree is that it stores the data pointer (a pointer to the
disk file block containing the key value), corresponding to a particular
key value, along with that key value in the node of a B-tree.
B+ tree
✓ B+ tree eliminates the above drawback by storing data
pointers only at the leaf nodes of the tree.
✓ The structure of leaf nodes of a B+ tree is quite different
from the structure of internal nodes of the B tree.
✓ The leaf nodes are linked to provide ordered access to the
records.
✓ The leaf nodes form the first level of index, with the
internal nodes forming the other levels of a multilevel
index.
✓ Some of the key values of the leaf nodes also appear in the
internal nodes, to simply act as a medium to control the
searching of a record.
✓ B+ tree ‘a’ and ‘b’, one for the internal nodes and the other
for the external (or leaf) nodes.
Internal Node structure

External Node (Leaf Node)


Using the Pnext pointer it is viable to traverse all the leaf nodes, just like a linked
list, thereby achieving ordered access to the records stored in the disk.

A B+ tree with ‘l’ levels can store more entries in its internal nodes compared to a
B-tree having the same ‘l’ levels. This accentuates the significant improvement
made to the search time for any given key.
Operations on B+ Tree
✓ Searching.
✓ Insertion.
✓ Deletion.
Searching
✓ Searching just like in a binary search tree.
✓ Starts at the root, works down to the leaf level.
✓ Does a comparison of the search value and the current “separation
value”, goes left or right.
✓ Since no structure change in a B+ tree during a searching process.
✓ So just compare the key value with the data in the tree, then give the
result back.
Insertion
✓ A search is first performed, using the value to be added.
✓ After the search is completed, the location for the new value is known.
✓ If the tree is empty, add to the root.
✓ Once the root is full, split the data into 2 leaves, using the root to hold
keys and pointers.
✓ If adding an element will overload a leaf, take the median and split it.
INSERTION

insert 28 into the below 5 ordered tree.

After inserting 28 the tree looks as below.


insert 70 into the below 5 ordered tree.

✓ While we are inserting the value 70 in leaf node 3 it exceeds the


max number of keys (m-1).
✓ We need to split the leaf node by sending the mean value to the root
✓ After updating the tree looks as follows
Deletion
✓ Deletion, like insertion, begins with a search.
✓ When the item to be deleted is located and removed, the tree must be
checked to make sure no rules are violated.
✓ The rule to focus on is to ensure that each node has at least ceil (n/2)
pointers.
Example
Delete 60 from the below tree
✓ After deleting 60 from tree the rules specified for B+ tree are
dissatisfied
✓ To make it again a B+ tree it should be rearranged as follows

The total time complexity of the B+ tree search operation is O(t log tn).
Where O(t) is time complexity for each linear search.
The time complexity of the insertion algorithm is O(t log tn).
Advantages of B+ Trees:
1. Records can be fetched in equal number of disk accesses.
2. Height of the tree remains balanced and less compare to b-tree.
3. We can access the data stored in a B+ tree sequentially as well as
directly.
4. Keys are used for indexing.
5. Faster search queries as the data is stored only on the leaf nodes.
6. Deletion will never be a complexed process since element will always
be deleted from the leaf nodes in B+ trees .whereas in B tree, deletion
of internal nodes are so complicated and time consuming.

Disadvantages of B+ Trees:
 Any search will end at leaf node only.
 Time complexity for every search results in O(h).
 Extra insertion and deletion overhead, space overhead..
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING
(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

17CI09 Data Base Management Systems


Program & Semester: B.Tech & III SEM
CSE
Academic Year: 2021 - 22

UNIT V: Interfacing And Interacting With


NoSQL
Introduction to NoSQL
A NoSQL originally referring to non SQL or non relational is a database that
provides a mechanism for storage and retrieval of data.
NoSQL databases are used in real-time web applications and big data and their use
are increasing over time. NoSQL systems are also sometimes called Not only SQL
to emphasize the fact that they may support SQL-like query languages.
The data structures used by NoSQL databases are different from those used by
default in relational databases which makes some operations faster in NoSQL.
In relational database you need to create the table, define schema, set the data
types of fields etc before you can actually insert the data. In NoSQL you don’t
have to worry about that, you can insert, update data on the go.
There are certain situations where you would prefer relational database over
NoSQL, however when you are dealing with huge amount of data then NoSQL
database is your best choice.
Difference between SQL(RDBMS) and NoSQL.
Type
SQL databases are primarily called as Relational Databases (RDBMS)
NoSQL database are primarily called as non-relational or distributed database.
Language
SQL databases defines and manipulates data based structured query language
(SQL). SQL is one of the most versatile and widely-used options available which
makes it a safe choice especially for great complex queries.
SQL requires you to use predefined schemas to determine the structure of
your data before you work with it. Also all of your data must follow the same
structure.
NoSQL database has dynamic schema for unstructured data. Data is stored in
many ways which means it can be document-oriented, column-oriented, graph-
based or organized as a KeyValue store. This flexibility means that documents
can be created without having defined structure first.
Scalability
In almost all situations SQL databases are vertically scalable. This means that
you can increase the load on a single server by increasing things like RAM, CPU
or SSD.
NoSQL databases are horizontally scalable. This means that you handle more
traffic by sharding, or adding more servers in your NoSQL database
Structure
SQL databases are table-based.
NoSQL databases are either key-value pairs, document-based, graph databases
or wide-column stores.
Property followed
SQL databases follow ACID properties (Atomicity, Consistency, Isolation and
Durability).
NoSQL database follows Brewers CAP theorem (Consistency, Availability and
Partition tolerance).
Support
Great support is available for all SQL database from their vendors. Also a lot of
independent consultations are there who can help you with SQL database for a
very large scale deployments.
NoSQL databases still have to rely on community support and only limited
outside experts are available for setting up and deploying your large scale NoSQL
deployments.
Examples
SQL databases: PostgreSQL, MySQL, Oracle and Microsoft SQL Server.
NoSQL database: Redis, RavenDB Cassandra, MongoDB, BigTable, HBase,
Neo4j and CouchDB.
The CAP Theorem
The CAP theorem is used to makes system designers aware of the trade-offs while
designing networked shared-data systems.
It is very important to understand the CAP theorem as It makes the basics of
choosing any NoSQL database based on the requirements.
CAP theorem states that in networked shared-data systems or distributed systems,
we can only achieve at most two out of three guarantees for a
database: Consistency, Availability and Partition Tolerance.

Consistency
Consistency means that the nodes will have the same copies of a replicated data
item visible for various transactions. A guarantee that every node in a distributed
cluster returns the same, most recent, successful write.
Availability:
Availability means that each read or write request for a data item will either be
processed successfully or will receive a message that the operation cannot be
completed.
Partition tolerance
Partition tolerance means that the system can continue operating if the network
connecting the nodes has a fault that results in two or more partitions, where the
nodes in each partition can only communicate among each other.

The CAP theorem categorizes systems into three categories


CP (Consistent and Partition Tolerant) database:
A CP database delivers consistency and partition tolerance at the expense of
availability. When a partition occurs between any two nodes, the system has to
shut down the non-consistent node (i.e., make it unavailable) until the partition is
resolved.
AP (Available and Partition Tolerant) database
An AP database delivers availability and partition tolerance at the expense of
consistency. When a partition occurs, all nodes remain available but those at the
wrong end of a partition might return an older version of data than others. When
the partition is resolved, the AP databases typically resynchronous the nodes to
repair all inconsistencies in the system.
CA (Consistent and Available) database
A CA delivers consistency and availability in the absence of any network
partition. Often a single node’s DB servers are categorized as CA systems.
Single node DB servers do not need to deal with partition tolerance and are thus
considered CA systems.
In any networked shared-data systems or distributed systems partition tolerance
is a must. Network partitions and dropped messages are a fact of life and must
be handled appropriately.
There are the four main types of NoSQL databases
✓ Document databases
✓ Key-value stores
✓ Column-oriented databases
✓ Graph databases
Key-value stores
Data is stored in key/value pairs. It is designed in such a way to handle lots of
data and heavy load.
Key-value pair storage databases store data as a hash table where each key is
unique, and the value can be a JSON, BLOB(Binary Large Objects), string, etc.
Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases.
Document Databases
A document database stores data in JSON, BSON , or XML documents (not Word
documents or Google docs, of course). In a document database, documents can be
nested. Particular elements can be indexed for faster querying.

Amazon SimpleDB, CouchDB, MongoDB

Graph-Based
A graph type database stores entities as well the relations amongst those entities.
The entity is stored as a node with the relationship as edges. An edge gives a
relationship between nodes. Every node and edge has a unique identifier.
A graph database is a database that is based on graph theory. It consists of a set of
objects, which can be a node or an edge.
✓ Nodes represent entities or instances such as people, businesses, accounts, or
any other item to be tracked.
✓ Edges, also termed graphs or relationships, are the lines that connect nodes to
other nodes; representing the relationship between them. Edges are the key
concept in graph databases.
✓ Properties are information associated to nodes.

Graph base database mostly used for social networks, logistics, spatial data.
Neo4J, Infinite Graph, OrientDB
Column-based
Column-oriented databases work on columns and are based on BigTable
paper by Google. Every column is treated separately. Values of single
column databases are stored contiguously.

HBase, Cassandra, HBase, Hypertable are NoSQL query examples of


column based database.

You might also like