DBMS
DBMS
(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
UNIT I
Module 1: INTRODUCTION
An overview of Database Management
System
A database-management system (DBMS) is a collection of interrelated
data and a set of programs to access those data. The collection of data,
usually referred to as the database.
(or)
The DBMS is a general-purpose software system that facilitates the
processes of defining, constructing, manipulating, and sharing databases
among various users and applications.
•Data Independence
•Efficient data access
•Data Integrity and Security.
•Data Administration.
•Concurrent access and Crash recovery.
•Reduced application development time.
Data Models
A database model shows the logical structure of a database, including
the relationships and constraints that determine how data can be
stored and accessed. Individual database models are designed based
on the rules and concepts of whichever broader data model the
designers adopt. Most data models can be represented by an
accompanying database diagram.
We have different data models
The Entity-Relationship Model
Relational Model
Object oriented data Model
Object Relational Model
Network data Model
Hierarchical data model
Object-relational model
This hybrid database model combines the simplicity of the relational
model with some of the advanced functionality of the object-oriented
database model. In essence, it allows designers to incorporate objects into
the familiar table structure.
Three Schema Architecture
A database system is a collection of interrelated files and a set of programs
that allow users to access and modify these files. It provides users with an
abstract view of the data. That is, the system hides certain details of how
the data are stored and maintained.
Developers hide the complexity from users through several levels of
abstraction, to simplify users’ interactions with the system.
The data in the DBMS is described at three levels of abstraction
Physical Level (Internal Schema)
❖
❖ Logical Level (Conceptual Schema)
❖ View Level (External Schema)
Physical Level
The lowest level of abstraction describes how the data are stored. The
physical level has an internal schema which describes the physical storage
structure of the database.
Logical Level
Logical level describes what data are stored in the database, and what
relationships exist among those data.
Logical level has a conceptual schema, which describes the structure of
the whole database for a community of users.
The conceptual schema hides the details of physical storage structures
and concentrates on describing entities, data types, relationships, user
operations, and constraints.
View level
This is the highest level in data abstraction.
This level includes number of external schemas or user views. Each
external schema describes the part of the database that a particular user
group is interested in and hides the rest of the database from that user
group.
Database Schema and Instances
The description of a database is called the database schema, which is
specified during database design and is not expected to change
frequently.
A schema diagram displays only some aspects of a schema, such as the
names of record types and data items, and some types of constraints.
The data in the database at a particular moment in time is called a
database state or Instance.
Database schema defines the variable declarations in tables that belong
to a particular database, the values of these variables at a particular
moment of time is called data instance.
The distinction between database schema and database state is very
important
The DBMS stores the descriptions of the schema constructs and
constraints—also called the meta-data
Data Independence
Data Independence can be defined as the capacity to change the
schema at one level of a database system without having to change the
schema at the next higher level.
Metadata itself follows a layered architecture, so that when we change
data at one layer, it does not affect the data at another level. This data is
independent but mapped to each other.
There are two types of Data Independence
❖ Logical Data Independence
❖ Physical Data Independence
Logical data independence is the capacity to change the conceptual
schema without having to change external schemas or application
programs.
Physical data independence is the capacity to change the internal
schema without having to change the conceptual schema. Hence, the
external schemas need not be changed as well.
Database Languages
Database languages are used for read, update and store data in a
database. There are several such languages that can be used for this
purpose; one of them is SQL (Structured Query Language).
A database system provides a data definition language to specify the
database schema and a to express database queries and updates. data
manipulation language.
There are three categories of DBMS Languages. Those are
1) Data Definition Language (DDL)
2) Data Manipulation Language (DML)
3) Data Control Language (DCL)
Data Definition Language (DDL)
DDL or Data Definition Language actually consists of the SQL commands
that can be used to define the database schema. It simply deals with
descriptions of the database schema and is used to create and modify the
structure of database objects in the database.
CREATE – is used to create the database or its objects (like table, index,
function, views, store procedure and triggers).
DROP – is used to delete objects from the database.
ALTER-is used to alter the structure of the database.
COMMENT –is used to add comments to the data dictionary.
RENAME –is used to rename an object existing in the database
Data Manipulation Language (DML)
The SQL commands that deals with the manipulation of data present in the
database belong to DML or Data Manipulation Language and this includes
most of the SQL statements.
Examples of DML:
INSERT – is used to insert data into a table.
UPDATE – is used to update existing data within a table.
DELETE– is used to delete records from a database table.
TRUNCATE–is used to remove all records from a table, including all spaces
allocated for the records are removed.
Storage manager
A storage manager is a program module that provides the interface between
the low-level data stored in the database and the application programs and
queries submitted to the system.
Thus, the storage manager is responsible for storing, retrieving, and
updating data in the database. The storage manager components include:
UNIT I
Module 1I: Data Modelling using the Entity
Relationship Model
The database design process can be divided into six (6) steps.
Requirements Analysis
The very first step in designing a database application is to understand.
What type of data is to be stored in the Database.
What applications must be built on top of it.
What type of operations could be performed on Database.
Conceptual Database Design
In this we have to develop a high level description of the data which is gathered
from the requirement analysis step. For that purpose, we have to use ER Model
concept.
Entity Types
Ternary Relationship
Association between three entities are called ternary relationship.
Generalization, Specialization and Aggregation in ER model are used for data
abstraction in which abstraction mechanism is used to hide details of a set of
objects.
Generalization
Generalization is the process of extracting common properties from a set of
entities and create a generalized entity from it. It is a bottom-up approach in
which two or more entities can be generalized to a higher level entity if they have
some attributes in common
Specialization
In specialization, an entity is divided into sub-entities based on their
characteristics. It is a top-down approach where higher level entity is specialized
into two or more lower level entities.
Aggregation
An ER diagram is not capable of representing relationship between an entity and
a relationship which may be required in some scenarios. In those cases, a
relationship with its correspoding entities is aggregated into a higher level entity.
For Example, Employee working for a project may require some machinery. So,
REQUIRE relationship is needed between relationship WORKS_FOR and entity
MACHINERY. Using aggregation, WORKS_FOR relationship with its entities
EMPLOYEE and PROJECT is aggregated into single entity and relationship
REQUIRE is created between aggregated entity and MACHINERY.
Relationship Constraints
The limitations which are imposed on a relationship is called Relationship
Constraints. There are two types of Relationship constraints are there. Those are
1) Cardinality Ratio
2) Participation Constraint
Cardinality Ratio
Cardinality defines the number of entities in one entity set, which can be
associated with the number of entities of other set via relationship set.
One-to-one − One entity from entity set A can be associated with at most one
entity of entity set B and vice versa.
One-to-many − One entity from entity set A can be associated with more than
one entities of entity set B however an entity from entity set B, can be associated
with at most one entity.
Many-to-one − More than one entities from entity set A can be associated with at
most one entity of entity set B, however an entity from entity set B can be
associated with more than one entity from entity set A.
Many-to-many − One entity from A can be associated with more than one entity
from B and vice versa.
Participation Constraints
Participation constraints define the least number of relationship instances in
which an entity must compulsorily participate. There are two types of
participation constraint
Total Participation
❑ It specifies that each entity in the entity set must compulsorily participate in
at least one relationship instance in that relationship set.
❑ That is why, it is also called as mandatory participation.
❑ Total participation is represented using a double line between the entity set
and relationship set.
Example of Total participation
Here,
Double line between the entity set “Student” and relationship set “Enrolled
in” signifies total participation.
It specifies that each student must be enrolled in at least one course.
Partial Participation
It specifies that each entity in the entity set may or may not participate in the
relationship instance in that relationship set.
That is why, it is also called as optional participation.
Partial participation is represented using a single line between the entity set
and relationship set.
Example of partial participation
Here,
Single line between the entity set “Course” and relationship set “Enrolled in”
signifies partial participation.
It specifies that there might exist some courses for which no enrolments are
made.
Relationship between Cardinality and Participation Constraints-
Minimum cardinality tells whether the participation is partial or total.
If minimum cardinality = 0, then it signifies partial participation.
If minimum cardinality = 1, then it signifies total participation.
Employee Table
Dept_id Dept_name
Department Table
Works in Table
Rule-05: For Binary Relationships With Cardinality Ratios
The following four cases are possible
Here, combined table will be drawn for the entity set B and relationship set R.
Case-03: For Binary Relationship With Cardinality Ratio m:1
Here, combined table will be drawn for the entity set A and relationship set R.
Case-04: For Binary Relationship With Cardinality Ratio 1:1
Here, two tables will be required. Either combine ‘R’ with ‘A’ or ‘B’
Way-01:
1. AR ( a1 , a2 , b1 )
2. B ( b1 , b2 )
Way-02:
1. A ( a1 , a2 )
2. BR ( a1 , b1 , b2 )
While determining the minimum number of tables required for binary
relationships with given cardinality ratios, following thumb rules must be kept in
mind-
For binary relationship with cardinality ration m : n , separate and individual
tables will be drawn for each entity set and relationship.
For binary relationship with cardinality ratio either m : 1 or 1 : n , always
remember “many side will consume the relationship” i.e. a combined table
will be drawn for many side entity set and relationship set.
For binary relationship with cardinality ratio 1 : 1 , two tables will be
required. You can combine the relationship set with any one of the entity sets.
If there is a key constraint from both the sides of an entity set with total
participation, then that binary relationship is represented using only single table.
Category or Union
Category represents a single super class or sub class relationship with more
than one super class.
It can be a total or partial participation.
For example: Car booking, Car owner can be a person, a bank (holds a possession
on a Car) or a company. Category (sub class) → Owner is a subset of
the union of the three super classes → Company, Bank, and Person.
A Category member must exist in at least one of its super classes.
Constraints on Specialization & Generalization
There are 2 types of constraints are available on specialization & Generalization.
Those are Disjointness and Completeness.
Disjointness
Disjointness: An entity can be a member of at most one of the subclass
entities.
Non Disjointness: An entity is not a member of anyone of the subclass
entities.
Completeness Constraint:
Total: It specifies that every entity in the super class must be a member
of some sub class. It is indicated by double line.
Partial: It specifies that entity need not belong to any of the subclass. It
is indicated with single line.
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING
(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
UNIT II
Module 1: Relational Data Model and
Language
Relational Data Model Concepts
Codd proposed the Relational data model in 1970. Before 1970 most of the
database systems follows one of the two data models. Those are
✓ Hierarchical Data Model
✓ Network Data Model
The relational model represents the database as a collection of relations.
A relation is thought of as a table of values, each row in the table represents a
collection of related data values.
A row represents a fact that typically corresponds to a real-world entity or
relationship.
The table name and column names are used to help to interpret the meaning
of the values in each row.
Example: Student relationship.
Relational data model is the primary data model, which is used widely around
the world for data storage and processing. This model is simple and it has all the
properties and capabilities required to process data with storage efficiency.
A relation (or relation state) r of the relation schema R(A1, A2, ..., An) can be
denoted by r(R), is a set of n-tuples r = {t1, t2, ..., tm}.
Each n-tuple t is an ordered list of n values t =<v1, v2, ..., vn>, where each
value vi, 1 ≤ i ≤ n, is an element of dom (Ai)
Definition of a relation can be restated more formally using set theory concepts. A
relation (or relation state) r(R) is a mathematical relation of degree n on the
domains dom(A1), dom(A2), ..., dom(An), which is a subset of the Cartesian
product (denoted by ×) of the domains that define R:
r(R) ⊆ (dom(A1) × dom(A2) × ... × dom(An))
The degree (or arity) of a relation is the number of attributes in its
relation schema.
Ex: Student (sid: string, name:string, login:string, age:integer, gpa:real)
Relation Instance
An instance of a relation is set of tuples also called as records, in which each
tuple has a same number of fields as the relation schema.
Domain constraints can be defined as the definition of a valid set of values for an
attribute. The data type of domain includes string, character, integer, time, date,
currency, etc.
If you understand little bit of SQL then you can think of it as a where clause in
SQL, which is used for the same purpose.
Syntax
σ Condition/Predicate (Relation/Table name)
Customer_Id Customer_Name Customer_City
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Example
σ Customer_City="Agra" (CUSTOMER)
Syntax
Customer_Id Customer_Name
C10100 Steve
C10111 Raghu
C10115 Chaitanya
C10117 Ajeet
C10118 Carl
Union Operator (∪)
Union operator is denoted by ∪ symbol and it is used to select all the rows (tuples)
from two tables (relations).
Lets discuss union operator a bit more. Lets say we have two relations R1 and R2
both have same columns and we want to select all the tuples(rows) from these
relations then we can apply the union operator on these relations.
Note: The rows (tuples) that are present in both the tables will only appear once in
the union set. In short you can say that there are no duplicates present after the
union operation.
r ∪ s = { t | t ∈ r or t ∈ s}
Student_Name
Aditya
Carl
Paul
Lucy
Rick
Steve
As you can see there are no duplicate names present in the output even
though we had few common names in both the tables, also in the
COURSE table we had the duplicate name itself.
Set Difference (-)
Set Difference is denoted by – symbol. Lets say we have two relations R1 and
R2 and we want to select all those tuples(rows) that are present in Relation R1
but not present in Relation R2, this can be done using Set difference R1 – R2.
Syntax
table_name1 - table_name2
Example
∏ Student_Name (STUDENT) - ∏ Student_Name (COURSE)
A query to select those student names that are present in STUDENT table but not
present in COURSE table
Student_Name
Carl
Rick
Cartesian product (X)
Cartesian Product is denoted by X symbol.
Lets say we have two relations R1 and R2 then the cartesian product of these two
relations (R1 X R2) would combine each tuple of first relation R1 with the each
tuple of second relation R2.
Syntax
R1 X R2
R1
A B
AA 100
BB 200
CC 300
R2
X Y
XX 99
YY 11
ZZ 101
R1 X R2
A B X Y
AA 100 XX 11
AA 100 YY 99
AA 100 ZZ 101
BB 200 XX 99
BB 200 YY 11
BB 200 ZZ 101
CC 300 XX 99
CC 300 YY 11
CC 300 ZZ 101
Rename (ρ)
Rename (ρ) operation can be used to rename a relation or an attribute of a
relation.
Syntax
ρ(new_relation_name, old_relation_name)
Example
Customer_Id Customer_Name Customer_City
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
ρ(CUST_NAMES, ∏Customer_Name (CUSTOMER))
CUST_NAMES
Steve
Raghu
Chaitanya
Ajeet
Carl
Derived Operators
Intersection Operator (∩)
Intersection operator is denoted by ∩ symbol and it is used to select
common rows (tuples) from two tables (relations).
Lets say we have two relations R1 and R2 both have same columns and
we want to select all those tuples(rows) that are present in both the
relations, then in that case we can apply intersection operation on these
two relations R1 ∩ R2.
Syntax
table_name1 ∩ table_name2
Example
∏ Student_Name (COURSE) ∩ ∏ Student_Name (STUDENT)
Student_Name
Aditya
Steve
Paul
Lucy
JOIN
An SQL Join is used to combine data from two or more tables, based on a
common field between them.
Example
Orders table
Customers table
(INNER) JOIN: Returns records that have matching values in both tables
LEFT (OUTER) JOIN: Returns all records from the left table, and the matched
records from the right table
RIGHT (OUTER) JOIN: Returns all records from the right table, and the
matched records from the left table
FULL (OUTER) JOIN: Returns all records when there is a match in either left or
right table.
Left Outer Join
Returns all records from the left table, and the matched records from the right
table
The result of a left outer join (or simply left join) for
tables Employee and Location always contains all records of the "left" table
(Employee), even if the join-condition does not find any matching record in the
"right" table (Location).
Full Outer Join or Full Join is to retain the nonmatching information by including
nonmatching rows in the results of a join, use a full outer join. It includes all rows
from both tables, regardless of whether or not the other table has a matching value.
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING
(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
UNIT II
Module 1I: Introduction to SQL:
✓ SQL means Structured Query Language.
✓ SQL was called SEQUEL (Structured English QUEry Language) and
was designed and implemented at IBM Research
✓ SQL used to create, storing, manipulating and retrieving the data from
the relational database.
✓ SQL is a comprehensive database language: It has statements for data
definitions, queries, and updates.
✓ The SQL was implemented by Oracle Corporation
Characteristics of SQL
✓ SQL is an ANSI and ISO standard computer language for creating
and manipulating databases.
✓ SQL allows the user to create, update, delete, and retrieve data from a
database.
✓ SQL is very simple and easy to learn.
✓ SQL works with database programs like DB2, Oracle, MS Access,
Sybase, My SQL Server etc.
Advantages of SQL:
➢ High Speed: SQL Queries can be used to retrieve large amounts of records
from a database quickly and efficiently.
➢ Well Defined Standards Exist: SQL databases use long-established standard,
which is being adopted by ANSI & ISO. Non-SQL databases do not adhere
to any clear standard.
➢ No Coding Required: Using standard SQL it is easier to manage database
systems without having to write substantial amount of code.
➢ Emergence of ORDBMS: Previously SQL databases were synonymous with
relational database.
➢ With the emergence of Object-Oriented DBMS, object storage capabilities
are extended to relational databases.
Disadvantages of SQL:
➢ Difficulty in Interfacing: Interfacing an SQL database is more complex than
adding a few lines of code.
➢ More Features Implemented in Proprietary way: Although SQL databases
conform to ANSI & ISO standards, some databases go for proprietary
extensions to standard SQL to ensure vendor lockin.
SQL Data types & Literals:
✓ SQL Data Type is an attribute that specifies the type of data of any object.
✓ Each column, variable and expression has a related data type in SQL. You can
use these data types while creating your tables. You can choose a data type for
a table column based on your requirement.
SQL Server offers six categories of data types for your use which are listed
below
Exact Numeric Data Types
Approximate Numeric Data Types
Date and Time Data Types
Character Strings Data Types
Unicode Character Strings Data Types
Binary Data Types
Exact Numeric Data Types
DATA FROM TO
TYPE
bigint -9,223,372,036,854,775,808 9,223,372,036,854,775,807
int -2,147,483,648 2,147,483,647
smallint -32,768 32,767
tinyint 0 255
bit 0 1
decimal -10^38 +1 10^38 -1
numeric -10^38 +1 10^38 -1
Note − Here, date time has 3.33 milliseconds accuracy where as smalldatetime
has 1 minute accuracy.
Character Strings Data Types
SQL Literals
In SQL, a literal is the same as a constant. There are several types of literals
✓ String
✓ integer
✓ Decimal
✓ datetime literals.
String Literals
String literals are always surrounded by single quotes ('). These string literal
examples contain of strings enclosed in single quotes.
Example:
'TechOnTheNet.com'
'This is a literal'
'XYZ'
'123’
Integer Literals
Integer literals can be either positive numbers or negative numbers, but do not
contain decimals. If you do not specify a sign, then a positive number is
assumed. Here are some
Example:
536
+536
-536
Decimal Literals
Decimal literals can be either positive numbers or negative numbers and contain
decimals. If you do not specify a sign, then a positive number is assumed.
Example:
24.7
+24.7
-24.7
Datetime Literals
Datetime literals are character representations of datetime values that are enclosed
in single quotes.
Examples:
April 30, 2015'
'2015/04/30'
'2015/04/30 08:34:25'
Character Literals
Character literals are character representations of Character values that are
enclosed in single quotes.
Example:
'A'
'2'
'%’
Constant in SQL
As the name implies a constant is a value used in a PL/SQL Block that remains
unchanged throughout the program. A constant is a user-defined literal value. You
can declare a constant and use it instead of actual value.
Example
If you want to write a program which will increase the salary of the employees by
25%, you can declare a constant and use it throughout the program.
Next time when you want to increase the salary again you can change the value of
the constant which will be easier than changing the actual value throughout the
program.
Syntax to declare a constant
constant_name CONSTANT datatype := VALUE;
salary_increase CONSTANT number (3) := 10;
SQL Commands:
❖ DDL (Data Definition Language)
❖ DML (Data Manipulation Language)
❖ DQL (Data Query Language)
❖ DCL (Data Control Language)
❖ TCL(Transactional control commands)
ALTER TABLE - ADD Column: To add a column in a table, use the following
syntax:
Syntax: ALTER TABLE table_name ADD column_name datatype;
Example: The following SQL adds an "Email" column to the "Customers" table:
ALTER TABLE Customers ADD Email varchar(255);
ALTER TABLE - DROP COLUMN:
To delete a column in a table, use the following syntax (notice that some database
systems don't allow deleting a column):
Syntax: ALTER TABLE table_name DROP COLUMN column_name;
Example: The following SQL deletes the "Email" column from the "Customers"
table:
ALTER TABLE Customers DROP COLUMN Email;
UPDATE TABLE
The SQL UPDATE Query is used to modify the existing records in a table. You
can use WHERE clause with UPDATE query to update selected rows otherwise
all the rows would be affected.
Syntax: The basic syntax of UPDATE query with WHERE clause is as follows:
Syntax:
UPDATE table_name
SET column1 = value1, column2 = value2...., columnN = valueN
WHERE [condition];
You can combine N number of conditions using AND or OR operators.
Syntax:
DELETE FROM table_name
WHERE [condition];
You can combine N number of conditions using AND or OR operators.
If you want to DELETE all the records from CUSTOMERS table, you do not
need to use WHERE clause and
DELETE query would be as follows:
Syntax:
SELECT column1, column2, columnN FROM table_name;
If you want to fetch all the fields available in the field, then you can use the
following syntax:
SELECT * FROM table_name;
Example:
SQL> SELECT ID, NAME, SALARY FROM CUSTOMERS;
If you want to fetch all the fields of CUSTOMERS table, then use the following
query:
SQL> SELECT * FROM CUSTOMERS;
DCL (Data Control Language)
Data control commands in SQL allow you to control access to data within the
database. These DCL commands are normally used to create objects related to user
access and also control the distribution of privileges among users. Some data
control commands are as follows:
✓ GRANT
✓ REVOKE
GRANT
It is used to grant various privileges on tables to user. These privileges can be any
combination of select, insert, update, delete, alter, or all.
When we use any DML command like INSERT, UPDATE or DELETE, the
changes made by these commands are not permanent, until the current session is
closed, the changes made by these commands can be rolled back.
To avoid that, we use the COMMIT command to mark the changes as permanent.
SAVE POINT
Save points are used to divide a very lengthy transaction to smaller ones. These
are used to identifies a point in a transaction to which we can roll back later.
If we have used the UPDATE command to make some changes into the database,
and realise that those changes were not required, then we can use the ROLLBACK
command to rollback those changes, if they were not committed using the
COMMIT command.
Example
id name
1 Abhi
2 Adam
4 Alex
INSERT INTO class VALUES(5, 'Rahul');
COMMIT;
UPDATE class SET name = 'Abhijit' WHERE id = '5';
SAVEPOINT A;
INSERT INTO class VALUES(6, 'Chris');
SAVEPOINT B;
INSERT INTO class VALUES(7, 'Bravo');
SAVEPOINT C;
SELECT * FROM class;
5 Abhijit
6 Chris
7 Bravo
ROLLBACK TO B;
SELECT * FROM class;
5 Abhijit
6 Chris
ROLLBACK TO A;
SELECT * FROM class;
5 Abhijit
SQL OPERATORS
An operator is a reserved word, or a character used primarily in an SQL statement's
WHERE clause to perform operations, such as comparisons and arithmetic
operations. Operators are used to specify conditions in an SQL statement and to
serve as conjunctions for multiple conditions in a statement.
❖ Arithmetic operators
❖ Comparison operators
❖ Logical operators
❖ Operators used to negate conditions
SQL Arithmetic Operators
Assume variable a holds 10 and variable b holds 20
Operator Description Example
We often use aggregate functions with the GROUP BY and HAVING clauses of
the SELECT statement. The following are the most used SQL aggregate functions:
For example, if you want to reference all pages in a book that discusses a certain
topic, you first refer to the index, which lists all the topics alphabetically and are
then referred to one or more specific page numbers.
An index helps to speed up SELECT queries and WHERE clauses, but it slows
down data input, with the UPDATE and the INSERT statements. Indexes can be
created or dropped with no effect on the data.
Creating an index involves the CREATE INDEX statement, which allows you to
name the index, to specify the table and which column or columns to index, and
to indicate whether the index is in an ascending or descending order.
Indexes can also be unique, like the UNIQUE constraint, in that the index
prevents duplicate entries in the column or combination of columns on which
there is an index.
Syntax
CREATE INDEX index_name ON table_name;
Single-Column Indexes
A single-column index is created based on only one table column. The basic
syntax is as follows.
CREATE INDEX index_name
ON table_name (column_name);
Unique Indexes
Unique indexes are used not only for performance, but also for data integrity. A
unique index does not allow any duplicate values to be inserted into the table. The
basic syntax is as follows.
CREATE UNIQUE INDEX index_name
on table_name (column_name);
Composite Indexes
A composite index is an index on two or more columns of a table. Its basic syntax
is as follows.
CREATE INDEX index_name
on table_name (column1, column2);
A view can contain all rows of a table or select rows from a table. A view can be
created from one or many tables which depends on the written SQL query to
create a view.
Database views are created using the CREATE VIEW statement. Views can be
created from a single table, multiple tables or another view.
To create a view, a user must have the appropriate system privilege according to
the specific implementation.
This view would be used to have customer name and age from the CUSTOMERS
table.
CREATE VIEW CUSTOMERS_VIEW AS
SELECT name, age
FROM CUSTOMERS;
NESTED QUERIES
A Subquery or Inner query or a Nested query is a query within another SQL query
and embedded within the WHERE clause.
A subquery is used to return data that will be used in the main query as a condition
to further restrict the data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE
statements along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.
1 John 20 US 2000.00
2 26 Dubai 1500.00
3 David 27 Bangkok 2000.00
4 Alina 29 UK 6500.00
5 Kathrin 34 Bangalore 8500.00
6 Harry 42 China 4500.00
7 Jackson 25 Mizoram 10000.00
Example
SELECT *
FROM EMPLOYEE
WHERE ID IN (SELECT ID
FROM EMPLOYEE
WHERE SALARY > 4500);
4 Alina 29 UK 6500.00
5 Kathrin 34 Bangalore 8500.00
7 Jackson 25 Mizoram 10000.00
SQL subquery can also be used with the Insert statement. In the insert statement,
data returned from the subquery is used to insert into another table.
In the subquery, the selected data can be modified with any of the character, date
functions
INSERT INTO table_name (column1, column2, column3....)
SELECT * FROM table_name WHERE VALUE OPERATOR
We can use the following syntax to copy the complete EMPLOYEE table into the
EMPLOYEE_BKP table.
INSERT INTO EMPLOYEE_BKP SELECT * FROM EMPLOYEE
WHERE ID IN (SELECT ID FROM EMPLOYEE);
Select * from EMPLOYEE_BKP ;
Subqueries with the UPDATE Statement
The subquery of SQL can be used in conjunction with the Update statement. When a
subquery is used with the Update statement, then either single or multiple columns in a
table can be updated.
Syntax:
Example
UPDATE EMPLOYEE
SET SALARY = SALARY * 0.25
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 29);
ID NAME AGE ADDRESS SALARY
1 John 20 US 2000.00
2 Stephan 26 Dubai 1500.00
3 David 27 Bangkok 2000.00
4 Alina 29 UK 1625.00
5 Kathrin 34 Bangalore 2125.00
6 Harry 42 China 1125.00
7 Jackson 25 Mizoram 10000.00
The subquery of SQL can be used in conjunction with the Delete statement just
like any other statements mentioned above.
Syntax
1 John 20 US 2000.00
2 Stephan 26 Dubai 1500.00
3 David 27 Bangkok 2000.00
7 Jackson 25 Mizoram 10000.00
SET Operations in SQL
SQL supports few Set operations which can be performed on the table data. These
are used to get meaningful results from data stored in the table, under different
special conditions. Some of the Set operations are
1. UNION
2. UNION ALL
3. INTERSECT
4. MINUS
UNION
UNION is used to combine the results of two or more SELECT statements.
However, it will eliminate duplicate rows from its resultset. In case of union,
number of columns and datatype must be same in both the tables, on which
UNION operation is being applied.
Syntax
ID NAME
1 Jack
2 Harry
3 Jackson
ID NAME
3 Jackson
4 Stephan
5 David
Example
1 Jack
2 Harry
3 Jackson
4 Stephan
5 David
Union All
Union All operation is equal to the Union operation. It returns the set without
removing duplication and sorting the data.
Syntax:
SELECT column_name FROM table1
UNION ALL
SELECT column_name FROM table2;
Example:
SELECT * FROM First
UNION ALL
SELECT * FROM Second;
ID NAME
1 Jack
2 Harry
3 Jackson
3 Jackson
4 Stephan
5 David
Intersect
It is used to combine two SELECT statements. The Intersect operation returns the
common rows from both the SELECT statements.
In the Intersect operation, the number of datatype and columns must be the same.
It has no duplicates and it arranges the data in ascending order by default.
Syntax
SELECT column_name FROM table1
INTERSECT
SELECT column_name FROM table2;
ID NAME
3 Jackson
Minus
It combines the result of two SELECT statements. Minus operator is used to
display the rows which are present in the first query but absent in the second
query.
It has no duplicates and data arranged in ascending order by default.
Syntax:
SELECT column_name FROM table1
MINUS
SELECT column_name FROM table2;
Example
SELECT * FROM First
MINUS
SELECT * FROM Second;
ID NAME
1 Jack
2 Harry
CURSORS
Oracle creates a memory area, known as the context area, for processing an SQL
statement, which contains all the information needed for processing the statement;
for example, the number of rows processed, etc.
A cursor is a pointer to this context area. PL/SQL controls the context area
through a cursor. A cursor holds the rows (one or more) returned by a SQL
statement. The set of rows the cursor holds is referred to as the active set.
You can name a cursor so that it could be referred to in a program to fetch and
process the rows returned by the SQL statement, one at a time. There are two
types of cursors
❖ Implicit cursors
❖ Explicit cursors
Implicit Cursors
Explicit Cursors:
Explicit cursors are programmer-defined cursors for gaining more control over the
context area. An explicit cursor should be defined in the declaration section of the
PL/SQL Block. It is created on a SELECT Statement which returns more than one
row.
Syntax:
CURSOR cursor_name IS select_statement;
Working with an explicit cursor includes the following steps
✓ Declaring the cursor for initializing the memory
✓ Opening the cursor for allocating the memory
✓ Fetching the cursor for retrieving the data
✓ Closing the cursor to release the allocated memory
Declaring the Cursor
Declaring the cursor defines the cursor with a name and the associated SELECT
statement.
CURSOR c_customers IS
SELECT id, name, address FROM customers;
Opening the Cursor
Opening the cursor allocates the memory for the cursor and makes it ready for
fetching the rows returned by the SQL statement into it. For example, we will
open the above defined cursor as follows −
OPEN c_customers;
Fetching the Cursor
Fetching the cursor involves accessing one row at a time. For example, we will
fetch rows from the above opened cursor as follows
FETCH c_customers INTO c_id, c_name, c_addr;
UNIT III
Normalization
Following determine the quality of relation schema design:
✓ Making sure that the semantics of the attributes is clear in the schema
✓ Reducing the redundant information in tuples
✓ Reducing the NULL values in tuples
✓ Disallowing the possibility of generating spurious tuples.
Functional Dependency
A functional dependency, denoted by X → Y, between two sets of attributes X and Y
that are subsets of R specifies a constraint on the possible tuples that can form a
relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that have
t1[X] = t2[X], they must also have t1[Y] = t2[Y].
Reflexivity rule
If α is a set of attributes and β ⊆ α, then α →β holds.
Augmentation rule
If α → β holds and γ is a set of attributes, then γα → γβ holds.
Transitivity rule
If α →β holds and β → γ holds, then α → γ holds.
Union rule
If α → β holds and α → γ holds, then α →βγ holds.
Decomposition rule
If α →βγ holds, then α → β holds and α →γ holds.
Pseudo transitivity rule
If α→β holds and γβ →δ holds, then αγ →δ holds.
Let us apply our rules to the example of schema R = (A, B, C, G, H, I) and the
set F of functional dependencies {A → B, A → C, CG → H, CG → I, B → H}.
For example:
Consider a table with two columns Student_id and Student_Name.
{Student_Id, Student_Name} -> Student_Id is a trivial functional dependency as
Student_Id is a subset of {Student_Id, Student_Name}.
Also, Student_Id -> Student_Id & Student_Name -> Student_Name are trivial
dependencies too.
Non-trivial functional dependency
If a functional dependency X->Y holds true where Y is not a subset of X then this
dependency is called non -trivial Functional dependency.
For example:
An employee table with three attributes: emp_id, emp_name, emp_address. The
following functional dependencies are non-trivial:
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)
Multivalued dependency:
MVD or multivalued dependency means that for a single value of attribute ‘a’
multiple values of attribute ‘b’ exist. Multivalued dependency occurs when there
are more than one independent multivalued attributes in a table.
For example:
Consider a bike manufacture company, which produces two colors (Black and
white) in each model every year.
bike_model manuf_year color
M1001 2007 Black
M1001 2007 Red
M2012 2008 Black
M2012 2008 Red
M2222 2009 Black
M2222 2009 Red
Here columns manuf_year and color are independent of each other and
dependent on bike_model. In this case these two columns are said to be
multivalued dependent on bike_model. These dependencies can be represented
like this:
bike_model ->-> manuf_year
bike_model ->-> color
Transitive dependency:
A functional dependency is said to be transitive if it is indirectly formed by two
functional dependencies.
X -> Z is a transitive dependency if the following three functional dependencies
hold true:
X->Y
Y does not ->X
Y->Z
Example: Let’s take an example to understand it better:
Book Author Author_age
Game of Thrones George R. R. Martin 66
Harry Potter J. K. Rowling 49
Dying of the Light George R. R. Marti 66
{Book} ->{Author} (if we know the book, we knows the author name)
{Author} does not ->{Book}
{Author} -> {Author_age}
Update anomaly:
In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we
have to update the same in two rows or the data will become inconsistent. If
somehow, the correct address gets updated in one department but not in other then
as per the database, Rick would be having two different addresses, which is not
correct and would lead to inconsistent data.
Insert anomaly:
Suppose a new employee joins the company, who is under training and currently
not assigned to any department then we would not be able to insert the data into
the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the
department D890 then deleting the rows that are having emp_dept as D890 would
also delete the information of employee Maggie since she is assigned only to this
department.
To overcome these anomalies, we need to normalize the data.
Normalization
Normalization is a process of organizing the data in database for avoiding data
redundancy which solves the problem of insertion anomaly, update anomaly &
deletion anomaly.
The table is in 1 NF because each attribute has atomic values. However, it is not in
2NF because non-prime attribute teacher_age is dependent on teacher_id alone
which is a proper subset of candidate key. i.e., teacher_id → teacher_age.
This violates the rule for 2NF as the rule says “no non-prime attribute is dependent
on the proper subset of any candidate key of the table”.
Third Normal Form (3NF)
According to Codd’s original definition, a relation schema R is in 3NF if it
satisfies 2NF and no nonprime attribute of R is transitively dependent on the
primary key (it should not have transitive dependencies).
A relation is in 3NF if at least one of the following condition holds in every non-
trivial function dependency X –> Y
✓X is a super key.
✓Y is a prime attribute (each element of Y is part of some candidate key)
emp_distric
emp_id emp_name emp_zip emp_state emp_city
t
To make this table complies with 3NF we have to break the table into two tables to
remove the transitive dependency:
Employee table
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999
Employee_zip
EMP_ID EMP_COUNTRY
264 India
364 UK
264 283
264 300
364 232
364 549
From B→CD we can derive B→C and B→D by using decomposition rule.
Let us consider A→B and B→C. From these two functional dependencies we
can derive A→C by applying Transitivity rule.
From A→B and B→D we can derive A→D.
So, final F+ = {A→BC, B→CD, A→B, A→C, B→C, B→D, A→C, A→D}.
Attribute Closure
Let X→Y be the Functional Dependency. Attribute closure of X denoted
with X+ with respect to F is the set of all attributes A such that X→A can be
inferred using Armstrong Axioms.
Algorithm
Closure= X;
Repeat until there is no change in closure
{
If there is an FD U→V in F
Such that U ⊆ Closure,
Then set Closure= Closure ∪ V.
}
A→BC, B→CD.
Super Key
Let R be the relational schema, and X be the set of attributes over R. If X+
determine all the attributes of R, then X is said to be super key of R.
Problem 1
Given a relation R( A, B, C, D) and Functional Dependency set FD = { AB → CD,
B → C }, determine whether the given R is in 2NF? If not convert it into 2 NF.
Problem 2
Given a relation R( P, Q, R, S, T) and Functional Dependency set FD = { PQ → R,
S → T }, determine whether the given R is in 2NF? If not convert it into 2 NF.
Problem 3
Given a relation R( P, Q, R, S, T, U, V, W, X, Y) and Functional Dependency set FD
= { PQ → R, PS → VW, QS → TU, P → X, W → Y }, determine whether the given
R is in 2NF? If not convert it into 2 NF.
Data Redundancy
Data redundancy in database means that some data fields are repeated in the
database. This data repetition may occur either if a field is repeated in two or more
tables or if the field is repeated within the table.
Disadvantages of data redundancy
1. Increases the size of the database unnecessarily.
2. Cause’s data inconsistency.
3. Decrease’s efficiency of database.
4. May cause data corruption.
Decomposition:
Decomposition is the process of dividing the normal form into tables for remove
the anamoly. Decomposition makes easy to find the data in database. It removes
the inconsistency, duplication. Decomposition means replacing a relation with a
collection of smaller relations.
Properties of Decomposition: While decomposing a relation R into R1, R2… Rn.
The decomposition must satisfy the following properties. Those are
R( A , B , C )
A B C
1 2 1
2 5 3
3 3 3
Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-
R1( A , B ) R2( B , C )
A B B C
1 2 2 1
2 5 5 3
3 3 3 3
The dependency preservation decomposition is another property of decomposed
relational database schema D in which each functional dependency X -> Y
specified in F either appeared directly in one of the relation schemas Ri in the
decomposed D or could be inferred from the dependencies that appear in some Ri.
Step 2.
Prime attribute are those attribute which are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example.
Step 3.
The relation R is in 1st normal form as a relational DBMS does not allow multi-
valued or composite attribute.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is
not proper subset of candidate key AC) and AC->BE is in 2nd normal form (AC is
candidate key) and B->E is in 2nd normal form (B is not a proper subset of
candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super
key nor D is a prime attribute) and in B->E (neither B is a super key nor E is a
prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be super
key or RHS should be prime attribute.
Example
Consider the below schema, “if a company makes a product and an agent is an agent
for that company, then he always sells that product for the company”. Under these
circumstances, the ACP table is shown as:
The relation ACP is again decomposed into 3 relations. Now, the natural Join of
all the three relations will be shown as:
Result of natural join of R1 and R3 over ‘Company’ and then natural join of R13
and R2 over ‘Agent’ and ‘Product’ is given below.
Hence, in this example, all the redundancies are eliminated, and the
decomposition of ACP is a lossless join decomposition. Hence the relation is in
5NF as it does not violate the property of lossless join.
Inclusion Dependencies
It is a statement of the form that some columns of a relation contained in another
column.
Example: Foreign Key.
✓ We should not split groups of attributes that participate in an inclusion
dependency.
For example AB is inclusion dependent on CD that means AB ⊆ CD. While
decomposing the above schema contains AB we should ensure that at least one of
the schema’s obtained in that decomposition contains both A and B.
✓ Most of the inclusion dependencies are key based.
✓ In E-R diagram IS A hierarchies also leads to key-based inclusion
dependencies.
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING
(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
UNIT IV
A transaction is a unit of program execution that accesses and possibly
updates various data items.
A transaction is delimited by statements (or function calls) of the form
begin transaction and end transaction. The transaction consists of all
operations executed between the begin transaction and end
transaction.
Let’s take an example of a simple transaction. Suppose a bank employee transfers
Rs 500 from A's account to B's account. This very simple and small transaction
involves several low-level tasks.
A’s Account
Open_Account (A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account (A)
B’s Account
Open_Account (B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account (B)
ACID Properties
A transaction in a database system must maintain Atomicity, Consistency, Isolation,
and Durability − commonly known as ACID properties − in order to ensure
accuracy, completeness, and data integrity.
Atomicity (A)
This property ensures that either all the operations of a transaction reflect in
database or none.
Example
Suppose Account A has a balance of Rs.400 & B has Rs.700. Account A is
transferring Rs.100 to Account B. This is a transaction that has two operations
Let’s say first operation passed successfully while second failed, in this case A’s
balance would be Rs. 300 while B would be having Rs. 700 instead of Rs. 800.
This is unacceptable in a banking system. Either the transaction should fail without
executing any of the operation or it should process both the operations. The
Atomicity property ensures that.
Consistency (C)
Integrity Constraints should be maintained. So, that the database is consistent
before and after the transaction. To preserve the consistency of database, the
execution of transaction should take place in isolation (that means no other
transaction should run concurrently when there is a transaction already running).
Example
Account A is having a balance of 400 and it is transferring 100 to account B & C
both. We have two transactions here. Let’s say these transactions run concurrently
and both the transactions read 400 balance, in that case the final balance of A
would be 300 instead of 200 .
This is wrong. If the transaction were to run in isolation, then the second
transaction would have read the correct balance 300 (before debiting 100 ) once
the first transaction went successful.
Isolation
Even though multiple transactions may execute concurrently, the system
guarantees that, for every pair of transactions Ti and Tj , it appears to Ti that either
Tj finished execution before Ti started, or Tj started execution after Ti finished.
Thus, each transaction is unaware of other transactions executing concurrently in
the system.
Durability (D)
Once a transaction completes successfully, the changes it has made into the
database should be permanent even if there is a system failure. The recovery
management component of database systems ensures the durability of transaction.
Transaction States
Any transaction at any point of time must be in one of the following states:
Active: The initial state; the transaction stays in this state while it is executing.
Partially committed: After the final statement has been executed.
Failed: After the discovery that normal execution can no longer proceed.
Aborted: After the transaction has been rolled back and the database has been
restored to its state prior to the start of the transaction
Committed: After successful completion.
T1: read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Transaction T2 transfers 10 percent of the balance from account A to account B. It
is defined as
T2: read(A);
temp := A * 0.1;
A := A − temp;
write(A);
read(B);
B := B + temp;
write(B).
Suppose the current values of accounts A and B are $1000 and $2000
Schedule2 Concurrent schedule
Serial Schedules
It is a schedule in which transactions are aligned in such a way that one transaction
is executed first. When the first transaction completes its cycle, then the next
transaction is executed. Transactions are ordered one after the other. This type of
schedule is called a serial schedule, as transactions are executed in a serial manner.
Equivalence Schedules
An equivalence schedule can be of the following types −
✓Result Equivalence
✓View Equivalence
✓Conflict Equivalence
Result Equivalence:
If two schedules produce the same result after execution, they are said to be
result equivalent. They may yield the same result for some value and different
results for another set of values. That's why this equivalence is not generally
considered significant.
View Equivalence
Two schedules would be view equivalence if the transactions in both the
schedules perform similar actions in a similar manner.
For example −
If Ti reads the initial data in S1, then it also reads the initial data in S2.
If Ti reads the value written by J in S1, then it also reads the value written by J
in S2.
If Ti performs the final write on the data value in S1, then it also performs the
final write on the data value in S2.
Conflict Equivalence
Two Operations would be conflicting if they have the following properties –
✓ Both belong to separate transactions.
✓ Both accesses the same data item.
✓ At least one of them is "write" operation.
Example:
Conflicting operations pair (R1(A), W2(A)) because they belong to two
different transactions on same data item A and one of them is write operation.
Similarly,
(W1(A), W2(A)) and (W1(A), R2(A)) pairs are also conflicting.
On the other hand, (R1(A), W2(B)) pair is non-conflicting because they
operate on different data item.
Similarly, ((W1(A), W2(B)) pair is non-conflicting.
Note − View equivalent schedules are view serializable and conflict equivalent
schedules are conflict serializable. All conflict serializable schedules are view
serializable too.
Serializability
A schedule is said to be serializable schedule if it is equivalent to the any one of
the serial schedule.
Conflict Serializable Schedules
A schedule S is said to be conflict serializable schedule if it is conflict equivalent
to the any one of the serial schedule.
Here S, S12 are not following the same order for conflicting operation pairs. S
Now compare S with <T2 T1> for conflict equivalence.
Here S, S21 are not following the same order for conflicting operation pairs.
So, the given schedule S is not a conflict serializable schedule.
S: R1(A)W1(A)R2(A)W2(A)R1(B)W1(B)R2(B)W2(B)
View Serializable Schedules
A schedule S is said to be view Serializable schedule if it is view equivalent to
the any one of the serial schedule.
Step-1: Generate all possible serial schedules for the given schedule. If a schedule
contains n transactions, then possible number of serial schedules are n!
Step-2: Now compare each serial schedule with given schedule for view
equivalence. If any serial schedule is view equivalent with the given
schedule, then it is view Serializable schedule otherwise it is not a view
Serializable schedule.
Example:
Check whether the schedule is view Serializable or not?
S: R2 (B); R2 (A); R1 (A); R3 (A); W1 (B); W2 (B); W3 (B);
Solution: With 3 transactions, total number of schedules possible = 3! = 6
<T1 T2 T3>
<T1 T3 T2>
<T2 T3 T1>
<T2 T1 T3>
<T3 T1 T2>
<T3 T2 T1>
Step 1: Initial Read
A: T2
B: T2
Remaining Schedules:
< T2 T1 T3>
<T2 T3 T1 >
Failure Classification
There are various types of failure that may occur in a system, each of which needs
to be dealt with in a different manner. The failures are classified as follows
✓ Transaction failure.
✓ System crash.
✓ Disk failure.
Transaction failure
There are two types of errors that may cause a transaction to fail:
Logical error
The transaction can no longer continue with its normal execution because of some
internal condition, such as bad input, data not found, overflow, or resource limit
exceeded.
System error
The system has entered an undesirable state (for example, deadlock), as a result of
which a transaction cannot continue with its normal execution. The transaction,
however, can be executed later.
System crash
There is a hardware malfunction, or a bug in the database software or the
operating system, that causes the loss of the content of volatile storage and brings
transaction processing to a halt. The content of non-volatile storage remains intact
and is not corrupted.
Disk failure
✓ A disk block loses its content as a result of either a head crash or failure during
a data transfer operation.
To determine how the system should recover from failures, we need to identify
the failure modes of those devices used for storing data. Next, we must consider
how these failure modes affect the contents of the database.
Storage Structure
The storage structure can be divided into two categories
Volatile storage
✓ As the name suggests, a volatile storage cannot survive system crashes.
Examples
main memory ,RAM and cache memory are examples of volatile storage.
Non-volatile storage
✓ These memories are made to survive system crashes. They are huge in data
storage capacity, but slower in accessibility.
Examples:
Harddisks, magnetic tapes, flash memory, and non-volatile (battery backed up)
ROM.
Recovery and Atomicity
✓ When a system crashes, it may have several transactions being executed and
various files opened for them to modify the data items.
✓ According to atomicity of transactions must be maintained, that is, either all
the operations are executed or none.
❖ It should check the states of all the transactions, which were being executed.
❖ A transaction may be in the middle of some operation; the DBMS must ensure
the atomicity of the transaction in this case.
❖ It should check whether the transaction can be completed now or it needs to be
rolled back.
❖ No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well
as maintaining the atomicity of a transaction
➢ Maintaining the logs of each transaction and writing them onto some stable
storage before modifying the database.
➢ Maintaining shadow paging, where the changes are done on a volatile
memory, and later, the actual database is updated.
Log-based Recovery
Log maintains the records of actions performed by a transaction. It is important
that the logs are written prior to the actual modification and stored on a stable
storage media, which is failsafe.
When a transaction enters the system and starts execution, it writes a log about
it.
<Tn, Start>
When the transaction modifies an item X, it write logs as follows −
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V1 to V2.
When the transaction finishes, it logs −
<Tn, commit>
If all the operations in schedule are successful, then all the writes are deferred to
partial commit otherwise no writes are deferred to partial commit.
✓ In this method write operation does not need old record i.e., (write, T, x, new
value).
✓ (Abort, T) entry never used in this method.
✓ In this method if any system crash occurs because of any transaction failure then
the schedule looks up into log file and reads all the writes until (commit, T)
entry is found. If the entry is found write all (write, T) entries into the database
otherwise redo all the operations. If (commit, T) entry is not found nothing to be
done.
✓ This method is also called as (No-undo) / redo recovery scheme.
✓ Redo must be done in a order.
Example:
(Start,T0)
(Write,T0,a,9)
Crash (Nothing is to be done because no commit found)
(commit,T0)
(start, T1)
(write,T1,a,7)
Crash (T0 committed so redo T0)
(commit,T1)
Crash (T0, T1 are committed so redo T0,T1)
Example:
(Start, T0)
(Write, T0, A, 3, 9)
Crash (Undo T0 so A=3)
(Commit, T0)
(Start, T1)
(Write, T1, B, 2, 5)
(Write, T1, a, 9, 7)
Crash (undo – T1 and Redo T0 so B=2, A=9)
(Commit, T1)
Crash (Redo T0, T1 so B=5, A=7)
Recovery with concurrent Transactions
When more than one transaction are being executed in parallel, the logs are
interleaved. At the time of recovery, it would become hard for the recovery system
to backtrack all logs, and then start recovering. To ease this situation, most modern
DBMS use the concept of 'checkpoints’.
Checkpoint
✓ Checkpoint declares a point before which the DBMS was in consistent state,
and all the transactions were committed.
✓ Checkpoint is a mechanism where all the previous logs are removed from the
system and stored permanently in a storage disk.
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the
following manner
✓ The recovery system reads the logs backwards from the end to the last
checkpoint.
✓ It maintains two lists, an undo-list and a redo-list.
✓ If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just
<Tn, Commit>, it puts the transaction in the redo-list.
✓ If the recovery system sees a log with <Tn, Start> but no commit or abort log
found, it puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed.
All the transactions in the redo-list and their previous logs are removed and then
redone before saving their logs.
Example:
(Start, T1)
(Write, T1, B, 2, 3)
(Start, T2)
(Commit, T1)
(Write, T2, C, 5, 7)
(Commit, T2)
(Checkpoint, {T2})
(Start, t3)
(Write, T3, A, 1, 9)
(Commit, T3)
(Start, T4)
(Write, T4, C, 7, 2)
In the above example Undo list is – T4
Redo list is – T2,T3
First it performs Undo list in the order i.e., T4 and then it performs Redo T2,T3.
Deadlock Handling
Deadlock is a state of a database system having two or more transactions, when
each transaction is waiting for a data item that is being locked by some other
transaction.
A deadlock can be indicated by a cycle in the wait-for-graph. This is a directed
graph in which the vertices denote transactions and the edges denote waits for
data items.
Example
Transaction T1 is waiting for data item X which is locked by T3. T3 is waiting for
Y which is locked by T2 and T2 is waiting for Z which is locked by T1. Hence, a
waiting cycle is formed, and none of the transactions can proceed executing.
T1
X
Z
T2 T3
Y
There are three classical approaches for deadlock handling, namely
❖ Deadlock prevention.
❖ Deadlock avoidance.
❖ Deadlock detection and removal.
All of the three approaches can be incorporated in both a centralized and a
distributed database system.
Deadlock Prevention
The deadlock prevention approach does not allow any transaction to acquire
locks that will lead to deadlock.
The convention is that when more than one transactions request for locking the
same data item, only one of them is granted the lock.
One of the most popular deadlock prevention methods is pre-acquisition of all
the locks.
In this method, a transaction acquires all the locks before starting to execute
and retains the locks for the entire duration of transaction.
Using this approach, the system is prevented from being deadlocked since
none of the waiting transactions are holding any lock.
There are two algorithms for this purpose, namely wait-die and wound-wait.
Let us assume that there are two transactions, T1 and T2, where T1 tries to lock a
data item which is already locked by T2. The algorithms are as follows
Wait-Die
If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is younger than T2,
T1 is aborted and later restarted.
Wound-Wait
If T1 is older than T2, T2 is aborted and later restarted. Otherwise, if T1 is
younger than T2, T1 is allowed to wait.
Deadlock Avoidance
✓ The deadlock avoidance approach handles deadlocks before they occur. It
analyzes the transactions and the locks to determine whether waiting leads to a
deadlock or not .
✓ When a transaction requests a lock on data item The lock manager checks
whether the lock is available. If it is available, the lock manager allocates the
data item and the transaction acquires the lock.
✓ If the item is locked by some other transaction in incompatible mode, the lock
manager runs an algorithm to test whether keeping the transaction in waiting
state will cause a deadlock or not.
Deadlock Detection and Removal
The deadlock detection and removal approach runs a deadlock detection
algorithm periodically and removes deadlock in case there is one.
To detect deadlocks, the lock manager periodically checks if the wait-for graph
has cycles.
Shared Lock: When we take this lock we can just read the item but cannot write.
Exclusive Lock: In this type of lock we can write as well as read the data item.
Below table will give clear idea about what we can do and cannot while having
shared or exclusive lock.
In general we have 2 types of locking protocols. Those are
❖ Simple locking protocol
❖ 2 – Phase locking Protocol
In 2 PL we have seen:
Serializability: It is guaranteed to happen.
Cascading Rollback: It is possible which is bad.
Deadlock: It is possible.
The Venn Diagram below shows the classification of schedules that are rigorous
strict and conservative.
Example 1:
Now we will see what is the above schedule following the properties discussed above
1. If R_TS(X) > TS(T), then abort and roll back T and reject the operation.
2. If W_TS(X) > TS(T), then don’t execute the Write Operation and continue
processing. This is a case of Outdated or Obsolete Writes. Remember, outdated
writes are ignored in Thomas Write Rule but a Transaction following Basic TO
protocol will abort such a Transaction.
3. If neither the condition in 1 or 2 occurs, then and only then execute the
W_item(X) operation of T and set W_TS(X) to TS(T)
Validation Based Protocol (Optimistic Concurrency)
validation based protocol executes transaction in three phases:
Read phase
In this phase, the transaction T is read and executed. It is used to read the value of
various data items and stores them in temporary local variables. It can perform all
the write operations on temporary variables without an update to the actual
database.
Validation phase
In this phase, the temporary variable value will be validated against the actual data
to see if it violates the serializability.
Write phase
If the validation of the transaction is validated, then the temporary results are
written to the database or system otherwise the transaction is rolled back.
Here each phase has the following different timestamps:
Start(Ti): It contains the time when Ti started its execution.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its
validation phase.
Finish(Ti): It contains the time when Ti finishes its write phase.
✓ This protocol is used to determine the time stamp for the transaction for
serialization using the time stamp of the validation phase, as it is the actual
phase which determines if the transaction will commit or rollback.
✓ Hence TS(T) = validation(T).
✓ The serializability is determined during the validation process. It can't be
decided in advance.
✓ While executing the transaction, it ensures a greater degree of concurrency and
also less number of conflicts.
✓ Thus it contains transactions which have less number of rollbacks.
Multiple Granularity
Granularity: It is the size of data item allowed to lock.
✓ Multiple Granularity can be defined as hierarchically breaking up the database
into blocks which can be locked.
✓ The Multiple Granularity protocol enhances concurrency and reduces lock
overhead.
✓ It maintains the track of what to lock and how to lock.
✓ It makes easy to decide either to lock a data item or to unlock a data item. This
type of hierarchy can be graphically represented as a tree.
Example
✓ Consider a tree which has four levels of nodes.
✓ The first level or higher level shows the entire database.
✓ The second level represents a node of type area. The higher level database
consists of exactly these areas.
✓ The area consists of children nodes which are known as files. No file can be
present in more than one area.
✓ Finally, each file contains child nodes known as records. The file has exactly
those records that are its child nodes. No records represent in more than one
file.
Hence, the levels of the tree starting from the top level are as follows:
1.Database
2.Area
3.File
4.Record
There are three additional lock modes with multiple granularity:
Intention Mode Lock
Intention-shared (IS): It contains explicit locking at a lower level of the tree but
only with shared locks.
Intention-Exclusive (IX): It contains explicit locking at a lower level with
exclusive or shared locks.
Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared
mode, and some node is locked in exclusive mode by the same transaction.
Compatibility Matrix
It uses the intention lock modes to ensure serializability. It requires that if a
transaction attempts to lock a node, then that node must follow these protocols:
In multiple-granularity, the locks are acquired in top-down order, and locks must
be released in bottom-up order.
Algorithm for Recovery and Isolation Exploiting Semantics (ARIES)
Analysis
✓ The analysis step identifies the dirty (updated) pages in the buffer and the set of
transactions active at the time of the crash.
✓ The appropriate point in the log where the REDO operation should start is also
determined
REDO
✓ The REDO phase updates only committed transactions from the log to the
database.
✓ ARIES will have information which provides the start point for REDO
✓ Information stored by ARIES and in the data pages will allow ARIES to determine
whether the operation to be redone had been applied to the database.
✓ Thus only the necessary REDO operations are applied during recovery.
UNDO
✓ During the UNDO phase, the log is scanned backwards and the operations of
transactions that were active at the time of the crash are undone in reverse
order.
✓ The information needed for ARIES to accomplish its recovery procedure
includes the log, the Transaction Table, and the Dirty Page Table.
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING
(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
✓ Data redundancy, although taking up extra space, adds to disk reliability. This
means, in case of disk failure, if the same data is also backed up onto another
disk, we can retrieve the data and go on with the operation.
✓ If the data is spread across just multiple disks without the RAID technique, the
loss of a single disk can affect the entire data.
✓ RAID is very transparent to the underlying system.
✓ To the host system, it appears as a single big disk presenting itself as a linear
array of blocks. This allows older technologies to be replaced by RAID
without making too many changes in the existing code.
✓ RAID System is evaluated by Reliability, Availability, Performance, Capacity.
✓ There are 7 levels of RAID schemes. These schemas are as RAID 0, RAID 1,
...., RAID 6.
RAID-0 (Striping)
✓ RAID level 0 provides data stripping, i.e., a data can place across multiple
disk.
✓ Instead of placing just one block into a disk at a time, we can work with two
(or more) blocks placed into a disk before moving on to the next one.
Evaluation
Reliability: 0
✓ There is no duplication of data. Hence, a block once lost cannot be recovered.
Capacity: N*B
✓ The entire space is being used to store data. Since there is no duplication, N disks
each having B blocks are fully utilized.
RAID-1 (Mirroring)
✓ More than one copy of each block is stored in a separate disk. Thus, every block
has two (or more) copies, lying on different disks.
Evaluation:
✓ In this level, data is regenerated using parity drive.
✓ It contains high data transfer rates.
✓ In this level, data is accessed in parallel.
✓ It required an additional drive for parity.
✓ It gives a slow performance for operating on small sized files.
Evaluation:
✓ Reliability: 1
RAID-4 allows recovery of at most 1 disk failure (because of the way parity
works). If more than one disk fails, there is no way to recover the data.
✓ Capacity: (N-1)*B
One disk in the system is reserved for storing the parity. Hence, (N-1) disks are
made available for data storage, each disk having B blocks.
RAID-5 (Block-Level Striping with Distributed Parity)
This is a slight modification of the RAID-4 system where the only difference is
that the parity rotates among the drives.
Fixed-Length Records
All records on the page are guaranteed to be of the same length, record slots arc
uniform and can be arranged consecutively within a page.
When a record is inserted into the page, we must locate an empty slot and place the
record.
For inserting a record into empty slot we must find where we have a empty slot it
can be done in two ways
The first alternative is to store records in the first N. whenever a record is deleted,
we move the last record on the page into the vacated slot.
➢ This format allows us to locate the ith record on a page by a simple offset
calculation, and all empty slots appear together at the end of the page.
➢ This approach docs not work if there are external references to the record
The second alternative is to handle deletions by using an array of bits, one per
slot, to keep track of free slot information.
Locating records on the page requires scanning the bit array to find slots
whose bit is on; when a record is deleted, its bit is turned off.
Variable-Length Records
In variable length record type we cannot divide the page into a fixed collection
of slots. The problem is that, when a new record is to be inserted, we have to
find an empty slot of just the right length.
When a record is deleted, we must move records to fill the hole created by the
deletion, to ensure that all the free space on the page is contiguous
The most flexible organization for variable-length records is to maintain a
directory of slots for each page, with a (record offset (Location), record
length(size)) pair per slot.
The first component (record offset) is a 'pointer' to the record.
Deletion is done by setting the record offset to -1.
Records can be moved around on the page because the rid, which is the page
number and slot number does not change when the record is moved; only the
record offset stored in the slot changes.
RECORD FORMATS
While choosing a way to organize the fields of a record, we must consider
whether the fields of the record are of fixed or variable length and consider the
cost of various operations on the record.
Information common to all records of a given record type (such as the number
of fields and field types) is stored in the system catalog, which can be thought of
as a description of the contents of a database, maintained by the DBMS.
Fixed-Length Records
In a fixed-length record, each field has a fixed length, and the number of fields
is also fixed.
The fields of such a record can be stored consecutively, and given the address of
the record, the address of a particular field can be calculated using information
about the lengths of preceding fields.
Variable-Length Records
In the relational model, every record in a relation contains the same number of
fields. If the number of fields is fixed, a record is of variable length only
because some of its fields are of variable length.
One possible organization is to store fields consecutively, separated by
delimiters . This organization requires a scan of the record to locate a desired
field.
Another possible organization is to reserve some space at the beginning of a
record for use as an array of integer offsets.
The ith integer in this array is the starting address of the ith field value relative
to the start of the record.
we also store an offset to the end of the record; this offset is needed to
recognize where the last held ends.
File Organization
A database consist of a huge amount of data. The data is grouped within a table in
RDBMS, and each table have related records. A user can see that the data is stored
in form of tables, but in actual this huge amount of data is stored in physical
memory in form of files.
File : A file is named collection of related information that is recorded on
secondary storage such as magnetic disks, magnetic tables and optical
disks.
File Organization refers to the logical relationships among various records that
constitute the file, particularly with respect to the means of identification and
access to any specific record.
Closed hashing
In Closed hashing method, a new data bucket is allocated with same address and is
linked it after the full data bucket. This method is also known as overflow
chaining.
Quadratic probing
Quadratic probing is an open-addressing scheme where we look for i2‘th slot in
i’th iteration if the given hash value x collides in the hash table.
How Quadratic Probing is done?
Let hash(x) be the slot index computed using the hash function.
If the slot hash(x) % S is full, then we try (hash(x) + 1*1) % S.
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S.
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S.
This process is repeated for all the values of i until an empty slot is found
Double hashing
Double hashing is a collision resolving technique in Open Addressed Hash tables.
Double hashing uses the idea of applying a second hash function to key when a
collision occurs.
Bucket Splitting:
When the number of elements in a bucket exceeds a particular size, then the
bucket is split into two parts.
Directory Expansion
Directory Expansion Takes place when a bucket overflows. Directory Expansion
is performed when the local depth of the overflowing bucket is equal to the global
depth.
Example of hashing the following elements: 16,4,6,22,24,10,31,7,9,20,26.
Bucket Size: 3 (Assume)
Hash Function: Suppose the global depth is X. Then the Hash Function returns X
LSBs.
16- 10000 , 4- 00100 , 6- 00110 , 22- 10110 , 24- 11000 , 10- 01010 ,
31- 11111, 7- 00111, 9- 01001 , 20- 10100 , 26- 11010
The drawback of static hashing is that that it does not expand or shrink
dynamically as the size of the database grows or shrinks
Dynamic Hashing
In Dynamic hashing, data buckets grows or shrinks as the records increases or
decreases. Dynamic hashing is also known as extended hashing.
Advantages
Tree traversal is easier and faster.
Searching becomes easy as all records are stored only in leaf nodes and are sorted
sequential linked list.
There is no restriction on B+ tree size. It may grows/shrink as the size of data
increases/decreases.
Disadvantages
Inefficient for static tables.
Cluster File Organization
In cluster file organization, two or more related tables/records are stored withing
same file known as clusters.
These files will have two or more tables in the same data block and the key
attributes which are used to map these table together are stored only once.
Thus it lowers the cost of searching and retrieving various records in different files
as they are now combined and kept in a single cluster.
For example we have two tables or relation Employee and Department. These table
are related to each other
These table are allowed to combine using a join operation and can be seen in a
cluster file.
If we must insert, update or delete any record we can directly do so. Data is sorted
based on the primary key or the key with which searching is done. Cluster key is
the key with which joining of the table is performed.
Primary indexing
It is defined mainly on the primary key of the data-file, in which the data-
file is already ordered based on the primary key.
Primary Index is an ordered file whose records are of fixed length with
two fields. The first field of the index replicates the primary key of the
data file in an ordered manner, and the second field of the ordered file
contains a pointer that points to the data-block where a record
containing the key is available.
The first record of each block is called the Anchor record or Block
anchor. There exists a record in the primary index file for every block of
the data-file.
The average number of blocks using the Primary Index is = log2B + 1,
where B is the number of index blocks.
Clustered index:
Clustered index is created only when both the following conditions satisfy
1. The data or file, that you are moving into secondary memory should be
in sequential or sorted order.
2. There should be non key value, meaning it can have repeated values.
Properties of B-Tree
❖ A B-Tree is defined by the term degree or Order ‘M’. The value of M depends
upon disk block size.
❖ Every node in a B-Tree except the root node and the leaf node contain at least
M/2 children.
❖ The root node must have at least 2 child nodes.
❖ All nodes (including root) may contain at most M – 1 keys.
❖ Every node in a B-Tree contains at most M children.
❖ All leaves are at the same level.
❖ All keys of a node are sorted in increasing order. The child between two keys
k1 and k2 contains all keys in the range from k1 and k2.
A B tree of order 4 is shown below
While performing some operations on B Tree, any property of B Tree may violate
such as number of minimum children a node can have. To maintain the properties
of B Tree, the tree may split or join.
Operations
Searching
Insertion
Deletion
Searching
Searching in B Trees is like that in Binary search tree.
For example, if we search for an item 49 in the following B Tree. The process will
something like following :
Compare item 49 with root node 78. since 49 < 78 hence, move to its left sub-tree.
Since, 40<49<56, traverse right sub-tree of 40.
49>45, move to right. Compare 49.
match found, return.
Searching in a B tree depends upon the height of the tree. The search algorithm
takes O(log n) time to search any element in a B tree.
Insertion
Insertions are done at the leaf node level.
The following algorithm needs to be followed in order to insert an item into B
Tree.
✓ Traverse the B Tree in order to find the appropriate leaf node at which the node
can be inserted.
✓ If the leaf node contain less than m-1 keys then insert the element in the
increasing order.
✓ Else, if the leaf node contains m-1 keys, then follow the following steps.
✓ Insert the new element in the increasing order of elements.
✓ Split the node into the two nodes at the median.
✓ Push the median element upto its parent node.
✓ If the parent node also contain m-1 number of keys, then split it too by
following the same steps.
DELETION
✓ Deletion from a B-tree is more complicated than insertion, because we can
delete a key from any node-not just a leaf.
✓ when we delete a key from an internal node, we will have to rearrange the
node’s children.
✓ Just as we had to ensure that a node didn’t get too big due to insertion. we must
ensure that a node doesn’t get too small during deletion.
✓ A simple approach to deletion might have to back up if a node (other than the
root) along the path to where the key is to be deleted has the minimum
number of keys.
✓ Since most of the keys in a B-tree are in the leaves, deletion operations are
most often used to delete keys from leaves.
✓ The recursive delete procedure then acts in one downward pass through the
tree, without having to back up.
✓ When deleting a key in an internal node, however, the procedure makes a
downward pass through the tree but may have to return to the node from
which the key was deleted to replace the key with its predecessor or
successor.
✓ No of children must be according to the ceil of m/2.
After Deleting 6 the tree loos like
After deleting 13 the tree looks like
After deleting 7. Two leaf nodes are added to form the new node
For a B-Tree of order M
✓ Each internal node has up to M-1 keys to search
✓ Each internal node has between M/2 and M children
✓ Depth of B-Tree storing N items is O(log M/2 N)
Run time is:
✓ O(log M) to binary search which branch to take at each node.
But M is small compared to N.
✓ Total time to find an item is O(depth*log M) = O(log N)
The drawback of B-tree is that it stores the data pointer (a pointer to the
disk file block containing the key value), corresponding to a particular
key value, along with that key value in the node of a B-tree.
B+ tree
✓ B+ tree eliminates the above drawback by storing data
pointers only at the leaf nodes of the tree.
✓ The structure of leaf nodes of a B+ tree is quite different
from the structure of internal nodes of the B tree.
✓ The leaf nodes are linked to provide ordered access to the
records.
✓ The leaf nodes form the first level of index, with the
internal nodes forming the other levels of a multilevel
index.
✓ Some of the key values of the leaf nodes also appear in the
internal nodes, to simply act as a medium to control the
searching of a record.
✓ B+ tree ‘a’ and ‘b’, one for the internal nodes and the other
for the external (or leaf) nodes.
Internal Node structure
A B+ tree with ‘l’ levels can store more entries in its internal nodes compared to a
B-tree having the same ‘l’ levels. This accentuates the significant improvement
made to the search time for any given key.
Operations on B+ Tree
✓ Searching.
✓ Insertion.
✓ Deletion.
Searching
✓ Searching just like in a binary search tree.
✓ Starts at the root, works down to the leaf level.
✓ Does a comparison of the search value and the current “separation
value”, goes left or right.
✓ Since no structure change in a B+ tree during a searching process.
✓ So just compare the key value with the data in the tree, then give the
result back.
Insertion
✓ A search is first performed, using the value to be added.
✓ After the search is completed, the location for the new value is known.
✓ If the tree is empty, add to the root.
✓ Once the root is full, split the data into 2 leaves, using the root to hold
keys and pointers.
✓ If adding an element will overload a leaf, take the median and split it.
INSERTION
The total time complexity of the B+ tree search operation is O(t log tn).
Where O(t) is time complexity for each linear search.
The time complexity of the insertion algorithm is O(t log tn).
Advantages of B+ Trees:
1. Records can be fetched in equal number of disk accesses.
2. Height of the tree remains balanced and less compare to b-tree.
3. We can access the data stored in a B+ tree sequentially as well as
directly.
4. Keys are used for indexing.
5. Faster search queries as the data is stored only on the leaf nodes.
6. Deletion will never be a complexed process since element will always
be deleted from the leaf nodes in B+ trees .whereas in B tree, deletion
of internal nodes are so complicated and time consuming.
Disadvantages of B+ Trees:
Any search will end at leaf node only.
Time complexity for every search results in O(h).
Extra insertion and deletion overhead, space overhead..
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING
(AUTONOMOUS)
Accredited by NAAC & NBA (Under Tier - I) ISO 9001:2015 Certified Institution
Approved by AICTE, New Delhi. and Affiliated to JNTUK, Kakinada
L.B. REDDY NAGAR, MYLAVARAM, KRISHNA DIST., A.P.-521 230.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Consistency
Consistency means that the nodes will have the same copies of a replicated data
item visible for various transactions. A guarantee that every node in a distributed
cluster returns the same, most recent, successful write.
Availability:
Availability means that each read or write request for a data item will either be
processed successfully or will receive a message that the operation cannot be
completed.
Partition tolerance
Partition tolerance means that the system can continue operating if the network
connecting the nodes has a fault that results in two or more partitions, where the
nodes in each partition can only communicate among each other.
Graph-Based
A graph type database stores entities as well the relations amongst those entities.
The entity is stored as a node with the relationship as edges. An edge gives a
relationship between nodes. Every node and edge has a unique identifier.
A graph database is a database that is based on graph theory. It consists of a set of
objects, which can be a node or an edge.
✓ Nodes represent entities or instances such as people, businesses, accounts, or
any other item to be tracked.
✓ Edges, also termed graphs or relationships, are the lines that connect nodes to
other nodes; representing the relationship between them. Edges are the key
concept in graph databases.
✓ Properties are information associated to nodes.
Graph base database mostly used for social networks, logistics, spatial data.
Neo4J, Infinite Graph, OrientDB
Column-based
Column-oriented databases work on columns and are based on BigTable
paper by Google. Every column is treated separately. Values of single
column databases are stored contiguously.