KEMBAR78
Hnd2 Database | PDF | Databases | Computers
0% found this document useful (0 votes)
19 views76 pages

Hnd2 Database

Must read

Uploaded by

debramunroe49
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views76 pages

Hnd2 Database

Must read

Uploaded by

debramunroe49
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

2.2.3 Advantages of Triggers ..................................................

28
Introduction to
2.2.4 Disadvantages of Triggers .............................................. 29
DATABASE MODELLING & 2.3 A Database Report ................................................................ 29

DESIGN 2.4 Unified Modeling Language .................................................. 32


2.4.1 Object Oriented Concepts Used in UML......................... 33
2.4.2 Structural UML Diagrams .............................................. 34
CHAPTER THREE ........................................................................ 36
Storage System in DBMS ............................................................... 36
Table of Contents
3.1 Types of Data Storage ........................................................... 36
3.1.1 Primary Storage.............................................................. 37
CHAPTER ONE............................................................................... 1
3.1.2 Secondary Storage .......................................................... 38
Object-Oriented Database Management System ................................ 1
3.1.3 Tertiary Storage.............................................................. 39
1.1 Components of Object-Oriented Data Model: .......................... 1
3.3 File Organization Storage ...................................................... 43
1.1.1 Object Structure: .............................................................. 1
3.3.1 Slotted-page Structure .................................................... 45
1.1.2. Object Classes: ................................................................ 3
3.3.2 Object storage ................................................................ 47
1.2 Relational Database and Object-Oriented ................................ 4
3.3.3 Storage in OODBs.......................................................... 47
1.3 Object Oriented Analysis and Design ...................................... 5
3.3.4 Hash File Organization ................................................... 49
1.4 Advantages of object-oriented databases ............................... 11
3.3.5 Multiple Key Access ...................................................... 58
1.5 Disadvantages of object-oriented databases ........................... 12
3.3.6 B+ File Organization ...................................................... 59
CHAPTER TWO............................................................................ 13
CHAPTER FOUR .......................................................................... 61
Object Oriented Database Forms, Triggers and Reports .................. 13
4. Query Processing ........................................................................ 61
2.1 Forms ................................................................................... 13
4.1 Evaluation of Expressions ..................................................... 61
2.1.1 Types of Forms .............................................................. 13
4.1.1 Materialization ............................................................... 62
2.1.2 Form Design .................................................................. 15
4.1.2 Pipelining ....................................................................... 62
2.2 Trigger .................................................................................. 17
4.2 Join Operations: .................................................................... 63
2.2.1 Example of Trigger in SQL Server ................................. 18
4.2.1 Types of Join operations: ................................................ 64
2.2.2 Types of SQL Server Triggers ........................................ 23
4.3 SQL SELECT Statement ....................................................... 74
i ii
4.3.1 Syntax of SELECT Statement in SQL ............................ 75 5.6.3 Buffer Replacement Strategies...................................... 116
4.3.2 Examples of SELECT Statement in SQL ........................ 75 CHAPTER SIX ............................................................................ 118
4.4.1 Operations of Transaction:.............................................. 78 Data analytics and Decision Support System ................................. 118
4.4.2 States of Transaction ...................................................... 79 6.1 Data Analytics .................................................................... 118
Aborted................................................................................... 81 6.1.1 Data Analytics Process ................................................. 119
4.5 Testing of Serializability ....................................................... 82 6.1.2 Types of Data Analytics ............................................... 120
4.5.1 Recoverability of Schedule ............................................. 86 6.1.4 Data Analytics Tools.................................................... 122
4.6 Deadlock in DBMS ............................................................... 89 6.2 Data Mining ........................................................................ 123
4.6.2 Deadlock Detection ........................................................ 91 6.2.1 Types of Data Mining ...................................................... 124
4.6.3 Deadlock Prevention ...................................................... 92 6.2.2 Data warehouses........................................................... 124
4.7 Failure Classification ............................................................ 93 6.2.3 Data Repositories: ........................................................ 124
4.7.1. Transaction failure......................................................... 93 6.2.4 Object-Relational Database: ......................................... 125
4.7.2. System Crash ................................................................ 94 6.2.5 Transactional Database: ................................................ 125
4.7.3. Disk Failure................................................................... 94 6.2.6 Advantages of Data Mining .......................................... 125
4.8 Log-Based Recovery ............................................................. 95 6.2.7 Disadvantages of Data Mining ...................................... 126
4.8.1 Recovery using Log records ........................................... 96 6.2.8 Data Mining Applications............................................. 127
4.8.2 Checkpoint............................................................................. 97 6.2.9 Challenges of Implementation in Data mining .................. 131
CHAPTER FIVE ............................................................................ 99 6.3 Big Data ............................................................................. 133
Concurrency Control Protocols ....................................................... 99 6.3.1 The Uses of Big Data ................................................... 134
5.1 Lock-Based Protocol ............................................................. 99 6.3.2 Advantages and Disadvantages of Big Data ................. 135
5.2 Timestamp Ordering Protocol ............................................. 105 6.4 Decision Support System (DSS) ......................................... 135
5.2.1 Advantages and Disadvantages of TO protocol: ............ 107 6.4.1 Types of DSS ............................................................... 139
5.3 Validation Based Protocol ................................................... 107 6.5 Multimedia Database .......................................................... 140
5.4 Multiple Granularity ........................................................... 109 6.5.1 Content of Multimedia Database management system... 140
5.5 Intention Mode Lock ........................................................... 111 6.5.2 Types of multimedia applications ................................. 140
5.6 Database Buffer .................................................................. 113 6.5.3 Challenges to multimedia databases.............................. 141
5.6.1 Buffer Manager ............................................................ 114 6.5.4 Areas where multimedia database ................................. 142
iii iv
6.6 Mobile Database ................................................................. 143
CHAPTER ONE
6.6.1 Components of a mobile database environment include: 143
Reference ................................................................................. 145 Object-Oriented Database
Management System

T he data model in which data is kept in the form of


objects, which are instances of classes, is known as an
object-oriented database management system, or ODBMS.
The object-oriented data model is made up of these classes
and objects.

A database management system (DBMS) that facilitates the


modelling and generation of data as objects is called an
object-oriented database management system (ODBMS),
which is frequently abbreviated as ODBMS for object
database management system. Inheritance of class attributes
and methods by subclasses and their objects is also included,
as is some sort of support for object classes.

1.1 Components of Object-Oriented Data Model:


The OODBMS is based on three major components,
namely: Object structure, Object classes, and Object
identity. These are explained below.
1.1.1 Object Structure:
The structure of an object refers to the properties that an
object is made up of. These properties of an object are
referred to as an attribute. Thus, an object is a real-world
entity with certain attributes that makes up the object
structure. Also, an object encapsulates the data code into a

v 1
single unit which in turn provides data abstraction by 1.1.2. Object Classes:
hiding the implementation details from the user. An object which is a real-world entity is an instance of a
The object structure is further composed of three types of class. Hence first we need to define a class and then the
components: Messages, Methods, and Variables. These are objects are made which differ in the values they store but
explained below. share the e class definition. The objects in turn correspond
to various messages and variables stored in them.
1. Messages –
Example
A message provides an interface or acts as a
class CLERK
communication medium between an object and the
outside world. A message can be of two types: { //variables
 Read-only message: If the invoked method does not
char name;
change the value of a variable, then the invoking
message is said to be a read-only message. string address;
 Update message: If the invoked method changes the
int id;
value of a variable, then the invoking message is
said to be an update message. int salary;
//Messages
2. Methods –
When a message is passed then the body of code that is char get_name();
executed is known as a method. Whenever a method is
string get_address();
executed, it returns a value as output. A method can be
of two types: int annual_salary();
 Read-only method: When the value of a variable is
};
not affected by a method, then it is known as the
read-only method. In the above example, we can see, CLERK is a class that
 Update-method: When the value of a variable holds the object variables and messages.
change by a method, then it is known as an update
An OODBMS also supports inheritance in an extensive
method.
manner as in a database there may be many classes with
similar methods, variables and messages. Thus, the concept
3. Variables
of the class hierarchy is maintained to depict the
It stores the data of an object. The data stored in the
similarities among various classes.
variables makes the object distinguishable from one
another.

2 3
1.2 Relational Database and Object-Oriented

Relational database management systems (RDBMS) work Relational C:companys


with tables, with each row in the table representing a record.
The columns in a row represent the attributes of an
individual record. Associations between records (“A
Company has many Employees. An Employee belongs to a d:departments Dep_manager
Company”) are facilitated with foreign keys in one table
referencing IDs in another table. These associations make up
the “relational” part of relational databases.
Dep_employee employees
Contrast this with the OOD, which typically stores and
manages objects directly on the database server's disk. There
Figure 1 Representation object oriented and relational database
are no tables, no rows, no columns, no foreign keys. There
are only objects. 1.3 Object Oriented Analysis and Design
Associations between objects in an OOD can also be Object Oriented analysis is an analysis and identification of
established and persist, which can lead to powerful and fast objects, and design means combining those identified
querying of data across complex relationships. objects. So, the main purpose of OO analysis is identifying
the objects for designing a system. The analysis can also be
Object Oriented DB done for an existing system. The analysis can be more
efficient if we can identify the objects. Once we have
C1:company
identified the objects, their relationships are then identified,
and the design is also produced.

The purpose of OOA is given below:


d1:department d2:department d3:department
 To identify the objects of a system.
 To identify their relationships.
Manager Employee  To make a design that is executable when the
p1:person p2:person concepts of OO are employed.

Following are the steps where OO concepts are applied and


implemented:
4 5
Step 1: OO Analysis import java.util.*;

The main purpose of OO analysis is identifying the objects // Class 1


and describing them correctly. After the objects are // Student class
identified, the designing step is easily carried out. It is a must class Student {
to identify the objects with responsibilities. Here the
responsibility refers to the functions performed by the // Attributes of Student
private String studentName;
objects. Each individual object has its own functions to
private int studentId;
perform. The purpose of the system is fulfilled by
collaborating these responsibilities. // Constructor of Student class
public Student(String studentName, int
Step 2: OO Design studentId)
{
This phase mainly emphasizes on meeting the requirements. this.studentName = studentName;
In this phase, the objects are joined together as per the this.studentId = studentId;
intended associations. After the association is completed, }
the designing phase also gets complete.
public int getstudentId() {
return studentId;
}

public String getstudentName() {


return studentName;
}
Step 3: OO Implementation }

// Class 2
This is the last phase that comes after the designing is done.
// Department class
It implements the design using any OO languages like C++,
// Department class contains list of Students
Java, etc. class Department {
// Java program to illustrate // Attributes of Department class
// Concept of Aggregation private String deptName;
private List<Student> students;
// Importing required classes
import java.io.*; // Constructor of Department class
6 7
public Department(String deptName,
List<Student> students) public void addDepartment(Department
{ department)
this.deptName = deptName; {
this.students = students; departments.add(department);
} }

public List<Student> getStudents() { // Method of Institute class


return students; // Counting total students in the
} institute
public int getTotalStudentsInInstitute()
public void addStudent(Student student) {
{ int noOfStudents = 0;
students.add(student); List<Student> students = null;
}
} for (Department dept : departments) {
students = dept.getStudents();
// Class 3
// Institute class for (Student s : students) {
// Institute class contains the list of noOfStudents++;
Departments }
class Institute { }
return noOfStudents;
// Attributes of Institute }
private String instituteName; }
private List<Department> departments;
// Class 4
// Constructor of Institute class // main class
public Institute(String instituteName, class AggregationExample {
List<Department>
departments) // main driver method
{ public static void main(String[] args)
// This keyword refers to current {
instance itself // Creating independent Student
this.instituteName = instituteName; objects
this.departments = departments; Student s1 = new Student("Janet", 1);
} Student s2 = new Student("Mercy", 2);
8 9
Student s3 = new Student("Peter", 1); Institute institute = new
Student s4 = new Student("Paul", 2); Institute("BITS", departments);

// Creating an list of CSE Students // Display message for better


List<Student> cse_students = new readability
ArrayList<Student>(); System.out.print("Total students in
cse_students.add(s1); institute: ");
cse_students.add(s2);
// Calling method to get total number
// Creating an initial list of EE of students
Students // in the institute and printing on
List<Student> ee_students = new console
ArrayList<Student>(); System.out.print(
ee_students.add(s3);
ee_students.add(s4); institute.getTotalStudentsInInstitute());
}
// Creating Department object with a }
Students list
// using Aggregation (Department "has"
students)
Department CSE = new Department("CSE", 1.4 Advantages of object-oriented databases
cse_students);
Department EE = new Department("EE", With all of their complex associations to other objects, and
ee_students);
because complex data objects persist in an OOD, the most
// Creating an initial list of significant advantage of the OOD over RDBMS is the ability
Departments to query across these complex relationships very quickly.
List<Department> departments = new
ArrayList<Department>();  There are no slow “joins” as in an RDBMS. Instead,
departments.add(CSE); you have fast queries with complex data.
departments.add(EE);
 Since the database structure is so close to the
// Creating an Institute object with programming objects, the code is simpler and lighter.
Departments list
// using Aggregation (Institute "has" As another example, we might think back to our task object
Departments) instance, which cannot be stored as-is in MySQL. It needs
first to be decomposed into its attributes to be stored in the
10 11
table as a row with columns. The reverse process will CHAPTER TWO
involve retrieval and composition. Not so with object-
oriented or document databases. Have an object? Store the
whole thing in the database. Object Oriented Database Forms,
1.5 Disadvantages of object-oriented databases
Triggers and Reports
An OOD may be a great choice if you're using an object- 2.1 Forms
oriented programming language and need to manage
complex data with complex object-to-object associations. A custom-designed screen for entering new records in, or
Designing and optimizing a database system for these kinds displaying existing records from, a database table. Forms in
of complexities, however, also has its trade-offs. Access are like display cases in stores that make it easier to
For one thing, the relative performance of very simple view or get the items that you want. Since forms are objects
database operations, the ones you might do for a simple through which you or other users can add, edit, or display
lookup of an attribute from a relational database table, may the data stored in your database, the design of your form is
be sub-optimal. an important aspect. There's a lot you can do design-wise
with forms.
Additionally, while users of RDBMS can enjoy a standard
query language (SQL), users of object-oriented database 2.1.1 Types of Forms
systems may not have widely adopted standards at their
disposal. For the most part, each flavor of OOD is coupled i. Bound forms
tightly to an object-oriented programming language, and ii. Unbound forms
querying syntax is very language-dependent.
2.1.1.1 Bound Forms
Lastly, the OOD user community still feels small in
comparison to the exploding ecosystem of web development Bound forms are connected to some underlying data source
within the RDBMS space. But the community is fast- such as a table, query, or SQL statement. Bound forms are
growing and likely to make up for lost time. what people typically think of when they think of the
purpose of a form. Forms are to be filled out or used to enter
or edit data in a database. Examples of bound forms will
typically be what users use to enter, view or edit data in a
database.

12 13
2.1.1.2 Unbound Forms 2.1.2 Form Design

These forms are not connected to an underlying record or When designing forms in an object-oriented database
data source. Unbound forms could be dialog boxes, switch (OODB), there are a few key considerations to keep in mind.
boards, or navigation forms. In other words, unbound forms The principles of object-oriented design, encapsulation, and
are typically used to navigate or interact with the database at inheritance can be applied to create efficient and reusable
large, as opposed to the data itself. form components. Here's a step-by-step guide for designing
forms in an object-oriented database:
Types of Bound Forms
Identify the Entities: Start by identifying the entities or
There are many types of bound forms you can create in objects that you want to represent in your form. These could
Access. Let us understand the types be real-world entities like customers, products, or
employees.
Single Item Form
This is the most popular one and this is where the records Define Classes: Create classes to represent each entity. Each
are displayed — one record at a time. class should have attributes (data) and methods (behavior)
Multiple Item Form that define its properties and actions. For example, a
This displays multiple records at a time from that bound "Customer" class might have attributes like name, address,
data source. and email, along with methods for updating customer
details.
Split Form
The form is divided into halves, either vertically or Form Component Classes: Create form component classes
horizontally. One half displays a single item or record, and that inherit from base component classes. For example, you
the other half displays a list or provides a datasheet view of might have a "TextField" class, a "Dropdown" class, and a
multiple records from the underlying data source. A trigger "Checkbox" class. These classes should encapsulate the
is a set of SQL statements that reside in system memory logic and behavior specific to each form component.
with unique names. It is a specialized category of stored
procedure that is called automatically when a database Define Relationships: Identify any relationships between
server event occurs. Each trigger is always associated with entities. For example, if you have a "Customer" entity and
a table. an "Order" entity, there might be a one-to-many relationship
where a customer can have multiple orders. Represent these
relationships using references or associations between the
classes.

14 15
Create Form Classes: Design form classes that represent 2.2 Trigger
the overall structure and behavior of your forms. These
classes will contain instances of the form component classes A trigger is called a special procedure because it cannot be
and handle the form's validation, submission, and data called directly like a stored procedure. The key distinction
retrieval processes. between the trigger and procedure is that a trigger is called
automatically when a data modification event occurs against
Implement Event Handling: Use event handling a table. A stored procedure, on the other hand, must be
mechanisms to capture user interactions with the form invoked directly.
components. For example, when a user enters text in a text
field, an event handler should update the corresponding The following are the main characteristics that distinguish
attribute in the underlying class instance. triggers from stored procedures:

Validation and Data Manipulation: Implement validation


 We cannot manually execute/invoked triggers.
logic within the form classes to ensure data integrity and
consistency. Perform checks on user input and handle any  Triggers have no chance of receiving parameters.
errors or exceptions that may occur. Additionally,  A transaction cannot be committed or rolled back
incorporate methods for manipulating and persisting data in
the OODB. inside a trigger.

Reusability and Modularity: Design your form We can create a trigger in SQL Server by using
components and form classes to be reusable and modular. the CREATE TRIGGER statement as follows:
This allows you to create different forms by combining the
appropriate components and promotes code maintenance CREATE TRIGGER schema.trigger_name
and scalability. ON table_name
Testing and Iteration: Test your form design thoroughly to AFTER {INSERT, UPDATE, DELETE}
ensure its functionality, usability, and responsiveness. Make
necessary iterations based on user feedback and any [NOT FOR REPLICATION]
discovered issues. AS
By following these steps, you can create a robust and {SQL_Statements}
efficient form design in an object-oriented database,
leveraging the principles of encapsulation, inheritance, and schema: It is an optional parameter that defines which
modularity. schema the new trigger belongs to.

16 17
trigger_name: It is a required parameter that defines the
4. Name VARCHAR(45),
name for the new trigger.
5. Salary INT,
table_name: It is a required parameter that defines the table
6. Gender VARCHAR(12),
name to which the trigger applies. Next to the table name,
we need to write the AFTER clause where any events like 7. DepartmentId INT
INSERT, UPDATE, or DELETE could be listed. 8. )
NOT FOR REPLICATION: This option tells Next, we will insert some record into this table as follows:
that SQL Server does not execute the trigger when data is
modified as part of a replication process.
1. INSERT INTO Employee VALUES (1,'Steffan', 82
SQL_Statements: It contains one or more SQL statements 000, 'Male', 3),
that are used to perform actions in response to an event that
occurs 2. (2,'Amelie', 52000, 'Female', 2),
3. (3,'Antonio', 25000, 'male', 1),
Triggers will be helpful when we need to execute some
events automatically on certain desirable scenarios. For 4. (4,'Marco', 47000, 'Male', 2),
example, we have a constantly changing table and need to
know the occurrences of changes and when these changes 5 (5,'Eliana', 46000, 'Female', 3)
happen. If the primary table made any changes in such
We can verify the insert operation by using the SELECT
scenarios, we could create a trigger to insert the desired data
statement. We will get the below output:
into a separate table.

2.2.1 Example of Trigger in SQL Server 1. SELECT * FROM Employee;


Table 1 Employee table
Let us understand how we can work with triggers in the SQL
Server. We can do this by first creating a table named Id Name Salary Gender DepartmentId
'Employee' using the below statements:
1 Martins 72000 Male 3
1. CREATE TABLE Employee 2 Treasure 63000 Female 4
2. ( 3 Taiwo 55000 Male 2
3. Id INT PRIMARY KEY,

18 19
4 Miracle 60000 Male 3
5. BEGIN
5 Ayomide 49000 Female 4 6. Declare @Id int
6 Agada 70000 Male 4 7. SELECT @Id = Id from inserted
8. INSERT INTO Employee_Audit_Test
9. VALUES ('New employee with Id = ' + CAST(@I
d AS VARCHAR(10)) + ' is added at ' + CAST(Get
We will also create another table named date() AS VARCHAR(22)))
'Employee_Audit_Test' to automatically store transaction
records of each operation, such as INSERT, UPDATE, or 10. END
DELETE on the Employee table:
After creating a trigger, we will try to add the following
record into the table:
1. CREATE TABLE Employee_Audit_Test
2. ( 1. INSERT INTO Employee VALUES (6,'Peter', 6200
3. Id int IDENTITY, 0, 'Male', 3)
4. Audit_Action text
If no error is found, execute the SELECT statement to check
5. ) the audit records. We will get the output as follows:

We are going to create another trigger to store transaction


Now, we will create a trigger that stores transaction records records of each delete operation on the Employee table into
of each insert operation on the Employee table into the the Employee_Audit_Test table. We can create the delete
Employee_Audit_Test table. Here we are going to create the trigger using the below statement:
insert trigger using the below statement:
1. CREATE TRIGGER trDeleteEmployee
1. CREATE TRIGGER trInsertEmployee 2. ON Employee
2. ON Employee 3. FOR DELETE
3. FOR INSERT 4. AS
4. AS 5. BEGIN
6. Declare @Id int
20 21
7. SELECT @Id = Id from deleted 1. SELECT @Id = Id from inserted
8. INSERT INTO Employee_Audit_Test 2. SELECT @Id = Id from deleted
9. VALUES ('An existing employee with Id = ' + CAST(@I
Here inserted and deleted are special tables used by the SQL
d AS VARCHAR(10)) + ' is deleted at ' + CAST(Getdate() Server. The inserted table keeps the copy of the row when
AS VARCHAR(22))) you insert a new row into the actual table. And the deleted
10. END table keeps the copy of the row you have just deleted from
the actual table.
After creating a trigger, we will delete a record from the
Employee table: 2.2.2 Types of SQL Server Triggers

1. DELETE FROM Employee WHERE Id = 2; We can categorize the triggers in SQL Server in mainly three
types:
If no error is found, it gives the message as below:
1. Data Definition Language (DDL) Triggers
2. Data Manipulation Language (DML) Triggers
3. Logon Triggers

DDL Triggers
Figure 2 SQL delete report log
DDL triggers are fired in response to the DDL events, such
as CREATE, ALTER, and DROP statements. We can create
Finally, execute the SELECT statement to check the audit these triggers at the database level or server level, depending
records: on the type of DDL events. It can also be executed in
response to certain system-defined stored procedures that do
DDL-like operations.

The DDL triggers are useful in the following scenario:


Figure 3 SQL insert report log

 When we need to prevent the database schema


In both the triggers code, you will notice these lines:
from changing

22 23
 When we need to audit changes made in the  Insert, Update, Or Delete Statements
database schema 6. END
 When we need to respond to a change made in the
database schema Instead of Triggers

After Triggers Instead of Trigger fires before SQL Server begins to execute
the triggering operation that triggered it. It means that no
After trigger fires, when SQL Server completes the condition constraint check is needed before the trigger runs.
triggering action successfully, that fired it. Generally, this As a result, even if the constraint check fails, this trigger will
trigger is executed when a table completes an insert, update fire. It is the opposite of the AFTER trigger. We can create
or delete operations. It is not supported in views. Sometimes the INSTEAD OF triggers on a table that executes
it is known as FOR triggers. It can be further classified into successfully but doesn't contain the table's actual insert,
three types update, or delete operations.

1. AFTER INSERT Trigger We can classify this trigger further into three types:
2. AFTER UPDATE Trigger
1. INSTEAD OF INSERT Trigger
3. AFTER DELETE Trigger
2. INSTEAD OF UPDATE Trigger
Example: When we insert data into a table, the trigger 3. INSTEAD OF DELETE Trigger
associated with the insert operation on that table will not fire
until the row has passed all constraints, such as the primary Example: When we insert data into a table, the trigger
key constraint. SQL Server cannot fire the AFTER trigger associated with the insert operation on that table will fire
when the data insertion failed. The following is illustration before the data has passed all constraints, such as the
of the After Triggers syntax in SQL Server: primary key constraint. SQL Server also fires the Instead of
Trigger if the data insertion fails.
1. CREATE TRIGGER schema_name.trigger_name
2. ON table_name The following is an illustration of the Instead of Triggers
syntax in SQL Server:
3. AFTER {INSERT | UPDATE | DELETE}
4. AS 1. CREATE TRIGGER schema_name.trigger_name
5. BEGIN 2. ON table_name
 Trigger Statements 3. INSTEAD OF {INSERT | UPDATE | DELETE}
24 25
4. AS
5. BEGIN
1. ALTER TRIGGER [dbo].[triggers_in_sql]
 trigger statements
2. ON [dbo].[EmployeeTable]
 Insert, Update, or Delete commands
3. AFTER INSERT
6. END
4. AS
Logon Triggers 5. BEGIN
 Modify as per your needs
Logon triggers fires in response to a LOGON event. The 6. END
LOGON event occurs when a user session is generated with
an SQL Server instance, which is made after the How to DELETE Triggers in SQL Server?
authentication process of logging is completed but before
establishing a user session. As a result, the SQL Server error
We can remove an existing trigger in SQL Server using
log will display all messages created by the trigger,
the DROP TRIGGER statement. We must be very careful
including error messages and the PRINT statement
while removing a trigger from the table. Because once we
messages. If authentication fails, logon triggers do not
have deleted the trigger, it cannot be recovered. If a trigger
execute. These triggers may be used to audit and control
is not found, the DROP TRIGGER statement throws an
server sessions, such as tracking login activity or limiting the
error.
number of sessions for a particular login.
The following syntax removes DML triggers:
UPDATE Triggers in SQL Server

The data stored in the table can be changed over a period of 1. DROP TRIGGER [IF EXISTS] schema_name.trigger_n
time. In that case, we also need to make changes in the ame;
triggers. We can do this in two ways into the SQL Server.
The first one is to use the SQL Server Management Studio, If we want to remove more than one trigger at once, we must
and the second one is the Transact-SQL Query. separate the trigger using the comma operator:

Modify Triggers using SQL Command 2. DROP TRIGGER schema_name.trigger_name1, trigger


_name2.....n;
We can use the ALTER TRIGGER statement to modify the
triggers in MS SQL. The following statement allows is to do We can use the DROP TRIGGER statement in the below
modifications to the triggers: format to delete one or more LOGON triggers:
26 27
1. DROP TRIGGER [ IF EXISTS ] trigger_name1, tri  Triggers are easy to maintain.
gger_name2.....n
2.2.4 Disadvantages of Triggers
2. ON { DATABASE | ALL SERVER };
The following are the disadvantages of using triggers in SQL
We can use the DROP TRIGGER statement in the below
Server:
format to delete one or more DDL triggers:
 Triggers only allow using extended validations.
1. DROP TRIGGER [ IF EXISTS ] trigger_name1, tri
 Triggers are invoked automatically, and their
gger_name2.....n
execution is invisible to the user. Therefore, it
2. ON ALL SERVER;
isn't easy to troubleshoot what happens in the
database layer.
2.2.3 Advantages of Triggers
 Triggers may increase the overhead of the
The following are the advantages of using triggers in SQL database server.
Server:  We can define the e trigger action for multiple
user actions such as INSERT and UPDATE in the
 Triggers set database object rules and roll back if
e CREATE TRIGGER statement.
any change does not satisfy those rules. The
 We can create a trigger in the current database
trigger will inspect the data and make changes if
only, but it can reference objects outside the
necessary.
current database.
 Triggers help us to enforce data integrity.
 Triggers help us to validate data before inserted 2.3 A Database Report
or updated. A database report is a structured presentation of data from a
 Triggers help us to keep a log of records. database that provides organized and summarized
information for analysis, decision-making, or presentation
 Triggers increase SQL queries' performance
purposes. It typically includes tables, charts, graphs, and
because they do not need to compile each time other visual elements to convey data in a meaningful way.
they are executed. Here are some key points about creating a database report:
 Triggers reduce the client-side code that saves
Database reports provide custom information to database
time and effort. users. Reports can be simple documents that only output the
28 29
contents of a table, or complex outputs that combine the Common visualization types include bar charts, line graphs,
information from several tables and show selected subsets pie charts, and scatter plots.
of database information.
Grouping and Summarization: Group related data
Report Design: Start by defining the purpose and scope of together to provide a logical structure to the report.
the report. Identify the specific data elements and metrics Summarize data by aggregating values using functions like
that need to be included. Determine the layout and format of sum, average, count, or maximum/minimum. Grouping and
the report, such as the number of columns, grouping of data, summarization help to organize data hierarchically and
and visualizations to be used. provide insights at different levels of detail.
Data Retrieval: Retrieve the necessary data from the Filtering and Sorting: Apply filters to the report to focus
database to populate the report. Construct SQL queries or on specific subsets of data based on criteria such as date
use reporting tools that allow you to extract the required data ranges, specific categories, or other conditions. Sort the data
based on specified criteria, such as date ranges, filters, or in a meaningful order, such as ascending or descending,
joins between tables. based on specific columns or metrics.
Data Transformation: Perform any necessary calculations, Report Parameters: Consider adding parameters to allow
aggregations, or transformations on the retrieved data to users to customize the report output. Parameters enable users
generate meaningful insights. This may involve applying to specify filters, sorting options, or other preferences at
mathematical operations, formatting dates or numbers, or runtime, making the report more flexible and user-friendly.
creating derived metrics.
Report Generation and Distribution: Generate the report
Report Formatting: Format the report layout to present the in a desired format, such as PDF, Excel, or HTML,
data in a clear and organized manner. Use headings, depending on the intended use. Distribute the report to
subheadings, and section separators to structure the report. relevant stakeholders through email, file sharing platforms,
Choose appropriate fonts, colors, and styles for readability. or publishing on a web portal.
Consider incorporating headers, footers, and page numbers
Regular Updates and Maintenance: If the report is
for professional presentation.
generated on a recurring basis, set up automated processes
Visual Elements: Utilize visual elements like tables, charts, to update and distribute the report regularly. Ensure that the
graphs, and diagrams to enhance data understanding. report remains accurate and up-to-date by incorporating any
Choose the appropriate visualization types based on the necessary data refresh or synchronization mechanisms.
nature of the data and the insights you want to convey.
Remember to test the report thoroughly to ensure data
accuracy, readability, and adherence to requirements.
30 31
Continuously gather feedback from report users to improve
the design and usefulness of future iterations.
2.4 Unified Modeling Language

Unified Modeling Language (UML) is a general purpose


modelling language. The main aim of UML is to define a
standard way to visualize the way a system has been
designed. It is quite similar to blueprints used in other fields
of engineering.
UML is not a programming language, it is rather a visual
language. We use UML diagrams to portray the behavior
and structure of a system. UML helps software engineers,
businessmen and system architects with modelling, design
and analysis. The Object Management Group (OMG)
adopted Unified Modelling Language as a standard in
1997. Its been managed by OMG ever since. International Figure 4 UML2.2 hierarchy of diagrams
Organization for Standardization (ISO) published UML as
an approved standard in 2005. UML has been revised over 2.4.1 Object Oriented Concepts Used in UML
the years and is reviewed periodically. 1. Class – A class defines the blue print i.e. structure and
functions of an object.
Diagrams in UML can be broadly classified as: 2. Objects – Objects help us to decompose large systems
1. Structural Diagrams – Capture static aspects or and help us to modularize our system. Modularity helps
structure of a system. Structural Diagrams include: to divide our system into understandable components so
Component Diagrams, Object Diagrams, Class that we can build our system piece by piece. An object
Diagrams and Deployment Diagrams. is the fundamental unit (building block) of a system
2. Behavior Diagrams – Capture dynamic aspects or which is used to depict an entity.
behavior of the system. Behavior diagrams include: Use 3. Inheritance – Inheritance is a mechanism by which
Case Diagrams, State Diagrams, Activity Diagrams and child classes inherit the properties of their parent
Interaction Diagrams. classes.
The image below shows the hierarchy of diagrams 4. Abstraction – Mechanism by which implementation
according to UML 2.2 details are hidden from user.
5. Encapsulation – Binding data together and protecting it
from the outer world is referred to as encapsulation.

32 33
6. Polymorphism – Mechanism by which functions or 5. Deployment Diagram: Deployment Diagrams are used
entities are able to exist in different forms. to represent system hardware and its software. It tells us
what hardware components exist and what software
2.4.2 Structural UML Diagrams components run on them. We illustrate system
1. Class Diagram – The most widely use UML diagram is architecture as distribution of software artifacts over
the class diagram. It is the building block of all object distributed targets. An artifact is the information that is
oriented software systems. We use class diagrams to generated by system software.
depict the static structure of a system by showing 6. Package Diagram – Package Diagrams are used to
system’s classes, their methods, attributes and depict how packages and their elements have been
relationship between different classes or objects. organized. A package diagram simply shows us the
2. Composite Structure Diagram: composite structure dependencies between different packages and internal
diagrams are used to represent the internal structure of composition of packages.
a class and its interaction points with other parts of the
system. A composite structure diagram represents
relationship between parts and their configuration
which determine how the classifier (class, a component,
or a deployment node) behaves. They represent internal
structure of a structured classifier making the use of
parts, ports, and connectors.
3. Object Diagram – An Object Diagram can be referred
to as a screenshot of the instances in a system and the
relationship that exists between them. Since object
diagrams depict behaviour when objects have been
instantiated, we are able to study the behaviour of the
system at a particular instant.
4. Component Diagram: Component diagrams are used to
represent how the physical components in a system
have been organized. We use them for modelling
implementation details. Component Diagrams depict
the structural relationship between software system
elements and help us in understanding if functional
requirements have been covered by planned
development.

34 35
CHAPTER THREE
Storage System in DBMS

A database system provides an ultimate view of the


stored data. However, data in the form of bits, bytes get
stored in different storage devices.

In this section, we will take an overview of various types of


storage devices that are used for accessing and storing data.
Figure 5 Types of storage devices used for storing the data
3.1 Types of Data Storage
3.1.1 Primary Storage
For storing the data, there are different types of storage
options available. These storage types differ from one It is the primary area that offers quick access to the stored
another as per the speed and accessibility. There are the data. We also know the primary storage as volatile storage.
following types of storage devices used for storing the data: It is because this type of memory does not permanently store
the data. As soon as the system leads to a power cut or a
 Primary Storage crash, the data also get lost. Main memory and cache are the
types of primary storage.
 Secondary Storage
 Tertiary Storage 1. Main Memory: It is the one that is responsible for
operating the data that is available by the storage
medium. The main memory handles each instruction
of a computer machine. This type of memory can
store gigabytes of data on a system but is small
enough to carry the entire database. At last, the main
memory loses the whole content if the system shuts
down because of power failure or other reasons.
36 37
2. Cache: It is one of the costly storage media. On the of storing large amounts of databases than the
other hand, it is the fastest one. A cache is a tiny main memory.
storage media which is maintained by the computer  Magnetic Disk Storage: This type of storage
hardware usually. While designing the algorithms media is also known as online storage media. A
and query processors for the data structures, the magnetic disk is used for storing the data for a
designers keep concern on the cache effects. long time. It is capable of storing an entire
database. It is the responsibility of the computer
3.1.2 Secondary Storage
system to make availability of the data from a disk
Secondary storage is also called as Online storage. It is the to the main memory for further accessing. Also, if
storage area that allows the user to save and store data the system performs any operation over the data,
permanently. This type of memory does not lose the data due the modified data should be written back to the
to any power failure or system crash. That's why we also call
it non-volatile storage. disk. The tremendous capability of a magnetic
disk is that it does not affect the data due to a
There are some commonly described secondary storage system crash or failure, but a disk failure can
media which are available in almost every type of computer
easily ruin as well as destroy the stored data.
system:
3.1.3 Tertiary Storage
 Flash Memory: A flash memory stores data in
USB (Universal Serial Bus) keys which are It is the storage type that is external from the computer
further plugged into the USB slots of a computer system. It has the slowest speed. But it is capable of storing
a large amount of data. It is also known as Offline storage.
system. These USB keys help transfer data to a Tertiary storage is generally used for data backup. There are
computer system, but it varies in size limits. following tertiary storage devices available:
Unlike the main memory, it is possible to get back
the stored data which may be lost due to a power  Optical Storage: An optical storage can store
cut or other reasons. This type of memory storage megabytes or gigabytes of data. A Compact Disk
is most commonly used in the server systems for (CD) can store 700 megabytes of data with a
caching the frequently used data. This leads the playtime of around 80 minutes. On the other hand,
systems towards high performance and is capable
38 39
a Digital Video Disk or a DVD can store 4.7 or maintains all information of a relation or table, from its
schema to the applied constraints. All the metadata is stored.
8.5 gigabytes of data on each side of the disk.
In general, metadata refers to the data about data. So, storing
 Tape Storage: It is the cheapest storage medium
the relational schemas and other metadata about the relations
than disks. Generally, tapes are used for archiving in a structure is known as Data Dictionary or System
or backing up the data. It provides slow access Catalog.
direct and sequential to data
A data dictionary is like the A-Z dictionary of the relational
3.2 Storage Hierarchy database system holding all information of each relation in
the database.
Besides the above, various other storage devices reside in
the computer system. These storage media are organized on The types of information a system must store are:
the basis of data accessing speed, cost per unit of data to buy
the medium, and by medium's reliability. Thus, we can
 Name of the relations
create a hierarchy of storage media on the basis of its cost
and speed.  Name of the attributes of each relation
 Lengths and domains of attributes
 Name and definitions of the views defined on the
database
 Various integrity constraints

With this, the system also keeps the following data based on
users of the system:

 Name of authorized users


 Accounting and authorization information about
users.
 The authentication information for users, such as
passwords or other related information.
Figure 6 Storage device hierarchy

In addition to this, the system may also store some statistical


Till now, we learned and understood about relations and its
and descriptive data about the relations, such as:
representation. In the relational database system, it

40 41
 Number of tuples in each relation in a non-formalized manner. It does not use any normal form
so as to fastly access the data stored in the dictionary.
 Method of storage for each relation, such as
clustered or non-clustered. For example, in the data dictionary, it uses underline below
the value to represent that the following field contains a
A system may also store the storage organization, whether primary key.
sequential, hash, or heap. It also notes the location where
each relation is stored: 3.3 File Organization Storage

 If relations are stored in the files of the operating There are different ways of storing data in the database.
system, the data dictionary note, and stores the Storing data in files is one of them. A user can store the data
in files in an organized manner. These files are organized
names of the file.
logically as a sequence of records and reside permanently on
 If the database stores all the relations in a single disks. Each file is divided into fixed-length storage units
file, the data dictionary notes and store the blocks known as Blocks. These blocks are the units of storage
containing records of each relation in a data allocation as well as data transfer. Although the default
block size in the database is 4 to 8 kilobytes, many databases
structure similar to a linked list.
allow specifying the size at the time of creating the database
instance.
At last, it also stores the information regarding each index of
all the relations:
Usually, the record size is smaller than the block size. But,
for large data items such as images, the size can vary. For
 Name of the index. accessing the data quickly, it is required that one complete
 Name of the relation being indexed. record should reside in one block only. It should not be
 Attributes on which the index is defined. partially divided between one or two blocks.
 The type of index formed.
 Fixed-length records.
All the above information or metadata is stored in a data  Variable-length records
dictionary. The data dictionary also maintains updated
information whenever they occur in the relations. Such Fixed-Length Records
metadata constitutes a miniature database. Some systems
store the metadata in the form of a relation in the database 1. Fixed-length records mean setting a length and
itself. The system designers design the way of representation storing the records into the file. If the record size
of the data dictionary. Also, a data dictionary stores the data exceeds the fixed size, it gets divided into more than

42 43
one block. Due to the fixed size there occurs 2. Defining the way of storing variable-length records
following two problems:
within a block so as to extract that record in a block
2. Partially storing subparts of the record in more than easily.
one block requires access to all the blocks containing
the subparts to read or write in it. Thus, the representation of a variable-length record can be
divided into two parts:
3. It is difficult to delete a record in such a file
organization. It is because if the size of the existing 1. An initial part of the record with fixed-length
record is smaller than the block size, then another attributes such as numeric values, dates, fixed-length
record or a part fills up the block. character attributes for storing their value.
Variable-Length Records 2. The data for variable-length attributes such as varchar
type is represented in the initial part of the record by
Variable-length records are the records that vary in size. It (offset, length) pair. The offset refers to the place
requires the creation of multiple blocks of multiple sizes to
where that record begins, and length refers to the
store them. These variable-length records are kept in the
following ways in the database system: length of the variable-size attribute. Thus, the initial
part stores fixed-size information about each
1. Storage of multiple record types in a file. attribute, i.e., whether it is the fixed-length or
2. It is kept as Record types that enable repeating fields variable-length attribute.
like multisets or arrays.
3.3.1 Slotted-page Structure
3. It is kept as Record types that enable variable lengths
either for one field or more. There occurs a problem to store variable-length records
within the block. Thus, such records are organized in a
In variable-length records, there exist the following two slotted-page structure within the block. In the slotted-page
problems: structure, a header is present at the starting of each block.
This header holds information such as:
1. Defining the way of representing a single record so
as to extract the individual attributes easily. 1. The number of record entries in the header
2. No free space remaining in the block

44 45
3. An array containing the information on the location location. This stops fragmentation of space inside the block
but supports indirect pointers to the record.
and size of the records.
3.3.2 Object storage

Object storage, often referred to as object-based storage, is


a data storage architecture for handling large amounts of
unstructured data. This is data that does not conform to, or
cannot be organized easily into, a traditional relational
database with rows and columns. Today’s Internet
communications data is largely unstructured. This includes
email, videos, photos, web pages, audio files, sensor data,
and other types of media and web content (textual or non-
Figure 7 Slotted page structure textual). This content streams continuously from social
media, search engines, mobile, and “smart” devices. Market
Inserting and Deleting Method research firm IDC estimates that unstructured data is likely
to represent as much as 80% of all data worldwide by the
The variable-length records reside in a contiguous manner year 2025.
within the block.
Enterprises are finding it challenging to efficiently (and
When a new record is to be inserted, it gets the place at the affordably) store and manage this unprecedented volume of
end of the free space. It is because free space is contiguous data. Object-based storage has emerged as the preferred
as well. Also, the header fills an entry with the size and method for data archiving and backup. It offers a level of
location information of the newly inserted record. scalability not possible with traditional file- or block-based
storage. With object-based storage, you can store and
When an existing record is deleted, space is freed, and the manage data volumes on the order of terabytes (TBs),
header entry sets to deleted. Before deleting, it moves the petabytes (PBs), and even greater.
record and occupies it to create the free space. The end-of-
free-space gets the update. Then all the free space again sets
3.3.3 Storage in OODBs
between the first record and the final entry.
Object-oriented databases (OODBs) store and manage data
The primary technique of the slotted-page structure is that using an object-oriented model, which is different from the
no pointer should directly point the record. Instead, it should traditional relational database model. In an OODB, data is
point to the header entry that contains the information of its organized and stored in the form of objects, which are
instances of predefined classes. The storage structure of an
46 47
object-oriented database can vary depending on the specific one-to-one, one-to-many, or many-to-many. To represent
implementation and design choices. relationships, OODBs use object references or pointers,
which are stored as attributes within objects. The references
Objects encapsulate both data and the behavior associated
allow navigation between related objects.
with that data. Each object is an instance of a class and
represents a specific entity or concept within the database. Indexes: OODBs may utilize indexes to improve the
Objects can have attributes (data) and methods (functions or efficiency of object retrieval and querying. Indexes are data
procedures) that operate on the data. structures that provide fast access to objects based on
specific attributes or criteria. Indexes can be created on
Classes: Classes define the structure and behavior of
individual attributes or combinations of attributes. Common
objects. They serve as blueprints or templates for creating
index types used in OODBs include B-trees, hash indexes,
objects. Class definitions include the attributes and methods
and spatial indexes.
that objects of that class will possess. The class definitions
are stored in a schema or class hierarchy, which represents Query Optimization: OODBs employ query optimization
the organization of classes and their relationships. techniques to improve query performance. These techniques
involve analyzing query execution plans, considering
Object Identifiers: Each object in an OODB is assigned a
indexes, and applying various optimization strategies. Query
unique identifier known as an object identifier or OID. The
optimization aims to minimize the number of object
OID serves as a reference to the object and is used to locate
accesses and reduce the overall execution time. It's
and retrieve the object from the database. OIDs are typically
important to note that the specific storage structure and
implemented as numeric values or unique identifiers such as
implementation details can vary between different OODB
GUIDs (Globally Unique Identifiers).
systems. The above description provides a general
Object Store: The object store is the primary storage area in understanding of the storage structure in object-oriented
an OODB. It is responsible for storing and managing the databases.
objects themselves. Objects are typically stored in a binary
3.3.4 Hash File Organization
format, which includes the object's attributes and any
associated metadata. The object store can be implemented Hash File Organization uses the computation of hash
using various data structures such as B-trees, hash tables, or function on some fields of the records. The hash function's
object-oriented file systems. output determines the location of disk block where the
Relationships: OODBs support relationships between records are to be placed.
objects. Objects can have references to other objects,
forming associations or links. These relationships can be

48 49
3.3.4.1 Static Hashing

In static hashing, the resultant data bucket address will


always be the e. That means if we generate an address for
EMP_ID =103 using the hash function mod (5) then it will
always result in e bucket address 3. Here, there will be no
change in the bucket address.

Hence in this static hashing, the number of data buckets in


Figure 8 Hash file organisation in memory
memory remains constant throughout. In this example, we
will have five data buckets in the memory used to store the
When a record has to be received using the hash key
data.
columns, then the address is generated, and the whole record
is retrieved using that address. In the e way, when a new
record has to be inserted, then the address is generated using
the hash key and record is directly inserted. The e process is
applied in the case of delete and update.

In this method, there is no effort for searching and sorting


the entire file. In this method, each record will be stored
randomly in the memory.

Figure 10 Static hashing structure

Operations of Static Hashing

 Searching a record

Figure 9 Hash file organisation in memory with a new record


50 51
When a record needs to be searched, then the e hash function For example: suppose R3 is a new address which needs to
retrieves the address of the bucket where the data is stored. be inserted, the hash function generates address as 112 for
R3. But the generated address is already full. So the system
 Insert a Record searches next available data bucket, 113 and assigns R3 to
it.
When a new record is inserted into the table, then we will
generate an address for a new record based on the hash key
and record is stored in that location.

 Delete a Record

To delete a record, we will first fetch the record which is


supposed to be deleted. Then we will delete the records for
that address in memory.

 Update a Record Figure 11 Adding records with Static hashing

To update a record, we will first search it using a hash


function, and then the data record is updated. If we want to 2. Close Hashing
insert some new record into the file but the address of a data
bucket generated by the hash function is not empty, or data When buckets are full, then a new data bucket is allocated
already exists in that address. This situation in the static for the e hash result and is linked after the previous one. This
hashing is known as bucket overflow. This is a critical mechanism is known as Overflow chaining.
situation in this method.
For example: Suppose R3 is a new address which needs to
To overcome this situation, there are various methods. Some be inserted into the table, the hash function generates
commonly used methods are as follows: address as 110 for it. But this bucket is full to store the new
data. In this case, a new bucket is inserted at the end of 110
1. Open Hashing buckets and is linked to it.
When a hash function generates an address at which data is
already stored, then the next bucket will be allocated to it.
This mechanism is called as Linear Probing.

52 53
How to insert a new record

Firstly, you have to follow the e procedure for retrieval,


ending up in some bucket. If there is still space in that
bucket, then place the record in it. If the bucket is full, then
we will split the bucket and redistribute the records.

For example:

Figure 12 Adding records with Static hashing Consider the following grouping of keys into buckets,
depending on the prefix of their hash address:

3.3.4.2 Dynamic Hashing


The dynamic hashing method is used to overcome the
problems of static hashing like bucket overflow. In this
method, data buckets grow or shrink as the records increases
or decreases. This method is also known as Extendable
hashing method.
Figure 13 Keys and hash address
This method makes hashing dynamic, i.e., it allows insertion
or deletion without resulting in poor performance.
The last two bits of 2 and 4 are 00. So it will go into bucket
How to search a key B0. The last two bits of 5 and 6 are 01, so it will go into
bucket B1. The last two bits of 1 and 3 are 10, so it will go
First, calculate the hash address of the key. Check how many
into bucket B2. The last two bits of 7 are 11, so it will go into
bits are used in the directory, and these bits are called as
B3.
i.Take the least significant i bits of the hash address. This
gives an index of the directory. Now using the index, go to
the directory and find bucket address where the record might
be.

54 55
Figure 15 Dynamic hashing structure

3.3.4.3 Advantages of dynamic hashing


Figure 14 Key insertion with hashing
 In this method, the performance does not decrease
Insert key 9 with hash address 10001 into the above as the data grows in the system. It simply
structure:
increases the size of memory to accommodate the
Since key 9 has hash address 10001, it must go into the first data.
bucket. But bucket B1 is full, so it will get split. The splitting  In this method, memory is well utilized as it
will separate 5, 9 from 6 since last three bits of 5, 9 are 001, grows and shrinks with the data. There will not be
so it will go into bucket B1, and the last three bits of 6 are any unused memory lying.
101, so it will go into bucket B5. Keys 2 and 4 are still in  This method is good for the dynamic database
B0. The record in B0 pointed by the 000 and 100 entry where data grows and shrinks frequently.
because last two bits of both the entry are 00. Keys 1 and 3
3.3.4.4 Disadvantages of dynamic hashing
are still in B2. The record in B2 pointed by the 010 and 110
 In this method, if the data size increases then the
entry because last two bits of both the entry are 10. Key 7
bucket size is also increased. These addresses of
are still in B3. The record in B3 pointed by the 111 and 011
data will be maintained in the bucket address
entry because last two bits of both the entry are 11.
table. This is because the data address will keep
changing as buckets grow and shrink. If there is a
huge increase in data, maintaining the bucket
address table becomes tedious.

56 57
 In this case, the bucket overflow situation will 3.3.6 B+ File Organization
also occur. But it might take little time to reach B+ tree file organization is the advanced method of an
this situation than static hashing. indexed sequential access method. It uses a tree-like
structure to store records in File. It uses the e concept of
3.3.5 Multiple Key Access
key-index where the primary key is used to sort the
Consider the following example.
records. For each primary key, the value of the index is
“Find all staff names in DNo = 5, with a salary equal to
generated and mapped with the record. The B+ tree is
400000 .
similar to a binary search tree (BST), but it can have
Select SName
more than two children. In this method, all the records
From STAFF are stored only at the leaf node. Intermediate nodes act
Where DNo = 5 and Salary = 400000; as a pointer to the leaf nodes. They do not contain any
records.

There are three ways to do this:


 Use the index on DNo to find all records
pertaining to DNo = 5. Examine each such record
to see whether salary = 400000.

 Use the index on Salary to find all records


pertaining to the salary which is equal to naira
400000. Examine each such record to see whether
Figure 16 B+ file organisation structure
it belongs to DNo = 5.
The above B+ tree shows that:
 Use the index on Salary to find all records
pertaining to the salary which is equal to naira.  There is one root node of the tree, i.e., 25.
400000. Also, use the index on DNo attribute  There is an intermediary layer with nodes. They
which is equal to 5. Take the intersection of these do not store the actual record. They have only
two indices, which gives the result. pointers to the leaf node.

58 59
 The nodes to the left of the root node contain the CHAPTER FOUR
prior value of the root and nodes to the right
contain next value of the root, i.e., 15 and 30 4. Query Processing
respectively.

I
 There is only one leaf node which has only values,
i.e., 10, 12, 17, 20, 24, 27 and 29.
n our previous sections, we understood various concepts
 Searching for any record is easier as all the leaf
nodes are bAdeced. in query processing. We learned about the query processing
steps, selection operations, and also several types of
 In this method, searching any record can be
algorithms used for performing the join operation with their
traversed through the single path and accessed cost estimations. We are already aware of computing and
easily. representing the individual relational operations for the
given user query or expression. Here, we will get to know
Pros of B+ tree file organization how to compute and evaluate an expression with multiple
operations.
 In this method, searching becomes very easy as
all the records are stored only in the leaf nodes 4.1 Evaluation of Expressions
and sorted the sequential linked list.
For evaluating an expression that carries multiple operations
 Traversing through the tree structure is easier and
in it, we can perform the computation of each operation one
faster. by one. However, in the query processing system, we use
 The size of the B+ tree has no restrictions, so the two methods for evaluating an expression carrying multiple
number of records can increase or decrease and operations. These methods are:
the B+ tree structure can also grow or shrink.
1. Materialization
 It is a bAdeced tree structure, and any
insert/update/delete does not affect the 2. Pipelining
performance of tree. Let's take a brief discussion of these methods.
Cons of B+ tree file organization

 This method is inefficient for the static method.

60 61
4.1.1 Materialization 4.2 Join Operations:

In this method, the given expression evaluates one relational A Join operation combines related tuples from different
operation at a time. Also, each operation is evaluated in an relations, if and only if a given join condition is satisfied. It
appropriate sequence or order. After evaluating all the is denoted by ⋈.
operations, the outputs are materialized in a temporary
relation for their subsequent uses. It leads the materialization Example:
method to a disadvantage. The disadvantage is that it needs Table 2 Employee table
to construct those temporary relations for materializing the
results of the evaluated operations, respectively. These
EMP_CODE EMP_NAME
temporary relations are written on the disks unless they are
small in size.
101 Dele

4.1.2 Pipelining 102 Chike

Pipelining is an alternate method or approach to the 103 Osaro


materialization method. In pipelining, it enables us to
evaluate each relational operation of the expression
simultaneously in a pipeline. In this approach, after Table 3 Salary table
evaluating one operation, its output is passed on to the next
operation, and the chain continues till all the relational
operations are evaluated thoroughly. Thus, there is no EMP_CODE SALARY
requirement of storing a temporary relation in pipelining.
Such an advantage of pipelining makes it a better approach 101 50000
as compared to the approach used in the materialization
method. Even the costs of both approaches can have 102 30000
subsequent differences in-between. But, both approaches
perform the best role in different cases. Thus, both ways are 103 25000
feasible at their place. We have described and discussed the
materialization as well as pipelining method deeply in our
next sections one by one.

62 63
1. Result of join Operation: (EMPLOYEE ⋈ SALARY)  It is denoted by ⋈.

Table 4 Join operation of employee and salary table Example: Let's use the above EMPLOYEE table and
SALARY table:

EMP_COD EMP_NAME SALARY Input:

101 Dele 50000 ∏EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)


102 Chike 30000 Output:
103 Osaro 25000 Table 5 Natural join of employee and salary table

4.2.1 Types of Join operations: EMP_NAME SALARY

Dele 50000

Chike 30000

Osaro 25000

3.2.1.2 Outer Join

The outer join operation is an extension of the join operation.


It is used to deal with missing information.
Figure 17 Types of join operation

4.2.1.1. Natural Join:


 A natural join is the set of tuples of all
combinations in R and S that are equal on their
common attribute names.
64 65
Example Table 8 Outer join of employees and fact_workers tables

Table 6 Employee table


EMP_NAM STREE CITY FIRM SAL

EMP_NAME STREET CITY Mohammed Edo Benin UBA 100000

Mohammed Edo Street Benin Odion Dova Ekpom Flour 200000

Odion Dova Street Ekpoma Oloye Bode Ibadan Innoso 500000

Ameachi Sawmill Street Enugu


An outer join is basically of three types:
Oloye Bode Street Ibadan
a. Left outer join
b. Right outer join
c. Full outer join
Table 7 Fact_workers table

EMP_NAME FIRM SALARY


a. Left outer join:
Mohammed UBA 100000
 Left outer join contains the set of tuples of all
Odion Flour Mill 200000
combinations in R and S that are equal on their
Nneka Conoil 300000 common attribute names.
 In the left outer join, tuples in R have no matching
Oloye Innoson 500000
tuples in S.
 It is denoted by ⟕.
Input:

1. (EMPLOYEE ⋈ FACT_WORKERS)

Output:

66 67
Example: Using the above EMPLOYEE table and
Example: Using the above EMPLOYEE table and FACT_WORKERS Relation

FACT_WORKERS table Input:

Table 9 Left join of employees and fact_workers 1. EMPLOYEE ⟖ FACT_WORKERS

Input: Output:
Table 10 Right outer join
EMP_NAME STR CITY FIRM SAL

Mohammed Edo Benin UBA 10000


EMP_NAME FIRM SAL STRET CITY
Odion Dova Ekpoma Flour 20000
Mohammed UBA 10000 Edo Benin
Street Mill
Odion Flour 20000 Dova Ekpoma
Oloye Bode Ibadan Innoson 50000
Oloye Innoson 500000 Bode Ibadan
Ameachi Sawmill Enugu NULL NULL
Nneka Conoil 30000 NULL NULL

1. EMPLOYEE ⟕ FACT_WORKERS
b. Right outer join: c. Full outer join:

 Right outer join contains the set of tuples of all  Full outer join is like a left or right join except that
combinations in R and S that are equal on their it contains all rows from both tables.
common attribute names.  In full outer join, tuples in R that have no
 In right outer join, tuples in S have no matching matching tuples in S and tuples in S that have no
tuples in R. matching tuples in R in their common attribute
 It is denoted by ⟖. name.
 It is denoted by ⟗.

68 69
Example: Using the above EMPLOYEE table and Example:
FACT_WORKERS table as Input:
Table 12 customer relation

Table 11 Full outer join

EMP_NAM STR CITY FIRM SAL CLASS_ID NAME


E
1 Joshua

Mohammed Edo Benin UBA 10000 2 Onose

Odion Dova Ekpoma Flour mill 20000 3 Ighalo

Oloye Bode Ibadan Innoson 50000

Ameachi Saw Enugu NULL NULL


Input:
Nneka NUL NULL Conoil 30000
Output: Table 13 product

1. EMPLOYEE ⟗ FACT_WORKERS PRODUCT_ID CITY

Output: 1 Lagos

3. Equi join: 2 Benin

It is also known as an inner join. It is the most common join. 3 Ekpoma


It is based on matched data as per the equality condition. The
equi join uses the comparison operator(=). 1. CUSTOMER ⋈ PRODUCT

70 71
Let us take a look at the syntax of writing a query to perform
the cross join operation in SQL.
Table 14 equi join
1. SELECT TableName1.columnName1, TableName2.colum
CLASS_ID NAME PRODUCT_ID CITY nName2 FROM TableName1 CROSS JOIN TableName2
ON TableName1.ColumnName = TableName2.ColumnNa
1 Joshua 1 Lagos me;
2 Onose 2 Benin Now let us see take a deeper dive into the cross join in SQL
with the help of examples. All the queries in the examples
3 Ighalo 3 Ekpoma will be written using the MySQL database.

Consider we have the following tables with the given data:


4.2.1.3 SQL Cross Join
Table 15 MatchScore
Join operation in SQL is used to combine multiple tables
together into a single table. If we use the cross join to Player Department_id Beninls
combine two different tables, then we will get the Cartesian
product of the sets of rows from the joined table. When each
Okoro 1 2
row of the first table is combined with each row from the
second table, it is known as Cartesian join or cross join. Ade 1 3
After performing the cross join operation, the total number
Omos 2 2
of rows present in the final table will be equal to the product
of the number of rows present in table 1 and the number of Isa 3 5
rows present in table 2.

For example:
If there are two records in table 1 and three records in table
2, then after performing cross join operation, we will get
six records in the final table.

72 73
Table 16 Departments which stores the record returned by the SELECT statement
is called a result-set table.
Department_id Department_name
4.3.1 Syntax of SELECT Statement in SQL
1 IT 1. SELECT Column_Name_1, Column_Name_2, .....,
2 HR Column_Name_N FROM Table_Name;

3 Marketing In this SELECT syntax, Column_Name_1,


Column_Name_2, ….., Column_Name_N are the name of
Table 17 Cross join of matchscore and department
those columns in the table whose data we want to read.

If you want to access all rows from all fields of the table, use
DepartmentID Department_ Employee_ID the following SQL SELECT syntax with * asterisk sign:

1 Production 1 Backward Skip 10sPlay VideoForward Skip 10s

2 Sales 3 2. SELECT * FROM table_name;

3 Marketing 4
4.3.2 Examples of SELECT Statement in SQL
4 Accounts 5
Here, we took the following two different SQL examples
which will help you to execute the SELECT statement for
4.3 SQL SELECT Statement retrieving the records:

Example 1:
The SELECT statement is the most commonly used
command in Structured Query Language. It is used to access
Firstly, we have to create the new table and then insert some
the records from one or more database tables and views. It
dummy records into it.
also retrieves the selected data that follow the conditions we
want. Use the following query to create the Student_Records table
in SQL:
By using this command, we can also access the particular
record from the particular column of the table. The table
1. CREATE TABLE Student_Records
2(
74 75
Table 18
3. Student_Id Int PRIMARY KEY,
4. First_Name VARCHAR (20),
ID FName Address

Percen
5. Address VARCHAR (20),

Grade
Age

tage
6. Age Int NOT NULL,
7. Percentage Int NOT NULL,
8. Grade VARCHAR (10) 201 Sam Warri 18 89 A2
9. );
202 Olumide Ibadan 19 93 A1
The following query inserts the record of intelligent
students into the Student_Records table: 203 Efosa Warri 20 89 A2

204 Chike Warri 19 78 B1


a. INSERT INTO Student VALUES (201, Sam, Warri,
18, 89, A2), 205 Isa Zaria 20 75 B1
b. (202, Olumide, Ibadan, 19, 93, A1),
206 Sade Ijebu-ode 19 91 C1
c. (203, Efosa, Warri, 20, 89, A2),
d. (204, Chike, Warri, 19, 78, B1), 207 Efe Benin 20 80 B2
e. (05, Isa, Zaria, 20, 75, B1),
f. (206, Sade, Ijebu-ode, 19, 51, C1), 4.4.Transaction
g. (207, Efe, Benin, 20, 62, B2);
The transaction is a set of logically related operation. It
The following SQL query displays all the values of each
column from the above Student_records table: contains a group of tasks. A transaction is an action or series
of actions. It is performed by a single user to perform
1. SELECT * FROM Student_Records; (The output of the operations for accessing the contents of the database.
above query is:)
Example: Suppose an employee of bank transfers Rs 800
from X's account to Y's account. This small transaction
contains several low-level tasks:

X's Account

76 77
1. Open_Account(X)  The first operation reads X's value from database
2. Old_BAdece = X.bAdece and stores it in a buffer.
3. New_BAdece = Old_BAdece - 800  The second operation will decrease the value of X
4. X.bAdece = New_BAdece by 500. So buffer will contain 3500.
5. Close_Account(X)  The third operation will write the buffer's value to
the database. So X's final value will be 3500.
Y's Account
But it may be possible that because of the failure of
1. Open_Account(Y) hardware, software or power, etc. that transaction may fail
2. Old_BAdece = Y.bAdece before finished all the operations in the set.
3. New_BAdece = Old_BAdece + 800
For example: If in the above transaction, the debit
4. Y.bAdece = New_BAdece transaction fails after executing operation 2 then X's value
5. Close_Account(Y) will remain 4000 in the database which is not acceptable by
the bank.
4.4.1 Operations of Transaction:
To solve this problem, we have two important operations:
Following are the main operations of transaction:
Commit: It is used to save the work done permanently.
Read(X): Read operation is used to read the value of X
Rollback: It is used to undo the work done.
from the database and stores it in a buffer in main memory.
Write(X): Write operation is used to write the value back to
the database from the buffer.
Let's take an example to debit transaction from an account 4.4.2 States of Transaction
which consists of following operations:
In a database, the transaction can be in one of the following
1. R(X); states -
2. X = X - 500;
3. W(X);

Let's assume the value of X before starting of the transaction


is 4000.

78 79
Committed

A transaction is said to be in a committed state if it executes


all its operations successfully. In this state, all the effects are
now permanently saved on the database system.

Failed state

 If any of the checks made by the database


recovery system fails, then the transaction is said
Figure 18 Transaction states
to be in the failed state.
Active state  In the example of total mark calculation, if the
database is not able to fire a query to fetch the
 The active state is the first state of every
marks, then the transaction will fail to execute.
transaction. In this state, the transaction is being
executed. Aborted
 For example: Insertion or deletion or updating a
 If any of the checks fail and the transaction has
record is done here. But all the records are still not
reached a failed state then the database recovery
saved to the database.
system will make sure that the database is in its
Partially committed previous consistent state. If not then it will abort
or roll back the transaction to bring the database
 In the partially committed state, a transaction
into a consistent state.
executes its final operation, but the data is still not
 If the transaction fails in the middle of the
saved to the database.
transaction then before executing the transaction,
 In the total mark calculation example, a final all the executed transactions are rolled back to its
display of the total marks step is executed in this
consistent state.
state.

80 81
 After aborting the transaction, the database
recovery module will select one of the two
operations:

1. Re-start the transaction


2. Kill the transaction
Figure 19 Precedence graph
4.5 Testing of Serializability

Serialization Graph is used to test the Serializability of a


schedule.  If a precedence graph contains a single edge Ti → Tj,
then all the instructions of Ti are executed before the
Assume a schedule S. For S, we construct a graph known as
precedence graph. This graph has a pair G = (V, E), where first instruction of Tj is executed.
V consists a set of vertices, and E consists a set of edges.  If a precedence graph for schedule S contains a cycle,
The set of vertices is used to contain all the transactions then S is non-serializable. If the precedence graph has
participating in the schedule. The set of edges is used to
no cycle, then S is known as serializable.
contain all edges Ti ->Tj for which one of the three
conditions holds:
For example:
1. Create a node Ti → Tj if Ti executes write (Q) before
Tj executes read (Q).
2. Create a node Ti → Tj if Ti executes read (Q) before
Tj executes write (Q).
3. Create a node Ti → Tj if Ti executes write (Q) before
Tj executes write (Q).

Figure 20 Serialisable and no serialisable transaction


82 83
Explanation:

Read(A): In T1, no subsequent writes to A, so no new edges


Read(B): In T2, no subsequent writes to B, so no new edges
Read(C): In T3, no subsequent writes to C, so no new edges
Write(B): B is subsequently read by T3, so add edge T2 →
T3
Write(C): C is subsequently read by T1, so add edge T3 →
T1
Write(A): A is subsequently read by T2, so add edge T1 →
T2
Write(A): In T2, no subsequent reads to A, so no new edges
Write(C): In T1, no subsequent reads to C, so no new edges Figure 22 Schedule Sa
Write(B): In T3, no subsequent reads to B, so no new edges
Explanation:

Read(A): In T4,no subsequent writes to A, so no new edges


Read(C): In T4, no subsequent writes to C, so no new edges
Write(A): A is subsequently read by T5, so add edge T4 →
T5
Read(B): In T5,no subsequent writes to B, so no new edges
Write(C): C is subsequently read by T6, so add edge T4 →
Figure 21 Precedence graph for schedule S1: T6
Write(B): A is subsequently read by T6, so add edge T5 →
T6
The precedence graph for schedule S1 contains a cycle that's Write(C): In T6, no subsequent reads to C, so no new edges
why Schedule S1 is non-serializable. Write(A): In T5, no subsequent reads to A, so no new edges
Write(B): In T6, no subsequent reads to B, so no new edges

84 85
Figure 23 Precedence graph for schedule S2:

Figure 24 Transaction schedule


The precedence graph for schedule S2 contains no cycle
that's why ScheduleS2 is serializable.
The above figure shows a schedule which has two
4.5.1 Recoverability of Schedule
transactions. T1 reads and writes the value of A and that
Sometimes a transaction may not execute completely due to value is read and written by T2. T2 commits but later on, T1
a software issue, system crash or hardware failure. In that fails. Due to the failure, we have to rollback T1. T2 should
case, the failed transaction has to be rollback. But some also be rollback because it reads the value written by T1,
other transaction may also have used value produced by the but T2 can't be rollback because it already committed. So
failed transaction. So we also have to rollback those this type of schedule is known as irrecoverable schedule.
transactions.
Irrecoverable schedule: The schedule will be irrecoverable
if Tj reads the updated value of Ti and Tj committed before
Ti commit.

86 87
Figure 25 Schedule with two transaction Figure 26 Recoverable with cascading rollback

The above figure shows a schedule with two transactions. The above Table 3 shows a schedule with two transactions.
Transaction T1 reads and writes A, and that value is read Transaction T1 reads and write A and commits, and that
and written by transaction T2. But later on, T1 fails. Due to value is read and written by T2. So this is a cascade less
this, we have to rollback T1. T2 should be rollback because recoverable schedule.
T2 has read the value written by T1. As it has not committed 4.6 Deadlock in DBMS
before T1 commits so we can rollback transaction T2 as
well. So it is recoverable with cascade rollback. A deadlock is a condition where two or more transactions
are waiting indefinitely for one another to give up locks.
Recoverable with cascading rollback: The schedule will be Deadlock is said to be one of the most feared complications
recoverable with cascading rollback if Tj reads the updated in DBMS as no task ever gets finished and is in waiting state
value of Ti. Commit of Tj is delayed till commit of Ti. forever.

For example: In the student table, transaction T1 holds a


lock on some rows and needs to update some rows in the
grade table. Simultaneously, transaction T2 holds locks on
some rows in the grade table and needs to update the rows
in the Student table held by Transaction T1.

Now, the main problem arises. Now Transaction T1 is


waiting for T2 to release its lock and similarly, transaction
T2 is waiting for T1 to release its lock. All activities come
88 89
to a halt state and remain at a standstill. It will remain in a 4.6.2 Deadlock Detection
standstill until the DBMS detects the deadlock and aborts
one of the transactions. In a database, when a transaction waits indefinitely to obtain
a lock, then the DBMS should detect whether the transaction
is involved in a deadlock or not. The lock manager maintains
a Wait for the graph to detect the deadlock cycle in the
database.

Wait for Graph

 This is the suitable method for deadlock


detection. In this method, a graph is created based
on the transaction and their lock. If the created
graph has a cycle or closed loop, then there is a
deadlock.
 The wait for the graph is maintained by the system
for every transaction which is waiting for some
data held by the others. The system keeps
checking the graph if there is any cycle in the
Figure 27 Deadlock in DBMS
graph.

4.6.1 Deadlock Avoidance The wait for a graph for the above scenario is shown below:

When a database is stuck in a deadlock state, then it is better


to avoid the database rather than aborting or restating the
database. This is a waste of time and resource. Deadlock
avoidance mechanism is used to detect any deadlock
situation in advance. A method like "wait for graph" is used
for detecting the deadlock situation but this method is
suitable only for the smaller database. For the larger
database, deadlock prevention method can be used.
90 91
4.6.3 Deadlock Prevention 2. Check if TS(Ti) < TS(Tj) - If Ti is older transaction
Deadlock prevention method is suitable for a large database. and has held some resource and if Tj is waiting for it,
If the resources are allocated in such a way that deadlock then Tj is killed and restarted later with the random
never occurs, then the deadlock can be prevented. The delay but with the e timestamp.
Database management system analyzes the operations of the
4.6.3.2 Wound wait scheme
transaction whether they can create a deadlock situation or
not. If they do, then the DBMS never allowed that In wound wait scheme, if the older transaction requests for
transaction to be executed. a resource which is held by the younger transaction, then
older transaction forces younger one to kill the transaction
4.6.3.1 Wait-Die scheme and release the resource. After the minute delay, the younger
transaction is restarted but with the e timestamp. If the older
In this scheme, if a transaction requests for a resource which
is already held with a conflicting lock by another transaction transaction has held a resource which is requested by the
then the DBMS simply checks the timestamp of both Younger transaction, then the younger transaction is asked
transactions. It allows the older transaction to wait until the to wait until older releases it.
resource is available for execution.
4.7 Failure Classification
Let's assume there are two transactions Ti and Tj and let
TS(T) is a timestamp of any transaction T. If T2 holds a lock To find that where the problem has occurred, we generalize
by some other transaction and T1 is requesting for resources a failure into the following categories:
held by T2 then the following actions are performed by
DBMS: 1. Transaction failure
2. System crash
1. Check if TS(Ti) < TS(Tj) - If Ti is the older 3. Disk failure
transaction and Tj has held some resource, then Ti is
4.7.1. Transaction failure
allowed to wait until the data-item is available for
execution. That means if the older transaction is The transaction failure occurs when it fails to execute
waiting for a resource which is locked by the younger or when it reaches a point from where it can't go any
further. If a few transaction or process is hurt, then
transaction, then the older transaction is allowed to
this is called as transaction failure.
wait for resource until it is available.

92 93
Reasons for a transaction failure could be - the disk or any other failure, which destroy all
or part of disk storage.
1. Logical errors: If a transaction cannot
complete due to some code error or an internal 4.8 Log-Based Recovery
error condition, then the logical error occurs. The log is a sequence of records. Log of each transaction is
2. Syntax error: It occurs where the DBMS itself maintained in some stable storage so that if any failure
terminates an active transaction because the occurs, then it can be recovered from there. If any operation
database system is not able to execute it. For is performed on the database, then it will be recorded in the
example, The system aborts an active log. But the process of storing the logs should be done before
transaction, in case of deadlock or resource the actual transaction is applied in the database.
unavailability.
Let's assume there is a transaction to modify the City of a
4.7.2. System Crash student. The following logs are written for this transaction.
 System failure can occur due to power
 When the transaction is initiated, then it writes 'start'
failure or other hardware or software
log.
failure. Example: Operating system
 <Tn, Start>
error.
 When the transaction modifies the City from 'Noida'
 Fail-stop assumption: In the system
to 'Bangalore', then another log is written to the file.
crash, non-volatile storage is assumed
 <Tn, City, 'Noida', 'Bangalore' >
not to be corrupted.
 When the transaction is finished, then it writes
4.7.3. Disk Failure another log to indicate the end of the transaction.
 It occurs where hard-disk drives or storage
1. <Tn, Commit>
drives used to fail frequently. It was a common
problem in the early days of technology There are two approaches to modify the database:
evolution.
1. Deferred database modification:
 Disk failure occurs due to the formation of bad
sectors, disk head crash, and unreachability to
94 95
 The deferred modification technique occurs if the 4.8.2 Checkpoint
transaction does not modify the database until it has The checkpoint is a type of mechanism where all the
committed. previous logs are removed from the system and permanently
 In this method, all the logs are created and stored in stored in the storage disk. The checkpoint is like a
the stable storage, and the database is updated when bookmark. While the execution of the transaction, such
a transaction commits. checkpoints are marked, and the transaction is executed then
using the steps of the transaction, the log files will be
2. Immediate database modification:
created. When it reaches to the checkpoint, then the
 The Immediate modification technique occurs if transaction will be updated into the database, and till that
database modification occurs while the transaction is point, the entire log file will be removed from the file. Then
still active. the log file is updated with the new step of transaction till
 In this technique, the database is modified next checkpoint and so on.
immediately after every operation. It follows an
actual database modification. The checkpoint is used to declare a point before which the
DBMS was in the consistent state, and all transactions were
4.8.1 Recovery using Log records committed.

When the system is crashed, then the system consults the log 3.8.2.1 Recovery using Checkpoint
to find which transactions need to be undone and which need
to be redone. In the following manner, a recovery system recovers the
database from this failure:
1. If the log contains the record <Ti, Start> and <Ti,
Commit> or <Ti, Commit>, then the Transaction Ti
needs to be redone.
2. If log contains record<Tn, Start> but does not contain
the record either <Ti, commit> or <Ti, abort>, then
the Transaction Ti needs to be undone.
Figure 28 Recovery system

96 97
The recovery system reads log files from the end to start. CHAPTER FIVE
It reads log files from T4 to T1.
Concurrency Control Protocols
Recovery system maintains two lists, a redo-list, and an
undo-list. The concurrency control protocols ensure the atomicity,
consistency, isolation, durability and serializability of the
The transaction is put into redo state if the recovery concurrent execution of the database transactions.
system sees a log with <Tn, Start> and <Tn, Commit> or Therefore, these protocols are categorized as:
just <Tn, Commit>. In the redo-list and their previous
 Lock Based Concurrency Control Protocol
list, all the transactions are removed and then redone
 Time Stamp Concurrency Control Protocol
before saving their logs. For example: In the log file,
 Validation Based Concurrency Control Protocol
transaction T2 and T3 will have <Tn, Start> and <Tn,
Commit>. The T1 transaction will have only <Tn, We will understand and discuss each protocol one by one in
commit> in the log file. That's why the transaction is our next sections.
committed after the checkpoint is crossed. Hence it puts
5.1 Lock-Based Protocol
T1, T2 and T3 transaction into redo list.
In this type of protocol, any transaction cannot read or write
The transaction is put into undo state if the recovery data until it acquires an appropriate lock on it. There are two
system sees a log with <Tn, Start> but no commit or types of lock:
abort log found. In the undo-list, all the transactions are
1. Shared lock:
undone, and their logs are removed. For
example: Transaction T4 will have <Tn, Start>. So T4  It is also known as a Read-only lock. In a shared
will be put into undo list since this transaction is not yet lock, the data item can only read by the
complete and failed amid. transaction.
 It can be shared between the transactions because
when the transaction holds a lock, then it can't
update the data on the data item.

98 99
2. Exclusive lock:  If all the locks are not granted then this protocol
allows the transaction to rolls back and waits until
 In the exclusive lock, the data item can be both
all the locks are granted.
reads as well as written by the transaction.
 This lock is exclusive, and in this lock, multiple
transactions do not modify the e data
simultaneously.

There are four types of lock protocols available:


1. Simplistic lock protocol

It is the simplest way of locking the data while transaction.


Simplistic lock-based protocols allow all the transactions to
get the lock on the data before insert or delete or update on
it. It will unlock the data item after completing the
transaction.
Figure 29 Two way locking system
Advertisement
2. Pre-claiming Lock Protocol 3. Two-phase locking (2PL)
The two-phase locking protocol divides the execution
 Pre-claiming Lock Protocols evaluate the
transaction to list all the data items on which they phase of the transaction into three parts. In the first part,
need locks. when the execution of the transaction starts, it seeks
 Before initiating an execution of the transaction, permission for the lock it requires. In the second part, the
it requests DBMS for all the lock on all those data transaction acquires all the locks. The third phase is
items. started as soon as the transaction releases its first lock. In
 If all the locks are granted then this protocol the third phase, the transaction cannot demand any new
locks. It only releases the acquired locks.
allows the transaction to begin. When the
transaction is completed then it releases all the
lock.
100 101
2. Downgrading of lock (from X(a) to S(a)) must be
done in shrinking phase.

Example:

Figure 30

There are two phases of 2PL:

Growing phase: In the growing phase, a new lock on the data


item may be acquired by the transaction, but none can be
released.

Shrinking phase: In the shrinking phase, existing lock held


by the transaction may be released, but no new locks can be
acquired.

In the below example, if lock conversion is allowed then the


following phase can happen:

1. Upgrading of lock (from S(a) to X (a)) is allowed in


growing phase.
Figure 31 Upgrading and downgrading lock

102 103
The following way shows how unlocking and locking work
with 2-PL.

Transaction T1:

 Growing phase: from step 1-3


 Shrinking phase: from step 5-7
 Lock point: at 3

Transaction T2:

 Growing phase: from step 2-6 Figure 32

 Shrinking phase: from step 8-9


It does not have cascading abort as 2PL does.
 Lock point: at 6
5.2 Timestamp Ordering Protocol
4. Strict Two-phase locking (Strict-2PL)
The Timestamp Ordering Protocol is used to order the
 The first phase of Strict-2PL is similar to 2PL. In transactions based on their Timestamps. The order of
the first phase, after acquiring all the locks, the transaction is nothing but the ascending order of the
transaction continues to execute normally. transaction creation. The priority of the older transaction
 The only difference between 2PL and strict 2PL is higher that's why it executes first. To determine the
is that Strict-2PL does not release a lock after timestamp of the transaction, this protocol uses system
using it. time or logical counter.
 Strict-2PL waits until the whole transaction to
commit, and then it releases all the locks at a time. The lock-based protocol is used to manage the order
 Strict-2PL protocol does not have shrinking phase between conflicting pairs among transactions at the
of lock release. execution time. But Timestamp based protocols start
working as soon as a transaction is created.

104 105
Let's assume there are two transactions T1 and T2. TS(TI) denotes the timestamp of the transaction Ti.
Suppose the transaction T1 has entered the system at 007 R_TS(X) denotes the Read time-stamp of data-item X.
times and transaction T2 has entered the system at 009
times. T1 has the higher priority, so it executes first as it W_TS(X) denotes the Write time-stamp of data-item X.
is entered the system first. The timestamp ordering
5.2.1 Advantages and Disadvantages of TO protocol:
protocol also maintains the timestamp of last 'read' and
'write' operation on a data.  TO protocol ensures serializability since the
precedence graph is as follows:
Basic Timestamp ordering protocol works as follows:

1. Check the following condition whenever a transaction Ti


issues a Read (X) operation:

 If W_TS(X) >TS(Ti) then the operation is


rejected.
 If W_TS(X) <= TS(Ti) then the operation is
executed.
 Timestamps of all the data items are updated.
Figure 33 precedence graph for TS ordering
2. Check the following condition whenever a transaction Ti
issues a Write(X) operation:
 TS protocol ensures freedom from deadlock that
 If TS(Ti) < R_TS(X) then the operation is means no transaction ever waits.
rejected.  But the schedule may not be recoverable and may not
 If TS(Ti) < W_TS(X) then the operation is even be cascade free.
rejected and Ti is rolled back otherwise the
operation is executed. 5.3 Validation Based Protocol

Where, Validation phase is also known as optimistic concurrency


control technique. In the validation based protocol, the
transaction is executed in the following three phases:
106 107
I. Read phase: In this phase, the transaction T is read II. Hence TS(T) = validation(T).
and executed. It is used to read the value of various III. The serializability is determined during the validation
data items and stores them in temporary local process. It can't be decided in advance.
variables. It can perform all the write operations on IV. While executing the transaction, it ensures a greater
temporary variables without an update to the actual degree of concurrency and also less number of
database. conflicts.
II. Validation phase: In this phase, the temporary V. Thus it contains transactions which have less number
variable value will be validated against the actual of rollbacks.
data to see if it violates the serializability.
III. Write phase: If the validation of the transaction is 5.4 Multiple Granularity
validated, then the temporary results are written to the Granularity: It is the size of data item allowed to lock.
database or system otherwise the transaction is rolled
back. Multiple Granularity can be defined as hierarchically
breaking up the database into blocks which can be locked.
Here each phase has the following different timestamps:
The Multiple Granularity protocol enhances concurrency
Start(Ti): It contains the time when Ti started its execution. and reduces lock overhead. It maintains the track of what to
lock and how to lock. It makes easy to decide either to lock
Validation (Ti): It contains the time when Ti finishes its read
phase and starts its validation phase.rd Skip 10sPlaoForward a data item or to unlock a data item. This type of hierarchy
Skip 10s can be graphically represented as a tree.

Finish(Ti): It contains the time when Ti finishes its write For example: Consider a tree which has four levels of nodes.
phase.
 The first level or higher level shows the entire
I. This protocol is used to determine the time stamp for database.
the transaction for serialization using the time stamp  The second level represents a node of type area. The
of the validation phase, as it is the actual phase which higher level database consists of exactly these areas.
determines if the transaction will commit or rollback.

108 109
 The area consists of children nodes which are known 5.5 Intention Mode Lock
as files. No file can be present in more than one area.
Intention-shared (IS): It contains explicit locking at a lower
 Finally, each file contains child nodes known as level of the tree but only with shared locks.
records. The file has exactly those records that are its
child nodes. No records represent in more than one Intention-Exclusive (IX): It contains explicit locking at a
lower level with exclusive or shared locks.
file.
 Hence, the levels of the tree starting from the top Shared & Intention-Exclusive (SIX): In this lock, the node
level are as follows: is locked in shared mode, and some node is locked in
exclusive mode by the e transaction.
1. Database, 2) Area, 3) File 4) Record
Compatibility Matrix with Intention Lock Modes: The
below table describes the compatibility matrix for these lock
modes:

Figure 35 Compatibility Matrix with Intention Lock Modes

It uses the intention lock modes to ensure serializability. It


Figure 34 Multigranularity tree hierarchy requires that if a transaction attempts to lock a node, then
that node must follow these protocols:
In this example, the highest level shows the entire database.
The levels below are file, record, and fields.  Transaction T1 should follow the lock-compatibility
matrix.
There are three additional lock modes with multiple
granularity:
110 111
 Transaction T1 firstly locks the root of the tree. It can  If transaction T4 reads the entire database, then T4
lock it in any mode. needs to lock the database in S mode.
 If T1 currently has the parent of the node locked in
Recovery with Concurrent Transaction
either IX or IS mode, then the transaction T1 will lock
a node in S or IS mode only.  Whenever more than one transaction is being
 If T1 currently has the parent of the node locked in executed, then the interleaved of logs occur. During
either IX or SIX modes, then the transaction T1 will recovery, it would become difficult for the recovery
lock a node in X, SIX, or IX mode only. system to backtrack all logs and then start recovering.
 If T1 has not previously unlocked any node only, then  To ease this situation, 'checkpoint' concept is used by
the Transaction T1 can lock a node. most DBMS.
 If T1 currently has none of the children of the node-
5.6 Database Buffer
locked only, then Transaction T1 will unlock a node.
In our previous section, we learned about various types of
Observe that in multiple-granularity, the locks are acquired
data storage. But, the Beninl of a database system is that a
in top-down order, and locks must be released in bottom-up
minimum number of transfers should take place between the
order.
disk and memory. To do so, it can reduce the number of disk
accesses by keeping as many blocks in main memory. So,
 If transaction T1 reads record Ra9 in file Fa, then when the user wants to store the data, it can directly search
transaction T1 needs to lock the database, area A1 and in the main memory, and there will be no requirement of
file Fa in IX mode. Finally, it needs to lock Ra2 in S accessing the disk. However, it is difficult to keep so many
mode. blocks in main memory; we need to manage the allocation
of the space available in the main memory for the storage of
 If transaction T2 modifies record Ra9 in file Fa, then blocks.
it can do so after locking the database, area A1 and
file Fa in IX mode. Finally, it needs to lock the Ra9 in A database buffer is a temporary storage area in the main
memory. It allows storing the data temporarily when moving
X mode.
from one place to another. A database buffer stores a copy
 If transaction T3 reads all the records in file Fa, then of disk blocks. But, the version of block copies on the disk
transaction T3 needs to lock the database, and area A may be older than the version in the buffer.
in IS mode. At last, it needs to lock Fa in S mode.

112 113
5.6.1 Buffer Manager I. Buffer Replacement Strategy: If no space is left in the
A Buffer Manager is responsible for allocating space to buffer, it is required to remove an existing block from
the buffer in order to store data into the buffer. If a user the buffer before allocating the new one. The various
request a particular block and the block is available in the operating system uses the LRU (least recently used)
buffer, the buffer manager provides the block address in scheme. In LRU, the block that was least recently
the main memory. If the block is not available in the used is removed from the buffer and written back to
buffer, the buffer manager allocates the block in the the disk. Such type of replacement strategy is known
buffer. If free space is not available, it throws out some as Buffer Replacement Strategy.
existing blocks from the buffer to allocate the required II. Pinned Blocks: If the user wants to recover any
space for the new block. database system from the crashes, it is essential to
restrict the time when a block is written back to the
The blocks which are thrown are written back to the disk
disk. In fact, most recovery systems do not allow the
only if they are recently modified when writing on the
blocks to be written on the disk if the block updation
disk.
being in progress. Such types of blocks that are not
If the user requests such thrown-out blocks, the buffer allowed to be written on the disk are known as pinned
manager reads the requested block from the disk to the blocks. Luckily, many operating systems do not
buffer and then passes the address of the requested block support the pinned blocks.
to the user in the main memory. III. Forced Output of Blocks: In some cases, it becomes
necessary to write the block back to the disk even
However, the internal actions of the buffer manager are though the space occupied by the block in the buffer
not visible to the programs that may create any problem is not required. When such type of write is required,
in disk-block requests. The buffer manager is just like a it is known as the forced output of a block. It is
virtual machine. because sometimes the data stored on the buffer may
get lost in some system crashes, but the data stored
For serving the database system in the best possible way, the
buffer manager uses the following methods: on the disk usually does not get affected due to any
disk crash.

114 115
5.6.3 Buffer Replacement Strategies the processing of the block containing R1 is completed, and
the block is no longer required in the main memory. Even it
Buffer replacement is a method of replacing existing blocks has been recently used. Finally, the buffer manager gets the
in the buffer with the new block. By doing so, it helps to information, and it passes the report to the database system
reduce the number of disk access. The buffer manager uses for further processing.
the following strategies for Buffer Replacements:
5.6.3.3 Most Recently Used (MRU) Strategy
5.6.3.1 Least Recently Used (LRU) Strategy
MRU follows the opposite strategy of LRU. According to
According to LRU, the data or process which has the lowest MRU, the data or process which has the highest score in
score in its recent use is replaced with the new process or recent use gets replaced with the new data or process. In
data. In block-replacement strategy, if there is a need to block-replacement strategy, the block which is recently used
replace an existing block, the least recently referenced block gets replaced with the new block. Such type of strategy of
gets replaced. However, the replacement scheme of a Buffer-replacement is known as Most Recently Used
database system is different from the operating system. An strategy.
operating system depends on the past for predicting the
future. But, a database system is capable of predicting the As a result, a database system picks and follows a suitable
future plan accurately. Generally, a user needs to follow strategy based on different scenarios. But, it is seen that
various steps for making a block request to the database large database systems at most uses LRU techniques even
system. However, these steps enable the database system to though it may have some faults too. Time is also an
determine the requested block in advance. influencing factor in choosing the appropriate strategy. As
sometimes, a database system may get concurrent user
5.6.3.2 Toss-immediate Strategy requests, so to preserve the data consistency, it may need to
delay some requests. During this, if the buffer manager gets
According to this strategy, when the space occupied by a to know the information of the requested blocks which are
specific block is freed, the buffer manager should get delayed, then it may use that information for altering its
instructed before the processing of the final result. It will strategies. However, a buffer manager can retain the request
help the database system to know that space is going to be of the non-delayed blocks with the delayed blocks.
vacant, and a new block can be allocated to that place. This
type of block-replacement strategy is known as the Toss-
immediate strategy.

For example, consider two relations R1 and R2 stored in


separate files. If R1 once processed is not needed again. So,

116 117
use data analytics specifically to mean advanced analytics,
treating BI as a separate category.
CHAPTER SIX
Data analytics initiatives can help businesses increase
Data analytics and Decision Support
revenue, improve operational efficiency, optimize
System marketing campaigns and bolster customer service efforts.
6.1 Data Analytics Analytics also enable organizations to respond quickly to
Data analytics (DA) is the process of examining data sets to emerging market trends and gain a competitive edge over
find trends and draw conclusions about the information they business rivals. Depending on the application, the data that's
contain. Increasingly, data analytics is done with the aid of analyzed can consist of either historical records or new
specialized systems and software. Data analytics information that has been processed for real-time analytics.
technologies and techniques are widely used in commercial In addition, it can come from a mix of internal systems and
industries to enable organizations to make more-informed external data sources.
business decisions. Scientists and researchers also use
6.1.1 Data Analytics Process
analytics tools to verify or disprove scientific models,
theories and hypotheses. Data analytics applications involve more than just analyzing
data, particularly on advanced analytics projects. Much of
As a term, data analytics predominantly refers to an the required work takes place upfront, in collecting,
assortment of applications, from basic business integrating and preparing data and then developing, testing
intelligence (BI), reporting and online analytical processing and revising analytical models to ensure that they produce
(OLAP) to various forms of advanced analytics. In that accurate results. In addition to data scientists and other data
sense, it's similar in nature to business analytics, another analysts, analytics teams often include data engineers, who
umbrella term for approaches to analyzing data. The create data pipelines and help prepare data sets for analysis.
difference is that the latter is oriented to business uses, while
data analytics has a broader focus. The expansive view of The analytics process starts with data collection. Data
the term isn't universal, though: In some cases, people scientists identify the information they need for a particular
118 119
analytics application, and then work on their own or with descriptive analytics. They take the findings from
descriptive analytics and dig deeper to find the cause. The
data engineers and the IT staff to assemble it for use. Data
performance indicators are further investigated to discover
from different source systems may need to be combined via why they got better or worse. This generally occurs in three
data integration routines, transformed into a common format steps:
and loaded into an analytics system, such as a Hadoop Identify anomalies in the data. These may be unexpected
cluster, NoSQL database or data warehouse. changes in a metric or a particular market.
Data that is related to these anomalies is collected.
(Craig, 2023).
Statistical techniques are used to find relationships and
trends that explain these anomalies.

6.1.2 Types of Data Analytics 3. Predictive analytics helps answer questions about what
Data analytics is a broad field. There are four primary types will happen in the future. These techniques use historical
of data analytics: descriptive, diagnostic, predictive and data to identify trends and determine if they are likely to
prescriptive analytics. Each type has a different Beninl and recur. Predictive analytical tools provide valuable insight
a different place in the data analysis process. These are also into what may happen in the future and its techniques
the primary data analytics applications in business. include a variety of statistical and machine learning
techniques, such as: neural networks, decision trees, and
1. Descriptive analytics helps answer questions about what regression.
happened. These techniques summarize large datasets to
describe outcomes to stakeholders. By developing key 4. Prescriptive analytics helps answer questions about what
performance indicators (KPIs,) these strategies can help should be done. By using insights from predictive analytics,
track successes or failures. Metrics such as return on data-driven decisions can be made. This allows businesses
investment (ROI) are used in many industries. Specialized to make informed decisions in the face of uncertainty.
metrics are developed to track performance in specific Prescriptive analytics techniques rely on machine learning
industries. This process requires the collection of relevant strategies that can find patterns in large datasets. By
data, processing of the data, data analysis and data analyzing past decisions and events, the likelihood of
visualization. This process provides essential insight into different outcomes can be estimated.
past performance.
2. Diagnostic analytics helps answer questions about why
things happened. These techniques supplement more basic 6.1.3 Data Analytics Techniques
120 121
There are several different analytical methods and have a broad range of software tools to help acquire data,
techniques data analysts can use to process data and extract store information, process data, and report findings.
information. Some of the most popular methods are listed
below. Data analytics has always had loose ties to spreadsheets and
Microsoft Excel. Now, data analysts also often interact with
 Regression analysis entails analyzing the raw programming languages to transform and manipulate
relationship between dependent variables to databases. Open-source languages such as Python are often
determine how a change in one may affect the change utilized. More specific tools for data analytics like R can be
in another. used for statistical analysis or graphical modeling.
 Factor analysis entails taking a large data set and
shrinking it to a smaller data set. The Beninl of this Data analysts also have help when reporting or
maneuver is to attempt to discover hidden trends that communicating findings. Both Tableau and Power BI are
would otherwise have been more difficult to see. data visualization and analysis tools to compile information,
 Cohort analysis is the process of breaking a data set perform data analytics, and distribute results via dashboards
into groups of similar data, often broken into a and reports(Jake, 2023).
customer demographic. This allows data analysts
and other users of data analytics to further dive into 6.2 Data Mining
the numbers relating to a specific subset of data.
 Monte Carlo simulations model the probability of The process of extracting information to identify patterns,
different outcomes happening. Often used for risk trends, and useful data that would allow the business to take
mitigation and loss prevention, these simulations the data-driven decision from huge sets of data is called Data
incorporate multiple values and variables and often Mining. In other words, we can say that Data Mining is the
have greater forecasting capabilities than other data process of investigating hidden patterns of information to
analytics approaches. various perspectives for categorization into useful data,
 Time series analysis tracks data over time and which is collected and assembled in particular areas such as
solidifies the relationship between the value of a data data warehouses, efficient analysis, data mining algorithm,
point and the occurrence of the data point. This data helping decision making and other data requirement to
analysis technique is usually used to spot cyclical eventually cost-cutting and generating revenue.
trends or to project financial forecasts.
Data mining is the act of automatically searching for large
6.1.4 Data Analytics Tools stores of information to find trends and patterns that go
In addition to a broad range of mathematical and statistical beyond simple analysis procedures. Data mining utilizes
approaches to crunching numbers, data analytics has rapidly complex mathematical algorithms for data segments and
evolved in technological capabilities. Today, data analysts evaluates the probability of future events. Data Mining is
also called Knowledge Discovery of Data (KDD). Data
122 123
Mining is a process used by organizations to extract specific an IT structure. For example, a group of databases, where an
data from huge databases to solve business problems. It organization has kept various kinds of information.
primarily turns raw data into useful information.
6.2.4 Object-Relational Database:
6.2.1 Types of Data Mining
A combination of an object-oriented database model and
Data mining can be performed on the following types of relational database model is called an object-relational
data: model. It supports Classes, Objects, Inheritance, etc.

6.2.1 Relational Database One of the primary objectives of the Object-relational data
model is to close the gap between the Relational database
A relational database is a collection of multiple data sets and the object-oriented model practices frequently utilized
formally organized by tables, records, and columns from in many programming languages, for example, C++, Java,
which data can be accessed in various ways without having C#, and so on.
to recognize the database tables. Tables convey and share
information, which facilitates data searchability, reporting, 6.2.5 Transactional Database:
and organization.
A transactional database refers to a database management
6.2.2 Data warehouses system (DBMS) that has the potential to undo a database
transaction if it is not performed appropriately. Even though
A Data Warehouse is the technology that collects the data this was a unique capability a very long while back, today,
from various sources within the organization to provide most of the relational database systems support transactional
meaningful business insights. The huge amount of data database activities.
comes from multiple places such as Marketing and Finance.
The extracted data is utilized for analytical purposes and 6.2.6 Advantages of Data Mining
helps in decision- making for a business organization. The
 The Data Mining technique enables organizations
data warehouse is designed for the analysis of data rather
than transaction processing. to obtain knowledge-based data.
 Data mining enables organizations to make
6.2.3 Data Repositories: lucrative modifications in operation and
production.
The Data Repository generally refers to a destination for
data storage. However, many IT professionals utilize the  Compared with other statistical data applications,
term more clearly to refer to a specific kind of setup within data mining is a cost-efficient.

124 125
 Data Mining helps the decision-making process 6.2.8 Data Mining Applications
of an organization. Data Mining is primarily used by organizations with intense
 It Facilitates the automated discovery of hidden consumer demands- Retail, Communication, Financial,
patterns as well as the prediction of trends and marketing company, determine price, consumer preferences,
behaviors. product positioning, and impact on sales, customer
satisfaction, and corporate profits. Data mining enables a
 It can be induced in the new system as well as the retailer to use point-of-sale records of customer purchases to
existing platforms. develop products and promotions that help the organization
 It is a quick process that makes it easy for new to attract the customer.
users to analyze enormous amounts of data in a
short time.

6.2.7 Disadvantages of Data Mining


 There is a probability that the organizations may
sell useful data of customers to other
organizations for money. As per the report,
American Express has sold credit card purchases
of their customers to other organizations.
 Many data mining analytics software is difficult
to operate and needs advance training to work on.
 Different data mining instruments operate in
distinct ways due to the different algorithms used
in their design. Therefore, the selection of the
right data mining tools is a very challenging task.
 The data mining techniques are not precise, so
that it may lead to severe consequences in certain
Figure 36 Datamining application areas
conditions.
Data Mining in Healthcare:

126 127
Data mining in healthcare has excellent potential to improve predict the results of the student. With the results, the
the health system. It uses data and analytics for better institution can concentrate on what to teach and how to
insights and to identify best practices that will enhance teach.
health care services and reduce costs. Analysts use data
mining approaches such as Machine learning, Multi- Data Mining in Manufacturing Engineering:
dimensional database, Data visualization, Soft computing,
and statistics. Data Mining can be used to forecast patients Knowledge is the best asset possessed by a manufacturing
in each category. The procedures ensure that the patients get company. Data mining tools can be beneficial to find
intensive care at the right place and at the right time. Data patterns in a complex manufacturing process. Data mining
mining also enables healthcare insurers to recognize fraud can be used in system-level designing to obtain the
and abuse. relationships between product architecture, product
portfolio, and data needs of the customers. It can also be
used to forecast the product development period, cost, and
expectations among the other tasks.
Data Mining in Market Basket Analysis:
Data Mining in CRM (Customer Relationship
Market basket analysis is a modeling method based on a Management):
hypothesis. If you buy a specific group of products, then you
are more likely to buy another group of products. This Customer Relationship Management (CRM) is all about
technique may enable the retailer to understand the purchase obtaining and holding Customers, also enhancing customer
behavior of a buyer. This data may assist the retailer in loyalty and implementing customer-oriented strategies. To
understanding the requirements of the buyer and altering the get a decent relationship with the customer, a business
store's layout accordingly. Using a different analytical organization needs to collect data and analyze the data. With
comparison of results between various stores, between data mining technologies, the collected data can be used for
customers in different demographic groups can be done. analytics.

Data mining in Education: Data Mining in Fraud detection:

Education data mining is a newly emerging field, concerned Billions of dollars are lost to the action of frauds. Traditional
with developing techniques that explore knowledge from the methods of fraud detection are a little bit time consuming
data generated from educational Environments. EDM and sophisticated. Data mining provides meaningful
objectives are recognized as affirming student's future patterns and turning data into information. An ideal fraud
learning behavior, studying the impact of educational detection system should protect the data of all the users.
support, and promoting learning science. An organization Supervised methods consist of a collection of ple records,
can use data mining to make precise decisions and also to and these records are classified as fraudulent or non-
128 129
fraudulent. A model is constructed using this data, and the 6.2.9 Challenges of Implementation in Data mining
technique is made to identify whether the document is
fraudulent or not. Although data mining is very powerful, it faces many
challenges during its execution. Various challenges could be
Data Mining in Lie Detection: related to performance, data, methods, and techniques, etc.
The process of data mining becomes effective when the
Apprehending a criminal is not a big deal, but bringing out challenges or problems are correctly recognized and
the truth from him is a very challenging task. Law adequately resolved.
enforcement may use data mining techniques to investigate
offenses, monitor suspected terrorist communications, etc.
This technique includes text mining also, and it seeks
meaningful patterns in data, which is usually unstructured
text. The information collected from the previous
investigations is compared, and a model for lie detection is
constructed.

Data Mining Financial Banking:

The Digitalization of the banking system is supposed to


Figure 37 Challenges in data mining
generate an enormous amount of data with every new
transaction. The data mining technique can help bankers by
solving business-related problems in banking and finance by
identifying trends, casualties, and correlations in business
6.2.9.1 Incomplete and noisy data:
information and market costs that are not instantly evident
to managers or executives because the data volume is too
The process of extracting useful data from large volumes of
large or are produced too rapidly on the screen by experts.
data is data mining. The data in the real-world is
The manager may find these data for better targeting,
heterogeneous, incomplete, and noisy. Data in huge
acquiring, retaining, segmenting, and maintain a profitable
quantities will usually be inaccurate or unreliable. These
customer.
problems may occur due to data measuring instrument or
because of human errors. Suppose a retail chain collects
phone numbers of customers who spend more than $ 500,
and the accounting employees put the information into their
130 131
system. The person may make a digit mistake when entering designed algorithm and techniques are not up to the mark,
the phone number, which results in incorrect data. Even then the efficiency of the data mining process will be
some customers may not be willing to disclose their phone affected adversely.
numbers, which results in incomplete data. The data could
get changed due to human or system error. All these 6.2.9.5 Data Privacy and Security:
consequences (noisy and incomplete data)makes data
mining challenging. Data mining usually leads to serious issues in terms of data
security, governance, and privacy. For example, if a retailer
6.2.9.2 Data Distribution: analyzes the details of the purchased items, then it reveals
data about buying habits and preferences of the customers
Real-worlds data is usually stored on various platforms in a without their permission.
distributed computing environment. It might be in a
database, individual systems, or even on the internet. 6.2.9.6 Data Visualization:
Practically, It is a quite tough task to make all the data to a
centralized data repository mainly due to organizational and In data mining, data visualization is a very important process
technical concerns. For example, various regional offices because it is the primary method that shows the output to the
may have their servers to store their data. It is not feasible to user in a presentable way. The extracted data should convey
store, all the data from all the offices on a central server. the exact meaning of what it intends to express. But many
Therefore, data mining requires the development of tools times, representing the information to the end-user in a
and algorithms that allow the mining of distributed data. precise and easy way is difficult. The input data and the
output information being complicated, very efficient, and
6.2.9.3 Complex Data: successful data visualization processes need to be
implemented to make it successful.
Real-world data is heterogeneous, and it could be
multimedia data, including audio and video, images, 6.3 Big Data
complex data, spatial data, time series, and so on. Managing Big data refers to the large, diverse sets of information that
these various types of data and extracting useful information grow at ever-increasing rates. It encompasses the volume of
is a tough task. Most of the time, new technologies, new information, the velocity or speed at which it is created and
tools, and methodologies would have to be refined to obtain collected, and the variety or scope of the data points being
specific information. covered (known as the "three v's" of big data). Big data
often comes from data mining and arrives in multiple
6.2.9.4 Performance: formats.

The data mining system's performance relies primarily on Big data can be categorized as unstructured or structured.
the efficiency of algorithms and techniques used. If the Structured data consists of information already managed by
132 133
the organization in databases and spreadsheets; it is 6.3.2 Advantages and Disadvantages of Big Data
frequently numeric in nature. Unstructured data is The increase in the amount of data available presents both
information that is unorganized and does not fall into a opportunities and problems. In general, having more data
predetermined model or format. It includes data gathered on customers (and potential customers) should allow
from social media sources, which help institutions gather companies to better tailor products and marketing efforts in
information on customer needs. order to create the highest level of satisfaction and repeat
business. Companies that collect a large amount of data are
Big data can be collected from publicly shared comments provided with the opportunity to conduct deeper and richer
on social networks and websites, voluntarily gathered from analysis for the benefit of all stakeholders.
personal electronics and apps, through questionnaires,
product purchases, and electronic check-ins. The presence With the amount of personal data available on individuals
of sensors and other inputs in smart devices allows for data today, it is crucial that companies take steps to protect this
to be gathered across a broad spectrum of situations and data; a topic which has become a hot debate in today's
circumstances. Big data is most often stored in computer
online world, particularly with the many data breaches
databases and is analyzed using software specifically
designed to handle large, complex data sets. companies have experienced in the last few years. While
Many software-as-a-service (SaaS) companies specialize in better analysis is a positive, big data can also create
managing this type of complex data. overload and noise, reducing its usefulness. Companies
must handle larger volumes of data and determine which
6.3.1 The Uses of Big Data data represents signals compared to noise. Deciding what
Data analysts look at the relationship between different makes the data relevant becomes a key factor.
types of data, such as demographic data and purchase
history, to determine whether a correlation exists. Such Furthermore, the nature and format of the data can require
assessments may be done in-house or externally by a third- special handling before it is acted upon. Structured data,
party that focuses on processing big data into digestible consisting of numeric values, can be easily stored and
formats. Businesses often use the assessment of big data by sorted. Unstructured data, such as emails, videos, and text
such experts to turn it into actionable information. documents, may require more sophisticated techniques to
be applied before it becomes useful.
Nearly every department in a company can utilize findings
from data analysis, from human resources and 6.4 Decision Support System (DSS)
technology to marketing and sales. The Beninl of big data A decision support system (DSS) is a computerized
is to increase the speed at which products get to market, to program used to support determinations, judgments, and
reduce the amount of time and resources required to gain courses of action in an organization or a business. A DSS
market adoption, target audiences, and to ensure customers sifts through and analyzes massive amounts of data,
remain satisfied.
134 135
compiling comprehensive information that can be used to
solve problems and in decision-making. Typical Characteristics of a DSS
information used by a DSS includes target or projected
revenue, sales figures or past ones from different time  Support for decision-makers in semi-structured and
periods, and other inventory- or operations-related data. unstructured problems.
 Support for managers at various managerial levels,
A decision support system gathers and analyzes data, ranging from top executive to line managers.
synthesizing it to produce comprehensive information  Support for individuals and groups. Less structured
reports. In this way, as an informational application, a DSS problems often requires the involvement of several
differs from an ordinary operations application, whose individuals from different departments and
function is just to collect data. organization level.
 Support for interdependent or sequential decisions.
The DSS can either be completely computerized or powered  Support for intelligence, design, choice, and
by humans. In some cases, it may combine both. The ideal implementation.
systems analyze information and actually make decisions  Support for variety of decision processes and styles.
for the user. At the very least, they allow human users to  DSSs are adaptive over time.
make more informed decisions at a quicker pace (Troy,
2022). Benefits of DSS

Attributes of a DSS  Improves efficiency and speed of decision-making


activities.
 Adaptability and flexibility  Increases the control, competitiveness and capability
 High level of Interactivity of futuristic decision-making of the organization.
 Facilitates interpersonal communication.

 Encourages learning or training.
 Ease of use  Since it is mostly used in non-programmed decisions,
 Efficiency and effectiveness it reveals new approaches and sets up new evidences
 Complete control by decision-makers for an unusual decision.
 Ease of development  Helps automate managerial processes.
 Extendibility Components of a DSS
 Support for modeling and analysis
 Support for data access Following are the components of the Decision Support
System
 Standalone, integrated, and Web-based

136 137
 Database Management System (DBMS) − To solve a  Solver Oriented DSS − It is based on a solver, which
problem the necessary data may come from internal is an algorithm or procedure written for performing
or external database. In an organization, internal data certain calculations and particular program type.
are generated by a system such as TPS and MIS.  Rules Oriented DSS − It follows certain procedures
External data come from a variety of sources such as adopted as rules.
newspapers, online data services, databases  Rules Oriented DSS − Procedures are adopted in rules
(financial, marketing, human resources). oriented DSS. Export system is the example.
 Model Management System − It stores and accesses  Compound DSS − It is built by using two or more of
models that managers use to make decisions. Such the five structures explained above.
models are used for designing manufacturing facility,
analyzing the financial health of an organization, 6.4.1 Types of DSS
forecasting demand of a product or service, etc.
Following are some typical DSSs
Support Tools − Support tools like online help; pulls
down menus, user interfaces, graphical analysis, error  Status Inquiry System − It helps in taking operational,
correction mechanism, facilitates the user interactions management level, or middle level management
with the system. decisions, for example daily schedules of jobs to
machines or machines to operators.
 Data Analysis System − It needs comparative
Classification of DSS analysis and makes use of formula or an algorithm,
for example cash flow analysis, inventory analysis
There are several ways to classify DSS. Hoi Apple and etc.
Whinstone classifies DSS as follows  Information Analysis System − In this system data is
 Text Oriented DSS − It contains textually represented analyzed and the information report is generated. For
information that could have a bearing on decision. It example, sales analysis, accounts receivable systems,
allows documents to be electronically created, revised market analysis etc.
and viewed as needed.  Accounting System − It keeps track of accounting and
 Database Oriented DSS − Database plays a major role finance related information, for example, final
here; it contains organized and highly structured data. account, accounts receivables, accounts payables, etc.
 Spreadsheet Oriented DSS − It contains information that keep track of the major aspects of the business.
in spread sheets that allows create, view, modify  Model Based System − Simulation models or
procedural knowledge and also instructs the system to optimization models used for decision-making are
execute self-contained instructions. The most popular used infrequently and creates general guidelines for
tool is Excel and Lotus 1-2-3. operation or management (MIS Decision Support
System, 2022).
138 139
2. Presentation applications – They involve delivery of
6.5 Multimedia Database multimedia data subject to temporal constraint. Optimal
Multimedia database is the collection of interrelated viewing or listening requires DBMS to deliver data at
multimedia data that includes text, graphics (sketches, certain rate offering the quality of service above a certain
drawings), images, animations, video, audio etc and have threshold. Here data is processed as it is delivered.
vast amounts of multisource multimedia data. The Example: Annotating of video and audio data, real-time
framework that manages different types of multimedia data editing analysis.
which can be stored, delivered and utilized in different ways 3. Collaborative work using multimedia information – It
is known as multimedia database management system. involves executing a complex task by merging drawings,
There are three classes of the multimedia database which changing notifications. Example: Intelligent healthcare
includes static media, dynamic media and dimensional network.
media.
6.5.3 Challenges to multimedia databases
1. Modelling – Working in this area can improve database
6.5.1 Content of Multimedia Database management system
versus information retrieval techniques thus, documents
1. Media data – The actual data representing an object. constitute a specialized area and deserve special
2. Media format data – Information such as pling rate, consideration.
resolution, encoding scheme etc. about the format of the 2. Design – The conceptual, logical and physical design of
media data after it goes through the acquisition, multimedia databases has not yet been addressed fully as
processing and encoding phase. performance and tuning issues at each level are far more
3. Media keyword data – Keywords description relating to complex as they consist of a variety of formats like
the generation of data. It is also known as content JPEG, GIF, PNG, MPEG which is not easy to convert
descriptive data. Example: date, time and place of from one form to another.
recording. 3. Storage – Storage of multimedia database on any
4. Media feature data – Content dependent data such as the standard disk presents the problem of representation,
distribution of colors, kinds of texture and different compression, mapping to device hierarchies, archiving
shapes present in data. and buffering during input-output operation. In DBMS,
a ”BLOB”(Binary Large Object) facility allows untyped
6.5.2 Types of multimedia applications bitmaps to be stored and retrieved.
1. Repository applications – A Large amount of 4. Performance – For an application involving video
multimedia data as well as meta-data(Media format date, playback or audio-video synchronization, physical
Media keyword data, Media feature data) that is stored limitations dominate. The use of parallel processing may
for retrieval purpose, e.g., Repository of satellite images, alleviate some problems but such techniques are not yet
engineering drawings, radiology scanned pictures.
140 141
fully developed. Apart from this multimedia database which are nowadays very popular sources of learning.
consume a lot of processing time as well as bandwidth. Example: Digital libraries.
5. Queries and retrieval –For multimedia data like images,  Marketing, advertising, retailing, entertainment and
video, audio accessing data through query opens up travel. Example: a virtual tour of cities.
many issues like efficient query formulation, query  Real-time control and monitoring: Coupled with active
execution and optimization which need to be worked database technology, multimedia presentation of
upon. information can be very effective means for monitoring
and controlling complex tasks Example: Manufacturing
operation control.
6.6 Mobile Database
Mobile Database is a database that is transportable, portable,
and physically separate or detached from the corporate
database server but has the capability to communicate with
those servers from remote sites allowing the sharing of
various kinds of data.

With mobile databases, users have access to corporate data


on their laptop, PDA, or other Internet access device that is
required for applications at remote sites.

Figure 38 Contents of multimedia database


6.6.1 Components of a mobile database environment
include:
Source: Alex Onsman
 Corporate database server and DBMS that deals with
6.5.4 Areas where multimedia database and stores the corporate data and provides corporate
 Documents and record management : Industries and applications
businesses that keep detailed records and variety of  Remote database and DBMS usually manages and
documents. Example: Insurance claim record. stores the mobile data and provides mobile
 Knowledge dissemination : Multimedia database is a applications
very effective tool for knowledge dissemination in terms  mobile database platform that includes a laptop,
of providing several resources. Example: Electronic PDA, or other Internet access devices
books.  Two-way communication links between corporate
 Education and training : Computer-aided learning and mobile DBMS.
materials can be designed using multimedia sources
142 143
Based on the particular necessities of mobile applications, in Reference
many of the cases, the user might use a mobile device may
and log on to any corporate database server and work with Alex Onsman(2020) Multimedia Databases.
data there. In contrast, in others, the user may download data https://www.tutorialspoint.com/Multimedia
and work with it on a mobile device or upload data captured Databases.
at the remote site to the corporate database. The
communication between the corporate and mobile databases Craig S. (2023), Data analytics (DA)
is usually discontinuous and is typically established or gets https://www.techtarget.com/searchdatamanagement/
its connection for a short duration of time at irregular definition/data-analytics
intervals. Although unusual, some applications require Himanshi_S. (2020) Fundamentals of Database Systems.
direct communication between mobile databases. The two GeeksforGeeks
main issues associated with mobile databases are the
management of the mobile database and the communication Jake F. ( 2023) Data Analytics: What It Is, How It's Used,
between the mobile and corporate databases. In the and 4 Basic Techniques.
following section, we identify the requirements of mobile https://www.investopedia.com/terms/d/data-
DBMSs. analytics.asp.

The additional functionality required for mobile DBMSs MIS - Decision Support System(2022)
includes the capability to: https://www.tutorialspoint.com/management_information
system/ decision_support_system.htm#
 communicate with the centralized or primary
database server through modes Ramez E. Shamkant B. Navathe (2020) Fundamentals of
 repeat those data on the centralized database server Database Systems Pearson, India
and mobile device
 coordinate data on the centralized database server and TROY SEGAL( 2022) Decision Support System (DSS):
mobile device What It Is and How Businesses Use Them
 capture data from a range of sources such as the
Internet Yu, C. Teri B. (2011). Multimedia database applications:
 deal with those data on the mobile device issues and concerns for classroom
 analyze those data on a mobile device 3 (1): arXiv:1102.5769. Bibcode:2011arXiv1102.5769Y
 create customized and personalized mobile
applications

144 145
146

You might also like