KEMBAR78
Dbms Perplexity | PDF | Relational Database | Databases
0% found this document useful (0 votes)
134 views162 pages

Dbms Perplexity

Data abstraction in DBMS simplifies user interaction by hiding complex details of data storage, enhancing security and supporting data independence. It is implemented through three levels: physical, logical, and view, each serving different user needs while ensuring efficient database design and maintenance. Data independence allows schema modifications at one level without affecting others, promoting flexibility and security in database management.

Uploaded by

49guptarishav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views162 pages

Dbms Perplexity

Data abstraction in DBMS simplifies user interaction by hiding complex details of data storage, enhancing security and supporting data independence. It is implemented through three levels: physical, logical, and view, each serving different user needs while ensuring efficient database design and maintenance. Data independence allows schema modifications at one level without affecting others, promoting flexibility and security in database management.

Uploaded by

49guptarishav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 162

Data Abstraction in Database Management System (DBMS)

Definition:
Data abstraction in DBMS is the process of hiding the complex, low-level details of how data is
stored and maintained, and presenting users with a simplified, relevant view of the data. This
makes it easier and more secure for users to access and interact with the database, as they only
see the necessary information and not the underlying implementation details [1] [2] [3] .

Importance of Data Abstraction


Simplifies user interaction with complex data structures.
Enhances data security by restricting access to sensitive or irrelevant details.
Supports data independence, allowing changes in storage or schema without affecting user
applications.

8
Makes database design and maintenance more efficient [1] [4] [5] .
t1
at
Levels of Data Abstraction
bh

Data abstraction in DBMS is implemented through three main layers:


dd

Level Description Who Uses It Example Scenario


si

Lowest level. Describes how data is


Physical Database How customer data is
actually stored on storage devices (files,
Level programmers, DBAs stored as bytes on disk
blocks, indexes).

Logical Middle level. Describes what data is Database Customer table structure,
Level stored and the relationships among data. administrators relationships with orders

Highest level. Describes only part of the


View End users, User sees only their order
database relevant to a particular user or
Level applications history
application.

1. Physical (Internal) Level


Focuses on the actual storage of data (files, indexes, blocks, bytes).
Describes data structures and access methods used by the DBMS.
Concerned with efficiency, storage allocation, and retrieval mechanisms.
Example: Storing customer records as blocks of bytes on a hard drive [2] [6] [7] .
2. Logical (Conceptual) Level
Describes what data is stored and the relationships among those data.
Provides a logical view using tables, schemas, and relationships (e.g., ER diagrams).
Hides physical storage details from users.
Used by database administrators to define the structure and constraints of the database.
Example: Defining a "Customer" entity with attributes like customer_id, name, and address,
and its relationship to "Order" [1] [4] [7] .

3. View (External) Level


The highest level of abstraction; concerned with how users interact with the database.
Provides different customized views of the database for different users or applications.
Users can only access the data relevant to their needs, not the entire database.
Multiple views can be created for different user roles (e.g., customer, manager).
Example: A customer sees only their order history, while a manager sees sales reports [1] [2]
[6] .

8
Advantages of Data Abstraction
t1
at
Data Independence: Changes at one level (e.g., physical storage) do not affect other levels
(e.g., user views) [4] [5] .
bh

Security: Sensitive or irrelevant data is hidden from unauthorized users [3] .


dd

Simplified Interaction: Users interact with a simplified model rather than complex
underlying structures [1] [8] .
si

Flexibility: The system can evolve (e.g., change database engines) without impacting front-
end applications [9] .

Real-World Example
Suppose an e-commerce application uses MongoDB during development and switches to
MySQL in production. The data abstraction layer ensures that the front-end code does not need
to change, as it interacts with the data through a consistent interface, regardless of the
underlying database engine. Users and applications remain unaware of these backend
changes [9] .

Summary Table
Level What is Hidden? What is Exposed?

Physical Storage details, file structures Data blocks, storage methods

Logical Physical storage implementation Tables, schemas, relationships


Level What is Hidden? What is Exposed?

View Unnecessary data, other user data User-specific data, reports

Key Points to Remember


Data abstraction is fundamental to DBMS, ensuring users only see what they need [1] [2] [3] .
There are three levels: physical, logical, and view.
It supports data independence, security, and simplified user experience.
Multiple views can be created for different users or applications.

Tip for Exams:


Draw a diagram showing the three levels of abstraction and their relationships. Use real-world
analogies (like shopping for clothes: you see size and color, not factory details) to explain the
concept clearly [3] .

8
Data Independence in Database Management System (DBMS) t1
Definition:
at
Data independence is the ability to modify the schema at one level of a database system without
affecting the schema at the next higher level. This means changes in how data is stored or
bh

structured do not force changes in how data is accessed or used by applications and users [10]
[11] [12] [13] .
dd
si

Importance of Data Independence


Simplifies Database Maintenance: Changes can be made to storage or structure without
disrupting user applications or requiring extensive rewrites [10] [12] .
Enhances Flexibility: Databases can evolve to meet new requirements with minimal impact
on existing systems [10] [12] .
Supports Data Abstraction: By separating different levels of data representation, data
independence enables abstraction and security [11] [12] .

Types of Data Independence


There are two main types of data independence in DBMS:

What Can Be
Type What Remains Unaffected? Examples
Changed?

Physical Data Physical storage, file Logical schema, user Changing storage devices,
Independence organization views/applications file structures
What Can Be
Type What Remains Unaffected? Examples
Changed?

Logical Data Logical schema (tables, User views, application Adding/removing


Independence relationships) programs attributes, splitting tables

1. Physical Data Independence


Definition: The ability to change the physical storage or organization of data without
affecting the logical schema or application programs [11] [14] [12] [15] [16] .
Scope: Changes at the physical (internal) level do not impact the logical (conceptual) level.
Examples:
Switching from hard disk to SSD storage.
Changing file organization (e.g., from sequential to indexed).
Moving database files to a new location or device [14] [12] [15] .
Significance: Makes it easier to optimize performance or adopt new storage technology
without rewriting database logic or user applications [12] [15] .

2. Logical Data Independence


8
t1
Definition: The ability to change the logical schema (structure of tables, relationships,
at
constraints) without affecting user views or application programs [11] [14] [12] [13] [17] .
bh

Scope: Changes at the logical (conceptual) level do not impact the external (view) level.
Examples:
dd

Adding or removing attributes (columns) in a table.


si

Splitting or merging tables/entities.


Modifying relationships between entities [12] [17] .
Significance: Allows the database to evolve (e.g., add new features or data requirements)
without requiring changes to existing applications or user interfaces [12] [17] .

Why Data Independence Matters


Reduces Complexity: Developers and users do not need to understand or adapt to
changes in storage or schema [10] [12] .
Improves Security: Sensitive or internal details can be hidden from users [11] [12] .
Facilitates Upgrades: Hardware or schema upgrades can be performed with minimal
disruption [14] [12] [17] .
Ensures Data Abstraction: Supports the three-level architecture (physical, logical, view) [11]
[12] .
Summary Table
Type Level Affected Level Unaffected Example Change

Physical Data Logical/Conceptual,


Physical/Internal Changing storage device
Independence External

Logical Data Adding a new attribute to a


Logical/Conceptual External/View
Independence table

Key Points for Exams


Data independence is a core feature of modern DBMS, ensuring separation between how
data is stored, structured, and accessed [10] [11] [12] [18] .
There are two types: physical (storage changes do not affect schema) and logical (schema
changes do not affect user views/applications) [12] [13] [17] .
Both types make database systems more flexible, maintainable, and robust in the face of
change [12] [17] .

Tip:
Draw a diagram of the three-level architecture (Physical, Logical, View) and indicate where each

8
type of data independence applies for clear understanding in exams.
t1
at

bh

Data Definition Language (DDL) in Database Management System


dd

Definition:
Data Definition Language (DDL) is a subset of SQL used to define, modify, and remove the
si

structure of database objects such as tables, schemas, indexes, and views. DDL commands do
not manipulate the data itself but rather the schema or structure of the database [19] [20] [21] .

Key Features of DDL


Used to create, alter, and delete database objects (tables, indexes, views, sequences,
schemas, users, etc.) [19] [20] [21] .
Defines and manages the database schema, including specifying data types, field lengths,
and constraints [19] [22] [23] .
DDL statements take effect immediately and typically commit automatically, meaning
changes are permanent unless managed within a transaction (depending on the DBMS) [19]
[21] .

DDL is considered a subset of SQL and is standardized across most relational database
systems [19] [21] [24] .
Common DDL Commands
Command Purpose Example Syntax

Creates a new database object (table, CREATE TABLE Student (ID INT PRIMARY
CREATE
view, index, etc.) KEY, Name VARCHAR(50));

Modifies an existing database object ALTER TABLE Student ADD Email


ALTER
(add/remove columns, etc.) VARCHAR(100);

DROP Deletes an existing database object DROP TABLE Student;

Removes all records from a table, but


TRUNCATE TRUNCATE TABLE Student;
keeps its structure

RENAME Changes the name of a database object RENAME TABLE Student TO Alumni;

Adds comments or descriptions to COMMENT ON TABLE Student IS 'Stores


COMMENT
database objects student information';

DDL Constraints
DDL commands are also used to define constraints on tables to enforce data integrity:
PRIMARY KEY: Uniquely identifies each record in a table.
8
t1
FOREIGN KEY: Ensures referential integrity between tables.
at
UNIQUE: Ensures all values in a column are unique.
bh

CHECK: Enforces domain integrity by limiting the values that can be placed in a column.
NOT NULL: Ensures that a column cannot have a NULL value [19] [20] .
dd
si

Differences Between DDL and Other SQL Subsets


Language Functionality Example Commands

DDL Defines/modifies structure of database objects CREATE, ALTER, DROP

DML Manipulates data within tables INSERT, UPDATE, DELETE

DCL Controls access to data GRANT, REVOKE

TCL Manages transactions COMMIT, ROLLBACK, SAVEPOINT

DQL Retrieves data from the database SELECT

Examples
1. Creating a Table:

CREATE TABLE Employee (


ID INT PRIMARY KEY,
Name VARCHAR(50) NOT NULL,
Age INT,
Department VARCHAR(30)
);

2. Altering a Table:

ALTER TABLE Employee ADD Email VARCHAR(100);

3. Dropping a Table:

DROP TABLE Employee;

4. Truncating a Table:

TRUNCATE TABLE Employee;

Use Cases of DDL

8
Designing and setting up the initial database schema for an application.
t1
Modifying the structure of database objects as requirements change.
at
Removing obsolete tables or views from the database.
bh

Defining constraints to ensure data integrity and consistency [19] [20] [23] .
dd

Summary
si

DDL is essential for defining and managing the structure of a database.


It includes commands like CREATE, ALTER, DROP, and TRUNCATE.
DDL commands affect the schema, not the data itself.
Constraints defined via DDL help maintain data integrity.
DDL is a crucial component of SQL and is used during database design, setup, and
maintenance [19] [20] [24] .

Data Manipulation Language (DML) in Database Management System


Definition:
Data Manipulation Language (DML) is a subset of SQL (Structured Query Language) used to
manage and manipulate data within database tables. DML provides the commands necessary to
insert, retrieve, update, and delete data, making it essential for day-to-day database
operations [25] [26] [27] [28] .
Core Functions of DML
DML is responsible for the following primary operations, often referred to as CRUD:
Create: Add new records to tables (INSERT)
Read: Retrieve data from tables (SELECT)
Update: Modify existing data (UPDATE)
Delete: Remove data from tables (DELETE) [26] [27] [28]

Key DML Commands in SQL


Command Purpose Example Syntax

INSERT INTO Students (Name, Age) VALUES


INSERT Adds new records to a table
('Alice', 20);

Retrieves data from one or more SELECT Name, Age FROM Students WHERE Age >
SELECT
tables 18;

UPDATE Students SET Age = 21 WHERE Name =


UPDATE Modifies existing records
'Alice';

DELETE Removes records from a table


8
DELETE FROM Students WHERE Name = 'Alice';
t1
SELECT is sometimes categorized as Data Query Language (DQL), but in practice, it is
at
often considered part of DML because it is used for data retrieval [25] [26] [28] .
bh
dd

Characteristics of DML
Not Auto-Committed:
si

DML operations are generally not auto-committed. Changes can be rolled back if needed
before committing, depending on the DBMS [29] .
Affects Table Data, Not Structure:
DML commands manipulate the data within tables, not the structure of the tables
themselves (which is handled by DDL).
Transactional:
DML commands can be grouped into transactions, allowing multiple operations to be
executed as a single unit.
Supports Both Procedural and Non-Procedural Use:
DML can be procedural (specifying how data is accessed) or non-procedural/declarative
(specifying what data is needed) [26] .
Advanced DML Operations
JOINS:
Combine data from multiple tables based on related columns.
Subqueries:
Nest one query within another for more complex data retrieval.
Aggregate Functions:
Use functions like SUM, AVG, COUNT, etc., with SELECT for data analysis [28] .

DML vs. DDL


DML (Data Manipulation Language) DDL (Data Definition Language)

Manipulates data within tables Defines/modifies database structure

INSERT, SELECT, UPDATE, DELETE CREATE, ALTER, DROP, TRUNCATE

Transactional (can be rolled back) Usually auto-committed

Does not affect schema Alters schema (tables, indexes, etc.)

8
Importance of DML
t1
at
Central to Application Logic:
Enables applications to interact with and modify database content [27] [28] .
bh

Supports Data Analysis:


SELECT and related commands allow for complex querying and reporting [28] .
dd

Foundation for Business Workflows:


si

Used in all data-driven applications, from CRMs to analytics dashboards [27] .

Summary
DML is a crucial part of SQL, enabling the manipulation of data in relational databases.
The main DML commands are INSERT, SELECT, UPDATE, and DELETE.
DML operations are transactional, affect only the data (not the structure), and are
foundational for all database applications and data analysis tasks [25] [26] [27] [28] .

Entity-Relationship (ER) Model in Database Management System


The Entity-Relationship (ER) model is a high-level conceptual data model used to visually
represent the structure of a database. It provides a blueprint for designing databases by
illustrating entities, their attributes, and the relationships among them, making complex data
structures easy to understand and implement [30] [31] [32] .
Key Components of the ER Model

1. Entities
Definition: An entity is any real-world object, concept, or thing that can have data stored
about it, such as a person, place, event, or object [33] [34] [31] [32] .
Representation: Entities are depicted as rectangles in ER diagrams.
Example: Student, Employee, Product.

Entity Set
A collection of similar entities (e.g., all students in a college).
Types:
Strong Entity Set: Has a primary key (unique identifier). Represented by a single
rectangle.
Weak Entity Set: Lacks a primary key and depends on a strong entity. Represented by
a double rectangle.

2. Attributes
8
t1
Definition: Attributes are the properties or characteristics of an entity [33] [34] [31] [32] .
at
Representation: Attributes are shown as ovals (ellipses) connected to their entity.
bh

Types:
Simple Attribute: Cannot be divided further (e.g., Age).
dd

Composite Attribute: Can be divided into smaller subparts (e.g., Name → First Name,
Last Name).
si

Single-valued Attribute: Holds a single value (e.g., Salary).


Multi-valued Attribute: Can hold multiple values (e.g., Phone Numbers).
Derived Attribute: Value can be derived from other attributes (e.g., Age from Date of
Birth).
Key Attribute: Uniquely identifies an entity (e.g., Roll Number).

3. Relationships
Definition: Relationships illustrate how two or more entities are associated with each
other [33] [34] [31] [32] .
Representation: Depicted as diamonds connecting related entities.
Example: "Works_For" between Employee and Department.
Degree of Relationship
Unary: Relationship within the same entity set.
Binary: Relationship between two entities (most common).
Ternary: Relationship among three entities.
n-ary: Relationship among n entities.

Cardinality
Specifies the number of instances of one entity that can be associated with instances of
another entity (e.g., One-to-One, One-to-Many, Many-to-Many) [32] .

ER Diagram Notation Summary


Component Symbol Description

Entity Rectangle Object/concept stored in the database

Attribute Oval Properties of an entity

Relationship Diamond Association between entities

8
Primary Key Underlined
t1
Unique identifier for an entity
at
Multi-valued Double oval Attribute with multiple values
bh

Weak Entity Double rectangle Entity without a primary key


dd

Advantages of ER Model
si

Visual Clarity: Makes complex data structures easy to understand and communicate [31]
[35] .

Precise Planning: Helps avoid redundancy and ensures efficient resource use [31] [35] .
Adaptability: Easy to modify as requirements change [31] [35] .
Facilitates Data Integrity: Clearly defines relationships and constraints, supporting
referential integrity [35] .
Efficient Querying: Well-structured models enable optimized queries and better
performance [35] .

Disadvantages of ER Model
Limited to Conceptual Design: Does not specify implementation details or physical storage.
Complexity with Large Systems: Diagrams can become complicated for very large
databases [35] .
Types of ER Models
Conceptual ER Model: High-level overview, focuses on entities and relationships, omits
details [36] .
Logical ER Model: More detailed, includes attributes and relationships, used for database
design [36] .
Physical ER Model: Represents how the model will be implemented in the database
system [36] .

Example
Consider a university database:
Entities: Student, Course, Instructor.
Attributes: Student (Roll_No, Name, DOB), Course (Course_ID, Title), Instructor (Emp_ID,
Name).
Relationships: Student "enrolls in" Course, Instructor "teaches" Course.

8
Summary Table t1
Component Example Symbol in ERD
at
Entity Student Rectangle
bh

Attribute Name, DOB Oval


dd

Relationship Enrolls Diamond

Primary Key Roll_No Underlined oval


si

Multi-valued Phone Numbers Double oval

Weak Entity Dependent Double rectangle

Conclusion
The ER model is a foundational tool in database design, providing a clear, visual representation
of data and its relationships. It helps ensure robust, efficient, and scalable database systems by
simplifying the design process and supporting communication among stakeholders [31] [35] [32] .

Network Model in Database Management System (DBMS)
The network model is a database model designed to represent complex relationships among
data more flexibly than the hierarchical model. It organizes data using records connected by
links (pointers), forming a graph structure that supports multiple parent and child relationships,
making it ideal for modeling many-to-many relationships [37] [38] [39] .

Key Features of the Network Model


Graph Structure: Data is organized as a graph, with records (nodes) connected by links
(edges or pointers), allowing multiple paths between records [37] [38] .
Many-to-Many Relationships: Unlike the hierarchical model, the network model can
represent one-to-one, one-to-many, and many-to-many relationships between entities [37]
[40] .

Owner-Member Relationships: Each record can have multiple owners and members,
supporting flexible data connections [37] [38] .
Pointers: Relationships are implemented using pointers, which directly link records, enabling
fast data access and navigation [38] [41] .
Data Integrity: Members cannot exist without an owner, ensuring all data is properly linked
and maintaining integrity [42] [41] .
8
t1
at

Structure and Representation


bh

Records: The basic unit of data, similar to rows in a table.


dd

Sets: Collections of related records, typically representing owner-member relationships.


si

Pointers: Explicit links that connect records, forming a network (graph) instead of a strict
hierarchy [38] [41] .
Example: In a college database, a student record can be linked to both the "CSE
Department" and "Library" records, showing that students can belong to multiple
departments or sections [37] .

Advantages of the Network Model


Conceptual Simplicity: Easy to understand and design, especially for complex
relationships [39] .
Efficient Data Access: Multiple paths to the same record allow flexible and fast data
retrieval [37] [43] [40] .
Handles Complex Relationships: Supports many-to-many and multiple parent-child
relationships, which are difficult in hierarchical models [37] [39] .
Data Integrity: Ensures members are always associated with an owner, preventing orphan
records [42] [41] .
Data Independence: Changes in data storage do not affect application programs,
promoting separation between data and processing [42] .

Disadvantages of the Network Model


System Complexity: Use of pointers and complex relationships increases the complexity of
database design and management [42] [41] [39] .
Difficult Structural Changes: Modifying the structure (adding/removing relationships or
entities) is challenging due to interconnected data [41] [39] .
Operational Anomalies: Insert, delete, or update operations may require extensive pointer
adjustments [41] .
Lack of Standard Query Language: No universal query language like SQL, making it harder
to work across different network DBMSs [44] .
Scalability Issues: Performance can degrade with very large or highly complex datasets
unless carefully optimized [44] .

Examples of Network Database Management Systems

8
IDS (Integrated Data Store): Early network DBMS for mainframes [38] .
t1
IDMS (Integrated Database Management System): Used for creating complex data
at
structures on mainframes [38] .
bh

TurboIMAGE: Network DBMS for HP 3000 minicomputers [38] .


Univac DMS-1100: Network DBMS for Univac mainframes [38] .
dd
si

Comparison to Other Models


Feature Hierarchical Model Network Model Relational Model

Structure Tree Graph Table

Relationships 1-to-many 1-to-1, 1-to-many, many-to-many Any via keys

Flexibility Rigid Flexible Highly flexible

Data Access Sequential Fast, via pointers Declarative (SQL)

Complexity Simple Complex Moderate

Summary
The network model organizes data as a graph, supporting complex, many-to-many
relationships with multiple parent and child records.
It offers efficient data access and integrity but at the cost of increased system complexity
and difficulty with structural changes.
While largely superseded by the relational model, the network model remains important for
understanding the evolution of database systems and for certain high-performance
applications [43] [45] [40] .

Relational Model in Database Management System (DBMS)


The relational model is a foundational approach for organizing and managing data in a
database. Proposed by E. F. Codd in 1970, it represents data using two-dimensional tables
(relations), making data storage, retrieval, and management efficient, flexible, and intuitive [46]
[47] .

Key Concepts of the Relational Model


Relation (Table):
The core structure, a relation, is a table consisting of rows and columns. Each table
represents a real-world entity or relationship [46] [48] .
Tuple (Row):

8
Each row in a table is called a tuple and corresponds to a single record or instance of the
t1
entity [46] [48] .
Attribute (Column/Field):
at
Columns in a table are attributes, representing properties or characteristics of the entity [46]
bh

[48] .

Attribute Domain:
dd

The set of permissible values for a given attribute [46] .


si

Degree:
The number of attributes (columns) in a relation [46] .
Cardinality:
The number of tuples (rows) in a relation [46] .
Relational Schema:
The logical blueprint of a relation, specifying its name, attributes, and their data types [46]
[48] .
Example:
STUDENT(ROLL_NUMBER INTEGER, NAME VARCHAR(20), CGPA FLOAT)

Relational Instance:
The actual content (set of tuples) in a relation at a specific point in time [46] .
Keys in the Relational Model
Keys are crucial for uniquely identifying tuples and establishing relationships between tables [46]
[48] :

Candidate Key: Minimal set of attributes uniquely identifying a tuple.


Super Key: Any set of attributes that uniquely identifies a tuple.
Primary Key: Chosen candidate key to uniquely identify each tuple.
Alternate Key: Candidate keys not chosen as primary key.
Composite Key: Key formed by combining two or more attributes.
Foreign Key: Attribute(s) in one table that refer to the primary key of another table,
establishing relationships.

Properties and Advantages


Data Independence:
Logical data organization is independent of physical storage, allowing schema changes
without affecting applications [48] [49] .

8
Simplicity: t1
Tabular format is easy to understand, design, and use [46] [50] .
at
Integrity and Consistency:
Constraints and keys ensure data accuracy and consistency [46] [49] .
bh

Flexibility:
dd

Supports one-to-one, one-to-many, and many-to-many relationships using keys [46] [49] .
Standard Query Language:
si

Most relational databases use SQL (Structured Query Language) for data manipulation and
querying [50] [47] .

Constraints in the Relational Model


Entity Integrity:
Primary key cannot be NULL.
Referential Integrity:
Foreign key values must match primary key values in the referenced table or be NULL.
Domain Constraints:
Attribute values must belong to a defined domain.
Popular Relational Database Management Systems (RDBMS)
Oracle Database
MySQL
Microsoft SQL Server
IBM DB2
PostgreSQL [48] [51] [52]

Example
STUDENT Table:

ROLL_NUMBER NAME CGPA

101 Alice 8.9

102 Bob 7.5

Relation: STUDENT
Attributes: ROLL_NUMBER, NAME, CGPA

8
Tuples: (101, Alice, 8.9), (102, Bob, 7.5)
t1
at
bh

Comparison with Other Models


Feature Hierarchical Model Network Model Relational Model
dd

Structure Tree Graph Table (Relation)


si

Relationships 1-to-many Many-to-many 1-to-1, 1-to-many, many-to-many

Data Access Complex Complex Simple (via keys)

Data Independence Low Partial High

Query Language Proprietary Proprietary SQL (Standardized)

Flexibility Low Moderate High [49]

Summary
The relational model organizes data into inter-related tables, each representing an entity
with attributes and tuples [46] [48] .
It provides simplicity, flexibility, data integrity, and data independence.
Keys and constraints play a central role in ensuring data accuracy and establishing
relationships.
SQL is the standard language for interacting with relational databases, making them widely
adopted in industry and academia [50] [47] .
The relational model remains the most popular and influential database model, forming the
backbone of modern database systems [51] [52] [50] .

Object Oriented Data Model in Database Management System (DBMS)


The Object Oriented Data Model (OODM) is a data modeling approach where data is
represented and stored as objects, similar to the way real-world entities are modeled in object-
oriented programming. This model integrates both data and behavior (methods) into a single unit
called an object, allowing for a more natural and flexible representation of complex data and
relationships [53] [54] [55] [56] [57] .

Key Concepts and Components


1. Object
An object is a real-world entity or concept, encapsulating both data (attributes) and
behavior (methods).

8
Example: A Student object with attributes like name and roll_no, and methods like register()
t1
or updateProfile() [54] [55] [56] .
2. Class
at
bh

A class is a blueprint or template for creating objects.


It defines a set of attributes and methods that its objects (instances) will have.
dd

Example: The Student class defines the structure and behavior for all student objects [54]
[56] .
si

3. Object Attribute
Attributes are properties or characteristics of an object.
Example: For a Book object, attributes could be title, author, and ISBN [54] [55] .
4. Object Method
Methods are functions or procedures associated with an object, defining its behavior.
Example: A withdraw() method in a BankAccount object [54] [56] .
5. Inheritance
Inheritance allows a class to inherit attributes and methods from another class (the parent or
superclass).
Promotes code reusability and hierarchical relationships.
Example: A Bus class and a Ship class can both inherit from a Transport class [54] [55] [56] .
6. Encapsulation
Bundles data (attributes) and methods (behavior) together, restricting direct access to some
of the object's components.
7. Relationships
Objects can be related to each other by references or pointers, allowing for complex
interconnections, such as one-to-many or many-to-many relationships [55] [56] .

Features and Advantages


Real-World Modeling: Represents real-world entities more accurately by combining data
and behavior [54] [55] .
Complex Data Support: Can handle complex data types like images, audio, video, and
user-defined data types, which are difficult to manage in the relational model [54] [55] [56] .
Reusability and Modularity: Inheritance and encapsulation promote code reuse and
modularity.
Database Integrity: Encapsulation and object identity help maintain data integrity [54] .
Structural and Database Independence: Changes in data structure do not affect the overall
system, supporting independence [54] .

8
Seamless Integration with OOP Languages: Works natively with object-oriented
t1
programming languages, making development and maintenance easier [58] [57] .
at
User-Constructed Types: Users can define new types as needed, supporting
bh

extensibility [56] .
dd

Example
si

Suppose you have a Transport class with attributes and methods common to all vehicles. Bus,
Ship, and Plane are subclasses inheriting from Transport, each with specific attributes
(RoadTransport, WaterTransport, AirTransport respectively) [54] [55] .
Another example:
Student and Department are objects.
Each object has its own attributes and methods.
They are linked by a common attribute, such as Dept_no, establishing a relationship between
the objects [54] [55] .

Object Oriented Database Management System (OODBMS)


An OODBMS is a database system based on the object-oriented data model. It stores data as
objects, supports object identity, encapsulation, inheritance, and allows storage of complex data
types [56] [57] . Examples include MongoDB (with some object-oriented features), ObjectDB, and
db4o [58] [57] [59] .
Comparison with Relational Model
Aspect Relational Model Object Oriented Data Model

Data Representation Tables (rows and columns) Objects (attributes + methods)

Relationships Foreign keys, joins Object references, pointers

Data Types Simple (numbers, strings) Complex (images, audio, video, user-defined types)

Behavior Not stored with data Methods stored with data

Inheritance Not supported Supported

Encapsulation Not supported Supported

Suitable For Structured, tabular data Complex, multimedia, and hierarchical data

Disadvantages
Complexity: More complex to design and manage compared to relational databases.
Lack of Standardization: No universal standard like SQL for OODBMSs, leading to
compatibility issues [60] .

8
Limited Adoption: Less widely used and supported than relational databases, especially for
t1
traditional business applications [60] [57] .
at
Performance: May not perform as well as relational databases for simple, tabular data and
bh

standard transaction processing.


dd

Summary
si

The object oriented data model stores both data and relationships in objects, encapsulating
attributes and methods together.
Supports inheritance, encapsulation, and complex data types.
Ideal for applications requiring representation of complex, multimedia, or hierarchical data.
Forms the basis of OODBMS, which integrate database management with object-oriented
programming principles [54] [55] [56] [57] .
This model is especially useful for engineering, multimedia, CAD, and applications where real-
world modeling and complex data are essential.

Integrity Constraints in Database Management System (DBMS)


Integrity constraints are a set of rules enforced on database tables to ensure the accuracy,
validity, and consistency of the data stored in a DBMS. These constraints are checked
automatically by the DBMS every time data is inserted, updated, or deleted, preventing
accidental or intentional corruption of the database [61] .
Types of Integrity Constraints

1. Domain Constraint
Definition: Restricts the permissible values for a given attribute (column) in a table.
Purpose: Ensures that each attribute contains only values from a defined domain (data
type, format, range, or enumeration).
Examples:
Age column can only have positive integers.
Gender column can only have values like 'Male', 'Female', 'Non-Binary'.
Price column must be a decimal greater than or equal to zero.
SQL Example:
CREATE TABLE Person (
ID INT PRIMARY KEY,
Name VARCHAR(50),
Age INT CHECK (Age > 0)
);

8
t1
Here, the CHECK constraint enforces that Age must be positive [62] [63] .
at

2. Entity Integrity Constraint


bh

Definition: Ensures that each row in a table can be uniquely identified and that the primary
dd

key cannot be NULL.


Purpose: Prevents duplicate or unidentified records in a table.
si

Examples:
Every student record must have a unique, non-null StudentID.
In a Books table, Book_ID (primary key) must be unique and not null.
SQL Example:
CREATE TABLE Student (
StudentID INT PRIMARY KEY,
Name VARCHAR(100)
);

Here, StudentID is the primary key and cannot be NULL or duplicated [62] [61] [63] .
3. Referential Integrity Constraint
Definition: Ensures that a foreign key value in one table either matches a primary key value
in another table or is NULL.
Purpose: Maintains valid links between related tables and prevents orphaned records.
Examples:
An order in the Orders table must reference a valid CustomerID from the Customers
table.
A Dept_ID in the Employees table must exist in the Department table.
SQL Example:
CREATE TABLE Department (
Dept_ID INT PRIMARY KEY,
Dept_Name VARCHAR(50)
);
CREATE TABLE Employee (
ID INT PRIMARY KEY,
Name VARCHAR(50),
Dept_ID INT,
FOREIGN KEY (Dept_ID) REFERENCES Department(Dept_ID)

8
);
t1
Here, every Dept_ID in Employee must exist in Department, ensuring referential integrity [61]
at
[63] .
bh

4. Key Constraint
dd

Definition: Ensures that a set of attributes (keys) uniquely identifies a row in a table.
si

Purpose: Prevents duplicate records and enforces uniqueness.


Types:
Primary Key: Unique and not null.
Unique Key: Unique but can be null.
Examples:
Roll No in a Student table must be unique.
Email addresses in a Users table must be unique.
SQL Example:
CREATE TABLE Users (
UserID INT PRIMARY KEY,
Email VARCHAR(100) UNIQUE
);

Here, both UserID and Email must be unique, and UserID cannot be NULL [61] .
Additional Constraint Types
NOT NULL Constraint: Ensures that a column cannot have NULL values.
CHECK Constraint: Limits the values that can be placed in a column (e.g., Age > 0).
ENUM Constraint: Restricts a column to a set of predefined values [63] .

Why Integrity Constraints Matter


Data Accuracy: Prevents invalid or inconsistent data entry.
Data Consistency: Ensures relationships between tables remain valid.
Data Reliability: Maintains trust in the database over time.
Automatic Enforcement: Reduces manual checks and errors by enforcing rules at the
database level [61] [63] .

Summary Table
Constraint Type Enforces On Ensures Example

8
Age > 0, Gender in {M, F, Non-
Domain Column Valid data type/range/format
t1 Binary}
at
Primary Key
Entity Integrity Unique, non-null identifier StudentID in Student table
column
bh

Referential Foreign Key Valid references between Dept_ID in Employee matches


Integrity column tables Department
dd

Key Key columns Uniqueness of records UserID, Email


si

In summary:
Integrity constraints are essential rules in DBMS that maintain the quality, consistency, and
reliability of data by enforcing restrictions on data values, uniqueness, and relationships between
tables [61] [62] [63] .

Data Manipulation Operations in Database Management System (DBMS)


Data manipulation operations are fundamental actions performed on the data stored within a
database. These operations enable users and applications to create, retrieve, modify, and
remove data, ensuring the database remains useful and up-to-date for various business and
analytical needs [64] [65] [66] .
Core Data Manipulation Operations (CRUD)
The four primary data manipulation operations are often referred to as CRUD:

Operation SQL Command Description

Create INSERT Adds new data (records/tuples) to a table

Read SELECT Retrieves data from one or more tables

Update UPDATE Modifies existing data in a table

Delete DELETE Removes data from a table

1. Create (INSERT)
Purpose: Add new records into a table.
Example:
INSERT INTO Students (StudentID, Name, Age) VALUES (101, 'Alice', 20);

Usage: Used when new data needs to be stored, such as registering a new user or adding a
new product [65] [67] [66] .
8
t1
at
2. Read (SELECT)
bh

Purpose: Retrieve data from the database based on specific criteria.


Example:
dd

SELECT Name, Age FROM Students WHERE Age > 18;


si

Usage: Used for querying information, generating reports, or displaying data to users [64]
[65] [67] [66] .

3. Update (UPDATE)
Purpose: Change existing data in one or more records.
Example:
UPDATE Students SET Age = 21 WHERE StudentID = 101;

Usage: Used when correcting errors, updating user details, or changing product prices [64]
[65] [67] [66] .
4. Delete (DELETE)
Purpose: Remove one or more records from a table.
Example:
DELETE FROM Students WHERE StudentID = 101;

Usage: Used for removing outdated, incorrect, or unnecessary data [64] [65] [67] [66] .

Additional Data Manipulation Operations


MERGE: Combines INSERT and UPDATE operations, allowing conditional insert or update of
data.
CALL: Invokes stored procedures or functions that may perform complex data
manipulations [68] .

Characteristics of Data Manipulation Operations


Transactional: Multiple operations can be grouped into a transaction, ensuring atomicity (all
succeed or none do) [65] .
8
t1
Read Consistency: Databases maintain consistency so users see a stable view of data
at
during manipulation [65] .
bh

Automated or Manual: Operations can be performed manually by users or automated by


scripts and applications [69] [64] .
dd
si

Strategic Steps in Data Manipulation


1. Database Creation: Gather and import data from various sources.
2. Data Cleansing: Restructure and clean data to ensure accuracy and consistency.
3. Data Combination: Merge data, remove redundancies, and organize for analysis.
4. Data Analysis: Extract useful insights by querying and aggregating data [69] [64] .

Importance of Data Manipulation Operations


Data Accessibility: Enables users to access and work with stored information.
Data Maintenance: Keeps data current, accurate, and relevant.
Business Insights: Supports reporting, analytics, and informed decision-making.
Operational Efficiency: Automates repetitive tasks and ensures data integrity [69] [64] [65] .
Summary Table
Operation SQL Example Common Use Case

INSERT INSERT INTO Table VALUES (...); Add new customer record

SELECT SELECT * FROM Table WHERE condition; Find all orders for a customer

UPDATE UPDATE Table SET col = val WHERE condition; Change product price

DELETE DELETE FROM Table WHERE condition; Remove inactive users

In summary:
Data manipulation operations (CRUD) are essential for interacting with and managing the data in
a DBMS. They allow users to add, retrieve, modify, and delete data, forming the backbone of all
database-driven applications [64] [65] [66] .

1. https://www.scaler.com/topics/data-abstractions-in-dbms/
2. https://www.tutorialspoint.com/what-is-data-abstraction-in-dbms
3. https://www.studocu.com/in/document/i-k-gujral-punjab-technical-university/database-management-s

8
ystem/dbms-notes-me/108887431 t1
4. https://www.techtarget.com/whatis/definition/data-abstraction
at
5. https://mrcet.com/downloads/digital_notes/CSE/II Year/DBMS.pdf
6. https://mrcet.com/downloads/digital_notes/IT/Database Management Systems.pdf
bh

7. https://mu.ac.in/wp-content/uploads/2021/08/USIT304-Database-Management-Systems.pdf
dd

8. https://herovired.com/learning-hub/blogs/data-abstraction/
9. https://www.purestorage.com/knowledge/what-is-data-abstraction.html
si

10. https://unstop.com/blog/data-independence-in-dbms
11. https://en.wikipedia.org/wiki/Data_independence
12. https://www.scaler.com/topics/data-independence-in-dbms/
13. https://www.slideshare.net/slideshow/database-management-systems-data-independence/266139115
14. https://www.tutorialspoint.com/dbms/dbms_data_independence.htm
15. https://www.guru99.com/dbms-data-independence.html
16. https://testbook.com/key-differences/difference-between-physical-and-logical-data-independence
17. https://www.tutorialspoint.com/physical-and-logical-data-independence
18. https://herovired.com/learning-hub/topics/data-independence-in-dbms/
19. https://www.techtarget.com/whatis/definition/Data-Definition-Language-DDL
20. https://www.secoda.co/glossary/what-is-a-data-definition-language-ddl
21. https://en.wikipedia.org/wiki/Data_definition_language
22. https://byjus.com/gate/data-definition-language-notes/
23. https://satoricyber.com/glossary/ddl-data-definition-language/
24. https://www.scaler.com/topics/ddl-in-dbms/
25. https://en.wikipedia.org/wiki/Data_manipulation_language
26. https://satoricyber.com/glossary/dml-data-manipulation-language/
27. https://www.datasunrise.com/knowledge-center/dml-data-manipulation-language/
28. https://staragile.com/blog/data-manipulation-language
29. https://byjus.com/gate/data-manipulation-language-dql-notes/
30. https://www.du.ac.in/du/uploads/departments/Operational Research/24042020_E-R Model.pdf
31. https://www.collaboard.app/en/blog/entity-relationship-model/
32. https://www.shiksha.com/online-courses/articles/er-model-in-dbms/
33. https://opendsa.cs.vt.edu/ODSA/Books/Database/html/ERDComponents.html
34. https://mebrahimii.github.io/comp440-fall2020/lecture/week_10/Database Design E-R Model.pdf
35. https://www.essaycorp.com/blog/advantages-and-disadvantages-of-an-er-model
36. https://www.techtarget.com/searchdatamanagement/definition/entity-relationship-diagram-ERD
37. https://www.scaler.com/topics/network-model-in-dbms/
38. https://www.upskillcampus.com/blog/network-database-management-system/
39. https://www.slideshare.net/slideshow/data-base-and-all-its-types/235565003
40. https://www.thoughtspot.com/data-trends/data-modeling/types-of-data-models
41. http://dbmsenotes.blogspot.com/2014/03/comparison-of-data-models-data-models.html

8
42. https://www.scribd.com/document/358623974/test-docx
t1
43. https://raima.com/network-model-vs-relational-model/
at
44. https://www.datamation.com/big-data/what-is-a-network-data-model-examples-pros-and-cons/
bh

45. https://mariadb.com/kb/en/understanding-the-relational-database-model/
dd

46. https://www.scaler.com/topics/dbms/relational-model-in-dbms/
47. https://en.wikipedia.org/wiki/Relational_database
si

48. https://byjus.com/gate/relational-model-in-dbms-notes/
49. https://www.tutorialspoint.com/differentiate-between-the-three-models-on-the-basis-of-features-and-
operations-dbms
50. https://azure.microsoft.com/en-au/resources/cloud-computing-dictionary/what-is-a-relational-databas
e
51. https://cloud.google.com/learn/what-is-a-relational-database
52. https://www.oracle.com/in/database/what-is-a-relational-database/
53. https://www.gartner.com/en/information-technology/glossary/object-data-model
54. https://www.scaler.com/topics/object-oriented-model-in-dbms/
55. https://byjus.com/gate/object-oriented-data-model-in-dbms-notes/
56. https://phoenixnap.com/kb/object-oriented-database
57. https://en.wikipedia.org/wiki/Object_database
58. https://www.mongodb.com/en-us/resources/basics/databases/what-is-an-object-oriented-database
59. https://www.youtube.com/watch?v=cw5R-CiEn6g
60. https://librarytechnology.org/document/7203
61. https://www.scaler.com/topics/dbms/integrity-constraints-in-dbms/
62. https://www.boardinfinity.com/blog/integrity-constraints-in-dbms/
63. https://www.almabetter.com/bytes/articles/integrity-constraints-in-dbms
64. https://www.indeed.com/career-advice/career-development/data-manipulation
65. https://www.tutorialspoint.com/sql_certificate/manipulating_data.htm
66. https://en.wikipedia.org/wiki/Create,_read,_update_and_delete
67. https://stackify.com/what-are-crud-operations/
68. https://ftpdocs.broadcom.com/cadocs/0/CA IDMS 18 5 User Bookshelf-
ENU/Bookshelf_Files/HTML/IDMS_SQL_Prog_ENU/1283243.html
69. https://www.solvexia.com/blog/5-top-tips-for-data-manipulation

8
t1
at
bh
dd
si
Relational Algebra in Database Management Systems (DBMS)
Relational algebra is a procedural query language that forms the theoretical foundation for
querying and manipulating data in relational databases. It provides a set of well-defined
operations that take one or more relations (tables) as input and produce a new relation as
output, allowing users to formulate and optimize queries efficiently [1] [2] [3] [4] .

Key Points:
Procedural: Specifies a sequence of operations to obtain the desired result.
Operates on relations (tables), producing new relations.
Forms the basis for SQL and query optimization in DBMS [2] [3] .

Importance of Relational Algebra

8
Foundation for Query Languages: Underpins SQL and other database query languages [2]
[4] .
t1
at
Query Optimization: Enables DBMS to optimize queries by transforming them into efficient
execution plans [3] .
bh

Data Manipulation: Allows selection, projection, combination, and transformation of data


dd

from multiple tables [1] [3] .


si

Fundamental Concepts
Term Definition

Relation A table with rows (tuples) and columns (attributes) [5] .

Tuple A single row in a relation [5] .

Attribute A column in a relation [5] .

Degree Number of attributes in a relation [5] .

Cardinality Number of tuples (rows) in a relation [5] .

Relation Schema Name of the relation with its attributes [5] .

Relation Instance Actual content (set of tuples) of a relation at a given time [5] .
Core Operations of Relational Algebra

1. Select (σ)
Purpose: Selects rows (tuples) from a relation that satisfy a specified condition.
Notation: $ \sigma_{condition}(Relation) $
Example: $ \sigma_{subject = "database"}(Books) $
Selects all books with the subject "database".
Properties: Unary operation (applies to a single relation) [1] [3] [6] [4] .

2. Project (π)
Purpose: Selects specific columns (attributes) from a relation.
Notation: $ \pi_{attribute1, attribute2, ...}(Relation) $
Example: $ \pi_{student_id, name}(Students) $
Extracts only student_id and name columns.
Properties: Reduces the number of columns, removes duplicates [3] [6] [4] .

8
3. Union (⋃) t1
Purpose: Combines tuples from two relations, removing duplicates.
at
Notation: $ Relation1 \cup Relation2 $
bh

Requirement: Both relations must have the same schema (same attributes and domains).
Example: $ Students_2023 \cup Students_2024 $ [3] [6] [4] .
dd

4. Set Difference (−)


si

Purpose: Returns tuples present in the first relation but not in the second.
Notation: $ Relation1 - Relation2 $
Requirement: Both relations must be union-compatible.
Example: $ Enrolled_Students - Graduated_Students $ [3] [6] [4] .

5. Cartesian Product (×)


Purpose: Combines each tuple of the first relation with every tuple of the second.
Notation: $ Relation1 \times Relation2 $
Result: Relation with all possible combinations of tuples.
Example: $ Students \times Courses $ [3] [6] [4] [7] .
6. Rename (ρ)
Purpose: Renames the output relation or its attributes.
Notation: $ \rho_{NewName}(Relation) $ or $ \rho_{NewName(A1, A2, ...)}(Relation) $
Example: $ \rho_{S}(Students) $ renames Students to S [1] [3] [7] .

Additional Operations

Intersection (∩)
Purpose: Returns tuples present in both relations.
Notation: $ Relation1 \cap Relation2 $ [3] [6] .

Division (÷)
Purpose: Finds tuples in one relation associated with all tuples in another.
Use Case: "Find students enrolled in all courses offered."
Notation: $ Relation1 \div Relation2 $ [3] .

8
Join Operations
t1
at
Natural Join (⨝): Combines tuples from two relations based on common attributes.
bh

Theta Join (⨝θ): Combines tuples based on a given condition.


Equi Join: A special case of theta join where the condition is equality [7] .
dd
si

Examples
SQL vs. Relational Algebra Example:
SQL:

SELECT student_id, name FROM Students WHERE grade = 'A';

Relational Algebra:

[3]
Best Practices
Start with simple operations and combine them for complex queries.
Use renaming to avoid confusion, especially after joins or products.
Understand schema compatibility requirements for set operations.
Practice with real database schemas to master query formulation [3] .

Conclusion
Relational algebra is a fundamental tool for formulating, optimizing, and understanding queries
in relational databases. Mastery of its operations enables efficient data manipulation and forms
the basis for advanced database concepts and query languages like SQL [1] [2] [3] [4] .

Tuple Relational Calculus (TRC) in DBMS


Tuple Relational Calculus (TRC) is a non-procedural, declarative query language used in
relational databases. Unlike relational algebra, which specifies how to retrieve data, TRC focuses

8
on what data to retrieve, using logical predicates to describe the desired result set [8] [9] [10] .
t1
at
Key Characteristics
bh

Declarative Language: Specifies what data to retrieve, not how to retrieve it [8] [9] [10] .
dd

Based on Predicate Logic: Utilizes first-order logic to form queries [9] [11] .
Tuple Variables: Uses variables (e.g., $ t $) that represent tuples (rows) in relations (tables)
si

[8] [9] [12] .

Foundation for SQL: TRC inspired the development of SQL [13] [10] .

Syntax and Structure


A typical TRC query is written as:

$ t $: Tuple variable ranging over a relation.


$ P(t) $: Predicate (logical condition) that must be satisfied for $ t $ to be included in the
result [8] [9] [14] .
Example:

This returns all tuples from the Student table where the age is less than 18 [8] .
Components of TRC
Tuple Variable: Represents a row in a relation (e.g., $ t $ in Student).
Predicate: Logical expression involving attributes of the tuple variable (e.g., $ t.age < 18 $).
Quantifiers:
Existential ($ \exists $): There exists a tuple satisfying a condition.
Universal ($ \forall $): All tuples satisfy a condition [8] [14] .
Logical Connectives: AND ($ \land \lor \lnot $) [8] [9] .

Types of Variables
Free Variable: Not bound by a quantifier; appears in the result [8] .
Bound Variable: Bound by a quantifier (exists or for all); used only within the predicate [8] .

Examples
1. Select All Customers with Zip Code 12345

8
t1
Returns all tuples from Customer where Zipcode is 12345 [9] .
at
bh

2. Select Customer IDs of All Customers


dd

Returns the customer_id of all customers [9] .


si

3. Select Names of Authors Who Wrote an Article on 'database'

Returns names from Author where the article is 'database' [14] .

Comparison with Other Query Languages


Feature Relational Algebra Tuple Relational Calculus

Type Procedural Declarative

Focus How to retrieve What to retrieve

Syntax Operations Logical predicates

Expressiveness Equivalent Equivalent

Both TRC and relational algebra are relationally complete, meaning any query expressible in one
can be expressed in the other [13] [14] [11] .
Key Points to Remember
TRC queries describe the set of tuples to be returned using logical formulas [8] [9] .
TRC is more about specifying conditions than specifying steps.
Uses tuple variables, logical connectives, and quantifiers.
Basis for understanding declarative query languages like SQL [13] [10] .

Summary Table: Tuple Relational Calculus


Aspect Description

Language Type Declarative (Non-procedural)

Variables Tuple variables (e.g., $ t $)

Syntax { t \mid P(t) }

Predicates Logical conditions on tuple attributes

Quantifiers Existential ($ \exists \forall $)

8
Output Set of tuples satisfying the predicate
t1
Example { t \mid t \in Student \land t.age < 18 }
at

In summary:
bh

Tuple Relational Calculus is a powerful, declarative query language in DBMS that allows users to
specify what data they want by describing conditions on tuples, rather than detailing the steps
dd

to retrieve it. It is foundational for understanding modern database query languages and
complements relational algebra in expressive power [8] [13] [9] [14] .
si

Domain Relational Calculus (DRC) in DBMS


Domain Relational Calculus (DRC) is a non-procedural, declarative query language used in
database management systems. Unlike relational algebra, which specifies how to retrieve data,
DRC focuses on what data to retrieve, describing queries using logical formulas based on
attribute domains rather than entire tuples [15] [16] [17] .

Key Characteristics
Declarative: Specifies what data to fetch, not the procedure to do so [15] [17] .
Domain Variables: Uses variables that represent individual attribute values (domains), not
entire rows [17] [18] [19] .
Predicate Logic: Employs first-order logic, including logical connectives (AND, OR, NOT)
and quantifiers (∃, ∀) [20] [21] .
Expressiveness: Equivalent in power to relational algebra and tuple relational calculus [20]
[17] .

Syntax of DRC
The general form of a DRC query is:

$ x_1, x_2, ..., x_n $: Domain variables, each ranging over the set of possible values for an
attribute [15] [16] [17] .
$ P(x_1, x_2, ..., x_n) $: A predicate (logical condition) that must be true for the variables to
be included in the result [15] [17] [21] .

Operators and Quantifiers


Logical Connectives:
AND ($ \land $)
OR ($ \lor $)

8
NOT ($ \lnot $) [20] [21]
t1
at
Quantifiers:
Existential ($ \exists $): "There exists"
bh

Universal ($ \forall $): "For all" [20] [21]


dd

Examples
si

1. Find Names of Students Aged 20


Given table: Students(ID, Name, Age)

ID Name Age

1 John 20

2 Sarah 22

3 Emily 19

4 Michael 21

DRC Query:

Result:
Name
John [15]

2. Find Names of Employees in IT Earning Over 55,000


Given table: Employee(ID, Name, Department, Salary)

ID Name Department Salary

1 Alice HR 50000

2 Bob IT 60000

3 Claire Finance 55000

4 David IT 65000

DRC Query:

Result:
Name
Bob
David [15]
8
t1
at
3. Find Customer IDs with Zip Code 12345
bh

Given table: Customer(Customer_id, Name, Zip_code)


dd

Customer_id Name Zip_code


si

1 Rohit 12345

2 Rahul 13245

3 Rohit 56789

4 Amit 12345

DRC Query:

Result:
Customer_id
1
4 [16]
Comparison: DRC vs. TRC
Aspect Domain Relational Calculus (DRC) Tuple Relational Calculus (TRC)

Variables Individual attribute values (domains) Whole tuples (rows)

Syntax $ { \langle x_1, x_2, ... \rangle \mid P(...) } $ $ { t \mid P(t) } $

Output List of attribute values Set of tuples

Use Case Column-wise queries Row-wise queries

Basis for Query-By-Example (QBE) SQL [17] [18] [19]

Additional Notes
DRC is foundational for visual query languages like Query-By-Example (QBE) [17] .
DRC queries are more abstract and do not specify the retrieval process, making them
suitable for high-level query formulation [15] [17] .
Both DRC and TRC are relationally complete-any query expressible in one can be written in
the other [20] [17] .

8
t1
Summary Table: Domain Relational Calculus
at
Feature Description
bh

Language
Declarative, non-procedural
Type
dd

Variables Domain variables (attribute values)


si

Syntax $ { \langle x_1, x_2, ..., x_n \rangle \mid P(x_1, x_2, ..., x_n) } $

Predicate Logical condition on domain variables

Quantifiers Existential ($ \exists \forall $)

Output Set of attribute values satisfying the predicate

$ { \langle Name \rangle \mid \exists ID, Age (\langle ID, Name, Age \rangle \in Students \land
Example
Age = 20) } $

In summary:
Domain Relational Calculus is a declarative query language in DBMS that uses domain variables
to specify what attribute values to retrieve from a database, using logical predicates and
quantifiers. It is powerful, expressive, and forms the basis for user-friendly query interfaces [15]
[16] [17] .


SQL3 Constructs in Database Management Systems
SQL3, also known as SQL:1999, is a major revision of the SQL standard that introduced a wide
range of advanced features to support complex, modern database applications. It is a superset
of earlier SQL standards and incorporates both object-oriented and procedural programming
concepts, greatly extending the expressive power and flexibility of SQL [22] [23] [24] .

Key Constructs and Features of SQL3:

1. User-Defined Data Types (UDTs)


Purpose: Allows users to define custom data types, enabling the modeling of complex data
directly within the database schema.
Examples: Structured types (similar to classes in OOP), distinct types (new types based on
existing ones), and reference types for object identity.
Use Case: A custom Address type with fields for street, city, and postal code [22] [24] .

2. Nested Tables and Collection Types


Nested Tables: Tables can have columns that themselves are tables, supporting hierarchical
and complex data modeling.

8
t1
Collection Types: Includes support for arrays, multisets (bags), and lists, with options for
ordered/unordered and allowing/disallowing duplicates [22] [23] .
at
bh

3. Large Object Types (BLOB, CLOB)


BLOB (Binary Large Object): For storing large binary data such as images, audio, or video.
dd

CLOB (Character Large Object): For storing large text data.


si

ARRAY: Enables storing arrays as column values [25] .

4. Object-Oriented Features
Inheritance: Supports subtypes and supertypes, allowing table hierarchies (e.g., a Staff
supertable with Lecturer and Admin as sub-tables) [23] .
Methods and Constructors: Objects can encapsulate both data and behavior, similar to
OOP languages [22] [26] .

5. Stored Procedures and Functions (SQL/PSM)


SQL/PSM (Persistent Stored Modules): Enables procedural programming within SQL,
supporting control-flow statements (IF, CASE, LOOP, etc.).
User-Defined Functions: Both scalar and table-valued functions can be created [22] [23] [24] .
6. Triggers
Definition: Procedures that automatically execute in response to specific events (INSERT,
UPDATE, DELETE) on a table.
Use Case: Enforcing business rules or maintaining audit logs [22] [24] .

7. Recursive Queries
WITH RECURSIVE Clause: Enables writing recursive queries, which are essential for
handling hierarchical data like organizational charts or bill-of-materials structures [22] [24] .

8. Reference Types and Object Identity


Reference Types: Allow columns to store references (pointers) to objects in other tables,
supporting object identity and relationships [23] [26] .

9. Temporary Tables
Temporary Tables: Support for creating tables that exist only for the duration of a session
or transaction, useful for intermediate results and complex computations [24] .

8
Summary Table: Major SQL3 Constructs
t1
at
Feature Description & Use Case
bh

User-Defined Types (UDT) Custom types for complex data modeling


dd

Nested Tables Hierarchical data representation

Large Object Types BLOB and CLOB for multimedia and large text
si

Object-Oriented Support Inheritance, methods, object identity

Stored Procedures/Funcs Procedural logic and reusable code blocks

Triggers Automated actions on data changes

Recursive Queries Hierarchical and graph data traversal

Reference Types Object references for relationships

Temporary Tables Session- or transaction-scoped tables

Collection Types Arrays, multisets, and lists for flexible data storage

Advantages of SQL3
Enhanced Flexibility: Supports complex and hierarchical data structures.
Improved Performance: Stored procedures and triggers allow logic to reside closer to the
data.
Better Data Modeling: Object-oriented features and UDTs enable more natural
representation of real-world entities.
Efficient Hierarchical Data Management: Recursive queries and nested tables simplify
handling of tree and graph structures [22] [24] .

Conclusion:
SQL3 represents a significant leap in SQL’s capabilities, enabling advanced data modeling,
procedural programming, and object-oriented features within relational databases.
Understanding these constructs is essential for leveraging the full power of modern DBMS in
complex applications [22] [23] [24] .

Data Definition Language (DDL) Constructs in DBMS


Data Definition Language (DDL) is a subset of SQL used to define, modify, and remove the
structure of database objects such as tables, schemas, indexes, and views. DDL commands are
fundamental for designing and managing the schema of a database, as they specify how data is
organized and how relationships among data are maintained [27] [28] [29] [30] [31] [32] .

8
t1
Key Characteristics of DDL
at
Schema Definition: DDL is used to create and modify the structure of database objects
bh

(tables, views, indexes, schemas).


Auto-commit: DDL commands are auto-committed, meaning changes are saved
dd

permanently and cannot be rolled back [28] .


No Data Manipulation: DDL deals with the structure, not the actual data (unlike DML, which
si

manipulates data) [29] [30] .


Immediate Effect: Changes take effect immediately upon execution [29] .

Main DDL Commands

1. CREATE
Purpose: Creates new database objects such as tables, indexes, views, or entire
databases.
Usage: Defines columns, data types, and constraints.
Syntax:
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
...
);
Example:
CREATE TABLE Employees (
EmployeeID INT,
FirstName VARCHAR(255),
LastName VARCHAR(255),
Department VARCHAR(255)
);

Other Uses: CREATE INDEX, CREATE VIEW, CREATE SCHEMA [31] [33] [34] .

2. ALTER
Purpose: Modifies the structure of existing database objects.
Usage: Add, modify, or drop columns; rename objects; add or remove constraints.
Syntax:
ALTER TABLE table_name ADD column_name datatype;

Example:

8
ALTER TABLE Employees ADD Salary INT;
t1
at
Other Uses: ALTER INDEX, ALTER VIEW [31] [33] [34] .
bh

3. DROP
dd

Purpose: Deletes database objects completely from the database.


si

Usage: Removes tables, views, indexes, or databases.


Syntax:
DROP TABLE table_name;

Example:
DROP TABLE Employees;

Effect: Removes both structure and all data within the object [30] [31] .

4. TRUNCATE
Purpose: Removes all records from a table, but retains the table structure for future use.
Usage: Fast way to delete all rows without deleting the table itself.
Syntax:
TRUNCATE TABLE table_name;

Example:
TRUNCATE TABLE Employees;

Effect: Table remains, but is empty; structure and schema are preserved [31] [34] .

5. Additional DDL Features


Constraints: DDL allows defining constraints such as PRIMARY KEY, FOREIGN KEY,
UNIQUE, and CHECK to enforce data integrity at the schema level [29] .
Indexes: DDL can create and drop indexes to improve query performance.
Comments/Descriptions: Some DBMSs allow using DDL to add comments or labels to
database objects [29] .

Summary Table: DDL Commands

8
Command Purpose Example Syntax
t1 Effect

CREATE Create new object CREATE TABLE Students (...); Adds a new table or object
at
ALTER TABLE Students ADD
ALTER Modify existing object Changes structure of an object
bh

Age INT;

Removes object and its data


dd

DROP Delete object DROP TABLE Students;


permanently

Remove all data, keep Deletes all rows, keeps table


si

TRUNCATE TRUNCATE TABLE Students;


structure definition

Key Points to Remember


DDL is crucial for database schema design and management.
DDL commands change the structure, not the data.
All changes are auto-committed and take effect immediately.
Common DDL commands: CREATE, ALTER, DROP, TRUNCATE [27] [28] [30] [31] [34] .
DDL also supports defining constraints and indexes for data integrity and performance.

In summary:
DDL constructs are the foundation of database structure in DBMS, allowing you to create,
modify, and remove tables and other objects, define relationships, and enforce data integrity
through constraints and indexes. Mastery of DDL is essential for effective database design and
administration.

Data Manipulation Language (DML) Constructs in DBMS
Data Manipulation Language (DML) is a subset of SQL used to manage and manipulate data
stored within database tables. DML enables users to perform operations such as inserting,
updating, deleting, and retrieving data, making it essential for day-to-day database
interactions [35] [36] [37] [38] [39] [40] .

Key Features of DML


Purpose: Manipulates the actual data in database tables, not the structure.
CRUD Operations: Supports Create, Read, Update, and Delete functionalities.
Transactional: DML operations are not auto-committed; changes can be rolled back until
explicitly committed [35] [36] .
Frequency: DML commands are used frequently in routine database operations [38] .

Primary DML Commands

1. SELECT
8
t1
Purpose: Retrieves data from one or more tables.
at
Syntax:
bh

SELECT column1, column2 FROM table_name WHERE condition;


dd

Example:
si

SELECT * FROM Employees WHERE Department = 'Marketing';

Notes: Can be used with clauses like WHERE, ORDER BY, GROUP BY, and JOIN to filter and
organize results [41] [36] [38] [39] [40] .

2. INSERT
Purpose: Adds new records (rows) to a table.
Syntax:
INSERT INTO table_name (column1, column2, ...) VALUES (value1, value2, ...);

Example:
INSERT INTO Employees (Name, Age, Department) VALUES ('Ankit Roy', 62, 'SEO');

Notes: Can insert single or multiple rows [35] [36] [39] [40] [42] .
3. UPDATE
Purpose: Modifies existing records in a table.
Syntax:
UPDATE table_name SET column1 = value1, column2 = value2 WHERE condition;

Example:
UPDATE Employees SET Department = 'HR' WHERE Name = 'Shriyansh Tiwari';

Notes: Use WHERE clause to specify which rows to update; omitting WHERE updates all
rows [36] [38] [39] [40] .

4. DELETE
Purpose: Removes records from a table.
Syntax:
DELETE FROM table_name WHERE condition;

8
t1
Example:
at
DELETE FROM Employees WHERE Age > 60;
bh

Notes: Use WHERE clause to avoid deleting all rows [36] [38] [39] [40] .
dd

Summary Table: DML Commands


si

Command Purpose Example Syntax

SELECT Retrieve data SELECT * FROM Employees WHERE Department = 'Marketing';

INSERT Add data INSERT INTO Employees (Name, Age, Department) VALUES ('Ankit', 62, 'SEO');

UPDATE Modify data UPDATE Employees SET Department = 'HR' WHERE Name = 'Shriyansh';

DELETE Remove data DELETE FROM Employees WHERE Age > 60;

Additional Notes
Transactions: DML operations can be grouped into transactions using COMMIT and
ROLLBACK for data integrity [35] [36] [38] .
MERGE Command: Some DBMSs support MERGE for conditional insert/update (upsert)
operations [43] .
DML vs. DDL: DML manipulates data; DDL (Data Definition Language) defines and modifies
table structures [37] [38] .
In summary:
DML constructs are vital for interacting with and managing the actual data in a database.
Mastery of SELECT, INSERT, UPDATE, and DELETE commands is essential for effective
database usage and application development [35] [36] [38] [39] [40] .

Open Source & Commercial DBMS: Detailed Notes


Understanding the distinction between open source and commercial (proprietary/licensed)
Database Management Systems (DBMS) is essential for choosing the right platform for different
organizational needs. Both types have unique features, benefits, and limitations.

Definitions
Open Source DBMS:
Software whose source code is freely available for anyone to view, modify, and distribute.
Examples: MySQL, PostgreSQL, MongoDB, Cassandra, Redis [44] [45] [46] [47] .
Commercial (Proprietary) DBMS:

8
Software developed by companies for commercial purposes, requiring a paid license for
t1
use. Source code is closed and only accessible to authorized users. Examples: Oracle,
Microsoft SQL Server, IBM Db2, Snowflake [44] [45] [46] .
at
bh

Key Differences
dd

Aspect Open Source DBMS Commercial DBMS


si

Requires paid licenses, often


Cost Free or minimal cost [44] [46] [47]
expensive [44] [46]

Source Code Open and modifiable by anyone [44] [46] [47] Closed, only vendor can modify [44] [46]

Support Community-driven, sometimes limited [44] [46] Professional, vendor-backed support [44]
[47] [46]

Rapidly improving, may lack some advanced Comprehensive, advanced, often


Features
features [44] [48] [46] enterprise-grade [48] [46]

Limited customization, vendor-


Customization Highly customizable [44] [46] [47]
controlled [44] [46]

Transparent, but may have security risks if not Strict controls, robust security
Security
managed [46] features [46]

Vendor managed, regular and


Updates User/community managed [46]
planned [46]

Requires in-house expertise for Vendor handles much of the technical


Technical Skill
setup/maintenance [46] complexity [46]
Advantages & Disadvantages

Open Source DBMS


Advantages:
Cost-effective, with little to no licensing fees [44] [46] [47]
Flexibility and customization for unique requirements [44] [46] [47]
Large, active communities for support and rapid bug fixes [44] [47]
Transparent codebase allows for security inspection [46]
Disadvantages:
Support is primarily community-based, which may not meet enterprise needs [44] [46]
May lack some advanced features found in commercial products [48] [46]
Potential compatibility and integration issues [46]
Security and reliability depend on community vigilance [46]

Commercial DBMS
Advantages:
8
t1
at
Professional, dedicated support and service-level agreements [44] [46]
Advanced, enterprise-grade features (analytics, security, scalability) [48] [46]
bh

Regular, vendor-managed updates and patches [46]


dd

Better suited for mission-critical and large-scale enterprise applications [44] [46]
Disadvantages:
si

High licensing and maintenance costs [44] [46]


Less flexibility for customization [44] [46]
Strict licensing and usage restrictions [46]
Vendor lock-in risk [46]

Similarities
Both support SQL and standard database operations [48] .
Both can handle large volumes of data and complex data structures [46] .
Both are capable of supporting mission-critical applications [48] [46] .
Popular Examples (May 2025 Rankings)
Open Source DBMS Rank Commercial DBMS Rank

MySQL 2 Oracle 1

PostgreSQL 4 Microsoft SQL Server 3

MongoDB 5 Snowflake 6

Redis 7 IBM Db2 8

Elasticsearch 9 Databricks 12 [45]

Use Case Scenarios


Startups & Educational Institutions:
Often prefer open source DBMS for cost savings, customization, and community
support [46] .
Large Enterprises:
Typically choose commercial DBMS for advanced features, guaranteed support, and robust
security [46] .
Hybrid Approaches:
8
t1
Some organizations use open source DBMS for development/testing and commercial DBMS
at
for production.
bh

Trends
dd

Open source DBMS have matured significantly and are now considered viable for many
si

enterprise applications [44] .


The gap in features and support between open source and commercial DBMS is narrowing,
making open source increasingly attractive [44] [45] .
Commercial vendors now sometimes offer support for open source products, reflecting the
growing importance of open source in the DBMS market [44] .

In summary:
Open source DBMS offer cost savings, flexibility, and community support, making them ideal for
smaller organizations and those needing customization. Commercial DBMS provide advanced
features, professional support, and robust security, making them suitable for large enterprises
and mission-critical applications. The choice depends on organizational needs, budget, and
technical expertise [44] [46] [47] .

MySQL: Detailed Notes for DBMS

Introduction
MySQL is a widely used open-source relational database management system (RDBMS)
that implements the Structured Query Language (SQL) for managing and manipulating
data [49] .
It is available in two main editions: the open-source MySQL Community Server and the
proprietary MySQL Enterprise Server, which includes additional advanced features and
support [49] [50] .

Architecture of MySQL
MySQL follows a client-server architecture with three main layers [51] [52] [53] :

Layer Description

Users interact with MySQL through command-line, GUI tools, or APIs (e.g., MySQL Workbench).
Client End
Handles connection requests, authentication, and security [51] [52] [53] .

8
Server The core of MySQL. Handles query parsing, optimization, caching, thread management, built-in
t1
Layer functions, stored procedures, triggers, and views [51] [52] [53] .
at
Storage Responsible for actual data storage and retrieval. MySQL uses a pluggable storage engine
Engine architecture, allowing different engines (e.g., InnoDB, MyISAM) to be used per table [51] [54] .
bh

Key Points:
dd

The pluggable storage engine architecture lets you select the best storage engine for
specific application needs (e.g., transactions, high availability, analytics) [54] .
si

The server layer provides a consistent API, shielding applications from the complexities of
the underlying storage engines [54] [53] .

Major Features of MySQL


Cross-Platform Support: Runs on Windows, Linux, macOS, and others [49] .
SQL Compliance: Implements a broad subset of ANSI SQL 99, with some extensions [49] .
Storage Engines: Multiple engines (InnoDB, MyISAM, Memory, NDB Cluster, etc.) for
different workloads and requirements [49] [54] .
ACID Compliance: Ensured when using InnoDB and NDB Cluster engines [49] .
Transactions & Savepoints: Reliable transaction support with commit, rollback, and
savepoints [49] .
Replication: Supports master-slave, semi-synchronous, and synchronous (multi-master via
Group Replication or Galera Cluster) replication for high availability and scalability [49] .
Stored Procedures & Triggers: Supports procedural SQL and automation of tasks [49] .
Views & Cursors: Allows logical data abstraction and row-by-row data processing [49] .
Partitioning: Supports partitioned tables for better performance on large datasets [49] .
Full-Text Search: Enables efficient searching within text columns [49] .
Security: Advanced features include authentication, authorization, SSL support, and
auditing (especially in Enterprise Edition) [55] [50] .
Performance Schema: Collects and aggregates statistics for monitoring and tuning [49] .
Pluggable Storage Engines: Choose the most suitable engine for each table without
changing application code [54] .
Unicode Support: Handles multiple character sets and collations [49] .
Online DDL: Allows schema changes without downtime when using InnoDB [49] .
JSON Support: Stores and queries JSON documents [56] .
HeatWave ML: In-database machine learning and AI/ML integration (recent versions) [57] .

Editions of MySQL
MySQL Community Edition: Free, open-source, widely used for web and small to medium-

8
scale applications [49] . t1
MySQL Enterprise Edition: Paid, includes advanced security, backup, monitoring, and
support features [50] [49] [56] .
at
bh

Use Cases
dd

Web applications (e.g., WordPress, Facebook)


si

Data warehousing and analytics


E-commerce platforms
High-availability and clustered environments

Advantages
Free and open-source (Community Edition)
High performance and scalability
Flexible storage engine options
Large community and ecosystem
Strong security features (especially in Enterprise Edition)
Easy integration with programming languages and tools
Summary Table: MySQL Key Points
Aspect Description

Type Relational DBMS, open source (with commercial edition)

Architecture Client-Server, pluggable storage engines

Key Features Transactions, replication, security, partitioning, full-text search, JSON

Storage Engines InnoDB, MyISAM, Memory, NDB Cluster, others

Editions Community (free), Enterprise (paid, advanced features)

Use Cases Web apps, analytics, e-commerce, high-availability systems

In summary:
MySQL is a robust, flexible, and widely adopted RDBMS known for its pluggable architecture,
strong security, and support for both transactional and analytical workloads. Its open-source
nature, combined with enterprise-grade features, makes it suitable for a broad range of
applications, from small websites to large-scale, mission-critical systems [49] [56] [54] .

8
Oracle Database: Detailed Notes for DBMS
t1
at
bh

Introduction
dd

Oracle Database is a powerful, multi-model relational database management system


(RDBMS) developed by Oracle Corporation.
si

It is widely used in enterprise environments for its scalability, reliability, advanced features,
and support for both transactional (OLTP) and analytical (OLAP) workloads [58] [59] [60] .

Oracle Database Architecture


Oracle Database architecture is divided into physical structures, logical structures, and
instance components.

1. Physical Structures
These are files stored on the server’s disk:
Control Files:
Store metadata about the database (database name, data file locations, redo log locations)
and are essential for startup and recovery [61] .
Data Files:
Store actual user and system data. Data files are organized into logical units called
tablespaces [61] .
Redo Log Files:
Record all changes made to the database, enabling recovery in case of system failure [61] .

2. Logical Structures
These provide an abstraction over the physical files:
Tablespaces:
Logical storage units made up of one or more data files. Used to organize and manage data
efficiently [61] .
Segments, Extents, and Blocks:
Segments represent database objects (tables, indexes).
Extents are collections of contiguous blocks.
Blocks are the smallest unit of data storage [61] .
Schemas:
Collections of database objects (tables, views, indexes) owned by a user, helping organize
and manage objects systematically [61] .

8
t1
at
3. Oracle Instance Components
bh

An Oracle instance consists of memory structures and background processes running on a


server.
dd

Memory Structures
si

System Global Area (SGA):


Shared memory area containing:
Database Buffer Cache: Recently used data blocks.
Shared Pool: Parsed SQL statements, data dictionary cache.
Redo Log Buffer: Stores redo entries for recovery.
Large Pool, Java Pool, In-Memory Area: For specialized tasks and performance [62] [61]
[63] [60] [64] .

Program Global Area (PGA):


Private memory allocated to each server or background process, holding session-specific
data like sort areas and session variables [61] [63] [60] [64] .
Processes
Client Processes:
Run application code, interact with the Oracle instance via server processes [60] .
Server Processes:
Handle client requests, execute SQL, and interact with the database [60] .
Background Processes:
Perform maintenance and support tasks, such as:
DB Writer (DBWn): Writes modified data from SGA to data files.
Log Writer (LGWR): Writes redo log buffer to redo log files.
System Monitor (SMON): Performs crash recovery.
Process Monitor (PMON): Cleans up failed processes.
Checkpoint (CKPT), Archiver (ARCn), and others [62] [63] [60] [64] .

Key Features of Oracle Database


Multitenant Architecture:

8
Supports pluggable databases (PDBs) within a single container database (CDB), enhancing
t1
consolidation and cloud deployment [58] .
In-Memory Column Store:
at
Allows data to be stored in memory in a columnar format for faster analytics and
bh

reporting [58] [62] .


Advanced Security:
dd

Includes robust authentication, authorization, auditing, and encryption features [59] .


si

High Availability:
Supports Real Application Clusters (RAC), Data Guard, and Flashback technologies for fault
tolerance and disaster recovery [64] .
Scalability:
Efficiently handles large-scale enterprise workloads and supports clustering across multiple
servers [64] .
Support for Object-Relational Features:
User-defined types, inheritance, and polymorphism [60] .
Comprehensive Analytics:
Built-in support for data warehousing, analytics, machine learning, and JSON data [58] [59] .
Backup and Recovery:
Advanced tools like Recovery Manager (RMAN) for backup, restore, and recovery [62] [63] .
Partitioning:
Enables management of large tables and indexes for performance optimization.
Advantages of Oracle Database
Enterprise-grade reliability and performance
Advanced security and compliance features
Extensive support for high availability and disaster recovery
Rich feature set for both OLTP and OLAP workloads
Scalable to support large, mission-critical applications

Oracle Database Use Cases


Banking and financial services
Telecommunications
E-commerce and retail
Healthcare and government
Large-scale data warehousing and analytics

8
Summary Table: Oracle Database Architecture t1
Component Description
at
Control Files Metadata, essential for startup and recovery
bh

Data Files Store user/system data, organized in tablespaces


dd

Redo Log Files Record all changes for recovery

Tablespaces Logical storage units, manage data organization


si

SGA Shared memory: buffer cache, shared pool, redo log buffer, etc.

PGA Private memory for server/background processes

Background
DB Writer, Log Writer, System Monitor, Process Monitor, etc.
Processes

Multitenant Pluggable databases within a container database

High Availability RAC, Data Guard, Flashback, backup and recovery tools

In-memory analytics, object-relational support, security, partitioning, analytics,


Advanced Features
machine learning

In summary:
Oracle Database is a robust, enterprise-class RDBMS with a sophisticated architecture, high
availability, advanced security, and comprehensive analytics capabilities. Its modular design,
support for both transactional and analytical workloads, and extensive feature set make it a
leading choice for mission-critical applications in large organizations [58] [59] [61] [60] [64] .

DB2 in Database Management Systems: Detailed Notes

Introduction
IBM Db2 (Database 2) is a family of data management products, including relational
database management systems (RDBMS), developed by IBM.
Db2 is widely used in enterprise environments, especially on mainframes, but is also
available for distributed systems and cloud platforms.
It is designed for high performance, scalability, reliability, and robust data management.

Db2 Architecture Overview


Db2 architecture is component-based and supports both local and distributed environments. Its
core components and processes are designed for efficient data processing, high concurrency,
and reliability.

1. Client-Server Model

8
Clients: Applications (local or remote) interact with the Db2 server via the Db2 client library.
t1
Local clients use shared memory and semaphores for communication.
at
Remote clients use protocols like TCP/IP or named pipes [65] [66] .
bh

Server: Manages all database activities and resources.


dd

2. Engine Dispatchable Units (EDUs)


EDUs are threads responsible for processing SQL and XQuery requests, managing I/O, and
si

handling background tasks.


Db2 Agents: The most common EDUs, executing most SQL/XQuery processing for
applications.
Prefetchers: EDUs that read data from disk into buffer pools before it's needed, improving
performance for large data scans.
Page Cleaners: EDUs that write modified pages from buffer pools back to disk, ensuring
space for new data and maintaining data integrity [65] [66] .

3. Buffer Pools
Buffer pools are memory areas where pages of user data, indexes, and catalog data are
temporarily stored and modified.
They are crucial for performance, as accessing data from memory is much faster than from
disk.
Prefetchers and page cleaners optimize buffer pool usage by managing data flow between
disk and memory [65] [66] .
4. Table Spaces and Storage
Table Spaces: Logical storage units that map to physical storage on disk.
Types: Regular table spaces (user data), index table spaces (indexes), LOB table
spaces (large objects) [67] .
Pages: The smallest unit of storage, typically 4 KB or 32 KB in size.
Storage Groups: Collections of disk volumes used to manage where data is physically
stored [67] .

5. Indexes
Improve data retrieval speed by allowing quick lookups.
Stored in separate index table spaces.
Can be created on one or more columns to optimize SELECT, UPDATE, DELETE, and MERGE
operations [67] .

6. Logging and Recovery


System Log: Maintains records of all changes for recovery and auditing.

8
Active and Archive Logs: Db2 maintains multiple copies for redundancy and disaster
t1
recovery [67] .
at
Supports backup and restore operations, ensuring data durability.
bh

7. Locking and Concurrency


dd

Locking Services (IRLM): Manage concurrent data access and resolve deadlocks, ensuring
data consistency and isolation [67] .
si

Supports various isolation levels for transaction management.

8. Main Components in Mainframe Db2


Component Function

Manages connections, startup/shutdown, logging, and interaction with other


System Services (SSAS)
subsystems

Database Services Handles SQL execution, data manipulation, buffer management, and core DBMS
(DBAS) logic

Locking Services (IRLM) Manages locks and resolves deadlocks for concurrent access

Buffer Manager Manages buffer pools and coordinates data movement between disk and memory

Data Manager Analyzes and accesses rows or index data

Relational Data System Checks authorization, parses and optimizes SQL, creates access paths
Db2 pureScale Feature
Db2 pureScale is designed for high availability and scalability in clustered environments.
Multiple Db2 members (nodes) process requests in parallel, sharing access to the same
database on shared disk.
Supports up to 128 members, each with its own buffer pools and log files, enabling
continuous availability and workload balancing [68] .

Key Features of IBM Db2


High Performance: Efficient memory management, buffer pools, and parallel processing.
Scalability: Supports large databases and high transaction volumes; pureScale for
clustering.
Reliability: Advanced logging, backup, and recovery mechanisms.
Advanced SQL Support: Includes support for SQL, XQuery, and analytics.
Security: Fine-grained access control, auditing, and encryption.
Data Warehousing: Optimized for OLTP and OLAP workloads, with features for data

8
warehousing and analytics [69] . t1
Cross-Platform: Available on mainframes (z/OS), Linux, UNIX, Windows, and cloud.
at
bh

Summary Table: Db2 Architecture Components


dd

Component Description

Client Library Interface for applications to connect (local/remote)


si

EDUs Threads handling SQL, I/O, background tasks

Buffer Pools Memory areas for caching data and indexes

Prefetchers Load data from disk to buffer pool in advance

Page Cleaners Write modified data from buffer pool to disk

Table Spaces Logical storage units for organizing data

Indexes Speed up data retrieval

Logging Maintains change history for recovery

Locking Services Manages concurrency and deadlocks

pureScale Clustered, highly available configuration

In summary:
IBM Db2 is a robust, enterprise-grade RDBMS known for its modular architecture, high
performance, reliability, and advanced features for both OLTP and analytical workloads. Its
architecture, with components like EDUs, buffer pools, and pureScale, ensures efficient data
processing, scalability, and continuous availability, making it a preferred choice for mission-
critical applications [65] [67] [68] [66] .

SQL Server in Database Management System: Detailed Notes

Introduction
Microsoft SQL Server is a widely used relational database management system (RDBMS)
developed by Microsoft.
It supports storing, retrieving, and managing data for enterprise applications, with advanced
features for security, scalability, and high availability.

SQL Server Architecture Overview


SQL Server employs a multi-layered client-server architecture designed for performance,
scalability, and reliability. The primary architectural components are:

8
Component
t1 Function
at
Protocol Layer Manages communication between clients and the SQL Server instance.
bh

Relational
Processes queries, optimizes execution plans, and handles transactions and security.
Engine
dd

Storage Engine Manages physical storage, retrieval, and manipulation of data on disk.

Provides operating system-like services (memory, scheduling, I/O) for SQL Server
si

SQLOS
processes.

1. Protocol Layer
Role: Handles all client-server communication.
Supported Protocols:
Shared Memory: For local connections.
TCP/IP: For remote and networked connections.
Named Pipes: For LAN environments.
Tabular Data Stream (TDS): The protocol for data transfer between client and
server [70] [71] [72] .
Function: Receives requests from clients, packages them, and passes them to the
Relational Engine.
2. Relational Engine (Query Processor)
Role: Responsible for query processing, optimization, and execution.
Key Components:
Query Parser: Checks syntax and translates T-SQL statements into internal
representations.
Query Optimizer: Generates the most efficient execution plan based on statistics and
indexes.
Query Executor: Executes the plan, interacting with the Storage Engine as needed [73]
[70] [71] [72] [74] .

Functions:
Processes DDL, DML, and other SQL statements.
Manages transactions, security, and user permissions.
Formats results for client applications.

3. Storage Engine

8
Role: Handles actual storage and retrieval of data from disk.
t1
Responsibilities:
at
Reads/writes data pages to/from disk.
bh

Manages indexes, locking, and transaction logs.


Ensures data integrity and supports ACID properties [70] [71] [72] .
dd

Components:
si

Data File Architecture: Organizes data into files and filegroups for efficient access.
Log File Architecture: Maintains transaction logs for recovery and consistency.
Buffer Pool: Caches frequently accessed data for performance.
Lock Manager: Controls concurrent access and resolves deadlocks.

4. SQLOS (SQL Server Operating System)


Role: Acts as an abstraction layer between SQL Server and the Windows OS.
Functions:
Manages memory allocation, thread scheduling, and I/O operations.
Provides services like deadlock detection, buffer management, and exception
handling [73] [72] .
Benefit: Allows SQL Server to optimize resource usage independently of the underlying OS.
Memory Architecture
Buffer Pool: Stores data pages and index pages in memory to reduce disk I/O.
Procedure Cache: Stores execution plans for queries and stored procedures.
Log Cache: Temporarily holds transaction log records before writing to disk.
Other Areas: System-level data, connection context, and stack space for threads [75] [72] .

File Architecture
Data Files: Store actual table data and indexes; grouped into filegroups for manageability.
Log Files: Store transaction logs for recovery and rollback.
Filegroups: Logical groupings of data files to optimize performance and management [75]
[72] .

Types of SQL Server Deployments


Standalone: Single server instance.

8
Clustered: Multiple servers for high availability. t1
Mirrored: Redundant copy of the database for disaster recovery.
at
AlwaysOn Availability Groups: Synchronized databases across servers for high availability
and disaster recovery [75] .
bh
dd

Key Features of SQL Server


si

ACID Compliance: Ensures reliable transactions.


Security: Advanced authentication, encryption, and auditing.
High Availability: Clustering, mirroring, and AlwaysOn features.
Scalability: Handles large databases and high transaction loads.
Integration: Works with .NET, Azure, and other Microsoft technologies.
Tools: Includes SQL Server Management Studio (SSMS) for administration and
development.

Summary Table: SQL Server Architecture


Layer/Component Functionality

Protocol Layer Client-server communication (Shared Memory, TCP/IP, Named Pipes, TDS)

Relational Engine Query parsing, optimization, execution, transaction and security management

Storage Engine Data storage/retrieval, index management, logging, locking, ACID compliance
Layer/Component Functionality

SQLOS Memory, thread, and I/O management; abstraction from Windows OS

Buffer Pool In-memory cache for data and index pages

File Architecture Data files, log files, filegroups

In summary:
SQL Server is a robust, enterprise-grade RDBMS with a multi-layered architecture that
separates client communication, query processing, and data storage. Its advanced features for
security, scalability, and high availability make it a popular choice for mission-critical applications
in organizations of all sizes [73] [70] [75] [71] [72] [74] .

Domain Dependency in Database Management System (DBMS)

What is a Domain in DBMS?

8
A domain in DBMS refers to the set of all possible, valid values that an attribute (column) of
t1
a table can have [76] [77] .
at
Each attribute in a database schema is associated with a domain, which defines its data
type, possible range, length, and other constraints [76] [78] [77] .
bh

Examples of domains:
dd

For an attribute Age, the domain could be all integers from 0 to 120.
For an attribute Email, the domain could be all valid email addresses.
si

Domain Constraints
Domain constraints (or domain integrity constraints) are rules that restrict the type of data
that can be stored in a column, ensuring data consistency and correctness [76] [78] .
These constraints specify:
The data type (e.g., integer, character, date)
The format or pattern (e.g., phone number format)
The range or set of allowed values (e.g., gender can be 'M' or 'F')
Whether NULL values are allowed
Types of Domain Constraints:
NOT NULL: Ensures that a column cannot have NULL (missing) values [76] [78] .
CHECK: Restricts the values of a column to satisfy a specific condition (e.g., salary > 0) [76]
[78] .
Custom Domains: Can be created using SQL’s CREATE DOMAIN command to define reusable
data types with specific constraints [76] .

What is Domain Dependency?


Domain dependency in DBMS refers to the relationship between an attribute and its
domain, meaning that the values stored in an attribute must always belong to its defined
domain [76] [77] .
This dependency ensures that only valid, meaningful, and expected data is entered into the
database, maintaining data integrity.
Example:
If the Age attribute is defined to have the domain of positive integers, then entering a
negative number or a string would violate the domain dependency.
Violation of Domain Dependency:
Occurs when a value outside the defined domain is entered, leading to data inconsistency
and potential errors.

8
Why is Domain Dependency Important? t1
Data Integrity: Ensures only valid data is stored, preventing errors and inconsistencies [78] .
at
Data Validation: Automatically checks and rejects invalid data entries.
bh

Consistency: Maintains uniformity of data across the database.


Ease of Maintenance: Simplifies enforcing business rules and data standards.
dd
si

How to Implement Domain Dependency


By defining appropriate domain constraints at the time of table creation:
CREATE TABLE Student (
RollNo INT NOT NULL,
Name VARCHAR(50) NOT NULL,
Age INT CHECK (Age >= 0 AND Age <= 120)
);

By using the CREATE DOMAIN statement in SQL to define reusable domains:


CREATE DOMAIN AgeDomain AS INT CHECK (VALUE >= 0 AND VALUE <= 120);
Summary Table: Domain Dependency
Aspect Description

Domain Set of valid values for an attribute

Domain Constraint Rule that enforces attribute values to belong to their domain

Domain Dependency Attribute’s values must always come from its defined domain

Implementation NOT NULL, CHECK, and custom domain definitions in SQL

Importance Ensures data integrity, validation, and consistency

In summary:
Domain dependency in DBMS enforces that each attribute’s values must conform to its defined
domain, ensuring data integrity and consistency by restricting invalid or unexpected data from
being stored in the database [76] [78] [77] .

Data Dependency in Database Management System (DBMS)

8
t1
What is Data Dependency?
at
Data dependency in DBMS refers to a relationship between attributes (columns) in a
bh

database table where the value of one or more attributes uniquely determines the value of
one or more other attributes [79] [80] [81] .
dd

In simple terms, some data values are dependent on other data values to be recognized or
derived [81] .
si

Data dependencies are foundational to database design, normalization, and ensuring data
integrity.

Why Are Data Dependencies Important?


Normalization: Data dependencies are the core of normalization, which organizes data
efficiently, eliminates redundancy, and ensures logical storage [79] [82] .
Data Integrity: They help maintain consistency and accuracy in the database.
Query Optimization: Knowing dependencies allows for better query planning and
optimization [83] .
Database Design: Dependencies guide how tables and relationships are structured.
Types of Data Dependencies in DBMS
Type Description Example

Functional One attribute (or set) uniquely EmpID → EmpName: Knowing EmpID gives
Dependency determines another attribute (or set) you EmpName [79] [82] [80] [84]

An attribute is functionally dependent


Full Functional {EmpID, ProjectID} → Days: Days
on the whole composite key, not just
Dependency depend on both EmpID and ProjectID [79] [80]
part of it

Partial A non-prime attribute is dependent on In a table with {StudentID, CourseID} as


Dependency part of a candidate key key, if Grade depends only on CourseID [79]
[80]

An attribute depends on another


Transitive A → B and B → C implies A → C (indirect
attribute, which in turn depends on a
Dependency dependency) [79] [80]
key

Trivial When an attribute is dependent on a


EmpID, EmpName → EmpName [79] [80]
Dependency set of attributes that includes itself

Multivalued One attribute in a table implies multiple If a person has multiple phone numbers and
Dependency independent values of another attribute email addresses, both are independent [79]
[80]

8
t1
Functional Dependency (FD): The Core Concept
at
Definition: If two tuples (rows) in a relation have the same value for attribute(s) X, they
bh

must have the same value for attribute(s) Y. Written as X → Y [82] [80] [84] .
Example: In an Employee table, EmpID → EmpName means each EmpID is associated with only
dd

one EmpName.
si

How to Identify Data Dependencies


Examine Data: Look for unique relationships between columns.
Understand Business Rules: Real-world logic often dictates dependencies (e.g., an
employee’s ID always gives their name).
Check for Uniqueness: If knowing the value of one attribute always gives you another, a
dependency exists.
Use Normalization: Normal forms (1NF, 2NF, 3NF, BCNF) are based on analyzing and
applying data dependencies [82] [79] .

Role in Database Normalization


Data dependencies are used to decompose tables into smaller, well-structured tables,
eliminating redundancy and update anomalies.
Each normal form addresses specific types of dependencies:
1NF: Removes repeating groups.
2NF: Removes partial dependencies.
3NF: Removes transitive dependencies.

Summary Table: Data Dependency Types


Dependency Type Symbol/Notation Key Point

Functional Dependency X→Y X uniquely determines Y

Full Functional Dependency X→Y Y depends on all of X, not just a part

Partial Dependency X→Y Y depends on part of a composite key

Transitive Dependency X → Y, Y → Z X indirectly determines Z

Trivial Dependency X, Y → Y Y is included in the determinant set

Multivalued Dependency X↠Y X determines multiple independent Y values

In Summary
Data dependency in DBMS is the relationship where the value of one attribute (or set)

8
determines the value of another. Understanding and applying data dependencies is crucial for
t1
designing efficient, normalized, and reliable databases, ensuring data integrity, reducing
at
redundancy, and optimizing queries [79] [82] [80] [81] .
bh


dd

Armstrong's Axioms in Database Management System (DBMS)


si

Introduction
Armstrong's axioms are a set of inference rules introduced by William W. Armstrong in 1974 for
reasoning about functional dependencies in relational databases [85] [86] . They provide a formal
and systematic way to deduce all possible functional dependencies from a given set, playing a
crucial role in database design, normalization, and ensuring data integrity [85] [87] [88] .

Significance
Soundness: Only valid (true) dependencies are derived.
Completeness: All possible dependencies implied by the given set can be derived using
these rules [85] [87] [88] .
Application: Used in normalization, finding minimal covers, and ensuring efficient and
consistent database schemas [85] [87] .
Primary Armstrong's Axioms (RAT)
There are three fundamental (primary) axioms, often abbreviated as RAT: Reflexivity,
Augmentation, and Transitivity [87] [88] [86] .

1. Reflexivity
Rule: If Y is a subset of X, then X → Y.
Explanation: Any set of attributes functionally determines its own subset.
Example: If X = {A, B}, then {A, B} → {A} holds [88] [86] .

2. Augmentation
Rule: If X → Y, then XZ → YZ for any set of attributes Z.
Explanation: Adding the same attributes to both sides of a dependency does not change
the dependency.
Example: If A → B, then AC → BC [88] [86] .

3. Transitivity
Rule: If X → Y and Y → Z, then X → Z.
8
t1
Explanation: Dependencies are transitive; if X determines Y and Y determines Z, then X
at
determines Z.
bh

Example: If A → B and B → C, then A → C [88] [86] .


dd

Secondary (Derived) Rules


si

These rules can be derived from the primary axioms and are often used to simplify the process
of finding all functional dependencies [86] :

Rule Statement Example

Union If X → Y and X → Z, then X → YZ If A → B and A → C, then A → BC

Decomposition If X → YZ, then X → Y and X → Z If A → BC, then A → B and A → C

Pseudo-Transitivity If X → Y and WY → Z, then WX → Z If A → B and CB → D, then CA → D

Composition If X → Y and W → Z, then XW → YZ If A → B and C → D, then AC → BD

Why Use Armstrong's Axioms?


Systematic Deduction: Provides a structured method to derive all implied functional
dependencies.
Database Normalization: Helps identify redundant data and decompose tables to achieve
higher normal forms.
Minimal Cover: Assists in finding the minimal set of functional dependencies for efficient
schema design [87] .

Advantages
Simplicity: Clear and easy-to-apply rules [87] .
Efficiency: Streamlines the process of dependency analysis and normalization.
Minimality: Helps in finding minimal covers and reducing redundancy.

Limitations
Complexity for Large Sets: Can be computationally intensive for large schemas [87] .
Scope: Primarily addresses deterministic (not probabilistic or uncertain) dependencies.

Summary Table: Armstrong's Axioms


Axiom Rule Example

8
Reflexivity If Y ⊆ X, then X → Y t1 {A, B} → {A}

Augmentation If X → Y, then XZ → YZ A → B ⇒ AC → BC
at
Transitivity If X → Y and Y → Z, then X → Z A → B, B → C ⇒ A → C
bh

Union If X → Y and X → Z, then X → YZ A → B, A → C ⇒ A → BC


dd

Decomposition If X → YZ, then X → Y and X → Z A → BC ⇒ A → B, A → C

Pseudo-Transitivity If X → Y and WY → Z, then WX → Z A → B, CB → D ⇒ CA → D


si

Composition If X → Y and W → Z, then XW → YZ A → B, C → D ⇒ AC → BD

In summary:
Armstrong's axioms are foundational inference rules in DBMS for deducing all functional
dependencies in a relational schema. They ensure data integrity, guide normalization, and
provide a systematic approach to database design by being both sound and complete [85] [87]
[88] [86] .

Normal Forms in Database Management Systems (DBMS)


What is Normalization?
Normalization is the process of organizing data in a relational database to reduce redundancy
and improve data integrity. This is achieved by decomposing tables into smaller, well-structured
tables according to a series of rules called normal forms [89] [90] [91] .

Types of Normal Forms


Each normal form builds upon the previous one. A table must satisfy all lower normal forms
before it can be considered in a higher normal form [92] [89] [93] [94] .

1. First Normal Form (1NF)


Definition:
A table is in 1NF if:
Each column contains only atomic (indivisible) values.
Each row is unique (no duplicate rows).
Each column has a unique name.

8
No repeating groups or arrays are allowed in any row [92] [89] [95] [93] [94] .
t1
Example Violation:
at
A cell containing “Math, Science” instead of separate rows for each subject.
bh

2. Second Normal Form (2NF)


dd

Definition:
si

A table is in 2NF if:


It is already in 1NF.
All non-key attributes are fully functionally dependent on the entire primary key (no partial
dependency) [92] [89] [95] [93] [94] .
Key Point:
2NF primarily applies to tables with composite primary keys.
Example Violation:
If a student’s name depends only on StudentID (part of a composite key of StudentID +
CourseID), it violates 2NF.
3. Third Normal Form (3NF)
Definition:
A table is in 3NF if:
It is already in 2NF.
There are no transitive dependencies; that is, all non-key attributes depend only on the
primary key and not on other non-key attributes [89] [95] [93] [94] [91] .
Key Point:
Removes indirect dependencies on the primary key.
Example Violation:
If DepartmentName depends on DepartmentID, and DepartmentID depends on EmployeeID,
then DepartmentName is transitively dependent on EmployeeID.

4. Boyce-Codd Normal Form (BCNF)


Definition:
A table is in BCNF if:

8
It is already in 3NF. t1
For every functional dependency (A → B), A is a superkey [89] [95] [93] [94] [96] .
at
Key Point:
bh

BCNF is a stricter version of 3NF, resolving certain anomalies not handled by 3NF.
dd

5. Fourth Normal Form (4NF)


si

Definition:
A table is in 4NF if:
It is in BCNF.
It has no multi-valued dependencies (MVDs) [89] [95] [93] [94] .
Key Point:
Addresses situations where one attribute in a table uniquely determines another set of attributes,
leading to redundancy.

6. Fifth Normal Form (5NF)


Definition:
A table is in 5NF if:
It is in 4NF.
It has no join dependencies that can cause lossless decomposition [89] [95] [93] [94] .
Key Point:
Deals with cases where tables can be split into smaller tables and then rejoined without
introducing redundancy or losing information.

Summary Table: Normal Forms


Normal Form Key Rule/Requirement Main Problem Solved

1NF Atomic values, unique rows, no repeating groups Eliminates repeating groups

2NF 1NF + no partial dependency on primary key Eliminates partial dependencies

3NF 2NF + no transitive dependency on primary key Eliminates transitive dependencies

BCNF 3NF + every determinant is a superkey Resolves certain anomalies in 3NF

4NF BCNF + no multi-valued dependencies Eliminates multi-valued dependencies

5NF 4NF + no join dependencies Eliminates join dependencies

Why Normalize?

8
Reduces data redundancy t1
Prevents update, insert, and delete anomalies
at
Improves data integrity and consistency
bh

Facilitates efficient data organization

In summary:
dd

Normal forms are essential rules in DBMS that guide the structuring of tables to minimize
redundancy and ensure data integrity. The most common forms used in practice are 1NF, 2NF,
si

and 3NF, with BCNF, 4NF, and 5NF addressing more complex scenarios [92] [89] [95] [94] .

Dependency Preservation in Database Management System (DBMS)

What is Dependency Preservation?


Dependency preservation is a property of database decomposition that ensures all
functional dependencies from the original relation are still enforceable after decomposition,
without requiring a join of the decomposed relations [97] [98] [99] .
In other words, after splitting a relation into two or more tables, you should be able to
enforce all original functional dependencies by checking them locally within the
decomposed tables, rather than reconstructing the original table through joins.
Why is Dependency Preservation Important?
Integrity Enforcement: Ensures all data integrity constraints (functional dependencies) are
maintained after decomposition.
Efficiency: Allows the DBMS to enforce constraints by checking only the decomposed
tables, avoiding costly joins [98] .
Normalization: While decomposing relations to achieve higher normal forms (like 3NF),
dependency preservation is a key criterion, along with lossless join [100] .

Formal Definition
Let:
$ R $ be a relation schema,
$ F $ be the set of functional dependencies on $ R $,
$ R $ is decomposed into $ R_1, R_2, ..., R_n $ with respective sets of functional
dependencies $ F_1, F_2, ..., F_n $ (where each $ F_i $ is the set of dependencies that can
be enforced on $ R_i $ alone).

8
The decomposition is dependency preserving if: t1
at
That is, the closure of the union of the projected dependencies is equivalent to the closure of the
bh

original dependencies [97] [99] .


dd

How to Check for Dependency Preservation


si

1. Project each functional dependency in $ F $ onto the decomposed relations ($ R_1, R_2, ...,
R_n $), forming $ F_1, F_2, ..., F_n $.
2. Take the union $ F' = F_1 \cup F_2 \cup ... \cup F_n $.
3. Compute the closure $ F'^+ $ and compare it to the closure of the original set $ F^+ $.
4. If $ F'^+ = F^+ $, the decomposition is dependency preserving [97] [98] [99] .

Example
Suppose $ R(A, B, C, D) $ with $ F = {A \rightarrow B, A \rightarrow C, C \rightarrow D} $.
Decompose $ R $ into:
$ R_1(A, B, C) $ with $ F_1 = {A \rightarrow B, A \rightarrow C} $
$ R_2(C, D) $ with $ F_2 = {C \rightarrow D} $
Union: $ F' = F_1 \cup F_2 = {A \rightarrow B, A \rightarrow C, C \rightarrow D} $
$ F'^+ = F^+ $ (all original dependencies can be enforced locally), so the decomposition is
dependency preserving [97] .

Dependency Preservation vs. Lossless Join


Dependency Preservation: All original functional dependencies can be checked without
joining tables [101] .
Lossless Join: No information is lost when decomposed tables are joined back.
Ideal Decomposition: Both properties should be satisfied, but sometimes achieving both
simultaneously is not possible for certain schemas.

Dependency Preservation and Normal Forms


3NF is the highest normal form that always allows a dependency-preserving and lossless
decomposition [100] .
BCNF may require sacrificing dependency preservation to achieve a lossless join.

8
Summary Table: Dependency Preservation t1
Aspect Description
at
Definition All original FDs can be enforced on decomposed tables without joining
bh

Importance Maintains data integrity and efficient constraint checking


dd

Formal Condition $ (F_1 \cup F_2 \cup ... \cup F_n)^+ = F^+ $

Relation to Normalization 3NF always allows dependency preservation with lossless join
si

Example Decomposition of $ R(A,B,C,D) $ into $ R_1(A,B,C) $ and $ R_2(C,D) $

In summary:
Dependency preservation in DBMS ensures that all functional dependencies of the original
relation can be enforced locally in the decomposed tables, avoiding the need to join tables to
check constraints. This property is crucial for efficient and reliable database design, especially
during normalization [97] [98] [99] .

Lossless Design (Lossless Join Decomposition) in DBMS


Introduction
Lossless design, also known as lossless join decomposition, is a fundamental concept in
relational database design. It refers to the process of decomposing a relation (table) into two or
more smaller relations such that no information is lost-the original relation can be perfectly
reconstructed by joining the decomposed tables using a natural join [102] [103] [104] [105] . This
property is crucial for maintaining data integrity, consistency, and completeness after
normalization or schema refinement.

Why is Lossless Design Important?


Data Preservation: Guarantees that all original data can be recovered from the
decomposed tables [102] [103] [104] .
Data Consistency: Prevents contradictions and anomalies that may arise from improper
decomposition [102] [104] .
Redundancy Reduction: Helps minimize data duplication, improving storage efficiency and
reducing update anomalies [103] [104] .
Query Optimization: Smaller, well-structured tables can improve query performance and
simplify data management [102] [104] .

8
t1
Scalability: Makes it easier to modify or expand the schema without risking data loss [102] .
at

Understanding Lossless Join Decomposition


bh

Definition: A decomposition of a relation $ R $ into $ R_1, R_2, ..., R_n $ is lossless if, by
dd

performing a natural join on all $ R_i $, the original relation $ R $ is obtained exactly (no
spurious tuples, no missing data) [102] [103] [104] [105] .
si

Non-Additive Join: Lossless decomposition is also called non-additive join decomposition


because the join does not add or lose information [104] .

Criteria for Lossless Join Decomposition


To ensure lossless decomposition, the following conditions must be met [104] [106] :
1. Attribute Coverage:
The union of attributes in all decomposed relations must equal the set of attributes in the
original relation:

2. Non-Null Intersection:
The intersection of attributes between any two decomposed relations must not be empty:

3. Key Condition:
The common attribute(s) in the intersection must be a candidate key (or superkey) for at
least one of the decomposed relations [104] [106] .
This ensures that the join does not introduce spurious tuples.

Formal Test for Lossless Join (Binary Decomposition)


For a decomposition of $ R $ into $ R_1 $ and $ R_2 $:
The decomposition is lossless if:
or
That is, the common attributes functionally determine all attributes in at least one of the
decomposed relations [106] .

Example
Suppose $ R(A, B, C, D) $ with functional dependency $ A \rightarrow BC $:
Decompose into $ R_1(A, B, C) $ and $ R_2(A, D) $
Attribute coverage: $ (A, B, C) \cup (A, D) = (A, B, C, D) $

8
Intersection: $ (A, B, C) \cap (A, D) = A $ t1
$ A $ is a key for $ R_1(A, B, C) $ because $ A \rightarrow BC $
at
Conclusion: This is a lossless join decomposition [106] .
bh

Properties of a Good Decomposition


dd

A well-designed decomposition should satisfy:


si

1. Lossless Join: No information is lost; original relation can be reconstructed [102] [103] [104]
[107] [105] .

2. Dependency Preservation: All functional dependencies are preserved and can be enforced
without joining tables [103] [107] [106] [108] .
3. Lack of Redundancy: Reduces unnecessary duplication of data [103] .
Note: Lossless join and dependency preservation are independent properties-one does not
guarantee the other [108] .

Summary Table: Lossless Design


Property Description

Lossless Join Original relation can be exactly reconstructed from decomposed tables

Attribute Coverage All original attributes must appear in the union of decomposed tables

Non-Null Intersection Decomposed tables must share at least one common attribute
Property Description

Key Condition Common attribute(s) must be a key for at least one decomposed table

Dependency Preservation Preferable, but not guaranteed by lossless join alone

Redundancy Reduction Helps minimize data duplication and anomalies

Conclusion
Lossless design is essential for effective relational database normalization. It ensures that
decomposing a table into smaller relations does not lose or distort information, preserving the
ability to reconstruct the original data set exactly. This property, along with dependency
preservation, forms the foundation of robust, efficient, and reliable database schemas [102] [103]
[104] [107] [105] [106] [108] .

Evaluation of Relational Algebra Expressions in DBMS

8
Overview
t1
at
The evaluation of relational algebra expressions is a crucial step in query processing within a
Database Management System (DBMS). Relational algebra expressions, built from a sequence
bh

of operations (such as selection, projection, join, etc.), are used to represent queries. Efficient
evaluation of these expressions ensures optimal query performance and resource utilization.
dd
si

Evaluation Strategies
There are two primary strategies for evaluating relational algebra expressions:

1. Materialized Evaluation
Process:
Each operation in the expression is evaluated one at a time, typically in a bottom-up
manner.
The result of each operation is stored in a temporary relation (often written to disk).
These intermediate results are then used as inputs for subsequent operations.
Example:
Compute $ A \bowtie B $, store result in a temporary file.
Compute $ C \bowtie D $, store result in another temporary file.
Join the temporary results as required.
Advantages:
Simplicity and modularity.
Disadvantages:
High I/O cost due to frequent writing and reading of intermediate results from disk [109]
[110] .

Increased storage requirements for temporary files.

2. Pipelined Evaluation
Process:
Multiple operations are evaluated simultaneously.
The output of one operation is passed directly as input to the next, without storing
intermediate results on disk.
Evaluation still proceeds bottom-up, but intermediate results are kept in memory as
much as possible.
Advantages:
Reduces disk I/O and storage overhead [109] [110] .
Faster overall query execution.
Disadvantages:
8
t1
More complex implementation.
at
May be limited by available memory.
bh
dd

Evaluation Plans
Definition:
si

An evaluation plan is a detailed, step-by-step blueprint specifying the order and method of
executing each operation in a relational algebra expression.
Types:
Logical Plan: Specifies the sequence of relational algebra operations.
Physical Plan: Specifies the algorithms and access methods (e.g., index scan, hash
join) used for each operation [111] .
Optimization:
The DBMS may generate multiple equivalent evaluation plans for the same query and
choose the most efficient one based on estimated costs (using statistics like relation size,
tuple size, index availability, etc.) [111] .
Steps in Evaluation
1. Decomposition:
SQL queries are decomposed into query blocks, each translated into a relational
algebra expression [109] [112] .
2. Translation:
Each query block is converted into an equivalent relational algebra expression.
3. Optimization:
The DBMS considers equivalent expressions and chooses an evaluation plan with the
lowest estimated cost [111] [113] .
Heuristic rules (e.g., perform selection and projection early, replace Cartesian product
and selection with join) are applied to improve efficiency [113] .
4. Execution:
The chosen plan is executed using either materialized or pipelined evaluation, or a
combination of both.

Heuristic Optimization Techniques


8
t1
Apply selections as early as possible to reduce the size of intermediate results.
at
Apply projections early to remove unnecessary columns and reduce tuple size.
bh

Replace Cartesian product followed by selection with a join to minimize unnecessary


computations.
dd

Combine consecutive selection or projection operations where possible [113] .


si

Example
Suppose you have a query to select the names of all female students in the BCA course:
Relational algebra expression:

Evaluation:
Apply the selection first to reduce the number of tuples.
Then apply the projection to get only the required column [112] .
Summary Table: Evaluation Strategies
Strategy Intermediate Results Disk I/O Speed Complexity

Materialized Stored on disk High Slower Simpler

Pipelined Passed in memory Low Faster More complex

In summary:
The evaluation of relational algebra expressions involves translating queries into algebraic
operations, optimizing their execution order, and choosing between materialized and pipelined
strategies. Efficient evaluation is key to high-performance query processing in DBMS [109] [110]
[111] [113] [112] .

Query Equivalence in Database Management System (DBMS)

Definition

8
Query equivalence refers to the situation where two relational algebra expressions (or
t1
queries) produce the same result set (i.e., the same set of tuples) for every possible legal
database instance-that is, any database state that satisfies all integrity constraints [114] [115]
at
[116] .
bh

The order of tuples may differ, but as long as the content (set of tuples) is identical, the
queries are considered equivalent [115] .
dd
si

Importance of Query Equivalence


Query Optimization: Query equivalence is fundamental in query optimization. The DBMS
query optimizer uses equivalence rules to transform a query into different, but equivalent,
forms to find the most efficient execution plan [114] [116] [117] [118] .
Query Tuning: Developers and DBAs can rewrite queries in equivalent forms to improve
performance without changing the results [114] [116] .
Correctness: Ensures that transformations and optimizations do not alter the intended
results of the query.

Equivalence Rules in Relational Algebra


Several algebraic rules establish when two queries are equivalent. These rules allow the DBMS to
rewrite and optimize queries safely [114] [116] [117] :
1. Selection (σ) Rules
Cascade (Associativity):

A selection with multiple conditions can be broken into a sequence of selections.


Commutativity:

Order of selection conditions does not affect the result.

2. Projection (π) Rules


Cascade:

Only the outermost projection is needed; intermediate projections can be omitted.

3. Join and Selection


Selection and Cartesian Product:

A selection after a Cartesian product is equivalent to a theta join.


Selection on Join:

8
t1
at
4. Set Operations
bh

Union and Intersection are Commutative and Associative:


dd
si

5. Distribution
Selection distributes over Union, Intersection, and Set Difference:

Projection distributes over Union:

6. Join Properties
Theta-join is commutative:

Natural-join is associative:
[117]
Examples of Query Equivalence
Example 1:

is equivalent to

Both yield the names of female students in the BCA course, regardless of the order of
selection conditions [118] .
Example 2:

Applying selection before or after union produces the same result [117] .

How Query Equivalence is Used in DBMS


During Query Optimization:
The optimizer rewrites queries using equivalence rules to find faster or more resource-
efficient execution plans without changing the result [114] [116] [117] [118] .
In Query Tuning:
Developers can manually rewrite queries using these rules to improve performance.

8
t1
Summary Table: Common Equivalence Rules
at
Operation Equivalence Rule Example
bh

Selection Cascade
dd

Selection Commutative

Projection Cascade
si

Selection-Join

Set Commutativity

Distribution

In summary:
Query equivalence in DBMS means two queries produce the same result for all legal database
instances. It is essential for query optimization, enabling the DBMS to transform and tune queries
for better performance using a set of well-defined equivalence rules [114] [115] [116] [117] .

Join Strategies in Database Management Systems (DBMS)

Introduction
Joins are fundamental operations in relational databases, enabling the combination of data from
multiple tables based on related columns. Efficient join strategies are crucial for query
performance, especially with large datasets or complex queries.

Types of Joins
Join Type Description Use Case Example

Returns rows with matching values in both tables; Customers who have
Inner Join
unmatched rows are excluded placed orders

Returns all rows from the left table, and matching


Left Outer All customers and their
rows from the right table; unmatched right rows are
Join orders, even if no order
NULL

Right Returns all rows from the right table, and matching
All products and their
Outer rows from the left table; unmatched left rows are
sales, even if unsold

8
Join NULL t1
Full Outer Returns all rows from both tables; unmatched rows Complete view of
Join are filled with NULLs employees and projects
at
Cross Returns the Cartesian product of two tables (all Generating all customer-
bh

Join possible row combinations) product pairs


[119]
Joins a table to itself to analyze hierarchical or Employee-manager
dd

Self Join [120]


recursive relationships relationships [121]
si

Join Algorithms (Join Execution Strategies)


Different algorithms are used internally by DBMSs to execute join operations efficiently. The
choice depends on data size, indexes, and sorting.

1. Nested Loop Join


How it works: For each row in the outer table, scan all rows in the inner table to find
matches.
Variants: Simple nested loop, indexed nested loop (uses index on inner table).
Best for: Small tables, or when one table is heavily filtered or indexed.
Memory Use: Low (unless using indexes).
Flexibility: Can handle any join condition. [122] [123]
2. Hash Join
How it works: Build a hash table on the join key of the smaller table, then scan the larger
table and probe the hash table for matches.
Best for: Large joins where one table is small enough to fit in memory and join is on equality.
Memory Use: Medium to high.
Flexibility: Only for equality joins.
Performance: Fast for large, unsorted tables. [122] [123]

3. Merge Join (Sort-Merge Join)


How it works: Both tables are sorted on the join key; then they are merged like a merge
sort.
Best for: Both tables are already sorted or can be sorted efficiently; equality joins.
Memory Use: Low to medium.
Flexibility: Only for equality joins.
Performance: Very efficient for sorted data. [122] [123]

4. Index Join (Index Nested Loop Join)


8
t1
How it works: Uses an index on the join key of the inner table to quickly find matching rows.
at
Best for: When the inner table has a suitable index and is large.
bh

Performance: Reduces need for full table scans.


dd

Use Case: Joining a large fact table with a dimension table with indexed keys. [123]
si

Distributed Join Strategies (for distributed/MPP databases)


Broadcast Join: Small table is sent to all nodes processing the large table, minimizing data
shuffling.
Shuffle Join: Both tables are partitioned and shuffled across nodes based on the join key.
Colocate Join: Both tables are pre-partitioned on the join key, allowing local joins without
shuffling. [122] [124]

Choosing the Right Join Strategy


Factor Impact

Table Sizes Use nested loop for small tables, hash join for large tables with small build input.

Index Availability Use index join if the inner table has a suitable index.

Sorting Use merge join if both tables are sorted on the join key.

Join Condition Hash and merge joins are for equality; nested loop can handle any condition.
Factor Impact

Memory Constraints Hash joins require more memory; nested loop and merge join use less.

Data Distribution In distributed systems, consider broadcast or shuffle joins for performance.

Best Practices for Efficient Joins


Create indexes on join columns to speed up lookups.
Use composite indexes for multi-column joins.
Apply filters early (WHERE clause) to reduce the dataset size before joining.
Select only required columns to minimize data transfer and processing.
Choose the join type (inner, left, right, full) that matches your data and reporting needs.
Use EXPLAIN plans to understand and optimize how your DBMS executes joins. [119] [122]
[123]

Summary Table: Join Algorithms

8
Join Condition
Algorithm Best For Memory Use Notes
t1 Type
at
Nested
Small/filtered tables Low Any Simple, flexible
Loop
bh

Large, unsorted tables, Medium- Fast, memory-


Hash Join Equality
equality High dependent
dd

Low- Efficient for sorted


Merge Join Sorted/indexed tables Equality
Medium data
si

Fast with suitable


Index Join Indexed inner table Low Any
indexes

In summary:
Join strategies in DBMS include various join types (inner, outer, cross, self) and execution
algorithms (nested loop, hash, merge, index join). The optimal strategy depends on table sizes,
indexes, sorting, memory, and data distribution. Understanding and applying the right join
strategy is key to writing efficient and scalable database queries.

Query Optimization Algorithms in Database Management System (DBMS)


What is Query Optimization?
Query optimization is the process by which a DBMS evaluates multiple strategies for executing a
query and selects the most efficient one. The goal is to minimize resource consumption (CPU,
memory, disk I/O, response time) while ensuring correct results.

Types of Query Optimization Algorithms

1. Rule-Based Optimization (RBO)


How it works: Uses a fixed set of rules or heuristics (e.g., use indexes, push selections
before joins) to choose execution plans.
Pros: Simple, fast decisions.
Cons: Ignores actual data distribution and resource costs, so may not always yield the best
plan [125] .

2. Cost-Based Optimization (CBO)


How it works: Considers multiple possible execution plans, estimates their costs using

8
statistics (table size, index selectivity, etc.), and picks the plan with the lowest estimated
t1
cost.
at
Pros: Produces more efficient plans for complex queries, especially with large or multiple
tables.
bh

Cons: Requires up-to-date statistics; optimization can be computationally expensive [125]


dd

[126] .
si

3. Heuristic-Based Optimization
How it works: Applies practical guidelines (heuristics) such as pushing selections and
projections as close to the data source as possible, or avoiding cross joins.
Pros: Quick, effective for routine queries.
Cons: May not find the globally optimal plan for complex queries [125] [127] .

4. Adaptive Query Optimization


How it works: The optimizer can defer final plan decisions until execution time, adapting the
plan based on real-time statistics collected during execution.
Features:
Contains predefined sub-plans and a statistics collector.
Can switch join algorithms (e.g., nested loop to hash join) mid-execution if initial
estimates were wrong.
Uses the final plan for future executions, preventing repeated poor choices.
Purpose: Addresses cardinality misestimates and changing data patterns for better
performance [125] .

Query Optimization Process


1. Query Parsing: SQL queries are parsed and translated into relational algebra
expressions [127] [128] .
2. Logical Plan Generation: The optimizer generates logical plans using relational algebra
operators (selection, projection, join, etc.).
3. Plan Transformation: Multiple equivalent expressions are generated using equivalence
rules (e.g., join commutativity, selection pushdown) [127] [129] .
4. Cost Estimation: For each plan, the optimizer estimates resource costs using statistics [126] .
5. Physical Plan Generation: Chooses physical operators (e.g., hash join, index scan) for each
logical operator.
6. Plan Selection: The plan with the lowest estimated cost is selected for execution [126] .

Key Query Optimization Techniques

8
t1
Selection Pushdown: Move selection operations as close to the data source as possible to
reduce intermediate result size [127] .
at
Projection Pushdown: Eliminate unnecessary columns early to minimize data transfer.
bh

Join Order Optimization: Reorder joins to minimize the size of intermediate results and
exploit indexes [127] [129] .
dd

Join Algorithm Selection: Choose the best join method (nested loop, hash join, merge join)
si

based on data size, indexes, and sorting [127] .


Query Rewriting: Transform queries into equivalent but more efficient forms using algebraic
rules [125] [127] .
Index Utilization: Use indexes to speed up data retrieval.
Subquery Flattening: Convert nested subqueries into joins or simpler forms for better
optimization [128] .

Advanced and Adaptive Techniques


Dynamic Query Re-optimization: The database can adjust execution plans on the fly as
data and resources change [125] .
Adaptive Query Execution: Plans are refined during runtime, switching strategies if initial
assumptions prove inaccurate [125] .
Machine Learning-Based Optimization: Uses historical data and patterns to predict
optimal execution paths [125] .
Multi-objective Optimization: Considers multiple cost metrics (e.g., time, memory, I/O) and
user preferences when choosing plans [130] .

Enumeration and Transformation Algorithms


Transformation-Based Optimization: Systematically generates all equivalent expressions
by applying equivalence rules to subexpressions, but limits space and time by sharing
common subexpressions and using pointers [129] .
Dynamic Programming: Especially for join order optimization, dynamic programming is
used to avoid redundant calculations and efficiently search the space of possible join
orders [129] .
Subplan Optimization: For complex queries, optimizers break the query into blocks,
optimize each block from the bottom up, and then combine them into a global plan [126] [128] .

Summary Table: Query Optimization Algorithms


Algorithm Type Approach Best For Example Techniques

Index usage, selection

8
Rule-Based Fixed rules/heuristics Simple, routine queries
t1 pushdown

Cost Complex, multi-table Join order, access path


Cost-Based
at
estimation/statistics queries selection
bh

Avoid cross joins, early


Heuristic-Based Practical guidelines Fast, general optimization
filtering
dd

Runtime plan Dynamic, unpredictable


Adaptive Adaptive join switching
adjustment workloads
si

Dynamic
Systematic plan search Join order optimization Left-deep/right-deep trees
Programming

Trends and Future Directions


AI-powered optimizers and automated query tuning are emerging, enabling self-tuning
databases with minimal manual intervention.
Big data integration and distributed query optimization are increasingly important as data
volumes grow and architectures become more complex [125] .

In summary:
Query optimization algorithms in DBMS include rule-based, cost-based, heuristic, and adaptive
approaches. They transform and evaluate multiple execution plans using relational algebra,
statistics, and equivalence rules to select the most efficient plan. Advanced techniques like
adaptive optimization and AI-driven tuning are shaping the future of high-performance, self-
managing databases [125] [127] [126] [129] .

1. https://www.tutorialspoint.com/dbms/relational_algebra.htm
2. https://en.wikipedia.org/wiki/Relational_algebra
3. https://bito.ai/resources/relational-algebra-in-dbms/
4. https://byjus.com/gate/relational-algebra-in-dbms-notes/
5. https://mrcet.com/downloads/digital_notes/IT/Database Management Systems.pdf
6. https://www.boardinfinity.com/blog/relational-algebra-in-dbms/
7. https://www.cbcb.umd.edu/confcour/Spring2014/CMSC424/Relational_algebra.pdf
8. https://studyglance.in/dbms/display.php?tno=18&topic=Tuple-Relational-Calculus-in-DBMS
9. https://www.scaler.com/topics/dbms/relational-calculus-in-dbms/
10. https://www.reddit.com/r/askscience/comments/84d9an/what_is_the_difference_between_relational_alg
ebra/
11. https://www.csbio.unc.edu/mcmillan/Media/Comp521F14Lecture04.pdf
12. https://herovired.com/learning-hub/topics/relational-calculus-in-dbms/
13. https://en.wikipedia.org/wiki/Tuple_relational_calculus
14. https://lkouniv.ac.in/site/writereaddata/siteContent/202004021910159071chandrabhan_DBMS_Relational
_model_and_Relational_Algebra.pdf
15. https://www.tutorialspoint.com/domain-relational-calculus-in-dbms

8
16. https://www.scaler.com/topics/dbms/relational-calculus-in-dbms/
t1
17. https://binaryterms.com/domain-relational-calculus.html
at
18. https://www.studocu.com/in/document/university-of-delhi/database-management-system/dbms-dbms/
bh

117875241
19. https://herovired.com/learning-hub/topics/relational-calculus-in-dbms/
dd

20. https://en.wikipedia.org/wiki/Domain_relational_calculus
21. https://www.w3schools.blog/relational-calculus-dbms
si

22. https://datascientest.com/en/all-about-sql3
23. https://www.youtube.com/watch?v=EJ6IpG0fZlk
24. https://celerdata.com/glossary/ansi-sql
25. https://www.iitk.ac.in/esc101/05Aug/tutorial/jdbc/jdbc2dot0/sql3.html
26. http://infolab.stanford.edu/~ullman/fcdb/spr99/lec12.pdf
27. https://byjus.com/gate/data-definition-language-notes/
28. https://www.scaler.com/topics/ddl-in-dbms/
29. https://www.techtarget.com/whatis/definition/Data-Definition-Language-DDL
30. https://www.almabetter.com/bytes/tutorials/sql/dml-ddl-commands-in-sql
31. https://www.dbvis.com/thetable/sql-ddl-the-definitive-guide-on-data-definition-language/
32. https://celerdata.com/glossary/data-definition-language-ddl
33. https://onecompiler.com/tutorials/mysql/commands/ddl-commands
34. https://www.datacamp.com/tutorial/sql-ddl-commands
35. https://byjus.com/gate/data-manipulation-language-dql-notes/
36. https://www.scaler.com/topics/dml-in-dbms/
37. https://www.almabetter.com/bytes/tutorials/sql/dml-ddl-commands-in-sql
38. https://www.theiotacademy.co/blog/dml-and-ddl-in-sql/
39. https://www.tutorialspoint.com/what-are-the-dml-commands-in-dbms
40. https://www.datacamp.com/tutorial/sql-dml-commands-mastering-data-manipulation-in-sql
41. https://opentextbc.ca/dbdesign01/chapter/chapter-sql-dml/
42. https://trainings.internshala.com/blog/dml-commands-in-sql-with-examples/
43. https://cloud.google.com/bigquery/docs/data-manipulation-language
44. https://www.navisite.com/blog/open-source-vs-commercial-database-systems/
45. https://db-engines.com/en/ranking_osvsc
46. https://simplelogic-it.com/difference-between-open-source-database-and-licensed-database/
47. https://www.ask.com/news/comparing-open-source-vs-proprietary-dbms-platforms-best
48. https://sis.binus.ac.id/2024/10/15/commercial-database-and-open-source-database-differences-and-s
imilarities/
49. https://en.wikipedia.org/wiki/MySQL
50. https://www.oracle.com/in/mysql/what-is-mysql/
51. https://www.cogentinfo.com/resources/architecture-of-mysql
52. https://cloudinfrastructureservices.co.uk/mysql-architecture-components-how-mysql-works-internally/
53. https://www.youtube.com/watch?v=jt3C9Ngbqfc
8
t1
54. https://dev.mysql.com/doc/en/pluggable-storage-overview.html
at
55. https://www.w3webschool.com/blog/features-of-mysql/
bh

56. https://www.mysql.com/products/enterprise/techspec.html
57. https://www.bytebase.com/blog/mysql-vs-sqlserver/
dd

58. https://docs.oracle.com/en/database/oracle/oracle-database/18/cncpt/introduction-to-oracle-database.
html
si

59. https://www.oracle.com/in/database/features/
60. https://docs.oracle.com/database/122/CNCPT/introduction-to-oracle-database.htm
61. https://www.tricentis.com/learn/a-guide-to-oracle-database-architecture
62. https://docs.oracle.com/cd/F19136_01/nonpub_db_techarch/pdf/db-19c-architecture.pdf
63. https://mindmajix.com/oracle-dba/oracle-11g-database-architecture-overview
64. https://indico.cern.ch/event/36804/attachments/731758/1003980/oracleArchitecture.pdf
65. https://www.ibm.com/docs/en/db2/11.1?topic=architecture-db2-process-overview
66. https://www.ibm.com/docs/en/db2/11.5?topic=architecture-db2-process-overview
67. https://www.youtube.com/watch?v=J8GvsoFLYEY
68. https://www.ibm.com/docs/en/db2/11.5?topic=environment-components-db2-purescale-feature
69. https://www.ibm.com/docs/SSEPGG_11.1.0/com.ibm.dwe.welcome.doc/dwev9welcome.html
70. https://www.simplilearn.com/what-is-microsoft-sql-server-architecture-article
71. https://www.guru99.com/sql-server-architecture.html
72. https://www.tutorialspoint.com/ms_sql_server/ms_sql_server_architecture.htm
73. https://learnomate.org/components-of-the-sql-server-architecture/
74. https://www.interviewbit.com/blog/sql-server-architecture/
75. https://www.milesweb.in/hosting-faqs/ms-sql-server-architecture/
76. https://www.scaler.com/topics/domain-in-dbms/
77. https://www.youtube.com/watch?v=HCLPUTFPcnk
78. https://www.boardinfinity.com/blog/domain-constraints-in-dbms/
79. https://www.arkware.com/what-are-database-dependencies/
80. https://www.tutorialspoint.com/Types-of-dependencies-in-DBMS
81. https://www.youtube.com/watch?v=HCLPUTFPcnk
82. https://talent500.com/blog/types-of-functional-dependencies-dbms/
83. https://www.lri.fr/~pierres/donn�es/save/these/articles/lpr-queue/database-dependency-discovery.pd
f
84. https://www.wrike.com/blog/functional-dependencies-database-systems/
85. https://www.prepbytes.com/blog/dbms/what-are-armstrongs-axioms-in-dbms/
86. https://www.nielit.gov.in/gorakhpur/sites/default/files/Gorakhpur/Alevel_1_DBMS_22Apr2020_AV.pdf
87. https://www.scaler.com/topics/armstrong-axioms-in-dbms/
88. https://digiimento.com/axioms-of-functional-dependencies-in-dbms-explained-armstrongs-axioms-wit
h-examples/

8
89. https://talent500.com/blog/normalization-dbms-types-normal-forms/
t1
90. https://en.wikipedia.org/wiki/Database_normalization
at
91. https://learn.microsoft.com/en-us/office/troubleshoot/access/database-normalization-description
bh

92. https://www.freecodecamp.org/news/database-normalization-1nf-2nf-3nf-table-examples/
93. https://www.youtube.com/watch?v=GFQaEYEc8_8
dd

94. https://www.datacamp.com/tutorial/normalization-in-sql
95. https://www.studytonight.com/dbms/database-normalization.php
si

96. https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization/
97. https://prepinsta.com/dbms/dependency-preserving-decomposition/
98. https://www.slideshare.net/slideshow/dependency-preservation-138672914/138672914
99. https://www.nielit.gov.in/gorakhpur/sites/default/files/Gorakhpur/Alevel_1_DBMS_02Jun2020_AV.pdf
100. https://homepages.inf.ed.ac.uk/libkin/papers/pods06b.pdf
101. https://solutionsadda.in/2024/08/29/database-management-system-376/
102. https://www.slideshare.net/slideshow/lossless-decomposition/138673228
103. https://byjus.com/gate/decomposition-in-dbms/
104. https://www.scaler.com/topics/lossless-join-decomposition-in-dbms/
105. https://testbook.com/gate/lossless-decomposition-in-dbms
106. https://prepinsta.com/dbms/lossless-join-and-dependency-preserving-decomposition/
107. https://www.db-book.com/Previous-editions/db4/slide-dir/ch7.pdf
108. https://stackoverflow.com/questions/39464758/lossless-decomposition-vs-dependency-preservation
109. https://www.tutorialspoint.com/explain-the-evaluation-of-relational-algebra-expression-dbms
110. https://www.youtube.com/watch?v=hJoK_wvTZ-M
111. https://www.cs.purdue.edu/homes/clifton/cs44800/QP1.pdf
112. https://www.tutorialspoint.com/distributed_dbms/distributed_dbms_relational_algebra_query_optimizati
on.htm
113. http://www.scienceandnature.org/IJEMS/IJEMS-Vol4(3)-July2013/IJEMS_V4(3)2013-8.pdf
114. https://piazza.com/class_profile/get_resource/jpuyegn6c76ih/jwu8g9g69b32o8
115. https://groups.google.com/g/stilgeichondman/c/CTY9ECa13G8
116. https://piazza.com/class_profile/get_resource/jyqypau0nkk4w0/k0q7f36yuag2ja
117. https://repository.dinus.ac.id/docs/ajar/formal_relational_query_language_part_3.pdf
118. https://www.tutorialspoint.com/distributed_dbms/distributed_dbms_relational_algebra_query_optimizati
on.htm
119. https://celerdata.com/glossary/sql-join-types-made-simple
120. https://www.coursera.org/articles/sql-join-types
121. https://www.w3schools.com/sql/sql_join.asp
122. https://celerdata.com/glossary/sql-joins
123. https://www.pingcap.com/article/sql-join-types-choosing-between-right-and-left-join/
124. https://www.linkedin.com/pulse/spark-join-strategies-mastering-joins-apache-venkatesh-nandikolla-m
k4qc

8
125. https://www.acceldata.io/blog/the-complete-guide-to-query-optimizers-and-performance-tuning
t1
126. https://docs.oracle.com/en/database/oracle/oracle-database/19/tgsql/query-optimizer-concepts.html
at
127. https://dev.to/ibrahimhyazouri/query-optimization-how-the-query-optimizer-works-using-relational-alg
ebra-1ho1
bh

128. https://www.tutorialspoint.com/distributed_dbms/distributed_dbms_relational_algebra_query_optimizati
on.htm
dd

129. https://www.db-book.com/slides-dir/PPTX-dir/ch16.pptx
130. https://en.wikipedia.org/wiki/Query_optimization
si
Indices in Database Management Systems (DBMS)
Definition:
An index in a DBMS is an additional data structure created on top of a database table to
improve the speed of data retrieval operations. It stores pointers to the actual data rows,
allowing the DBMS to locate records quickly without scanning the entire table [1] [2] [3] .

Why Use Indexes?


Faster Query Performance: Indexes reduce the number of disk accesses needed for data
retrieval.
Efficient Searching and Sorting: Indexes help in quick searching, sorting, and filtering of
records.
Optimized Data Access: Especially useful for large tables with frequent read operations [2]

8
[3] .
t1
at
Structure of an Index
bh

Search Key: The column(s) on which the index is built. It can be a primary key, candidate
key, or any attribute.
dd

Data Reference (Pointer): Points to the location (disk block address) of the actual data in
the table [4] [3] .
si

Types of Indexes

1. Based on Key Attributes


Type Description Characteristics

Created on the primary key of a Unique, not null, 1:1 mapping, sorted order, fast
Primary Index
table. searching [2] [4] [3] .

Clustered Determines physical order of Only one per table, sorts data rows, efficient for range
Index data in the table. queries [5] .

Secondary Created on non-primary Data may not be sorted, can have multiple per table,
Index (candidate) keys. slower than primary index [2] [5] .
2. Based on Data Coverage
Type Description Use Case

Dense Entry for every search key value in the data file. Fast access, more storage required [1] [4] .

Entries only for some search key values (usually Less storage, slightly slower access, suitable for
Sparse
one per block). large data [1] [4] .

3. Specialized Index Types


Type Description Use Case / Notes

Bitmap Uses bit arrays for columns with few distinct values Suitable for fields like gender, status,
Index (low cardinality). etc. [1] [5] .

Uses hash functions to map search keys to Best for exact match queries, not for
Hash Index
locations. range queries [5] [4] .

Filtered Useful for frequently queried


Indexes a subset of rows based on a filter condition.
Index subsets [5] .

Function- Indexes values computed from a function or Optimizes queries with functions in
based expression on one or more columns. WHERE clauses [5] .

8
Covering Includes all columns required by a query in the
t1 Eliminates need to access the base
Index index itself. table for those queries [5] .
at
Full-Text Designed for efficient text searching (e.g., searching Used in document or comment
bh

Index for words/phrases in text columns). searches [5] .

Spatial Optimized for geographical data types (e.g., Used in GIS and location-based
dd

Index coordinates, maps). queries [5] .

Reverse Stores the reversed value of the key for specific Used in some telecom and specialized
si

Index optimizations. applications [5] .

Indexing Methods
Ordered Indexing: Index entries are sorted, making search operations faster. Example: B-
tree, B+ tree indices [4] [3] .
Hashing: Uses hash functions to directly map keys to data locations, efficient for equality
searches [4] .

Single-Level vs. Multi-Level Indexing


Single-Level Indexing: Index table directly maps search keys to data blocks. Suitable for
small tables [2] .
Multi-Level Indexing: Uses multiple layers of indexes (e.g., B+ Trees) to handle large tables
efficiently [2] .
Advantages of Indexing
Speeds up data retrieval and query performance.
Reduces the workload on the DBMS for search operations.
Enables efficient sorting and filtering of data.

Disadvantages of Indexing
Increases storage requirements due to additional index files.
Slows down write operations (INSERT, UPDATE, DELETE) as indexes must be updated [1] .
Complexity in managing multiple indexes.

Summary Table: Common Index Types


Index Type Unique? Data Order Multiple per Table? Use Case

Primary Yes Sorted No Fast primary key lookups

Clustered No Sorted No Range queries, physical data order

Secondary Yes/No Unsorted Yes


8 Non-primary key searches
t1
Dense N/A N/A N/A Entry for every record
at
Sparse N/A N/A N/A Entry per block
bh

Bitmap N/A N/A N/A Low-cardinality columns


dd

Hash N/A N/A N/A Exact match queries

Filtered N/A N/A N/A Subset of data


si

Function-based N/A N/A N/A Function/expression-based searches

Full-Text N/A N/A N/A Text search

Spatial N/A N/A N/A Geographical data

Key Points to Remember


Indexes are critical for improving query performance in DBMS.
Choose the index type based on data characteristics and query patterns.
Over-indexing can hurt performance due to maintenance overhead.
Most common index structures: B-tree/B+ tree, hash, bitmap [1] [5] [4] .

Tip:
Always analyze your queries and data distribution before deciding which columns to index and
which index type to use. Proper indexing is essential for efficient database design and
operation [2] [5] .

B-Trees in Database Management Systems (DBMS)


Definition:
A B-tree is a self-balancing, multi-way tree data structure that maintains sorted data and allows
efficient searches, sequential access, insertions, and deletions, all in logarithmic time. B-trees are
widely used for indexing in databases and file systems, especially to manage large datasets
efficiently [6] [7] [8] .

Key Properties of B-Trees


Balanced Structure: All leaf nodes are at the same depth, ensuring the tree remains
balanced and operations are efficient [8] .
Multi-way Nodes: Each node can have more than two children (unlike binary trees), with
the number of children depending on the order (m) of the B-tree [8] .
Sorted Keys: Keys within each node are stored in ascending order, facilitating fast search
and retrieval [9] [8] .

8
Node Capacity: t1
Each non-root node contains between and keys.
at
Root node can have as few as one key.
bh

Each node (except root) has between and children [8] .


Self-Balancing: Insertion and deletion operations automatically rebalance the tree to
dd

maintain its properties [6] [8] .


si

Structure of a B-Tree Node


Internal Nodes: Contain keys and pointers to child nodes. Keys act as separators to guide
searches [7] .
Leaf Nodes: Contain keys and references (pointers) to actual data records or further
details [9] [8] .
Payload: In database indexing, each key is associated with a value (pointer/reference to the
data), collectively called the payload [9] [8] .

Operations on B-Trees
Search:
Follows the keys in internal nodes to the appropriate child, recursively, until the key is found
or determined absent. Search time is [9] [6] [8] .

Insertion:
Adds a key to the appropriate node. If the node overflows, it splits, and the middle key
moves up. This may propagate up to the root, maintaining balance [6] [8] .
Deletion:
Removes a key and may require merging or redistributing keys among nodes to maintain the
minimum key constraint [6] [8] .

B-Tree Indexing in Databases


Index Structure:
B-trees store keys (often primary or candidate keys) and references to actual data records.
In practice, B+ trees-a variant where all data is stored in leaf nodes-are often used for
database indexing [9] [8] .
Efficiency:
B-trees minimize disk I/O by reducing the tree height and maximizing the number of keys
per node (nodes typically correspond to disk pages), which is crucial for large datasets
stored on secondary storage [7] [8] .
Multi-level Indexing:
B-trees are an example of multi-level indexing, enabling fast access even in massive
tables [8] .

8
t1
Advantages of B-Trees in DBMS
at
Efficient Data Retrieval: Logarithmic search, insertion, and deletion times, even as data
bh

grows [6] [7] [8] .


Minimized Disk Access: Fewer tree levels mean fewer disk reads, which is critical for
dd

performance [7] [8] .


Scalable: Handles large datasets effectively due to its balanced, wide structure [7] [8] .
si

Sorted Data: Facilitates range queries and ordered traversals [9] [7] .

Example: How B-Tree Search Works


Suppose you want to search for a value (e.g., 48):
1. Start at the root node. Compare 48 with the keys in the root.
2. If 48 is less than a key, follow the left child; if greater, follow the right child.
3. Continue this process recursively until the key is found or you reach a leaf node without the
key [8] .
Summary Table: B-Tree Characteristics
Feature Description

Node Capacity Up to keys, children per node

Balanced All leaf nodes at the same depth

Search Time

Used For Database and file system indexing

Data Storage Keys and references (payload); B+ trees store data in leaves

Disk Access Optimized by matching node size to disk page size

Key Points to Remember


B-trees are foundational for efficient database indexing due to their balanced, scalable
structure and fast operations [6] [7] [8] [10] .
They minimize disk I/O, making them ideal for large, disk-based databases [7] [8] .
B-trees support dynamic growth and shrinkage, maintaining balance automatically [6] [8] .

In summary:
8
t1
B-trees are essential for high-performance database systems, enabling fast, scalable, and
at
reliable data access through balanced multi-level indexing [6] [7] [8] [10] .
bh


dd

Hashing in Database Management Systems (DBMS)


si

Definition:
Hashing is a technique in DBMS that allows direct access to data records on disk by calculating
their storage address using a hash function, instead of traversing index structures. This method
greatly accelerates data retrieval, especially for large datasets [11] [12] [13] .

Key Concepts in Hashing


Hash Function:
A mathematical function that takes a search key (often the primary key) as input and computes
a fixed-size address (hash code) where the record should be stored or retrieved from. Common
examples include modulo operations (e.g., h(K) = K mod n) [11] [12] [13] [14] .
Buckets:
Storage units (often disk blocks) where records are placed. Each bucket can store one or more
records. The hash function maps keys to specific buckets [12] [13] [14] .
Direct Address Calculation:
Hashing computes the storage location of a record directly, bypassing the need for multi-level
index traversal [11] [12] [13] .

Types of Hashing
1. Static Hashing
The number of buckets is fixed at the time of creation and does not change [12] [13] [14] .
The same search key always maps to the same bucket address.
Example: If the hash function is h(K) = K mod 5, keys are distributed among 5 buckets.
Limitation: Cannot adapt to changes in data volume; leads to overflow if buckets become
full [12] [13] [14] .
2. Dynamic Hashing (Extendible Hashing)
The number of buckets can grow or shrink dynamically as data is inserted or deleted [13] [14] .
Only a prefix of the hash value may be used to determine the bucket address, allowing for
flexible expansion [13] .
Helps prevent overflow and efficiently utilizes storage.
Commonly used in modern DBMS to handle unpredictable data growth [13] [14] .

8
t1
Hashing Operations
at

Insertion:
bh

The hash function computes the bucket address where the new record will be stored [12] [13] .
dd

Search:
The hash function computes the address, and the record is retrieved directly from the
si

corresponding bucket [12] [13] .


Deletion:
The record is located using the hash function and removed from its bucket [12] [13] .

Collision and Resolution Techniques


Collision:
Occurs when two or more keys hash to the same bucket address [13] . This is a common issue in
hashing.
Resolution Methods:
Overflow Chaining (Closed Hashing):
If a bucket is full, additional records are stored in a linked list or chain attached to that
bucket [13] .
Linear Probing (Open Hashing):
If a bucket is full, the next available bucket is used to store the record [13] .
Characteristics and Use Cases
Fast Equality Search:
Hashing is highly efficient for exact match queries (e.g., searching for a record with a
specific key) [15] [16] .
Not Suitable for Range Queries:
Hashing does not support range-based searches (e.g., finding all records between two
values), as the hash function scatters related keys across buckets [15] [16] .
Best for Random, Discrete Data:
Hashing performs best when data is randomly distributed and queries are for specific
values [13] .

Advantages of Hashing
Constant Time Operations:
Most hash operations (insert, search, delete) can be performed in constant time, regardless
of data size [13] .
Efficient Storage and Retrieval:
Direct calculation of addresses reduces the need for index traversal, minimizing disk I/O [11]

8
[12] [13] . t1
Dynamic Adaptation (with dynamic hashing):
at
Can handle growing or shrinking datasets efficiently [13] [14] .
bh

Disadvantages of Hashing
dd

Collisions:
si

Require additional mechanisms for resolution, which can add complexity [13] .
No Ordering:
Data is not stored in any sorted order, making range queries inefficient [13] [15] .
Not Ideal for All Workloads:
Less effective for queries requiring data ordering or range searches [13] [15] .

Summary Table: Static vs. Dynamic Hashing


Feature Static Hashing Dynamic Hashing

Bucket Count Fixed Grows/Shrinks as needed

Collision Handling Chaining/Probing Flexible bucket allocation

Scalability Limited High

Use Case Small, stable datasets Large, evolving datasets


Key Points to Remember
Hashing is a direct access technique using hash functions and buckets for fast data
retrieval [11] [12] [13] .
Static hashing uses a fixed number of buckets; dynamic hashing adapts to data growth [12]
[13] [14] .

Collisions are inevitable and must be resolved using chaining or probing [13] .
Hashing is ideal for equality searches but not for range queries [15] [16] .

In summary:
Hashing in DBMS is a powerful technique for efficient, direct data access, particularly suited for
large datasets and workloads dominated by equality searches. Understanding its mechanisms,
strengths, and limitations is essential for effective database design and optimization [11] [12] [13] [15]
[16] .

1. https://en.wikipedia.org/wiki/Database_index
2. https://www.scaler.com/topics/dbms/indexing-in-dbms/

8
3. https://www.studocu.com/in/document/budge-budge-institute-of-technology/multimedia-systems/unit-
t1
3-storage-strategies-indices-b-trees-hashing/108104634
4. https://www.scribd.com/document/669769472/Storage-indices-b-tree-hashing-in-dbms
at
5. https://blog.algomaster.io/p/a-detailed-guide-on-database-indexes
bh

6. https://en.wikipedia.org/wiki/B-tree
7. https://www.pingcap.com/article/understanding-basics-b-tree-data-structures/
dd

8. https://www.scaler.com/topics/b-tree-in-dbms/
si

9. https://builtin.com/data-science/b-tree-index
10. https://www.wscubetech.com/resources/dsa/b-tree
11. https://byjus.com/gate/hashing-in-dbms-notes/
12. https://www.tutorialspoint.com/dbms/dbms_hashing.htm
13. https://prepinsta.com/dbms/hashing/
14. https://www.codecademy.com/resources/blog/what-is-hashing/
15. https://codefinity.com/courses/v2/d90d9403-ce34-4555-b549-6bb5773a48a2/1128b6ab-f333-4267-
893e-98c6efa140e3/da696862-fa0a-405f-9c04-e5420f3488fb
16. https://www.pingcap.com/article/understanding-b-tree-and-hash-indexing-in-databases/
Concurrency Control in Database Management Systems (DBMS)
Definition and Importance
Concurrency control in DBMS is the set of techniques and protocols used to manage
simultaneous operations (transactions) on a database, ensuring that the integrity, consistency,
and isolation of data are maintained even when multiple users or applications access or modify
data at the same time [1] [2] [3] . Without proper concurrency control, simultaneous transactions
can interfere with each other, leading to problems such as data inconsistency, lost updates, and
dirty reads [4] [5] .

Why Concurrency Control is Needed


When transactions run concurrently without control, the following problems can occur:
Lost Update Problem: Two transactions update the same data item, but one update is lost
because the other transaction overwrites it [3] .
8
t1
Dirty Read: A transaction reads data written by another uncommitted transaction, which
at
may later be rolled back [5] .
bh

Non-repeatable Read: A transaction reads the same data item twice and gets different
values because another transaction modified the data in between [5] .
dd

Phantom Read: A transaction re-executes a query and sees a different set of rows due to
another transaction's insert or delete [5] .
si

Goals of Concurrency Control


Atomicity: Each transaction is all-or-nothing.
Consistency: Transactions take the database from one valid state to another.
Isolation: Each transaction executes as if it is the only one in the system.
Serializability: The outcome of executing transactions concurrently is the same as if they
were executed serially [1] [2] .

Concurrency Control Techniques


1. Lock-based Protocols
Pessimistic Locking: Locks data items before accessing them to prevent conflicts. Types:
Read Lock (Shared Lock): Multiple transactions can read but not modify.
Write Lock (Exclusive Lock): Only one transaction can read or modify [4] [6] .
Two-Phase Locking (2PL):
Growing Phase: Transaction acquires all the locks it needs but cannot release any.
Shrinking Phase: Transaction releases locks and cannot acquire any new ones.
Strict 2PL (Strong Strict 2PL or SS2PL): All locks are released only after the transaction
ends, ensuring serializability and recoverability [1] [7] .
Deadlock Handling:
Prevention: System ensures deadlocks never occur.
Detection and Recovery: System detects deadlocks and aborts transactions to
recover.
Avoidance: System uses additional information to avoid deadlocks [7] .

2. Timestamp-based Protocols
8
Each transaction is assigned a unique timestamp.
t1
at
Transactions are ordered based on their timestamps; older transactions get priority.
Ensures serializability by allowing transactions to proceed only if they do not violate the
bh

timestamp order [6] .


dd

3. Multiversion Concurrency Control (MVCC)


si

Maintains multiple versions of data items.


Readers access the most appropriate version, while writers create new versions.
Increases concurrency and performance by allowing reads and writes to occur without
blocking each other [1] [6] .

4. Validation (Optimistic) Concurrency Control


Transactions execute without restrictions and validate at commit time.
If a conflict is detected during validation, the transaction is rolled back or delayed.
Best suited for environments where conflicts are rare [7] .

5. Private Workspace Model (Deferred Update)


Each transaction works on a private copy of data.
Changes are applied to the database only at commit time, reducing contention [1] .
6. Index Concurrency Control
Synchronizes access to indexes rather than user data.
Specialized methods can provide significant performance improvements [1] .

Serializability
Conflict Serializability: Ensures that the schedule of transactions is equivalent to some
serial schedule by checking for conflicting operations.
View Serializability: Ensures that the schedule produces the same final state as a serial
schedule, even if the order of operations differs [5] .

Challenges in Concurrency Control


Isolation: Ensuring transactions do not interfere with each other.
Performance Overhead: Locking and validation can increase system overhead.
Deadlocks: Managing and resolving deadlocks efficiently.
Scalability: Ensuring the system scales with increasing numbers of concurrent
transactions [7] .
8
t1
at
Summary Table: Main Concurrency Control Techniques
bh

Technique Key Idea Advantages Disadvantages


dd

Two-Phase Locking Acquire all locks, then


Ensures serializability Can cause deadlocks
(2PL) release
si

Use transaction Can lead to transaction


Timestamp Ordering No deadlocks
timestamps aborts

Multiple versions of High concurrency, fewer More storage, complex


MVCC
data blocks management

No locks, high
Validation (Optimistic) Validate at commit Rollbacks possible
concurrency

Private Workspace Private copies, commit


Reduces contention May delay conflict detection
(Deferred) changes

Conclusion
Concurrency control is essential in DBMS to ensure data integrity and consistency in a multi-user
environment. Various protocols-such as locking, timestamp ordering, MVCC, and validation-are
used to manage concurrent transactions, each with its own strengths and trade-offs.
Understanding these techniques and their challenges is crucial for designing robust database
systems [1] [7] [2] [6] [5] .

ACID Properties in Database Management Systems (DBMS)


ACID stands for Atomicity, Consistency, Isolation, and Durability. These four properties
ensure that database transactions are processed reliably and maintain data integrity, even in the
presence of errors, power failures, or concurrent access [8] [9] [10] [11] .

Atomicity
Atomicity means that a transaction is an indivisible unit: it either completes fully or not at all.
If any part of a transaction fails, the entire transaction is rolled back, leaving the database
unchanged.
This prevents partial updates that could leave the database in an inconsistent state.
Example: In a bank transfer, if money is debited from one account but not credited to
another due to a failure, atomicity ensures the debit is also undone [8] [10] [11] [12] .

Consistency
Consistency ensures that a transaction takes the database from one valid state to another,

8
preserving all predefined rules, constraints, and data integrity.
t1
Any data written to the database must be valid according to all rules (such as foreign keys,
at
triggers, and constraints).
bh

If a transaction violates a rule, it is aborted and the database remains unchanged.


Example: If a rule states that every invoice must be linked to a customer, a transaction
dd

violating this rule will not be allowed [8] [10] [11] [12] .
si

Isolation
Isolation ensures that concurrent transactions do not interfere with each other.
The intermediate state of a transaction is invisible to other transactions; each transaction
executes as if it is the only one running.
This prevents problems like dirty reads, non-repeatable reads, and phantom reads.
Example: If two users try to update the same account balance simultaneously, isolation
ensures that each transaction sees a consistent view of the data and the final result reflects
both updates correctly [8] [10] [11] [13] [12] .

Durability
Durability guarantees that once a transaction is committed, its effects are permanent, even
in the event of a system crash or power failure.
Committed changes are saved to non-volatile storage and cannot be lost.
Example: After a successful transfer, the new account balances remain intact even if the
system fails immediately after the transaction [8] [10] [11] [14] [12] .
Summary Table
Property Description Example Scenario

Bank transfer: both debit and credit


Atomicity All or nothing: complete success or complete failure
must happen

Database moves from one valid state to another, Invoice must always have a valid
Consistency
preserving rules customer

Isolation Transactions do not affect each other’s execution Two users booking the same seat

Durability Committed transactions survive system failures Balance remains after power outage

Key Points
ACID properties are fundamental for reliable transaction processing in DBMS.
Most relational databases (e.g., MySQL, PostgreSQL, Oracle) are ACID compliant, but the
exact implementation may vary [15] [11] .
Understanding ACID is essential for designing robust, reliable, and consistent database
applications [8] [10] [11] [14] [12] .

8
⁂ t1
at
Serializability of Scheduling in Database Management Systems (DBMS)
bh

Definition
dd

Serializability is a fundamental concept in DBMS that ensures the correctness of concurrent


transaction execution. A schedule (the sequence of operations from multiple transactions) is
si

serializable if its outcome is equivalent to some serial execution of those transactions, meaning
the transactions could have been executed one after another without overlapping, producing
the same final database state [16] [17] [18] .

Importance of Serializability
Maintains data consistency and integrity during concurrent transaction execution [16] [19]
[17] .

Prevents anomalies such as lost updates, dirty reads, and inconsistent data.
Ensures that the database remains in a valid state, adhering to all business rules and
constraints, even with multiple users or processes accessing it at the same time [19] [17] [20] .

Types of Serializability
1. Conflict Serializability
Definition: A schedule is conflict serializable if it can be transformed into a serial schedule
by swapping non-conflicting operations [21] [17] [18] .
Conflicting Operations: Two operations conflict if they:
Belong to different transactions,
Operate on the same data item, and
At least one is a write operation.
Examples of Conflicts:
Read-Write (RW) conflict: One transaction reads, another writes the same data.
Write-Read (WR) conflict: One writes, another reads.
Write-Write (WW) conflict: Both write the same data [18] .
Testing Conflict Serializability: Use a precedence (serialization) graph:
Nodes represent transactions.
Edges represent conflicts (if one must precede another due to conflicts).
If the graph has no cycles, the schedule is conflict serializable [21] .

8
2. View Serializability
t1
at
Definition: A schedule is view serializable if it is view equivalent to a serial schedule [21] [17]
[18] .
bh

View Equivalence: Two schedules are view equivalent if:


dd

Each transaction reads the same initial value in both schedules.


Each read operation reads the value written by the same transaction in both schedules.
si

The final write on each data item is performed by the same transaction in both
schedules [18] .
Note: All conflict serializable schedules are view serializable, but not all view serializable
schedules are conflict serializable.

Serial vs. Serializable Schedules


Serial Schedule Serializable Schedule

Transactions execute one after another, Transactions execute concurrently but the result is equivalent to a
no overlap serial schedule [18]

Simple, but inefficient in multi-user


Efficient, allows concurrency while ensuring correctness [17]
systems
Achieving Serializability
Concurrency Control Mechanisms: DBMS uses methods like locking (two-phase locking),
timestamp ordering, and multi-version concurrency control (MVCC) to ensure
serializability [17] .
Commit and Rollback: Transactions are finalized (commit) or undone (rollback) to maintain
serializability and recoverability [22] .

Example
Suppose T1 and T2 both access account A:
Non-serializable schedule: T1 updates A, T2 reads old value of A before T1’s update is
visible, leading to inconsistency.
Serializable schedule: T2 either executes entirely before or after T1, ensuring a consistent
final state [21] [17] .

Key Points

8
Serializability is essential for correct concurrent transaction execution in DBMS.
t1
Conflict serializability is easier to test (using precedence graphs), while view
at
serializability is more general but harder to check.
Concurrency control is vital to enforce serializability, enabling safe parallelism and
bh

maintaining the integrity of the database system [17] .


dd

Summary:
Serializability ensures that even when transactions are executed concurrently, the result is as if
si

they were executed one after another, preserving data consistency and integrity in a multi-user
database environment [16] [17] [18] .

Locking-Based Schedulers in Database Management Systems (DBMS)


Locking-based schedulers are crucial mechanisms in DBMS for managing concurrent
transactions and ensuring data consistency and serializability. They use locks to control access
to data items, preventing conflicts and anomalies that can arise when multiple transactions
execute simultaneously.

Purpose of Locking-Based Schedulers


Synchronize access to database items by concurrent transactions.
Prevent issues like lost updates, dirty reads, and inconsistent data.
Ensure only one transaction can modify a data item at a time, or multiple can read but not
write concurrently [23] [24] [25] .
Types of Locks
Shared Lock (S) / Read Lock:
Allows a transaction to read a data item.
Multiple transactions can hold shared locks on the same item simultaneously.
No transaction can write while shared locks are held [24] [26] [27] .
Exclusive Lock (X) / Write Lock:
Allows a transaction to both read and write a data item.
Only one transaction can hold an exclusive lock on a data item at a time; no other
transaction can read or write that item [24] [26] [27] .

Lock Compatibility Table


Shared Lock (S) Exclusive Lock (X)

S Yes No

X No No

8
t1
at
Lock-Based Protocols
bh

1. Simplistic Lock Protocol


The simplest method: a transaction locks a data item before any operation (read/write), and
dd

unlocks it after completion.


si

Ensures data is protected during a transaction but can reduce concurrency [23] .
2. Pre-Claiming Lock Protocol
Before starting, a transaction requests all required locks.
If all locks are granted, it proceeds; otherwise, it waits.
Prevents deadlocks but may lead to reduced concurrency [23] .
3. Two-Phase Locking Protocol (2PL)
Divides transaction execution into two phases:
Growing Phase: Transaction acquires all required locks; no locks are released.
Shrinking Phase: Transaction releases locks; no new locks can be acquired.
Guarantees serializability but may cause deadlocks [23] [24] .
4. Strict Two-Phase Locking Protocol
A stricter version of 2PL: all exclusive locks are held until the transaction commits or aborts.
Prevents cascading rollbacks and is widely used in practice [23] .
Deadlock and Starvation
Deadlock: Occurs when two or more transactions wait indefinitely for locks held by each
other, forming a cycle [23] .
Starvation: A transaction waits indefinitely because other transactions keep acquiring the
required locks first. Can be prevented by using priority schemes such as aging [23] .

How Locking-Based Schedulers Work


Transactions must request and be granted appropriate locks before accessing data items.
The concurrency-control manager handles lock requests and ensures compatibility.
If a lock cannot be granted, the transaction waits.
Locks are released according to the protocol in use (e.g., after operation, at commit, or
following 2PL rules) [25] [27] .

Advantages and Disadvantages


Advantages:
Ensures data consistency and serializability.
8
t1
Prevents common concurrency anomalies [23] [24] [25] .
at
Disadvantages:
bh

Can cause deadlocks and starvation.


dd

May reduce concurrency and throughput due to waiting for locks [23] [24] .
si

Summary Table: Lock-Based Protocols


Protocol Key Feature Pros Cons

Simplistic Lock Lock before operation, unlock


Simple Low concurrency
Protocol after

Pre-Claiming Lock Request all locks before May block


Prevents deadlocks
Protocol starting transactions

Two-Phase Locking Growing and shrinking Can cause


Ensures serializability
(2PL) phases deadlocks

Strict Two-Phase Hold all exclusive locks until Prevents cascading May increase
Locking commit rollbacks waiting

In summary:
Locking-based schedulers are essential for managing concurrent transactions in DBMS. By using
shared and exclusive locks and protocols like 2PL, they ensure serializability and data integrity,
though they must also address challenges like deadlocks and starvation [23] [24] [25] [27] .

Timestamp-Based Schedulers in DBMS


Timestamp-based schedulers are concurrency control mechanisms that use timestamps to
order and manage transactions in a database, ensuring serializability and consistency without
using locks [28] [29] [30] .

Key Concepts
Timestamp:
A unique identifier assigned to each transaction when it enters the system. This can be
generated using the system clock or a logical counter [28] [31] [32] .
Older Transactions:
Transactions with smaller (earlier) timestamps are considered older and given higher priority
over newer transactions [28] [33] [31] .
Serializability:
The protocol ensures that the schedule of transactions is equivalent to some serial (one-

8
after-another) execution based on their timestamps [28] [30] .
t1
at
How Timestamp-Based Schedulers Work
bh

1. Assignment:
When a transaction $ T $ enters the system, it receives a timestamp $ TS(T) $ [28] [29] [30] .
dd

2. Ordering:
All operations (read/write) by $ T $ are tagged with $ TS(T) $. The protocol ensures
si

conflicting operations are executed in timestamp order [28] [33] [31] .


3. Data Item Timestamps:
For each data item $ X $, the DBMS maintains:
Read Timestamp ($ R_TS(X) $): Largest timestamp of any transaction that successfully
read $ X $.
Write Timestamp ($ W_TS(X) $): Largest timestamp of any transaction that
successfully wrote $ X $ [28] [33] .

Basic Timestamp Ordering Protocol


Read Operation ($ T $ wants to read $ X $):
If $ W_TS(X) > TS(T) $:
Abort and rollback $ T $ (as a newer transaction has already written $ X $).
Else:
Allow the read, and set $ R_TS(X) = \max(R_TS(X), TS(T)) $ [28] .
Write Operation ($ T $ wants to write $ X $):
If $ R_TS(X) > TS(T) $ or $ W_TS(X) > TS(T) $:
Abort and rollback $ T $ (as a newer transaction has read or written $ X $).
Else:
Allow the write, and set $ W_TS(X) = TS(T) $ [28] .
Result:
Any operation that violates the timestamp order is rejected, and the transaction is aborted
and rolled back [28] [30] .

Strict Timestamp Ordering


In strict timestamp ordering, a transaction's read or write is delayed until any transaction
that previously wrote the data item has committed or aborted.
This ensures both strictness and conflict serializability, reducing the risk of cascading
rollbacks [28] .

Advantages
Deadlock-Free:
8
t1
No transaction waits for locks, so deadlocks do not occur [28] [30] .
at
Serializability:
bh

Always produces conflict-serializable schedules [28] [30] .


High Concurrency:
dd

Multiple transactions can proceed simultaneously, improving throughput [30] .


si

Disadvantages
Cascading Rollbacks:
If an older transaction aborts, all newer transactions that read its data must also abort
(unless using strict timestamp ordering) [28] [32] .
Starvation:
Long-running or older transactions may be repeatedly aborted if they conflict with many
newer transactions [32] .
Overhead:
Maintaining and updating timestamps for all data items can increase system overhead [32] .
Comparison: Lock-Based vs. Timestamp-Based Schedulers
Feature Lock-Based Schedulers Timestamp-Based Schedulers

Deadlocks Possible Not possible

Waiting Transactions may wait No waiting (abort instead)

Starvation Less common Possible for older transactions

Overhead Lock management Timestamp management

Serializability Ensured via protocols Ensured by timestamp order

Summary Table: Key Points


Aspect Description

Timestamp Assignment Each transaction gets a unique, increasing timestamp

Data Item Timestamps Each data item tracks last read and write timestamps

Conflict Resolution Operations violating timestamp order cause transaction abort

8
Deadlock Handling No deadlocks (no waiting), but possible starvation
t1
Serializability Always ensured
at
Drawbacks Cascading rollbacks, starvation, timestamp management overhead
bh

In summary:
Timestamp-based schedulers in DBMS use transaction timestamps to order and control
dd

concurrent execution, guaranteeing serializability and deadlock-free operation but may suffer
from cascading rollbacks and starvation for long-running transactions [28] [30] [32] .
si

Multiversion Concurrency Control (MVCC) Schemes in DBMS


Multiversion Concurrency Control (MVCC) is an advanced concurrency control technique used
by modern database management systems to handle simultaneous transactions efficiently,
ensuring data consistency, integrity, and high performance without heavy reliance on locking
mechanisms [34] [35] [36] .

Key Concepts of MVCC


Multiple Versions:
MVCC maintains multiple versions of each data record. Every update creates a new version,
while older versions are retained for ongoing transactions [34] [36] .
Version Numbers/Timestamps:
Each version of a record is tagged with a unique version number or timestamp, indicating
when it was created or last modified [34] [36] .
Snapshot Isolation:
Each transaction operates on a consistent snapshot of the database as of its start time. This
means transactions see a stable view of the data, unaffected by concurrent updates [36] [37] .

How MVCC Works


1. Read Operations:
When a transaction reads a record, it sees the latest version that was committed before
the transaction began.
Reads do not block writes; multiple transactions can read the same or different versions
simultaneously [34] [36] [37] .
2. Write Operations:
When a transaction updates a record, the DBMS creates a new version with an
incremented version number or timestamp.
Other transactions continue to see the old version until the new version is committed [34]
[36] .

8
Once committed, future reads access the new version; further updates create additional
t1
versions [34] .
at
3. Version Cleanup:
bh

Old versions are periodically removed (garbage collected) when they are no longer
needed by any active transaction.
dd

For example, PostgreSQL uses a process called "VACUUM" to clean up obsolete


versions and reclaim space [34] .
si

Types of MVCC Schemes


Snapshot Isolation:
Each transaction gets a snapshot of the database at its start time, preventing read-write
conflicts and ensuring consistent reads [36] .
Serializable Snapshot Isolation:
Enhances snapshot isolation by ensuring the outcome is equivalent to some serial execution,
providing stronger consistency guarantees [36] .

Advantages of MVCC
High Concurrency:
Multiple transactions can read and write simultaneously without blocking each other,
improving throughput and user experience [34] [38] [36] .
Reduced Lock Contention:
MVCC minimizes the need for locks, reducing contention and the likelihood of deadlocks [34]
[38] [36] [37] .

Improved Read Performance:


Reads are fast and never blocked by writes, as they access the appropriate version of the
data [34] [38] .
Isolation:
Each transaction sees a consistent snapshot, ensuring transactional isolation and preventing
phenomena like dirty reads [36] [37] .

Drawbacks of MVCC
Storage Overhead:
Maintaining multiple versions increases database size and can lead to "version bloat" [34]
[38] [39] .

Garbage Collection:
The system must periodically clean up obsolete versions, which can be resource-
intensive [34] [39] .
Complex Implementation:
MVCC mechanisms are complex to implement, though this complexity is hidden from end

8
users and developers [34] [38] [39] . t1
Cascading Aborts:
at
If a transaction that created a version aborts, other transactions that read this version may
also need to abort [39] .
bh
dd

Comparison: MVCC vs. Locking-Based Concurrency Control


si

Aspect MVCC Locking-Based Control

Concurrency High (reads and writes don’t block) Lower (locks can block access)

Deadlocks Rare/none Possible

Read Performance Fast, non-blocking Can be blocked by writes

Storage Higher (multiple versions) Lower (single version)

Complexity Higher (version management) Simpler

Real-World Example: PostgreSQL MVCC


Implementation:
PostgreSQL uses MVCC to provide transaction isolation. Each transaction sees a snapshot
of the database, and the "VACUUM" process removes obsolete versions to control
database size [34] [37] .
User Transparency:
The details of MVCC are handled by the DBMS, so developers and users interact with the
database as usual, unaware of the underlying version management [34] [38] .

Summary Table: MVCC Key Points


Feature Description

Multiple Versions Maintains several versions of each record

Snapshot Isolation Transactions see a consistent snapshot of the database

No Read-Write Blocking Reads and writes occur simultaneously without blocking

Storage Overhead Requires periodic cleanup of obsolete versions

High Concurrency Supports many simultaneous transactions efficiently

In summary:
Multiversion concurrency control (MVCC) is a powerful and widely adopted technique in DBMS
that enables high concurrency and performance by allowing simultaneous reads and writes
through versioning, reducing lock contention and deadlocks, but at the cost of increased
storage and system complexity [34] [38] [36] [37] [39] .

8
⁂ t1
at
Optimistic Concurrency Control (OCC) Schemes in DBMS
bh

Optimistic Concurrency Control (OCC) is a non-locking concurrency control technique used in


database management systems to manage simultaneous transactions. OCC assumes that
dd

conflicts between transactions are rare and allows transactions to execute without locking data
resources, only checking for conflicts at the end of the transaction [40] [41] [42] .
si

Key Principles
No Locks During Execution: Transactions proceed without acquiring locks on data items,
allowing maximum concurrency and resource utilization [40] [41] [43] .
Conflict Detection at Commit: Before a transaction commits, the system checks whether
any other transaction has modified the data it accessed. If a conflict is detected, the
transaction is rolled back and may be retried [40] [44] [41] .
Best for Low Contention: OCC is ideal for environments where data conflicts are infrequent,
such as read-heavy workloads or systems with many users but few overlapping updates [42]
[41] [43] .
Phases of Optimistic Concurrency Control
OCC typically divides each transaction into three main phases [40] [44] :

1. Read Phase
The transaction reads data from the database and stores it in local variables (workspace).
All operations are performed on these local copies.
No changes are made to the actual database during this phase [44] .

2. Validation Phase
Before committing, the transaction checks whether any other concurrent transaction has
modified the data items it read or intends to write.
The system compares the transaction’s read and write sets with those of other transactions
to detect conflicts.
If no conflicts are found, the transaction proceeds to commit; otherwise, it is rolled back [40]
[44] .

8
3. Write Phase t1
If validation succeeds, the transaction writes its changes to the database.
at
If validation fails, the transaction is aborted and may be retried [40] [44] .
bh

Validation Rules
dd

Backward Validation: Checks if any transaction that committed after the current
si

transaction started has written to a data item read by the current transaction [44] [45] .
Forward Validation: Checks if the current transaction’s write set conflicts with the read sets
of active transactions [45] .
Serializability: The validation ensures that the resulting schedule is serializable, maintaining
database consistency [40] [44] .

Advantages of OCC
High Concurrency: Multiple transactions can proceed in parallel without waiting for locks,
leading to higher throughput in low-contention environments [42] [41] [43] [45] .
No Deadlocks: Since no locks are held, deadlocks are impossible [40] [41] .
Reduced Lock Management Overhead: Eliminates the need for lock acquisition and release,
reducing system overhead [45] .
Drawbacks of OCC
Transaction Rollbacks: If conflicts are frequent, many transactions may be rolled back,
reducing overall performance [42] [40] [41] .
Starvation: Long-running transactions may be repeatedly aborted if they often conflict with
shorter transactions [40] .
Not Suitable for High Contention: In write-heavy or high-contention environments,
pessimistic approaches may be more efficient [42] [41] [43] .

Example Workflow
Suppose two transactions, T1 and T2, both read and attempt to update the same data item:
Both read the original value and make local changes.
At commit time, both validate whether the data has changed since they read it.
Only one transaction will succeed; the other will be rolled back and may retry [41] [46] .

Comparison: Optimistic vs. Pessimistic Concurrency Control

8
Feature
t1
Optimistic Concurrency Control Pessimistic Concurrency Control
at
Locking No locks during transaction Locks acquired before data access
bh

Deadlocks Not possible Possible

Performance (Low Contention) High Moderate


dd

Performance (High
Low (due to rollbacks) High (due to blocking)
Contention)
si

Read-heavy, low-conflict Write-heavy, high-conflict


Use Case
workloads workloads

Summary Table: OCC Key Points


Aspect Description

Approach No locks, conflict detection at commit

Phases Read, Validation, Write

Best For Low-contention, read-heavy environments

Pros High concurrency, no deadlocks, lower locking overhead

Cons Frequent rollbacks in high contention, possible starvation of transactions

In summary:
Optimistic concurrency control schemes in DBMS maximize concurrency by allowing
transactions to execute without locks and validating for conflicts only at commit time. This
approach is highly efficient in environments with low data contention but can suffer from high
rollback rates when conflicts are frequent [40] [42] [41] .

Database Recovery in DBMS


Database recovery in DBMS refers to the processes and techniques used to restore a database
to a correct, consistent state after a failure. Recovery ensures that the database maintains the
ACID properties-especially atomicity and durability-by either completing all the effects of
committed transactions or rolling back incomplete ones [47] [48] [49] .

Types of Failures Requiring Recovery


Transaction Failure: Individual transactions fail due to logical errors or system issues.
System/Crash Failure: The entire database system crashes (e.g., power failure, OS crash),
possibly leaving the database in an inconsistent state [48] [50] .
Media/Storage Failure: Physical damage to storage devices, leading to loss or corruption of
database files [49] .

8
t1
Catastrophic Failure: Natural disasters or events that destroy the database and backups
(e.g., fire, earthquake) [49] .
at
bh

Goals of Database Recovery


dd

Restore Consistency: Ensure the database is in a consistent state after failure.


Preserve Atomicity: Incomplete transactions are fully undone.
si

Ensure Durability: Effects of committed transactions are retained [47] .

Key Recovery Techniques

1. Log-Based Recovery
Transaction Logs: Every operation (start, read, write, commit, abort) is recorded in a log file
stored on stable storage [51] [47] .
Example log entries:
<Tn, Start>: Transaction Tn started.
<Tn, X, V1, V2>: Tn changed X from V1 to V2.
<Tn, Commit>: Tn committed.
Write-Ahead Logging (WAL): Logs are written before any changes are applied to the
database, ensuring recoverability [52] .
Undo/Redo Operations:
Undo: Reverses changes of uncommitted transactions.
Redo: Reapplies changes of committed transactions that may not have been saved to
disk before the crash [51] [47] [50] .
Checkpointing: Periodically, the system records a checkpoint, a point where the database
is known to be consistent. During recovery, the system only needs to process logs from the
last checkpoint, improving efficiency [51] [47] [50] .

2. Deferred and Immediate Update


Deferred Update: Database is only updated after a transaction commits. If a failure occurs
before commit, no changes need to be undone [53] .
Immediate Update: Changes are applied to the database as transactions execute, but logs
ensure that uncommitted changes can be undone if necessary [53] .

3. Backup and Restore


Backup: Regular copies of the database are stored on different media or locations
(immediate or archival backups) [54] [52] [49] .
Restore: In case of severe failure, the database is restored from the most recent backup,

8
and logs are used to roll forward or roll back transactions as needed [54] [49] .
t1
4. Shadow Paging
at

Maintains two versions (pages) of the database: the current and a shadow copy. Updates
bh

are made to the current page, and the shadow page is only updated after a successful
commit, ensuring atomicity and easy rollback [52] .
dd
si

Recovery Process Phases


1. Analysis Phase: Identifies the point of failure and active transactions at that time [52] .
2. Redo Phase: Reapplies changes of committed transactions to ensure all their effects are
reflected in the database [52] .
3. Undo Phase: Reverses changes made by transactions that were active but not committed
at the time of failure [52] .

Recovery with Concurrent Transactions


The recovery system maintains two lists:
Undo-list: Transactions started but not committed (to be undone).
Redo-list: Transactions that committed (to be redone if necessary) [51] .
The system reads logs backward from the end to the last checkpoint, processes undo and
redo lists accordingly, and brings the database to a consistent state [51] .
Types of Recovery
Type Description

Crash Recovery Restores the database after a system crash, rolling back incomplete transactions [48] [49] .

Disaster Recovery Restores the database after catastrophic events using backups [49] .

Version Recovery Restores the database to a previous backup version [49] .

Rollforward Recovery Applies committed changes from logs after restoring a backup [49] .

Summary Table: Recovery Techniques


Technique How it Works Use Case

Log-Based
Uses logs for undo/redo, supports checkpoints Most common, all failures
Recovery

Deferred Update Updates DB only at commit, easy undo Simple, low-concurrency

Updates as transaction proceeds, logs required for


Immediate Update High-concurrency
undo

8
Backup & Restore Restores from periodic backups t1 Media/catastrophic failure

Simple systems, low


Shadow Paging Maintains shadow copy for atomicity
at
overhead
bh

In summary:
Database recovery in DBMS is essential for maintaining consistency, atomicity, and durability
dd

after failures. Techniques like log-based recovery, deferred/immediate update, checkpointing,


and backup/restore are used to ensure that committed transactions are preserved and
si

incomplete ones are rolled back, bringing the database back to a consistent state [51] [47] [48] [49]
[50] [53] .

1. https://www.scaler.com/topics/dbms/concurrency-control-in-dbms/
2. https://www.dremio.com/wiki/concurrency-control/
3. https://www.studocu.com/in/document/dr-apj-abdul-kalam-technical-university/database-management
-system/dbms-unit-5-notes/48082167
4. https://www.scribd.com/presentation/320810503/Concurrency-Control-in-DataBase
5. https://www.slideshare.net/slideshow/concurrency-control-in-advanced-database/266475028
6. https://www.solarwinds.com/resources/it-glossary/database-concurrency
7. https://www.shiksha.com/online-courses/articles/concurrency-control-techniques-in-dbms/
8. https://byjus.com/gate/acid-properties-in-dbms-notes/
9. https://www.databricks.com/glossary/acid-transactions
10. https://www.scaler.com/topics/dbms/acid-properties-in-dbms/
11. https://en.wikipedia.org/wiki/ACID
12. https://mariadb.com/kb/en/acid-concurrency-control-with-transactions/
13. https://www.freecodecamp.org/news/how-databases-guarantee-isolation/
14. https://www.simplilearn.com/acid-properties-in-dbms-article
15. https://www.mongodb.com/resources/basics/databases/acid-transactions
16. https://www.scaler.com/topics/dbms/serializability-in-dbms/
17. https://www.fynd.academy/blog/serializability-in-dbms
18. https://www.tutorialspoint.com/what-is-the-term-serializability-in-dbms
19. https://www.theknowledgeacademy.com/blog/serializability-in-dbms/
20. https://www.prepbytes.com/blog/dbms/serializability-in-dbms/
21. https://www.upgrad.com/blog/serializability-in-dbms/
22. https://blog.purestorage.com/purely-educational/what-does-serializability-mean-in-a-dbms/
23. https://www.scaler.com/topics/lock-based-protocol-in-dbms/
24. https://sitams.org/wp-content/uploads/2023/COURSE/MCA/DBMS_UNIT_V.pdf
25. https://www.guru99.com/dbms-concurrency-control.html
26. https://www.db-book.com/slides-dir/PDF-dir/ch18.pdf
27. https://www.tmu.ac.in/assets/pdf/coe/e-content/ECS_306/DBMS_Unit-5.pdf
28. https://www.scaler.com/topics/timestamp-based-protocols-in-dbms/

8
29. https://www.guru99.com/dbms-concurrency-control.html
t1
30. https://www.tutorialspoint.com/concurrency-control-based-on-timestamp-ordering
at
31. https://beginnersbook.com/2022/07/timestamp-based-ordering-protocol/
bh

32. https://15445.courses.cs.cmu.edu/fall2021/notes/17-timestampordering.pdf
dd

33. https://cse.poriyaan.in/topic/timestamp-based-protocol-50884/
34. https://www.theserverside.com/blog/Coffee-Talk-Java-News-Stories-and-Opinions/What-is-MVCC-Ho
si

w-does-Multiversion-Concurrencty-Control-work
35. https://en.wikipedia.org/wiki/Multiversion_concurrency_control
36. https://celerdata.com/glossary/multi-version-concurrency-control
37. https://www.postgresql.org/docs/7.1/mvcc.html
38. https://www.tutorialspoint.com/multiversion-concurrency-control-techniques
39. https://gpttutorpro.com/multiversion-concurrency-control-in-databases/
40. https://en.wikipedia.org/wiki/Optimistic_concurrency_control
41. https://www.freecodecamp.org/news/how-databases-guarantee-isolation/
42. https://www.linkedin.com/advice/0/what-benefits-drawbacks-using-optimistic-concurrency-control
43. https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/optimistic-concurrency
44. https://www.tutorialspoint.com/what-is-an-optimistic-concurrency-control-in-dbms
45. https://codemia.io/knowledge-hub/path/backwardforward_validation_in_optimistic_concurrency_control
46. https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/database-transactions-optimistic-concurrenc
y
47. https://www.scaler.com/topics/recovery-techniques-in-dbms/
48. https://www.ibm.com/docs/en/db2/11.1?topic=recover-crash-recovery
49. https://www.ibm.com/docs/en/db2/12.1.0?topic=administration-data-recovery
50. https://www.slideshare.net/slideshow/crash-recovery-in-database/233050920
51. https://www.tutorialspoint.com/dbms/dbms_data_recovery.htm
52. https://www.youtube.com/watch?v=1pSxXwy0qiE
53. https://dspmuranchi.ac.in/pdf/Blog/Database Recovery techniques.pdf
54. https://takeuforward.org/dbms/database-recovery-management

8
t1
at
bh
dd
si
Authentication in Database Management Systems (DBMS)
Definition
Authentication in a DBMS is the process of verifying the identity of a user, device, or system
attempting to access a database. Its primary goal is to ensure that only authorized users can
interact with the database and its contents, thereby protecting sensitive information from
unauthorized access [1] [2] [3] .

Purpose of Authentication
Prevents unauthorized access to database resources.
Maintains data integrity and confidentiality.
Forms the first line of defense in database security [4] [2] [5] .

How Authentication Works

8
t1
Users must provide credentials (such as username and password) when attempting to
access the database.
at
The DBMS compares these credentials against stored data (usually in an encrypted format).
bh

If the credentials match, access is granted; otherwise, it is denied [2] [3] .


dd

Authentication is always performed before authorization (which determines what actions the
user can perform) [3] .
si

Common Authentication Mechanisms


Username and Password: The most widely used method. Users enter a unique username
and a secret password. The DBMS verifies these against its records [4] [5] [2] .
Integrated Security: Utilizes the operating system or network authentication (e.g., Windows
Authentication in SQL Server) to validate users [4] .
Certificates and Keys: Employs digital certificates or cryptographic keys for enhanced
security, often used in enterprise environments [4] .
Single Sign-On (SSO): Allows users to access multiple applications and databases with a
single set of credentials, streamlining the login process. Common protocols include SAML
and OpenID Connect [1] .
Multi-Factor Authentication (MFA): Requires users to provide multiple forms of verification
(e.g., password plus a one-time code or biometric data) for stronger security [2] .

Types of Authentication in DBMS


Authentication Type Description

Users are authenticated directly by the DBMS using credentials stored within the
DBMS Authentication
database. No need for corresponding OS accounts [6] [7] .

Operating System The DBMS relies on the OS to authenticate users. If the user is logged into the OS,
Authentication they are granted access to the database.

Third-Party Uses external services or protocols (e.g., Kerberos, LDAP, SSO) to authenticate
Authentication users.

DBMS Authentication Example


In systems like Actian Vector or Ingres, DBMS authentication can be enabled at the server
level.
Users can be configured as:
WITH DBMS_AUTHENTICATION='REQUIRED': Must use DBMS credentials.
WITH DBMS_AUTHENTICATION='OPTIONAL': Can use other authentication methods if
available [6] [7] .

Authentication vs. Authorization

Aspect Authentication
8 Authorization
t1
Determines what actions the authenticated user can
at
Purpose Verifies identity of the user or system
perform
bh

Sequence Always comes first Follows authentication

Logging in with username and


dd

Example Granting read/write privileges to a table


password
si

Importance of Authentication in DBMS


Ensures only legitimate users can access and manipulate data.
Protects against data breaches and insider threats.
Supports compliance with data protection regulations [2] [5] [3] .

Summary of Key Points


Authentication is essential for database security.
Multiple authentication methods are available, each with varying levels of security and
convenience.
Proper authentication is the foundation for further security mechanisms like authorization
and encryption [5] [3] .

References for Further Study


DBMS documentation for specific authentication configuration.
Security standards like SAML, OpenID Connect, and Kerberos.
Database security best practices and compliance requirements.

Exam Tip: Always distinguish between authentication (identity verification) and authorization
(permission granting) in your answers, and be able to describe at least two authentication
mechanisms with examples.

Authorization in Database Management Systems (DBMS)


Definition
Authorization in a DBMS is the process of granting or denying specific permissions to users or
roles, determining what actions they are allowed to perform on database resources after their
identity has been authenticated [8] [9] [10] .

Purpose of Authorization
Controls access to database objects (tables, views, schemas, etc.).
Ensures users can only perform actions necessary for their roles (principle of least privilege)
[9] .

8
t1
Protects sensitive information and maintains data integrity by restricting unauthorized
operations.
at

How Authorization Works


bh

1. Authentication First: The system verifies the user's identity through authentication.
dd

2. Access Evaluation: The DBMS checks what permissions (privileges) the authenticated user
has.
si

3. Grant or Deny: Based on these permissions, the system allows or blocks access to
requested resources or operations [11] [10] .
4. Logging and Revocation: Activities may be logged, and permissions can be revoked if user
roles or policies change [11] .

Key Concepts in Authorization


Privileges
Specific rights to perform actions, such as SELECT, INSERT, UPDATE, DELETE on database
objects [12] [8] .
Privileges are stored in the database catalogs and can be granted or revoked.
Types of Privileges
System Privileges: Allow users to perform administrative tasks (e.g., creating tables, users,
or databases) [12] .
Object Privileges: Allow users to access or manipulate specific database objects (e.g.,
select data from a table, update a column) [12] .
Column Privileges: Restrict access to specific columns within a table [12] .
Roles
A role is a collection of privileges that can be assigned to users or groups, simplifying
management [12] [8] .
Roles can be created, granted, revoked, or dropped.
Granting and Revoking Privileges
GRANT: Used to assign privileges or roles to users or other roles [12] [8] .
REVOKE: Used to remove previously granted privileges or roles.
Access Control Models
Discretionary Access Control (DAC): Access is based on user identity or group
membership; object owners manage permissions [13] .
Mandatory Access Control (MAC): Access is based on fixed policies, often used in secure
environments [13] .

8
Role-Based Access Control (RBAC): Permissions are grouped by roles, and users are
t1
assigned roles according to their job functions [13] .
at
Attribute-Based Access Control (ABAC): Permissions are granted based on attributes (e.g.,
department, location, time) [13] .
bh

Authorization Flow in DBMS


dd

1. User logs in (authentication).


si

2. User requests an action (e.g., SELECT on a table).


3. DBMS checks user's roles and privileges.
4. If authorized, action is allowed; otherwise, it is denied.
5. Actions and access may be logged for auditing.

Examples
A user with only SELECT privilege on a table cannot modify its data.
A database administrator (DBA) may have all system privileges, allowing them to manage
users, roles, and database resources.
A sales role may have privileges to view and update sales data but not to access payroll
information.

Difference Between Authentication and Authorization

Authentication Authorization

Verifies the identity of the user Determines what actions the user can perform
Authentication Authorization

Always comes first Follows authentication

Example: Logging in with a password Example: Granting SELECT privilege on a table

Summary of Key Points


Authorization determines what authenticated users can do in the database [8] [9] [10] .
It is managed through privileges, roles, and access control models.
Proper authorization is essential for database security, enforcing least privilege, and
regulatory compliance.

Exam Tip: Be able to define authorization, describe types of privileges and roles, explain
GRANT/REVOKE, and distinguish between DAC, MAC, and RBAC models with examples.

Access Control in Database Management Systems (DBMS)


Definition

8
t1
Access control in a DBMS refers to the set of policies, models, and mechanisms that determine
which users or processes are permitted to access, modify, or manage database resources. Its
at
primary goal is to protect data from unauthorized access and ensure that users can only perform
bh

actions for which they have explicit permission [14] [15] [16] .
dd

Core Components of Access Control


Authentication: Verifies the identity of users attempting to access the database.
si

Authorization: Determines the specific actions authenticated users are allowed to perform
on database objects (e.g., tables, views) [14] [15] .
Access Enforcement: The system enforces access decisions by allowing or denying
requested operations.

Major Access Control Models in DBMS

Model Name Description Typical Use Cases

Resource owners grant or revoke access to Most relational DBMS,


Discretionary Access
other users. Highly flexible but can become collaborative environments [14] [16]
Control (DAC) [17]
complex.

Central authority defines access based on


Mandatory Access Government, military, high-
security labels/clearances. Users cannot
Control (MAC) security organizations [18] [14] [16]
override.

Role-Based Access Permissions are assigned to roles, and users Enterprises, organizations with
Control (RBAC) are assigned roles. Simplifies management. defined job functions [19] [14] [16]
[17]
Model Name Description Typical Use Cases

Access decisions are based on user,


Attribute-Based Access Modern, complex, or cloud
resource, and environmental attributes.
Control (ABAC) environments [14] [15]
Highly dynamic.

Rule-Based Access
Access is controlled by pre-defined rules or Systems needing flexible, policy-
Control (RBAC or RB-
policies, often time or context-based. driven controls [19] [20]
RBAC)

Descriptions of Key Models


Discretionary Access Control (DAC)
Owners of database objects (e.g., tables) have the authority to grant or revoke privileges to
other users.
Example: User A creates a table and gives User B SELECT access [14] [16] [17] .
Advantage: Flexible and user-driven.
Drawback: Can become difficult to manage in large systems.
Mandatory Access Control (MAC)

8
Access is governed by a central policy, often using security labels (e.g., confidential,
t1
secret).
Only administrators can change access policies; users cannot delegate permissions [18] [14]
at
[16] .
bh

Advantage: High security, suitable for sensitive environments.


dd

Drawback: Less flexible, can be restrictive.


Role-Based Access Control (RBAC)
si

Permissions are grouped into roles (e.g., "manager," "analyst"), and users are assigned to
these roles [19] [14] [16] [17] .
Simplifies administration, especially in large organizations.
Advantage: Scalable and manageable.
Drawback: Requires careful role design to avoid complexity.
Attribute-Based Access Control (ABAC)
Access is determined by evaluating attributes (user role, time, location, etc.) against
policies [14] [15] .
Highly dynamic and granular.
Advantage: Fine-grained, adaptable to complex requirements.
Drawback: Policy management can be complex.
Rule-Based Access Control
Access is controlled by a set of predefined rules or policies (e.g., "no access after 6 PM")
[19] [20] .

Can be combined with other models for greater flexibility.

Principles in Access Control


Least Privilege: Users should have the minimum access necessary to perform their
tasks [17] .
Separation of Duties: No single user should have enough privileges to misuse the system.
Open vs. Closed Systems: In open systems, access is allowed unless forbidden; in closed
systems, access is forbidden unless explicitly allowed [17] .
Centralized vs. Decentralized Administration: Centralized means a single authority
manages access; decentralized allows distributed management [17] .

How Access Control Works in Practice


1. User attempts access: Provides credentials (authentication).
2. System checks permissions: Consults access control lists, roles, or policies (authorization).
3. Decision enforced: Access is granted or denied based on the model in use.

8
4. Audit/logging: Actions may be recorded for security and compliance [14] [15] .
t1
at
Summary Table: Access Control Models
bh

Model Who Grants Access? Flexibility Security Level Typical Use


dd

DAC Object owner High Moderate General DBMS

MAC Central authority Low High Government, Military


si

RBAC Admins via roles Moderate High Enterprises

ABAC Policies/attributes Very High High Complex environments

Rule-Based Admin-defined policies High Moderate-High Policy-driven systems

Key Takeaways
Access control is essential for database security, ensuring only authorized users can access
or modify data.
The choice of access control model depends on organizational needs, security
requirements, and system complexity [17] [21] .
Proper implementation of access control helps prevent data breaches, supports compliance,
and enforces organizational policies.

Exam Tip: Be able to define each access control model, explain their advantages and
disadvantages, and provide examples of where each is most appropriate.

Discretionary Access Control (DAC) Model in DBMS
Definition
Discretionary Access Control (DAC) is an access control model in which the owner of a resource
(such as a database table, file, or record) has the authority to determine who can access that
resource and what operations they are permitted to perform (e.g., read, write, modify) [22] [23]
[24] [25] [26] [27] [28] . The term "discretionary" reflects the fact that access decisions are left to
the discretion of the resource owner.

Key Concepts
Subjects and Objects:
Subjects are users or user groups seeking access to resources.
Objects are resources such as tables, files, or data entries [23] .
Ownership:
The creator of an object is usually its owner and can grant or revoke access rights to
other users [22] [24] [27] [28] .
Access Control Lists (ACLs):

8
Permissions are often managed through ACLs, which specify which users or groups can
t1
access a resource and what actions they can perform (e.g., SELECT, INSERT, UPDATE,
DELETE) [25] [28] .
at
Grant and Revoke:
bh

Owners can grant privileges to other users and can revoke them at any time [29] [27] .
dd

Propagation of Rights:
In some systems, users who are granted access (with a "GRANT OPTION") can further
si

delegate those rights to others, unless restricted [29] [27] .

How DAC Works in DBMS


1. Authentication:
The system verifies the identity of the user requesting access.
2. Authorization by Owner:
The owner of the object decides which users or groups have what level of access.
3. Access Enforcement:
The DBMS checks the ACL or privilege table to determine if the requested operation is
allowed.
4. Privilege Propagation:
If permitted, users with granted rights (and the GRANT OPTION) can further delegate
access to others, unless the system restricts this [29] [27] .

Examples of DAC
File Systems:
In Unix and Windows, file owners can set read, write, and execute permissions for
themselves, groups, and others [22] [26] .
Database Tables:
In DBMSs, table owners can grant SELECT, INSERT, UPDATE, or DELETE privileges to other
users or roles [29] [26] .
Cloud Storage:
Users share files or folders with specific people and assign view or edit rights (e.g., Google
Drive, OneDrive) [26] .

Advantages of DAC
Flexibility:
Owners have fine-grained control over their resources and can easily share or restrict
access as needed [23] [25] [26] .
Ease of Implementation:
DAC is straightforward to implement and widely supported in commercial DBMSs [25] .
Supports Collaboration:
Facilitates sharing of information in business and collaborative environments [23] [28] .

8
Disadvantages of DAC
t1
at
Security Risks:
Since users can delegate access, there is a risk of excessive privilege propagation,
bh

potentially leading to unauthorized access if not properly managed [22] [27] .


dd

Less Centralized Control:


Security policies are less consistent because owners make individual decisions [23] [25] .
si

Vulnerability to Insider Threats:


Users may unintentionally or maliciously grant access to unauthorized individuals [26] .

Comparison: DAC vs. MAC

Feature DAC MAC

Who controls access? Resource owner (user) Central authority (admin/policy)

Flexibility High Low

Security Moderate (less secure than MAC) High (used for sensitive data)

Implementation Easy Complex

Usage Commercial/business DBMSs, collaboration Military, government, high-security

Best Practices and Limitations


Limit Privilege Propagation:
Systems should support mechanisms to restrict how far privileges can be delegated (e.g.,
not granting the GRANT OPTION by default) [29] [27] .
Regular Review of Permissions:
Owners and administrators should periodically audit access rights to prevent privilege
creep [26] .
Use DAC with Caution:
In environments requiring strict security, DAC should be combined with other models (e.g.,
MAC or RBAC) for additional control [28] .

Summary Table: DAC Model

Aspect Description

Control Decentralized; owner manages access

Mechanism ACLs, privilege tables, GRANT/REVOKE commands

Flexibility High

Security Moderate; risk of privilege propagation

Common Use Cases File systems, databases, cloud storage, collaborative apps

Limitation Potential for unauthorized access if privileges are not carefully managed

8
Exam Tip: t1
Be able to define DAC, explain how it works in a DBMS, discuss its advantages and
disadvantages, and compare it with other models like MAC. Use real-world examples (e.g., file
at
permissions, database privileges) to illustrate your answer.
bh


dd

Mandatory Access Control (MAC) in Database Management Systems


si

Definition
Mandatory Access Control (MAC) is a highly secure access control model in which a central
authority (usually system administrators or the operating system) strictly regulates access to
database resources based on predefined security policies. In MAC, users and data objects are
assigned security labels (such as clearance levels and categories), and access decisions are
made according to these labels-not at the discretion of individual users or data owners [30] [31]
[32] .

Key Features of MAC


Centralized Control: Only administrators can define, assign, or modify access policies.
Users cannot change permissions, even for data they create [30] [31] [33] .
Security Labels: Both users (subjects) and data objects (tables, views, etc.) are assigned
security labels, which include a clearance level (e.g., Confidential, Secret) and sometimes
categories (e.g., Department, Project) [30] [34] [32] .
Strict Enforcement: The system enforces access rules automatically and consistently,
ensuring that users can only access data for which they have the appropriate clearance [30]
[31] [32] .

Non-Discretionary: Users have no discretion to share or delegate access rights. All access
is determined by central policy [31] [32] [35] .
High Security: MAC is considered one of the most secure access control models, suitable
for environments where confidentiality and regulatory compliance are critical, such as
military, government, finance, and healthcare [30] [31] [36] .

How MAC Works in DBMS


1. Security Label Assignment:
Administrators assign clearance levels to users and sensitivity labels to database
objects [30] [31] .
2. Access Request:
When a user attempts to access a database object, the system checks the user’s
clearance against the object’s label.
3. Policy Evaluation:
Access is granted only if the user’s clearance meets or exceeds the object’s sensitivity
level and matches any required categories [30] [34] [32] .
4. No User Delegation:
8
t1
Users cannot grant or modify access rights, even for objects they create [31] [32] .
at

Example Scenario
bh

A database contains tables labeled as “Confidential,” “Secret,” and “Top Secret.”


dd

Users are assigned clearance levels by the administrator.


si

A user with “Secret” clearance can access “Secret” and “Confidential” tables, but not “Top
Secret” tables.
Only administrators can change these assignments or permissions.

Advantages of MAC
High Security: Reduces risk of unauthorized access and enforces strict confidentiality [30]
[31] [33] .

Centralized Policy Management: Ensures consistent application of security policies across


the system [30] [31] [36] .
Regulatory Compliance: Helps meet strict regulatory requirements in sensitive
industries [30] [36] .

Limitations of MAC
Low Flexibility: Users cannot share data or adjust permissions, making MAC unsuitable for
collaborative or dynamic environments [30] [33] .
Administrative Overhead: Requires significant effort from administrators to manage and
update policies, especially in large organizations [33] .
Complexity: Implementing and maintaining MAC can be complex and time-consuming [33]
[35] .

Comparison: MAC vs. DAC

Aspect MAC (Mandatory Access Control) DAC (Discretionary Access Control)

Control Centralized (admin/system) Decentralized (resource owner/user)

Flexibility Low High

Security Very High Moderate/Low

Use Cases Military, government, high-security systems Business, collaborative environments

User Delegation Not allowed Allowed

Management Admins set and enforce policies Users manage their own permissions

Use Cases for MAC


Military and government databases

8
t1
Financial institutions with strict data compartmentalization
Healthcare systems with sensitive patient data
at
Any environment requiring “need-to-know” access and zero-trust principles [30] [31] [36]
bh

Summary Table: MAC Model


dd

Feature Description
si

Control Centralized, admin-managed

User Role Cannot alter or delegate permissions

Security Mechanism Security labels (clearance, category)

Enforcement System-enforced, rigid

Best For High-security, regulated, or sensitive data

Drawback Inflexible, high administrative overhead

Exam Tip:
Be able to define MAC, explain its core principles (centralized control, security labels, strict
enforcement), describe its advantages and disadvantages, and contrast it with DAC using real-
world examples.

Role-Based Access Control (RBAC) Model in Database Management Systems
Definition
Role-Based Access Control (RBAC) is a security model for managing user access in a database
system by assigning permissions and privileges to roles rather than to individual users. Users are
then assigned to these roles based on their job responsibilities, ensuring they only have access
to the resources necessary for their role [37] [38] [39] .

Core Principles of RBAC


Role Assignment: A user can only access resources if they are assigned a role.
Role Authorization: The user's active role must be authorized for them.
Permission Authorization: Users can only use permissions that are assigned to their active
role [40] [41] .

How RBAC Works


1. Define Roles: Identify roles based on organizational structure and job functions (e.g.,
admin, manager, analyst).
2. Assign Permissions to Roles: Specify what actions each role can perform (e.g., read, write,
delete on certain tables).
8
t1
3. Assign Users to Roles: Users are added to roles according to their responsibilities.
at
4. Access Enforcement: When a user attempts to access a resource, the system checks the
bh

permissions of their assigned roles and grants or denies access accordingly [37] [38] [39] [41] .
dd

Types of RBAC Models


si

Model Description

Core RBAC Basic model with users, roles, and permissions.

Hierarchical
Supports role hierarchies, allowing roles to inherit permissions from other roles.
RBAC

Constrained Adds constraints such as separation of duties (e.g., a user cannot both approve and
RBAC request funds) [40] .

Benefits of RBAC
Improved Security: Limits access to sensitive data by ensuring users only have the
permissions needed for their role, supporting the principle of least privilege [42] [43] .
Operational Efficiency: Streamlines user management, especially in large organizations, by
allowing administrators to manage permissions at the role level rather than individually [44]
[41] .

Reduced Errors: Minimizes mistakes compared to manual, user-by-user permission


assignments [44] .
Easier Compliance: Simplifies audits and helps organizations meet regulatory requirements
(e.g., HIPAA, GDPR) [42] .
Flexibility: Changing a user's access is as simple as changing their role assignment [44] .

Drawbacks of RBAC
Initial Setup Complexity: Requires careful planning and analysis to define appropriate roles
and permissions.
Role Explosion: Too many roles can make the system complex to manage if not designed
carefully.
Maintenance: Keeping roles and permissions up to date as organizational needs change
requires ongoing attention [44] .

Best Practices for Implementing RBAC


Principle of Least Privilege: Assign only the minimum permissions needed for each role [45] .
Regular Audits: Review roles and permissions periodically to ensure they are still
appropriate [42] .
Role Templates: Use standardized templates for common roles to simplify assignment and

8
reduce errors [42] [45] . t1
Track Changes: Monitor modifications to roles and permissions to detect and investigate
at
suspicious activity [42] [45] .
bh

Example Scenario
Database Roles:
dd

Admin: Full access to all tables and operations.


si

Manager: Can view and update records but not delete them.
Analyst: Can only read data.
When an employee is promoted from Analyst to Manager, their user account is simply
moved to the Manager role, and their permissions are updated automatically.

Summary Table: RBAC Model

Feature Description

Access Control Basis Roles (not individual users)

Assignment Users → Roles → Permissions

Security High (supports least privilege)

Administration Centralized, efficient, and scalable

Suitable For Large organizations, compliance-focused environments

Drawbacks Potential complexity, requires regular maintenance


Exam Tip:
Be able to define RBAC, explain its principles, describe its benefits and challenges, and give
real-world examples of how roles and permissions are structured in a database system.

Intrusion Detection in Database Management Systems (DBMS)


Definition
Intrusion detection in a DBMS refers to the processes and tools used to monitor database
activity, identify unauthorized or suspicious access, and alert administrators to potential security
breaches. It helps safeguard sensitive data from both internal and external threats by providing
real-time monitoring and response capabilities [46] [47] .

Objectives of Intrusion Detection in DBMS


Detect unauthorized or abnormal database access and usage.
Identify insider threats, compromised accounts, or malicious queries.
Support compliance with regulations (e.g., PCI DSS, SOX, GDPR).

8
Provide forensic evidence and auditing for security incidents.
t1
at
Key Components of Intrusion Detection Systems (IDS) for Databases
bh

Sensors: Collect data on database activity, such as queries, logins, and configuration
changes. Sensors can be placed at the network, host, or database level [48] [49] [50] .
dd

Analysis Engine: Processes collected data to identify suspicious activity using techniques
like signature-based detection (matching known attack patterns) and anomaly detection
si

(identifying deviations from normal behavior) [48] [50] .


Alert Generation: When a potential intrusion is detected, the system generates alerts with
details such as the type of activity, timestamp, and user involved. Alerts are sent to security
personnel or integrated with Security Information and Event Management (SIEM) systems
for further action [50] [46] .
Management Interface: Provides a user interface for administrators to review alerts,
configure detection rules, and manage responses [50] [51] .

Types of Intrusion Detection Techniques


Signature-Based Detection: Compares database activity against a database of known
attack signatures or malicious patterns. Effective for detecting known threats but cannot
identify new or unknown attacks [50] .
Anomaly-Based Detection: Establishes a baseline of normal database behavior and flags
significant deviations as potential intrusions. Useful for detecting novel or insider threats [50] .
Hybrid Detection: Combines both signature-based and anomaly-based approaches for
more comprehensive coverage [50] .
Database Activity Monitoring (DAM) and Intrusion Detection
DAM tools are specialized for monitoring and analyzing all database activities, including
privileged user actions and SQL queries, in real time [46] [47] .
They can:
Audit and monitor all database activity without impacting performance.
Store activity logs securely outside the database.
Generate alerts for policy violations or abnormal activities.
Correlate activities across multiple database platforms.
Support session blocking and integration with SIEM tools for incident response [46] [47] .

Common Architectures for Database Intrusion Detection


Interception-Based: Monitors communications between database clients and servers, often
using network proxies or packet sniffers [46] .
Memory-Based: Monitors database activity directly in memory, such as the System Global
Area (SGA) in Oracle databases [46] .
Log-Based: Analyzes database logs for suspicious activities [46] .

8
t1
Features to Look for in Database Intrusion Detection Tools
at
Real-time monitoring and alerting.
bh

Minimal performance impact on the database.


Ability to monitor privileged and regular user activities.
dd

Secure storage of audit logs.


si

Support for compliance auditing and reporting.


Integration with broader security infrastructure (e.g., SIEM) [46] [47] .

Use Cases and Importance


Detecting and responding to unauthorized access or data exfiltration attempts.
Monitoring privileged user activity to prevent abuse.
Ensuring compliance with legal and industry regulations.
Providing forensic evidence for investigations.

Summary Table: Intrusion Detection in DBMS

Component Function

Sensors Collect database activity data

Analysis Engine Detects suspicious activity using signatures/anomalies

Alert Generation Notifies security teams of potential intrusions


Component Function

Management UI Allows configuration, review, and response

Exam Tip:
Be able to define intrusion detection in the context of DBMS, describe the main components and
techniques (signature-based, anomaly-based, hybrid), explain the role of DAM, and discuss its
importance for security and compliance. Use real-world examples and mention integration with
SIEM systems for a comprehensive answer.

SQL Injection in Database Management Systems (DBMS)


Definition
SQL Injection (SQLi) is a critical security vulnerability where an attacker inserts or "injects"
malicious SQL code into input fields or parameters of an application, causing the database to
execute unintended commands. This can lead to unauthorized access, data leakage, data
manipulation, or even full control over the database server [52] [53] [54] .

How SQL Injection Works


8
t1
Applications often construct SQL queries using user-supplied input.
at
If this input is not properly validated or sanitized, attackers can craft inputs that modify the
bh

intended SQL query.


dd

For example, an attacker might enter 1 OR 1=1 in a login form, altering the query logic to
always return true, thus bypassing authentication [53] [54] .
si

SQL Injection can be used to:


Read sensitive data (e.g., user credentials, personal data)
Modify or delete database records
Execute administrative operations (e.g., shutting down the DBMS)
In some cases, execute commands on the underlying operating system [52] [54] .
Example

-- Intended query:
SELECT id FROM users WHERE username='user_input' AND password='user_input';

-- Malicious input:
username: ' OR 1=1 --
password: (left blank)

-- Resulting query:
SELECT id FROM users WHERE username='' OR 1=1 -- ' AND password='';
-- The condition '1=1' always evaluates to true, so the attacker gains access.
Consequences of SQL Injection
Data theft (customer information, intellectual property, etc.)
Data loss or corruption (deletion or modification of records)
Bypassing authentication and authorization controls
Gaining administrative privileges
Potential full system compromise [52] [53] [54]

Prevention Techniques
1. Input Validation and Sanitization
Validate all user inputs against expected formats, lengths, and types.
Sanitize inputs by removing or encoding potentially harmful characters [55] [56] [57] .
2. Parameterized Queries (Prepared Statements)
Use parameterized queries to separate SQL logic from user input, ensuring user data is
treated as data, not executable code [58] [59] .
Supported in all major programming languages and database drivers.

8
3. Stored Procedures t1
Use properly constructed stored procedures that utilize parameters, not dynamic SQL,
to handle user inputs [55] [56] [59] .
at
4. Allow-List (Whitelist) Input Validation
bh

Accept only known, safe values for inputs such as table or column names [56] [59] .
dd

5. Least Privilege Principle


Restrict database user privileges to only those necessary for their function, minimizing
si

the impact of a successful attack [55] [56] .


6. Error Handling
Avoid displaying detailed database error messages to users, as these can reveal
database structure or vulnerabilities [58] .
7. Web Application Firewalls (WAF)
Deploy WAFs to filter and block malicious SQL injection attempts before they reach the
application [55] [58] .
8. Regular Security Audits and Patching
Conduct regular code reviews, vulnerability scans, and keep all database and
application software updated [58] [57] .

Summary Table: Prevention Methods

Prevention Technique Description

Input validation & sanitization Ensure only expected data is accepted


Prevention Technique Description

Parameterized queries Use placeholders to separate code from data

Stored procedures Use parameterized stored procedures

Allow-list input validation Accept only predefined, safe values

Least privilege Limit database user permissions

Error handling Hide detailed error messages from end users

Web Application Firewall (WAF) Block malicious traffic at the network/application layer

Regular audits & patching Identify and fix vulnerabilities proactively

Key Points for Exams


SQL Injection is one of the most dangerous and common web application vulnerabilities.
It exploits improper handling of user inputs in SQL queries.
Prevention relies on secure coding practices, input validation, use of parameterized queries,
least privilege, and regular security maintenance [55] [56] [58] [59] [57] .
Real-world attacks can lead to data breaches, loss of integrity, and severe business impact.

8
Exam Tip:
t1
Define SQL Injection, explain how it works with examples, discuss its consequences, and list
at
multiple prevention techniques with brief explanations for each.
bh


dd

1. https://www.strongdm.com/authentication
si

2. https://www.devx.com/terms/database-authentication/
3. https://www.techtarget.com/searchsecurity/definition/authentication
4. https://compositecode.blog/2023/12/07/security-and-authentication-in-relational-databases/
5. https://www.linkedin.com/pulse/discuss-authentication-authorization-encryption-udjgc
6. https://docs.actian.com/vector/5.1/Security/DBMS_Authentication.htm
7. https://docs.actian.com/ingres/10.2/Security/DBMS_Authentication.htm
8. https://www.ibm.com/docs/en/db2-big-sql/7.1?topic=sql-database-authorization
9. https://topperworld.in/database-security-and-authorization/
10. https://www.ibm.com/think/topics/authentication-vs-authorization
11. https://www.sailpoint.com/identity-library/difference-between-authentication-and-authorization
12. https://oercommons.org/authoring/21950-database-security/4/view
13. https://www.fortinet.com/resources/cyberglossary/authentication-vs-authorization
14. https://satoricyber.com/access-control/access-control-101-a-comprehensive-guide-to-database-acces
s-control/
15. https://www.optiq.ai/blog-post/what-is-access-control-4-types-of-access-control-models
16. https://portswigger.net/web-security/access-control/security-models
17. https://www.ijert.org/research/database-security-access-control-models-a-brief-overview-IJERTV2IS5
0406.pdf
18. https://butterflymx.com/blog/access-control-models/
19. https://delinea.com/blog/access-control-models-methods
20. https://www.goodaccess.com/blog/access-control-models-explained
21. https://www.twingate.com/blog/other/access-control-models
22. https://en.wikipedia.org/wiki/Discretionary_access_control
23. https://nordlayer.com/learn/access-control/discretionary-access-control/
24. https://www.sciencedirect.com/topics/computer-science/discretionary-access-control
25. https://www.tutorialspoint.com/difference-between-mac-and-dac
26. https://builtin.com/articles/discretionary-access-control
27. https://www.stigviewer.com/stig/oracle_database_12c/2021-04-06/finding/V-237715
28. https://www.syteca.com/en/blog/mac-vs-dac
29. https://www.slideshare.net/slideshow/discretionary-access-controldatabasepptx/262768086
30. https://nordlayer.com/learn/access-control/mandatory-access-control/
31. https://www.syteca.com/en/blog/mac-vs-dac
32. https://en.wikipedia.org/wiki/Mandatory_access_control

8
33. https://www.permit.io/blog/mac-vs-dac-comparing-access-control-fundamentals
t1
34. https://www.tutorialspoint.com/what-is-mandatory-access-control-in-information-security
at
35. https://www.tutorialspoint.com/difference-between-mac-and-dac
bh

36. https://www.pingidentity.com/en/resources/blog/post/access-control.html
37. https://www.techtarget.com/searchsecurity/definition/role-based-access-control-RBAC
dd

38. https://frontegg.com/guides/rbac
si

39. https://www.imperva.com/learn/data-security/role-based-access-control-rbac/
40. https://www.strongdm.com/rbac
41. https://auth0.com/docs/manage-users/access-control/rbac
42. https://www.solarwinds.com/resources/it-glossary/role-based-access-control
43. https://www.fortinet.com/resources/cyberglossary/role-based-access-control
44. https://www.caldersecurity.co.uk/role-based-access-control-rbac/
45. https://nordlayer.com/learn/access-control/role-based-access-control-implementation/
46. https://satoricyber.com/database-security/database-activity-monitoring-uses-features-and-how-to-cho
ose/
47. https://datacipher.com/top-database-activity-monitoring-solutions/
48. https://intellipaat.com/blog/intrusion-detection-system/
49. https://www.tookitaki.com/glossary/intrusion-detection-system-ids
50. https://www.stamus-networks.com/intrusion-detection-system-in-cyber-security
51. https://friendlycaptcha.com/wiki/what-is-network-intrusion-detection-system-nids/
52. https://owasp.org/www-community/attacks/SQL_Injection
53. https://www.techtarget.com/searchsoftwarequality/definition/SQL-injection
54. https://www.acunetix.com/websitesecurity/sql-injection/
55. https://www.indusface.com/blog/how-to-stop-sql-injection/
56. https://www.cloudflare.com/learning/security/threats/how-to-prevent-sql-injection/
57. https://www.esecurityplanet.com/threats/how-to-prevent-sql-injection-attacks/
58. https://www.strongdm.com/blog/how-to-prevent-sql-injection-attacks
59. https://www.legitsecurity.com/aspm-knowledge-base/how-to-prevent-sql-injection

8
t1
at
bh
dd
si
Object Oriented Databases (OODB) in Database Management Systems
Definition
An object-oriented database (OODB) is a database that stores data in the form of objects,
similar to how data is represented in object-oriented programming (OOP) languages like
Java, C++, or Python [1] [2] [3] .
OODBs combine object-oriented programming concepts (objects, classes, inheritance,
encapsulation, polymorphism) with database management features such as persistence,
concurrency, and transactions [1] [2] [4] .

Key Concepts
Objects

8
The fundamental unit in OODBs, representing real-world entities.
t1
Each object contains both data (attributes/properties) and behavior (methods/functions) [1]
[2] [3] .
at
Classes
bh

Blueprints or templates for creating objects.


dd

Define the structure (attributes) and behavior (methods) shared by all instances (objects) of
the class [1] [2] .
si

Object Identity
Each object has a unique identifier (OID) that distinguishes it from other objects, regardless
of its attribute values [5] [6] .
Encapsulation
Data and methods are bundled together, and internal details are hidden from the outside
world [1] [6] .
Inheritance
Objects can inherit properties and methods from other objects (parent classes), promoting
code reuse and hierarchy [1] [6] .
Polymorphism
The ability to use a unified interface for different underlying data types [1] .
Persistence
Objects can outlive the application process, being stored and retrieved from the database
as needed [2] [6] .

Features of Object Oriented Databases


Complex Data Modeling: Can handle complex data types and relationships directly as
objects [2] [6] [7] .
Query Language: Specialized query languages to retrieve and manipulate objects [2] [6] .
Transparent Persistence: Objects can be stored/retrieved without special conversion,
maintaining their structure and behavior [2] [6] .
ACID Transactions: Support atomicity, consistency, isolation, and durability for reliable
transactions [2] [6] .
Database Caching: Frequently accessed objects can be cached in memory for faster
access [2] [6] .
Recovery: Mechanisms for data recovery after failures [2] [6] .
Extensibility: Easily supports new data types and operations [2] [6] .
Schema Evolution: Supports changes to the schema (structure of objects/classes) without

8
major disruptions [6] . t1
at
Advantages
bh

Natural Mapping: Closer alignment between database objects and real-world entities or
application objects, reducing the "impedance mismatch" seen in relational databases [1] [2]
dd

[6] [7] .

Efficient Handling of Complex Data: Well-suited for multimedia, CAD/CAM, GIS, and other
si

applications with complex, interconnected data [2] [6] [3] .


Reusability and Maintenance: Inheritance and encapsulation promote code and data reuse,
making maintenance easier [6] .
Performance: Can offer performance improvements for certain complex data operations
compared to relational databases [6] .
Integration with OOP Languages: Seamless integration with object-oriented programming
languages, reducing the need for conversion between application objects and database
records [2] [6] [8] .
Expressive Modeling: Supports rich data models and relationships [6] .

Disadvantages
Complexity: More complex to design, implement, and manage, especially for simple data
needs [2] [6] [7] [8] .
Lack of Standards: No universal data model or standard query language, leading to
compatibility and portability issues [2] [6] [9] .
Limited Adoption: Not as widely used as relational databases; less community and
commercial support [2] [6] .
Security: Often lacks robust, standardized security mechanisms and fine-grained access
controls [2] [6] .
No Support for Views: Typically does not support database views like relational systems [2]
[6] .

Learning Curve: Steeper learning curve for users familiar with traditional relational
databases [7] .

Use Cases
Multimedia Applications: Efficiently stores and retrieves images, audio, and video as
objects [2] .
CAD/CAM Systems: Manages complex engineering data and relationships [2] .
Geographic Information Systems (GIS): Handles spatial and topographical data [2] .
Telecommunications: Manages hierarchical and interconnected network data [2] .
Real-Time Systems: Used in robotics, automation, and embedded systems for fast,

8
complex data access [2] . t1
at
Examples of Object Oriented Databases
bh

ObjectDB
Db4o
dd

ObjectStore
si

Versant
GemStone/S
WakandaDB
MongoDB (offers some object-oriented features) [6]

Comparison: Object Oriented vs. Relational Databases


Feature Object Oriented Database Relational Database

Data Model Objects, classes, inheritance Tables, rows, columns

Schema Flexibility High (supports schema evolution) Rigid (altering schema is complex)

Complex Data Handling Excellent Limited

Integration with OOP Seamless Requires mapping/conversion

Query Language No universal standard SQL (standardized)

Performance (complex ops) Often better May be slower


Feature Object Oriented Database Relational Database

Adoption Limited Widespread

Summary
Object-oriented databases provide a powerful way to model and manage complex data,
especially when working with object-oriented programming languages. They offer advantages in
terms of expressiveness, integration, and handling of complex relationships, but come with
increased complexity, lack of standards, and limited adoption compared to relational
databases [1] [2] [6] .

Object-Relational Databases (ORD) in Database Management Systems


Definition
An object-relational database (ORD), or object-relational database management system
(ORDBMS), is a DBMS that integrates features from both relational databases and object-

8
oriented databases. It supports traditional relational features (tables, rows, columns) while also
t1
allowing objects, classes, inheritance, and complex data types to be directly represented in the
schema and query language [10] [11] [12] .
at
bh

Key Features
dd

1. User-Defined Types (UDTs)


si

Allows creation of custom data types that can encapsulate both data and associated
behaviors (methods), similar to classes in object-oriented programming [13] [14] .
2. Type System and Table Inheritance
Supports inheritance in database schemas, allowing tables or types to inherit properties and
methods from parent tables or types [11] [14] .
3. Complex Data Types
Facilitates storage of arrays, structs, and other complex or nested data types directly in
tables [11] [13] .
4. Object Identity
Uses unique object identifiers (OIDs) to distinguish and reference objects, supporting object
identity beyond simple primary keys [13] .
5. Encapsulation
Operations (methods) can be encapsulated within UDTs, allowing data and behavior to be
bundled together [13] .
6. Enhanced SQL
SQL is extended to support object-oriented features, including querying and manipulating
complex objects and types [13] [14] .
7. ACID Transactions
Maintains full support for atomicity, consistency, isolation, and durability, ensuring data
integrity and reliability [11] .

Architecture
ORD architecture builds upon traditional relational database architecture with the following
enhancements [11] [14] :
Type System: Enables user-defined types and inheritance.
Table Inheritance: Tables can inherit structure and behavior from other tables.
Methods: Functions or procedures can be defined on data types and stored in the
database.
Complex Data Handling: Supports direct storage and querying of complex objects.

8
t1
Advantages
at
Enhanced Modeling Capabilities: Closer alignment with application object models, allowing
bh

more natural representation of real-world entities [11] [12] .


Improved Data Integrity: Encapsulation of data and behavior ensures more robust and
dd

consistent data management [11] .


si

Flexibility and Scalability: Supports complex applications and data types without
sacrificing the performance and robustness of relational databases [11] [12] .
Backward Compatibility: Maintains compatibility with existing relational database features
and SQL [14] .
Extensibility: Easily accommodates new data types and operations as application
requirements evolve [11] [12] .

Disadvantages
Complexity: Schema design and management can become complicated, especially for
simple applications [12] .
Learning Curve: Requires understanding of both relational and object-oriented
concepts [12] .
Lack of Universal Standards: Not all commercial DBMSs implement object-relational
features consistently, leading to portability issues [13] [12] .
Performance Considerations: Some object-oriented features may introduce overhead,
potentially affecting performance in certain scenarios [15] .

Use Cases
Applications with Complex Data: Multimedia, scientific, engineering, and GIS applications
benefit from ORD’s ability to handle complex and hierarchical data structures [11] [16] .
Enterprise Systems: Where integration between object-oriented application code and
relational data storage is needed.
Legacy System Modernization: When transitioning from pure relational to more object-
oriented paradigms without sacrificing existing investments.

Examples of Object-Relational Databases


Oracle Database (with object-relational extensions)
PostgreSQL (supports advanced object-relational features)
IBM Db2

8
t1
Comparison: Object-Relational vs. Relational vs. Object-Oriented Databases
at
Feature Relational DBMS Object-Relational DBMS Object-Oriented DBMS
bh

Data Model Tables Tables + Objects/UDTs Objects/Classes


dd

Inheritance No Yes Yes

Complex Data Types Limited Supported Fully Supported


si

Query Language Standard SQL Extended SQL Proprietary/Object Query

Integration with OOP Requires Mapping Easier Seamless

Adoption Widespread Growing Limited

Summary
Object-relational databases bridge the gap between the relational and object-oriented models,
offering the robustness and familiarity of relational databases along with the modeling power and
flexibility of object-oriented systems. They are particularly useful for applications needing
complex data representation and close integration with object-oriented programming languages,
but can introduce additional complexity and require careful schema design [11] [12] .

Logical Databases in Database Management Systems
Definition
A logical database is an abstract representation of how data is organized and related within a
database system, focusing on the structure, relationships, and business rules rather than the
physical storage details [17] [18] [19] . Logical databases are especially prominent in data modeling
and in specific platforms like SAP ABAP, where they provide a read-only, hierarchical view of
data for application programs [20] [21] .

Purpose and Importance


Blueprint for Data Usage: Logical databases serve as a blueprint for how data entities
interact, guiding the design of databases to align with business requirements [17] [18] .
Abstraction: They provide independence from physical implementation, allowing
organizations to adapt databases to changing needs without disrupting existing systems [17]
[19] .

Clarity and Consistency: By defining entities, attributes, and relationships, logical


databases ensure consistency and coherence in data storage and retrieval [17] [19] .

8
Business Alignment: They help translate business processes and requirements into
t1
implementable database designs [17] [18] .
at

Key Characteristics
bh

Entities: Represent real-world objects or concepts (e.g., Customer, Order, Product) [17] [22] .
dd

Attributes: Define properties or characteristics of entities (e.g., Customer Name, Order


Date) [17] [22] .
si

Relationships: Establish connections between entities (e.g., Customer places Order) [17] [19] .
Business Rules: Govern how data is managed and ensure data integrity (e.g., constraints,
validation rules) [17] [19] .
Normalization: Logical models focus on data normalization to reduce redundancy and
improve integrity [19] .
Technology-Agnostic: Logical databases are independent of specific DBMS technologies,
making them transferable across platforms [19] [23] .

Components of Logical Database Models


Component Description Example

Entity A distinct object or concept in the business domain Customer, Product, Order

Customer Name, Product


Attribute A property or detail about an entity
Price

Relationship Association between entities Customer places Order


Component Description Example

Primary Key Unique identifier for an entity Customer ID, Order ID

Foreign Key Attribute that links entities and enforces referential integrity Customer ID in Order entity

Business Constraint or logic that governs data integrity and "Order Date cannot be in
Rule relationships future"

Logical Database Structure (SAP ABAP Example)


Hierarchical Organization: Data is structured in a tree-like hierarchy, where records are
connected through links [20] [21] .
Read-Only View: Logical databases provide a read-only view of data, often used to retrieve
and pass data to application programs [20] [21] .
Centralized Authorization: Authorization checks and data access logic are centralized,
enhancing security and consistency [20] [21] .
Reuse and Performance: Logical databases allow multiple programs to reuse the same
data retrieval logic, improving performance and maintainability [20] [21] .

8
Logical vs. Physical Data Models
t1
at
Aspect Logical Data Model Physical Data Model
bh

Entities, attributes, relationships,


Focus Tables, columns, data types, indexes
business rules
dd

Audience Data architects, analysts Database developers, administrators


si

Technology Technology-agnostic Technology-specific

May be denormalized for


Normalization Emphasized
performance

Implementation Not implemented directly Directly implemented in DBMS

Table: CUSTOMER, Table: ORDER, [22] [19]


Example "Customer has Orders" [23]
Foreign Key

Advantages
Improved Data Understanding: Helps stakeholders and developers understand data
requirements and business processes [17] [18] [19] .
Flexibility: Supports changes and evolution in business requirements without major
redesign [17] [19] .
Data Integrity: Promotes normalization and clear business rules, reducing redundancy and
errors [19] .
Platform Independence: Can be adapted to different DBMS technologies [19] [23] .
Disadvantages
No Direct Implementation: Logical databases are not directly implemented; they require
translation into physical models for deployment [19] [23] .
Complexity for Simple Applications: May be excessive for small or simple databases [19] .

Summary
Logical databases provide a structured, abstract view of data, focusing on entities, attributes,
relationships, and business rules. They are crucial in the early stages of database design,
ensuring that the database aligns with business needs and remains adaptable, consistent, and
technology-agnostic. In platforms like SAP, logical databases also offer reusable, hierarchical
data access for application programs [17] [20] [21] [19] .

Web Databases in Database Management Systems


Definition

8
A web database is a system for storing, managing, and displaying information that is accessible
t1
via the Internet or web. It serves as the backbone for many web applications, enabling users to
at
interact with dynamic data from anywhere using a web browser [24] [25] [26] .
bh

Key Features
dd

Accessibility: Data can be accessed and managed remotely through web browsers,
supporting multi-location and multi-device use [24] [26] .
si

Dynamic Content: Enables websites to display up-to-date, interactive content by retrieving


and updating data in real time [24] [26] .
Multi-user Support: Allows simultaneous access and manipulation of data by multiple
users.
Integration: Web databases are often integrated with other web services and applications,
supporting e-commerce, blogs, email, and more [27] .
Scalability: Designed to handle varying amounts of data and user requests, making them
suitable for both small and large-scale applications.

Components of a Web Database System


Web database systems are typically structured in a multi-tier architecture, including:
Client (Web Browser): The user interface where users interact with the application. Handles
presentation logic and sometimes basic input validation [28] [29] .
Web/Application Server: Processes client requests, executes business logic, and
communicates with the database server. Technologies include PHP, Java, Python, Ruby,
Node.js, etc. [28] [30] [29] [31]
Database Server: Stores and manages the application’s data. Can be a relational (e.g.,
MySQL, PostgreSQL) or NoSQL (e.g., MongoDB) database [28] [30] [31] .
Typical Workflow:
1. User submits a request via a web browser.
2. The web server receives the request and passes it to the application server.
3. The application server processes the request, interacts with the database server to retrieve
or update data, and sends the response back to the client [28] [30] [29] [31] .

Common Uses
E-commerce: Product catalogs, order management, customer accounts [27] .
Membership/Client Databases: Storing user profiles, login credentials, and activity logs [24]
[26] .

Inventory Management: Tracking stock levels, suppliers, and transactions [24] [26] .

8
t1
Content Management Systems (CMS): Blogs, news sites, and forums where content is
frequently updated [27] .
at
Data Analytics: Collecting and analyzing user or business data for reporting and decision-
bh

making [26] .
dd

Advantages
si

Remote Access: Users can access and manage data from any location with internet
connectivity [24] [26] .
Real-Time Updates: Data changes are immediately reflected across all users and devices.
Collaboration: Supports multiple users working together on shared data.
Centralized Management: Easier to maintain, backup, and secure data from a central
location.

Disadvantages
Security Risks: Exposed to internet threats such as hacking, data breaches, and
unauthorized access. Requires robust security measures like authentication, encryption, and
firewalls [30] [31] .
Performance Issues: Dependent on network speed and server capacity; may experience
latency with high traffic or large datasets.
Complexity: Requires knowledge of web development, database management, and
networking for setup and maintenance.
Best Practices
Use Secure Authentication and Authorization: Ensure only authorized users can access or
modify data [30] [31] .
Encrypt Sensitive Data: Protect data in transit and at rest.
Regular Backups: Prevent data loss due to failures or attacks.
Optimize Queries and Indexes: Improve performance and reduce server load.
Scalability Planning: Design for growth in data volume and user base.
Monitoring and Logging: Track usage, errors, and security incidents for ongoing
improvement [30] [31] .

Examples of Web Databases


MySQL
PostgreSQL
MongoDB

8
Microsoft SQL Server t1
Oracle Database
at
These databases are commonly used as the backend for web applications and can be managed
through web-based interfaces or APIs [31] .
bh
dd

Summary
si

Web databases are essential for modern web applications, providing a centralized, accessible,
and dynamic way to store and manage data online. Their architecture typically involves clients,
application servers, and database servers working together to deliver interactive, data-driven
experiences to users across the globe [24] [26] [28] [30] [29] [31] .

Distributed Databases in Database Management Systems


Definition
A distributed database is a database system in which data is stored across multiple physical
locations-these may be different computers, servers, or data centers connected via a network.
Despite the physical distribution, the system appears as a single logical database to users and
applications [32] [33] [34] [35] [36] .
Key Characteristics
Data Distribution: Data is stored at multiple sites, which can be geographically
dispersed [32] [33] [37] [36] .
Transparency: Users interact with the database as if it is a single, unified system,
regardless of where data is physically stored [37] [36] .
Concurrent Accessibility: Multiple users can access and modify data simultaneously, with
mechanisms in place to ensure consistency and prevent conflicts [37] .
Synchronization: Updates at one site are reflected across all other sites to maintain data
consistency, often using protocols like two-phase commit [37] .
Scalability: The system can be easily expanded by adding more nodes, supporting
increased data and user loads [33] [37] .
Fault Tolerance and Reliability: If one node fails, the system can continue to operate,
ensuring high availability and reliability [33] [38] .
Network Communication: All nodes communicate over a network, requiring efficient
protocols to minimize latency and maximize throughput [33] [39] .

Architecture Types
8
t1
Client-Server Architecture: Clients interact with a central server that manages data
at
storage and access [39] .
Peer-to-Peer Architecture: All nodes act as both clients and servers, managing their own
bh

data and collaborating for queries and transactions [39] .


dd

Federated Architecture: Multiple independent databases are integrated under a unified


interface, but each maintains autonomy [39] .
si

Shared-Nothing Architecture: Each node is independent and self-sufficient, with no shared


memory or disk; data is partitioned among nodes [39] .

Design Considerations
1. Data Partitioning
Horizontal Partitioning: Divides tables into rows, distributing subsets to different nodes.
Vertical Partitioning: Divides tables into columns, distributing attributes to different
nodes [39] .
2. Replication
Full Replication: Every node stores a complete copy of the database, increasing availability
but also storage and update costs.
Partial Replication: Only selected data is replicated based on access patterns.
Multi-master Replication: Multiple nodes can accept updates, improving performance and
fault tolerance [39] .
3. Consistency and Concurrency Control
Mechanisms like locking, timestamp ordering, or optimistic concurrency control ensure data
consistency during simultaneous transactions [39] .
4. Network Communication & Latency
Efficient protocols and low-latency networks are vital to minimize delays and maximize
performance [39] .
5. Security and Privacy
Security challenges increase with distribution; strong authentication, encryption, and access
controls are essential [39] .

Advantages
Modular Development: Easy to expand by adding new nodes without disrupting the
system [33] [38] .
Reliability: System continues to function even if some nodes fail, reducing risk of total
failure [33] [38] .

8
Lower Communication Costs: Local data storage reduces the need for long-distance data
t1
transfer [33] .
at
Better Response Time: Localized data access can lead to faster query responses [33] .
Scalability: Supports growth in data volume and user base by adding more sites [33] [37] [38] .
bh

Disaster Recovery: Data stored in multiple locations increases resilience to natural


dd

disasters [38] .
si

Disadvantages
Complexity: Design, implementation, and maintenance are more complex than centralized
systems [38] .
Cost: Requires expensive software and skilled personnel for synchronization, data
consistency, and management [33] [38] .
Data Integrity: Ensuring data consistency and integrity across multiple sites is challenging,
especially with replication [33] [38] .
Security: More vulnerable due to multiple access points and the need to secure all nodes
and communication channels [38] [39] .
Overhead: Synchronization, coordination, and replication introduce significant processing
and network overhead [33] [38] .
Improper Data Distribution: Poorly planned data placement can reduce system
responsiveness and efficiency [33] .
Applications
Global Enterprises: Organizations with offices in multiple locations needing shared,
consistent data.
Cloud Services: Modern cloud databases are inherently distributed to provide scalability
and high availability.
E-commerce: Handling large-scale, geographically distributed transactions and user data.
Telecommunications: Managing customer and network data across regions.

Summary Table
Feature Distributed Database

Storage Multiple physical locations

Transparency Appears as a single logical database

Scalability High; add nodes as needed

Fault Tolerance High; continues if some nodes fail

Complexity High; requires advanced management

8
Data Consistency
t1
Challenging; needs robust control mechanisms
at
Security Complex; must secure all nodes and network
bh

Conclusion
dd

Distributed databases provide a robust solution for organizations requiring high availability,
si

scalability, and reliability by distributing data across multiple sites. However, they introduce
complexity in design, management, and security, requiring careful planning and advanced
technologies to ensure data consistency, integrity, and performance [33] [38] [39] .

Data Warehousing in Database Management Systems


Definition
A data warehouse is a specialized data management system designed to support business
intelligence (BI) activities, particularly analytics and reporting. It acts as a central repository that
collects, integrates, and stores large volumes of data from multiple, often disparate, sources.
The primary goal is to enable organizations to derive valuable insights and make informed
decisions by analyzing both current and historical data [40] [41] [42] [43] .
Key Characteristics
Subject-Oriented: Organized around key subjects (e.g., sales, finance, customer) rather
than specific applications or processes [41] .
Integrated: Consolidates data from various sources, transforming it into a consistent, unified
format [41] [43] .
Time-Variant: Maintains historical data, enabling analysis of trends and changes over
time [41] [43] .
Non-Volatile: Data is stable; once entered, it is not updated or deleted, ensuring
consistency for analysis [41] [43] .

Components of a Data Warehouse


Data Sources: Operational databases, CRM systems, external APIs, web logs, and more
provide raw data to the warehouse [43] [44] .
ETL (Extract, Transform, Load) Tools: Extract data from sources, transform it into a
common format, and load it into the warehouse. Modern systems may use ELT (Extract,
Load, Transform) for scalability [43] [45] [44] .

8
Central Data Warehouse Database: The core storage area, typically a relational or cloud-
t1
based database optimized for analytical queries [43] [45] .
at
Metadata: Data about the data-describes source, structure, and usage, aiding management
and discovery [43] [45] .
bh

Access Tools: BI tools, dashboards, reporting, OLAP, and data mining tools that allow users
to analyze and visualize data [43] [45] .
dd

Data Governance & Security: Policies and tools for data quality, access control, lineage, and
si

compliance [41] [43] .

Architecture Types
Architecture
Description
Type

Single-Tier Minimizes data storage by deduplication; rarely used due to scalability limits [45] .

Two-Tier Separates data sources from the warehouse; limited scalability [45] .

Most common; includes data warehouse database (bottom), OLAP server (middle), and client
Three-Tier
tools (top) [45] [44] .
Data Warehousing Process
1. Data Extraction: Collect data from various internal and external sources.
2. Data Transformation: Cleanse, format, and integrate data to ensure consistency and
quality.
3. Data Loading: Store the processed data in the data warehouse.
4. Data Access: Users query, analyze, and visualize data using BI and analytics tools [41] [43]
[44] .

Use Cases
Business Intelligence & Reporting: Enables organizations to generate reports and
dashboards for strategic decision-making [40] [42] [43] .
Trend Analysis: Provides historical data for identifying patterns and forecasting.
Regulatory Compliance: Centralizes and preserves data for audits and legal
requirements [43] .
Data Mining: Supports advanced analytics and machine learning by providing clean,
integrated datasets.

8
t1
Advantages
at

Centralized Data Repository: Provides a single source of truth for the organization [40] [43] .
bh

Improved Decision Making: Facilitates data-driven strategies and faster, more accurate
dd

insights [40] [41] .


Historical Analysis: Supports long-term trend analysis and performance tracking [41] [43] .
si

Data Quality and Consistency: ETL processes ensure high data integrity [41] [43] .
Scalability: Modern cloud-based warehouses can scale storage and compute
independently [45] [44] .

Disadvantages
Complexity and Cost: Implementation and maintenance require significant investment and
expertise.
Latency: Data is not always real-time; there can be delays between data generation and
availability for analysis.
ETL Overhead: Extracting, transforming, and loading large volumes of data can be
resource-intensive [45] .
Modern Trends
Cloud Data Warehousing: Shift from on-premises to cloud platforms for better scalability,
flexibility, and lower upfront costs [43] [45] [44] .
Support for Unstructured Data: Modern warehouses handle not only structured but also
semi-structured and unstructured data (e.g., logs, images) [43] .
Integrated Analytics: In-memory processing and real-time analytics capabilities are
increasingly common [43] [45] .

Summary Table
Feature Description

Purpose Centralized analytics and reporting

Data Sources Multiple, heterogeneous (internal & external)

Storage Centralized, optimized for query & analysis

Key Processes ETL/ELT, data integration, metadata management

Users Business analysts, data scientists, management

Architecture
8
Typically three-tier (database, OLAP, client tools)
t1
Modern Trends Cloud-based, support for unstructured data, real-time analytics
at
bh

Conclusion
dd

Data warehousing is foundational for organizations seeking to harness their data for business
intelligence and analytics. It centralizes, integrates, and preserves data from diverse sources,
si

empowering users to make informed, data-driven decisions and uncover valuable insights [40]
[41] [43] .

Data Mining in Database Management Systems


Definition
Data mining is the process of searching, analyzing, and extracting useful information and
patterns from large volumes of raw data. It involves using advanced analytical techniques to
uncover trends, relationships, or anomalies that can support decision-making and solve business
problems [46] [47] [48] .
Key Objectives
Pattern Discovery: Identify hidden patterns, correlations, or trends within large datasets.
Prediction: Forecast future trends or behaviors based on historical data.
Classification and Segmentation: Group data into categories or clusters for deeper
analysis.
Anomaly Detection: Spot unusual data points that may indicate fraud or errors.

Data Mining Process


The data mining process typically involves several key steps [46] [48] :
1. Data Collection: Gather data from various sources, often centralized in a data warehouse.
2. Data Preparation: Cleanse and transform data to ensure quality and consistency.
3. Data Mining: Apply algorithms and analytical methods to extract patterns and relationships.
4. Interpretation and Evaluation: Analyze results and present them in user-friendly formats
(e.g., graphs, tables).
5. Deployment: Use the insights to inform business strategies or automate decision-making.

8
t1
Techniques and Methods
at

Classification: Assign data to predefined categories (e.g., spam detection, credit risk).
bh

Clustering: Group similar data points together without predefined categories (e.g.,
dd

customer segmentation).
Association Rule Mining: Discover relationships between variables (e.g., market basket
si

analysis).
Regression: Predict numeric values based on other variables (e.g., sales forecasting).
Anomaly Detection: Identify outliers or unusual data patterns (e.g., fraud detection).
Sequence Analysis: Find patterns in sequential data (e.g., web clickstreams).

Relationship with Data Warehousing


Data mining often relies on data warehousing, as warehouses consolidate and organize large
volumes of data from multiple sources, making it easier to mine for patterns and insights [49]
[50] [48] .

Data is usually extracted from the warehouse, transformed as needed, and then analyzed
using data mining tools [49] .
Applications
Business Intelligence: Marketing analysis, customer segmentation, sales forecasting.
Fraud Detection: Identifying suspicious transactions in banking and finance.
Healthcare: Disease prediction, patient segmentation, and treatment effectiveness analysis.
Manufacturing: Quality control, predictive maintenance, and supply chain optimization.
Social Media & Web: Sentiment analysis, recommendation systems, and user behavior
analysis [46] [48] .

Advantages
Improved Decision Making: Provides actionable insights for strategic planning.
Competitive Advantage: Helps organizations identify new opportunities and optimize
operations.
Automation: Enables automated detection of patterns and anomalies, saving time and
resources.

8
Challenges and Limitations t1
Data Quality: Results depend on the accuracy and completeness of the input data.
at
Privacy Concerns: Mining personal data can raise ethical and legal issues [46] .
bh

Complexity: Requires specialized knowledge in statistics, programming, and domain


expertise.
dd

Scalability: Handling very large datasets can be computationally intensive.


si

Summary Table
Aspect Description

Goal Discover patterns, trends, and useful information

Data Source Large datasets, often from data warehouses

Techniques Classification, clustering, association, regression

Applications Business, healthcare, finance, web, manufacturing

Benefits Informed decisions, automation, competitive advantage

Challenges Data quality, privacy, complexity, scalability


Conclusion
Data mining is a critical component of modern database management, enabling organizations to
extract valuable knowledge from vast datasets. By leveraging data mining techniques,
businesses can enhance decision-making, optimize operations, and gain a deeper
understanding of their data-driven environments [46] [47] [48] .

1. https://study.com/academy/lesson/what-is-an-object-oriented-database.html
2. https://phoenixnap.com/kb/object-oriented-database
3. https://www.scribd.com/document/578693921/OBJECT-oriented-databases
4. https://en.wikipedia.org/wiki/Object_database
5. https://celerdata.com/glossary/object-oriented-dbms
6. https://hackernoon.com/object-oriented-databases-and-their-advantages
7. https://daily.dev/blog/object-oriented-vs-nosql-databases-key-differences
8. https://www.ionos.com/digitalguide/hosting/technical-matters/object-oriented-databases/
9. https://ecomputernotes.com/database-system/adv-database/object-oriented-database-oodb
10. https://en.wikipedia.org/wiki/Object–relational_database

8
t1
11. https://www.ituonline.com/tech-definitions/what-is-an-object-relational-database-ord/
12. https://byjus.com/gate/object-relational-data-model-in-dbms-notes/
at
13. https://www.tutorialspoint.com/object-relational-features-object-database-extensions-to-sql
bh

14. https://docs.oracle.com/en/database/oracle/oracle-database/19/adobj/key-features-object-relational-m
odel.html
dd

15. https://www.theserverside.com/definition/object-relational-mapping-ORM
16. https://www.sciencedirect.com/topics/computer-science/object-relational-database
si

17. https://risingwave.com/blog/mastering-logical-database-models-a-comprehensive-guide/
18. https://www.tibco.com/glossary/what-is-a-logical-data-model
19. https://www.datamation.com/big-data/logical-vs-physical-data-model/
20. https://www.studocu.com/in/document/dr-apj-abdul-kalam-technical-university/btech/logical-database
-notes-for-dbms/51617884
21. https://help.sap.com/doc/saphelp_nw73ehp1/7.31.19/en-US/9f/db9b5e35c111d1829f0000e829fbfe/cont
ent.htm
22. https://www.gooddata.com/blog/physical-vs-logical-data-model/
23. https://hevodata.com/learn/conceptual-vs-logical-vs-physical-data-model/
24. https://mrwebsites.ca/solutions/web_databases.html
25. https://nexalab.io/blog/what-is-web-database/
26. https://theintactone.com/2022/02/27/web-databases/
27. https://www.w3schools.in/dbms/web-based-database-management-system
28. https://www.ibm.com/docs/sl/SSEPEK_12.0.0/intro/src/tpc/db2z_componentsofwebapplications.html
29. https://www.spaceotechnologies.com/blog/web-application-architecture/
30. https://enterprisemonkey.com.au/web-application-architecture/
31. https://www.clarity-ventures.com/how-to-guides/web-application-architecture
32. https://www.mongodb.com/en-us/resources/basics/databases/distributed-database
33. https://phoenixnap.com/kb/distributed-database
34. https://www.cockroachlabs.com/blog/what-is-a-distributed-database/
35. https://www.scylladb.com/glossary/distributed-database/
36. https://www.techtarget.com/searchoracle/definition/distributed-database
37. https://www.tutorchase.com/answers/a-level/computer-science/what-are-the-essential-characteristics-
of-a-distributed-database
38. https://www.tutorialspoint.com/DDBMS-Advantages-and-Disadvantages
39. https://www.tutorialspoint.com/distributed-database-architecture
40. https://www.oracle.com/in/database/what-is-a-data-warehouse/
41. https://www.trantorinc.com/blog/understanding-data-warehousing
42. https://aws.amazon.com/what-is/data-warehouse/
43. https://www.sap.com/products/data-cloud/datasphere/what-is-a-data-warehouse.html
44. https://www.datacamp.com/blog/data-warehouse-architecture
45. https://www.snowflake.com/guides/data-warehouse-architecture/
46. https://www.investopedia.com/terms/d/datamining.asp
8
t1
47. https://www.spiceworks.com/tech/big-data/articles/what-is-data-mining/
at
48. https://www.techtarget.com/searchbusinessanalytics/definition/data-mining
bh

49. https://ebooks.inflibnet.ac.in/csp4/chapter/data-mining-introduction/
50. https://www.sap.com/hk/products/technology-platform/hana/what-is-data-mining.html
dd
si

You might also like