Dbms Perplexity
Dbms Perplexity
Definition:
Data abstraction in DBMS is the process of hiding the complex, low-level details of how data is
stored and maintained, and presenting users with a simplified, relevant view of the data. This
makes it easier and more secure for users to access and interact with the database, as they only
see the necessary information and not the underlying implementation details [1] [2] [3] .
8
Makes database design and maintenance more efficient [1] [4] [5] .
t1
at
Levels of Data Abstraction
bh
Logical Middle level. Describes what data is Database Customer table structure,
Level stored and the relationships among data. administrators relationships with orders
8
Advantages of Data Abstraction
t1
at
Data Independence: Changes at one level (e.g., physical storage) do not affect other levels
(e.g., user views) [4] [5] .
bh
Simplified Interaction: Users interact with a simplified model rather than complex
underlying structures [1] [8] .
si
Flexibility: The system can evolve (e.g., change database engines) without impacting front-
end applications [9] .
Real-World Example
Suppose an e-commerce application uses MongoDB during development and switches to
MySQL in production. The data abstraction layer ensures that the front-end code does not need
to change, as it interacts with the data through a consistent interface, regardless of the
underlying database engine. Users and applications remain unaware of these backend
changes [9] .
Summary Table
Level What is Hidden? What is Exposed?
8
Data Independence in Database Management System (DBMS) t1
Definition:
at
Data independence is the ability to modify the schema at one level of a database system without
affecting the schema at the next higher level. This means changes in how data is stored or
bh
structured do not force changes in how data is accessed or used by applications and users [10]
[11] [12] [13] .
dd
si
What Can Be
Type What Remains Unaffected? Examples
Changed?
Physical Data Physical storage, file Logical schema, user Changing storage devices,
Independence organization views/applications file structures
What Can Be
Type What Remains Unaffected? Examples
Changed?
Scope: Changes at the logical (conceptual) level do not impact the external (view) level.
Examples:
dd
Tip:
Draw a diagram of the three-level architecture (Physical, Logical, View) and indicate where each
8
type of data independence applies for clear understanding in exams.
t1
at
⁂
bh
Definition:
Data Definition Language (DDL) is a subset of SQL used to define, modify, and remove the
si
structure of database objects such as tables, schemas, indexes, and views. DDL commands do
not manipulate the data itself but rather the schema or structure of the database [19] [20] [21] .
DDL is considered a subset of SQL and is standardized across most relational database
systems [19] [21] [24] .
Common DDL Commands
Command Purpose Example Syntax
Creates a new database object (table, CREATE TABLE Student (ID INT PRIMARY
CREATE
view, index, etc.) KEY, Name VARCHAR(50));
RENAME Changes the name of a database object RENAME TABLE Student TO Alumni;
DDL Constraints
DDL commands are also used to define constraints on tables to enforce data integrity:
PRIMARY KEY: Uniquely identifies each record in a table.
8
t1
FOREIGN KEY: Ensures referential integrity between tables.
at
UNIQUE: Ensures all values in a column are unique.
bh
CHECK: Enforces domain integrity by limiting the values that can be placed in a column.
NOT NULL: Ensures that a column cannot have a NULL value [19] [20] .
dd
si
Examples
1. Creating a Table:
2. Altering a Table:
3. Dropping a Table:
4. Truncating a Table:
8
Designing and setting up the initial database schema for an application.
t1
Modifying the structure of database objects as requirements change.
at
Removing obsolete tables or views from the database.
bh
Defining constraints to ensure data integrity and consistency [19] [20] [23] .
dd
Summary
si
Retrieves data from one or more SELECT Name, Age FROM Students WHERE Age >
SELECT
tables 18;
Characteristics of DML
Not Auto-Committed:
si
DML operations are generally not auto-committed. Changes can be rolled back if needed
before committing, depending on the DBMS [29] .
Affects Table Data, Not Structure:
DML commands manipulate the data within tables, not the structure of the tables
themselves (which is handled by DDL).
Transactional:
DML commands can be grouped into transactions, allowing multiple operations to be
executed as a single unit.
Supports Both Procedural and Non-Procedural Use:
DML can be procedural (specifying how data is accessed) or non-procedural/declarative
(specifying what data is needed) [26] .
Advanced DML Operations
JOINS:
Combine data from multiple tables based on related columns.
Subqueries:
Nest one query within another for more complex data retrieval.
Aggregate Functions:
Use functions like SUM, AVG, COUNT, etc., with SELECT for data analysis [28] .
8
Importance of DML
t1
at
Central to Application Logic:
Enables applications to interact with and modify database content [27] [28] .
bh
Summary
DML is a crucial part of SQL, enabling the manipulation of data in relational databases.
The main DML commands are INSERT, SELECT, UPDATE, and DELETE.
DML operations are transactional, affect only the data (not the structure), and are
foundational for all database applications and data analysis tasks [25] [26] [27] [28] .
⁂
1. Entities
Definition: An entity is any real-world object, concept, or thing that can have data stored
about it, such as a person, place, event, or object [33] [34] [31] [32] .
Representation: Entities are depicted as rectangles in ER diagrams.
Example: Student, Employee, Product.
Entity Set
A collection of similar entities (e.g., all students in a college).
Types:
Strong Entity Set: Has a primary key (unique identifier). Represented by a single
rectangle.
Weak Entity Set: Lacks a primary key and depends on a strong entity. Represented by
a double rectangle.
2. Attributes
8
t1
Definition: Attributes are the properties or characteristics of an entity [33] [34] [31] [32] .
at
Representation: Attributes are shown as ovals (ellipses) connected to their entity.
bh
Types:
Simple Attribute: Cannot be divided further (e.g., Age).
dd
Composite Attribute: Can be divided into smaller subparts (e.g., Name → First Name,
Last Name).
si
3. Relationships
Definition: Relationships illustrate how two or more entities are associated with each
other [33] [34] [31] [32] .
Representation: Depicted as diamonds connecting related entities.
Example: "Works_For" between Employee and Department.
Degree of Relationship
Unary: Relationship within the same entity set.
Binary: Relationship between two entities (most common).
Ternary: Relationship among three entities.
n-ary: Relationship among n entities.
Cardinality
Specifies the number of instances of one entity that can be associated with instances of
another entity (e.g., One-to-One, One-to-Many, Many-to-Many) [32] .
8
Primary Key Underlined
t1
Unique identifier for an entity
at
Multi-valued Double oval Attribute with multiple values
bh
Advantages of ER Model
si
Visual Clarity: Makes complex data structures easy to understand and communicate [31]
[35] .
Precise Planning: Helps avoid redundancy and ensures efficient resource use [31] [35] .
Adaptability: Easy to modify as requirements change [31] [35] .
Facilitates Data Integrity: Clearly defines relationships and constraints, supporting
referential integrity [35] .
Efficient Querying: Well-structured models enable optimized queries and better
performance [35] .
Disadvantages of ER Model
Limited to Conceptual Design: Does not specify implementation details or physical storage.
Complexity with Large Systems: Diagrams can become complicated for very large
databases [35] .
Types of ER Models
Conceptual ER Model: High-level overview, focuses on entities and relationships, omits
details [36] .
Logical ER Model: More detailed, includes attributes and relationships, used for database
design [36] .
Physical ER Model: Represents how the model will be implemented in the database
system [36] .
Example
Consider a university database:
Entities: Student, Course, Instructor.
Attributes: Student (Roll_No, Name, DOB), Course (Course_ID, Title), Instructor (Emp_ID,
Name).
Relationships: Student "enrolls in" Course, Instructor "teaches" Course.
8
Summary Table t1
Component Example Symbol in ERD
at
Entity Student Rectangle
bh
Conclusion
The ER model is a foundational tool in database design, providing a clear, visual representation
of data and its relationships. It helps ensure robust, efficient, and scalable database systems by
simplifying the design process and supporting communication among stakeholders [31] [35] [32] .
⁂
Network Model in Database Management System (DBMS)
The network model is a database model designed to represent complex relationships among
data more flexibly than the hierarchical model. It organizes data using records connected by
links (pointers), forming a graph structure that supports multiple parent and child relationships,
making it ideal for modeling many-to-many relationships [37] [38] [39] .
Owner-Member Relationships: Each record can have multiple owners and members,
supporting flexible data connections [37] [38] .
Pointers: Relationships are implemented using pointers, which directly link records, enabling
fast data access and navigation [38] [41] .
Data Integrity: Members cannot exist without an owner, ensuring all data is properly linked
and maintaining integrity [42] [41] .
8
t1
at
Pointers: Explicit links that connect records, forming a network (graph) instead of a strict
hierarchy [38] [41] .
Example: In a college database, a student record can be linked to both the "CSE
Department" and "Library" records, showing that students can belong to multiple
departments or sections [37] .
8
IDS (Integrated Data Store): Early network DBMS for mainframes [38] .
t1
IDMS (Integrated Database Management System): Used for creating complex data
at
structures on mainframes [38] .
bh
Summary
The network model organizes data as a graph, supporting complex, many-to-many
relationships with multiple parent and child records.
It offers efficient data access and integrity but at the cost of increased system complexity
and difficulty with structural changes.
While largely superseded by the relational model, the network model remains important for
understanding the evolution of database systems and for certain high-performance
applications [43] [45] [40] .
⁂
8
Each row in a table is called a tuple and corresponds to a single record or instance of the
t1
entity [46] [48] .
Attribute (Column/Field):
at
Columns in a table are attributes, representing properties or characteristics of the entity [46]
bh
[48] .
Attribute Domain:
dd
Degree:
The number of attributes (columns) in a relation [46] .
Cardinality:
The number of tuples (rows) in a relation [46] .
Relational Schema:
The logical blueprint of a relation, specifying its name, attributes, and their data types [46]
[48] .
Example:
STUDENT(ROLL_NUMBER INTEGER, NAME VARCHAR(20), CGPA FLOAT)
Relational Instance:
The actual content (set of tuples) in a relation at a specific point in time [46] .
Keys in the Relational Model
Keys are crucial for uniquely identifying tuples and establishing relationships between tables [46]
[48] :
8
Simplicity: t1
Tabular format is easy to understand, design, and use [46] [50] .
at
Integrity and Consistency:
Constraints and keys ensure data accuracy and consistency [46] [49] .
bh
Flexibility:
dd
Supports one-to-one, one-to-many, and many-to-many relationships using keys [46] [49] .
Standard Query Language:
si
Most relational databases use SQL (Structured Query Language) for data manipulation and
querying [50] [47] .
Example
STUDENT Table:
Relation: STUDENT
Attributes: ROLL_NUMBER, NAME, CGPA
8
Tuples: (101, Alice, 8.9), (102, Bob, 7.5)
t1
at
bh
Summary
The relational model organizes data into inter-related tables, each representing an entity
with attributes and tuples [46] [48] .
It provides simplicity, flexibility, data integrity, and data independence.
Keys and constraints play a central role in ensuring data accuracy and establishing
relationships.
SQL is the standard language for interacting with relational databases, making them widely
adopted in industry and academia [50] [47] .
The relational model remains the most popular and influential database model, forming the
backbone of modern database systems [51] [52] [50] .
⁂
8
Example: A Student object with attributes like name and roll_no, and methods like register()
t1
or updateProfile() [54] [55] [56] .
2. Class
at
bh
Example: The Student class defines the structure and behavior for all student objects [54]
[56] .
si
3. Object Attribute
Attributes are properties or characteristics of an object.
Example: For a Book object, attributes could be title, author, and ISBN [54] [55] .
4. Object Method
Methods are functions or procedures associated with an object, defining its behavior.
Example: A withdraw() method in a BankAccount object [54] [56] .
5. Inheritance
Inheritance allows a class to inherit attributes and methods from another class (the parent or
superclass).
Promotes code reusability and hierarchical relationships.
Example: A Bus class and a Ship class can both inherit from a Transport class [54] [55] [56] .
6. Encapsulation
Bundles data (attributes) and methods (behavior) together, restricting direct access to some
of the object's components.
7. Relationships
Objects can be related to each other by references or pointers, allowing for complex
interconnections, such as one-to-many or many-to-many relationships [55] [56] .
8
Seamless Integration with OOP Languages: Works natively with object-oriented
t1
programming languages, making development and maintenance easier [58] [57] .
at
User-Constructed Types: Users can define new types as needed, supporting
bh
extensibility [56] .
dd
Example
si
Suppose you have a Transport class with attributes and methods common to all vehicles. Bus,
Ship, and Plane are subclasses inheriting from Transport, each with specific attributes
(RoadTransport, WaterTransport, AirTransport respectively) [54] [55] .
Another example:
Student and Department are objects.
Each object has its own attributes and methods.
They are linked by a common attribute, such as Dept_no, establishing a relationship between
the objects [54] [55] .
Data Types Simple (numbers, strings) Complex (images, audio, video, user-defined types)
Suitable For Structured, tabular data Complex, multimedia, and hierarchical data
Disadvantages
Complexity: More complex to design and manage compared to relational databases.
Lack of Standardization: No universal standard like SQL for OODBMSs, leading to
compatibility issues [60] .
8
Limited Adoption: Less widely used and supported than relational databases, especially for
t1
traditional business applications [60] [57] .
at
Performance: May not perform as well as relational databases for simple, tabular data and
bh
Summary
si
The object oriented data model stores both data and relationships in objects, encapsulating
attributes and methods together.
Supports inheritance, encapsulation, and complex data types.
Ideal for applications requiring representation of complex, multimedia, or hierarchical data.
Forms the basis of OODBMS, which integrate database management with object-oriented
programming principles [54] [55] [56] [57] .
This model is especially useful for engineering, multimedia, CAD, and applications where real-
world modeling and complex data are essential.
⁂
1. Domain Constraint
Definition: Restricts the permissible values for a given attribute (column) in a table.
Purpose: Ensures that each attribute contains only values from a defined domain (data
type, format, range, or enumeration).
Examples:
Age column can only have positive integers.
Gender column can only have values like 'Male', 'Female', 'Non-Binary'.
Price column must be a decimal greater than or equal to zero.
SQL Example:
CREATE TABLE Person (
ID INT PRIMARY KEY,
Name VARCHAR(50),
Age INT CHECK (Age > 0)
);
8
t1
Here, the CHECK constraint enforces that Age must be positive [62] [63] .
at
Definition: Ensures that each row in a table can be uniquely identified and that the primary
dd
Examples:
Every student record must have a unique, non-null StudentID.
In a Books table, Book_ID (primary key) must be unique and not null.
SQL Example:
CREATE TABLE Student (
StudentID INT PRIMARY KEY,
Name VARCHAR(100)
);
Here, StudentID is the primary key and cannot be NULL or duplicated [62] [61] [63] .
3. Referential Integrity Constraint
Definition: Ensures that a foreign key value in one table either matches a primary key value
in another table or is NULL.
Purpose: Maintains valid links between related tables and prevents orphaned records.
Examples:
An order in the Orders table must reference a valid CustomerID from the Customers
table.
A Dept_ID in the Employees table must exist in the Department table.
SQL Example:
CREATE TABLE Department (
Dept_ID INT PRIMARY KEY,
Dept_Name VARCHAR(50)
);
CREATE TABLE Employee (
ID INT PRIMARY KEY,
Name VARCHAR(50),
Dept_ID INT,
FOREIGN KEY (Dept_ID) REFERENCES Department(Dept_ID)
8
);
t1
Here, every Dept_ID in Employee must exist in Department, ensuring referential integrity [61]
at
[63] .
bh
4. Key Constraint
dd
Definition: Ensures that a set of attributes (keys) uniquely identifies a row in a table.
si
Here, both UserID and Email must be unique, and UserID cannot be NULL [61] .
Additional Constraint Types
NOT NULL Constraint: Ensures that a column cannot have NULL values.
CHECK Constraint: Limits the values that can be placed in a column (e.g., Age > 0).
ENUM Constraint: Restricts a column to a set of predefined values [63] .
Summary Table
Constraint Type Enforces On Ensures Example
8
Age > 0, Gender in {M, F, Non-
Domain Column Valid data type/range/format
t1 Binary}
at
Primary Key
Entity Integrity Unique, non-null identifier StudentID in Student table
column
bh
In summary:
Integrity constraints are essential rules in DBMS that maintain the quality, consistency, and
reliability of data by enforcing restrictions on data values, uniqueness, and relationships between
tables [61] [62] [63] .
⁂
1. Create (INSERT)
Purpose: Add new records into a table.
Example:
INSERT INTO Students (StudentID, Name, Age) VALUES (101, 'Alice', 20);
Usage: Used when new data needs to be stored, such as registering a new user or adding a
new product [65] [67] [66] .
8
t1
at
2. Read (SELECT)
bh
Usage: Used for querying information, generating reports, or displaying data to users [64]
[65] [67] [66] .
3. Update (UPDATE)
Purpose: Change existing data in one or more records.
Example:
UPDATE Students SET Age = 21 WHERE StudentID = 101;
Usage: Used when correcting errors, updating user details, or changing product prices [64]
[65] [67] [66] .
4. Delete (DELETE)
Purpose: Remove one or more records from a table.
Example:
DELETE FROM Students WHERE StudentID = 101;
Usage: Used for removing outdated, incorrect, or unnecessary data [64] [65] [67] [66] .
INSERT INSERT INTO Table VALUES (...); Add new customer record
SELECT SELECT * FROM Table WHERE condition; Find all orders for a customer
UPDATE UPDATE Table SET col = val WHERE condition; Change product price
In summary:
Data manipulation operations (CRUD) are essential for interacting with and managing the data in
a DBMS. They allow users to add, retrieve, modify, and delete data, forming the backbone of all
database-driven applications [64] [65] [66] .
⁂
1. https://www.scaler.com/topics/data-abstractions-in-dbms/
2. https://www.tutorialspoint.com/what-is-data-abstraction-in-dbms
3. https://www.studocu.com/in/document/i-k-gujral-punjab-technical-university/database-management-s
8
ystem/dbms-notes-me/108887431 t1
4. https://www.techtarget.com/whatis/definition/data-abstraction
at
5. https://mrcet.com/downloads/digital_notes/CSE/II Year/DBMS.pdf
6. https://mrcet.com/downloads/digital_notes/IT/Database Management Systems.pdf
bh
7. https://mu.ac.in/wp-content/uploads/2021/08/USIT304-Database-Management-Systems.pdf
dd
8. https://herovired.com/learning-hub/blogs/data-abstraction/
9. https://www.purestorage.com/knowledge/what-is-data-abstraction.html
si
10. https://unstop.com/blog/data-independence-in-dbms
11. https://en.wikipedia.org/wiki/Data_independence
12. https://www.scaler.com/topics/data-independence-in-dbms/
13. https://www.slideshare.net/slideshow/database-management-systems-data-independence/266139115
14. https://www.tutorialspoint.com/dbms/dbms_data_independence.htm
15. https://www.guru99.com/dbms-data-independence.html
16. https://testbook.com/key-differences/difference-between-physical-and-logical-data-independence
17. https://www.tutorialspoint.com/physical-and-logical-data-independence
18. https://herovired.com/learning-hub/topics/data-independence-in-dbms/
19. https://www.techtarget.com/whatis/definition/Data-Definition-Language-DDL
20. https://www.secoda.co/glossary/what-is-a-data-definition-language-ddl
21. https://en.wikipedia.org/wiki/Data_definition_language
22. https://byjus.com/gate/data-definition-language-notes/
23. https://satoricyber.com/glossary/ddl-data-definition-language/
24. https://www.scaler.com/topics/ddl-in-dbms/
25. https://en.wikipedia.org/wiki/Data_manipulation_language
26. https://satoricyber.com/glossary/dml-data-manipulation-language/
27. https://www.datasunrise.com/knowledge-center/dml-data-manipulation-language/
28. https://staragile.com/blog/data-manipulation-language
29. https://byjus.com/gate/data-manipulation-language-dql-notes/
30. https://www.du.ac.in/du/uploads/departments/Operational Research/24042020_E-R Model.pdf
31. https://www.collaboard.app/en/blog/entity-relationship-model/
32. https://www.shiksha.com/online-courses/articles/er-model-in-dbms/
33. https://opendsa.cs.vt.edu/ODSA/Books/Database/html/ERDComponents.html
34. https://mebrahimii.github.io/comp440-fall2020/lecture/week_10/Database Design E-R Model.pdf
35. https://www.essaycorp.com/blog/advantages-and-disadvantages-of-an-er-model
36. https://www.techtarget.com/searchdatamanagement/definition/entity-relationship-diagram-ERD
37. https://www.scaler.com/topics/network-model-in-dbms/
38. https://www.upskillcampus.com/blog/network-database-management-system/
39. https://www.slideshare.net/slideshow/data-base-and-all-its-types/235565003
40. https://www.thoughtspot.com/data-trends/data-modeling/types-of-data-models
41. http://dbmsenotes.blogspot.com/2014/03/comparison-of-data-models-data-models.html
8
42. https://www.scribd.com/document/358623974/test-docx
t1
43. https://raima.com/network-model-vs-relational-model/
at
44. https://www.datamation.com/big-data/what-is-a-network-data-model-examples-pros-and-cons/
bh
45. https://mariadb.com/kb/en/understanding-the-relational-database-model/
dd
46. https://www.scaler.com/topics/dbms/relational-model-in-dbms/
47. https://en.wikipedia.org/wiki/Relational_database
si
48. https://byjus.com/gate/relational-model-in-dbms-notes/
49. https://www.tutorialspoint.com/differentiate-between-the-three-models-on-the-basis-of-features-and-
operations-dbms
50. https://azure.microsoft.com/en-au/resources/cloud-computing-dictionary/what-is-a-relational-databas
e
51. https://cloud.google.com/learn/what-is-a-relational-database
52. https://www.oracle.com/in/database/what-is-a-relational-database/
53. https://www.gartner.com/en/information-technology/glossary/object-data-model
54. https://www.scaler.com/topics/object-oriented-model-in-dbms/
55. https://byjus.com/gate/object-oriented-data-model-in-dbms-notes/
56. https://phoenixnap.com/kb/object-oriented-database
57. https://en.wikipedia.org/wiki/Object_database
58. https://www.mongodb.com/en-us/resources/basics/databases/what-is-an-object-oriented-database
59. https://www.youtube.com/watch?v=cw5R-CiEn6g
60. https://librarytechnology.org/document/7203
61. https://www.scaler.com/topics/dbms/integrity-constraints-in-dbms/
62. https://www.boardinfinity.com/blog/integrity-constraints-in-dbms/
63. https://www.almabetter.com/bytes/articles/integrity-constraints-in-dbms
64. https://www.indeed.com/career-advice/career-development/data-manipulation
65. https://www.tutorialspoint.com/sql_certificate/manipulating_data.htm
66. https://en.wikipedia.org/wiki/Create,_read,_update_and_delete
67. https://stackify.com/what-are-crud-operations/
68. https://ftpdocs.broadcom.com/cadocs/0/CA IDMS 18 5 User Bookshelf-
ENU/Bookshelf_Files/HTML/IDMS_SQL_Prog_ENU/1283243.html
69. https://www.solvexia.com/blog/5-top-tips-for-data-manipulation
8
t1
at
bh
dd
si
Relational Algebra in Database Management Systems (DBMS)
Relational algebra is a procedural query language that forms the theoretical foundation for
querying and manipulating data in relational databases. It provides a set of well-defined
operations that take one or more relations (tables) as input and produce a new relation as
output, allowing users to formulate and optimize queries efficiently [1] [2] [3] [4] .
Key Points:
Procedural: Specifies a sequence of operations to obtain the desired result.
Operates on relations (tables), producing new relations.
Forms the basis for SQL and query optimization in DBMS [2] [3] .
8
Foundation for Query Languages: Underpins SQL and other database query languages [2]
[4] .
t1
at
Query Optimization: Enables DBMS to optimize queries by transforming them into efficient
execution plans [3] .
bh
Fundamental Concepts
Term Definition
Relation Instance Actual content (set of tuples) of a relation at a given time [5] .
Core Operations of Relational Algebra
1. Select (σ)
Purpose: Selects rows (tuples) from a relation that satisfy a specified condition.
Notation: $ \sigma_{condition}(Relation) $
Example: $ \sigma_{subject = "database"}(Books) $
Selects all books with the subject "database".
Properties: Unary operation (applies to a single relation) [1] [3] [6] [4] .
2. Project (π)
Purpose: Selects specific columns (attributes) from a relation.
Notation: $ \pi_{attribute1, attribute2, ...}(Relation) $
Example: $ \pi_{student_id, name}(Students) $
Extracts only student_id and name columns.
Properties: Reduces the number of columns, removes duplicates [3] [6] [4] .
8
3. Union (⋃) t1
Purpose: Combines tuples from two relations, removing duplicates.
at
Notation: $ Relation1 \cup Relation2 $
bh
Requirement: Both relations must have the same schema (same attributes and domains).
Example: $ Students_2023 \cup Students_2024 $ [3] [6] [4] .
dd
Purpose: Returns tuples present in the first relation but not in the second.
Notation: $ Relation1 - Relation2 $
Requirement: Both relations must be union-compatible.
Example: $ Enrolled_Students - Graduated_Students $ [3] [6] [4] .
Additional Operations
Intersection (∩)
Purpose: Returns tuples present in both relations.
Notation: $ Relation1 \cap Relation2 $ [3] [6] .
Division (÷)
Purpose: Finds tuples in one relation associated with all tuples in another.
Use Case: "Find students enrolled in all courses offered."
Notation: $ Relation1 \div Relation2 $ [3] .
8
Join Operations
t1
at
Natural Join (⨝): Combines tuples from two relations based on common attributes.
bh
Examples
SQL vs. Relational Algebra Example:
SQL:
Relational Algebra:
[3]
Best Practices
Start with simple operations and combine them for complex queries.
Use renaming to avoid confusion, especially after joins or products.
Understand schema compatibility requirements for set operations.
Practice with real database schemas to master query formulation [3] .
Conclusion
Relational algebra is a fundamental tool for formulating, optimizing, and understanding queries
in relational databases. Mastery of its operations enables efficient data manipulation and forms
the basis for advanced database concepts and query languages like SQL [1] [2] [3] [4] .
⁂
8
on what data to retrieve, using logical predicates to describe the desired result set [8] [9] [10] .
t1
at
Key Characteristics
bh
Declarative Language: Specifies what data to retrieve, not how to retrieve it [8] [9] [10] .
dd
Based on Predicate Logic: Utilizes first-order logic to form queries [9] [11] .
Tuple Variables: Uses variables (e.g., $ t $) that represent tuples (rows) in relations (tables)
si
Foundation for SQL: TRC inspired the development of SQL [13] [10] .
This returns all tuples from the Student table where the age is less than 18 [8] .
Components of TRC
Tuple Variable: Represents a row in a relation (e.g., $ t $ in Student).
Predicate: Logical expression involving attributes of the tuple variable (e.g., $ t.age < 18 $).
Quantifiers:
Existential ($ \exists $): There exists a tuple satisfying a condition.
Universal ($ \forall $): All tuples satisfy a condition [8] [14] .
Logical Connectives: AND ($ \land \lor \lnot $) [8] [9] .
Types of Variables
Free Variable: Not bound by a quantifier; appears in the result [8] .
Bound Variable: Bound by a quantifier (exists or for all); used only within the predicate [8] .
Examples
1. Select All Customers with Zip Code 12345
8
t1
Returns all tuples from Customer where Zipcode is 12345 [9] .
at
bh
Both TRC and relational algebra are relationally complete, meaning any query expressible in one
can be expressed in the other [13] [14] [11] .
Key Points to Remember
TRC queries describe the set of tuples to be returned using logical formulas [8] [9] .
TRC is more about specifying conditions than specifying steps.
Uses tuple variables, logical connectives, and quantifiers.
Basis for understanding declarative query languages like SQL [13] [10] .
8
Output Set of tuples satisfying the predicate
t1
Example { t \mid t \in Student \land t.age < 18 }
at
In summary:
bh
Tuple Relational Calculus is a powerful, declarative query language in DBMS that allows users to
specify what data they want by describing conditions on tuples, rather than detailing the steps
dd
to retrieve it. It is foundational for understanding modern database query languages and
complements relational algebra in expressive power [8] [13] [9] [14] .
si
Key Characteristics
Declarative: Specifies what data to fetch, not the procedure to do so [15] [17] .
Domain Variables: Uses variables that represent individual attribute values (domains), not
entire rows [17] [18] [19] .
Predicate Logic: Employs first-order logic, including logical connectives (AND, OR, NOT)
and quantifiers (∃, ∀) [20] [21] .
Expressiveness: Equivalent in power to relational algebra and tuple relational calculus [20]
[17] .
Syntax of DRC
The general form of a DRC query is:
$ x_1, x_2, ..., x_n $: Domain variables, each ranging over the set of possible values for an
attribute [15] [16] [17] .
$ P(x_1, x_2, ..., x_n) $: A predicate (logical condition) that must be true for the variables to
be included in the result [15] [17] [21] .
8
NOT ($ \lnot $) [20] [21]
t1
at
Quantifiers:
Existential ($ \exists $): "There exists"
bh
Examples
si
ID Name Age
1 John 20
2 Sarah 22
3 Emily 19
4 Michael 21
DRC Query:
Result:
Name
John [15]
1 Alice HR 50000
2 Bob IT 60000
4 David IT 65000
DRC Query:
Result:
Name
Bob
David [15]
8
t1
at
3. Find Customer IDs with Zip Code 12345
bh
1 Rohit 12345
2 Rahul 13245
3 Rohit 56789
4 Amit 12345
DRC Query:
Result:
Customer_id
1
4 [16]
Comparison: DRC vs. TRC
Aspect Domain Relational Calculus (DRC) Tuple Relational Calculus (TRC)
Syntax $ { \langle x_1, x_2, ... \rangle \mid P(...) } $ $ { t \mid P(t) } $
Additional Notes
DRC is foundational for visual query languages like Query-By-Example (QBE) [17] .
DRC queries are more abstract and do not specify the retrieval process, making them
suitable for high-level query formulation [15] [17] .
Both DRC and TRC are relationally complete-any query expressible in one can be written in
the other [20] [17] .
8
t1
Summary Table: Domain Relational Calculus
at
Feature Description
bh
Language
Declarative, non-procedural
Type
dd
Syntax $ { \langle x_1, x_2, ..., x_n \rangle \mid P(x_1, x_2, ..., x_n) } $
$ { \langle Name \rangle \mid \exists ID, Age (\langle ID, Name, Age \rangle \in Students \land
Example
Age = 20) } $
In summary:
Domain Relational Calculus is a declarative query language in DBMS that uses domain variables
to specify what attribute values to retrieve from a database, using logical predicates and
quantifiers. It is powerful, expressive, and forms the basis for user-friendly query interfaces [15]
[16] [17] .
⁂
SQL3 Constructs in Database Management Systems
SQL3, also known as SQL:1999, is a major revision of the SQL standard that introduced a wide
range of advanced features to support complex, modern database applications. It is a superset
of earlier SQL standards and incorporates both object-oriented and procedural programming
concepts, greatly extending the expressive power and flexibility of SQL [22] [23] [24] .
8
t1
Collection Types: Includes support for arrays, multisets (bags), and lists, with options for
ordered/unordered and allowing/disallowing duplicates [22] [23] .
at
bh
4. Object-Oriented Features
Inheritance: Supports subtypes and supertypes, allowing table hierarchies (e.g., a Staff
supertable with Lecturer and Admin as sub-tables) [23] .
Methods and Constructors: Objects can encapsulate both data and behavior, similar to
OOP languages [22] [26] .
7. Recursive Queries
WITH RECURSIVE Clause: Enables writing recursive queries, which are essential for
handling hierarchical data like organizational charts or bill-of-materials structures [22] [24] .
9. Temporary Tables
Temporary Tables: Support for creating tables that exist only for the duration of a session
or transaction, useful for intermediate results and complex computations [24] .
8
Summary Table: Major SQL3 Constructs
t1
at
Feature Description & Use Case
bh
Large Object Types BLOB and CLOB for multimedia and large text
si
Collection Types Arrays, multisets, and lists for flexible data storage
Advantages of SQL3
Enhanced Flexibility: Supports complex and hierarchical data structures.
Improved Performance: Stored procedures and triggers allow logic to reside closer to the
data.
Better Data Modeling: Object-oriented features and UDTs enable more natural
representation of real-world entities.
Efficient Hierarchical Data Management: Recursive queries and nested tables simplify
handling of tree and graph structures [22] [24] .
Conclusion:
SQL3 represents a significant leap in SQL’s capabilities, enabling advanced data modeling,
procedural programming, and object-oriented features within relational databases.
Understanding these constructs is essential for leveraging the full power of modern DBMS in
complex applications [22] [23] [24] .
⁂
8
t1
Key Characteristics of DDL
at
Schema Definition: DDL is used to create and modify the structure of database objects
bh
1. CREATE
Purpose: Creates new database objects such as tables, indexes, views, or entire
databases.
Usage: Defines columns, data types, and constraints.
Syntax:
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
...
);
Example:
CREATE TABLE Employees (
EmployeeID INT,
FirstName VARCHAR(255),
LastName VARCHAR(255),
Department VARCHAR(255)
);
Other Uses: CREATE INDEX, CREATE VIEW, CREATE SCHEMA [31] [33] [34] .
2. ALTER
Purpose: Modifies the structure of existing database objects.
Usage: Add, modify, or drop columns; rename objects; add or remove constraints.
Syntax:
ALTER TABLE table_name ADD column_name datatype;
Example:
8
ALTER TABLE Employees ADD Salary INT;
t1
at
Other Uses: ALTER INDEX, ALTER VIEW [31] [33] [34] .
bh
3. DROP
dd
Example:
DROP TABLE Employees;
Effect: Removes both structure and all data within the object [30] [31] .
4. TRUNCATE
Purpose: Removes all records from a table, but retains the table structure for future use.
Usage: Fast way to delete all rows without deleting the table itself.
Syntax:
TRUNCATE TABLE table_name;
Example:
TRUNCATE TABLE Employees;
Effect: Table remains, but is empty; structure and schema are preserved [31] [34] .
8
Command Purpose Example Syntax
t1 Effect
CREATE Create new object CREATE TABLE Students (...); Adds a new table or object
at
ALTER TABLE Students ADD
ALTER Modify existing object Changes structure of an object
bh
Age INT;
In summary:
DDL constructs are the foundation of database structure in DBMS, allowing you to create,
modify, and remove tables and other objects, define relationships, and enforce data integrity
through constraints and indexes. Mastery of DDL is essential for effective database design and
administration.
⁂
Data Manipulation Language (DML) Constructs in DBMS
Data Manipulation Language (DML) is a subset of SQL used to manage and manipulate data
stored within database tables. DML enables users to perform operations such as inserting,
updating, deleting, and retrieving data, making it essential for day-to-day database
interactions [35] [36] [37] [38] [39] [40] .
1. SELECT
8
t1
Purpose: Retrieves data from one or more tables.
at
Syntax:
bh
Example:
si
Notes: Can be used with clauses like WHERE, ORDER BY, GROUP BY, and JOIN to filter and
organize results [41] [36] [38] [39] [40] .
2. INSERT
Purpose: Adds new records (rows) to a table.
Syntax:
INSERT INTO table_name (column1, column2, ...) VALUES (value1, value2, ...);
Example:
INSERT INTO Employees (Name, Age, Department) VALUES ('Ankit Roy', 62, 'SEO');
Notes: Can insert single or multiple rows [35] [36] [39] [40] [42] .
3. UPDATE
Purpose: Modifies existing records in a table.
Syntax:
UPDATE table_name SET column1 = value1, column2 = value2 WHERE condition;
Example:
UPDATE Employees SET Department = 'HR' WHERE Name = 'Shriyansh Tiwari';
Notes: Use WHERE clause to specify which rows to update; omitting WHERE updates all
rows [36] [38] [39] [40] .
4. DELETE
Purpose: Removes records from a table.
Syntax:
DELETE FROM table_name WHERE condition;
8
t1
Example:
at
DELETE FROM Employees WHERE Age > 60;
bh
Notes: Use WHERE clause to avoid deleting all rows [36] [38] [39] [40] .
dd
INSERT Add data INSERT INTO Employees (Name, Age, Department) VALUES ('Ankit', 62, 'SEO');
UPDATE Modify data UPDATE Employees SET Department = 'HR' WHERE Name = 'Shriyansh';
DELETE Remove data DELETE FROM Employees WHERE Age > 60;
Additional Notes
Transactions: DML operations can be grouped into transactions using COMMIT and
ROLLBACK for data integrity [35] [36] [38] .
MERGE Command: Some DBMSs support MERGE for conditional insert/update (upsert)
operations [43] .
DML vs. DDL: DML manipulates data; DDL (Data Definition Language) defines and modifies
table structures [37] [38] .
In summary:
DML constructs are vital for interacting with and managing the actual data in a database.
Mastery of SELECT, INSERT, UPDATE, and DELETE commands is essential for effective
database usage and application development [35] [36] [38] [39] [40] .
⁂
Definitions
Open Source DBMS:
Software whose source code is freely available for anyone to view, modify, and distribute.
Examples: MySQL, PostgreSQL, MongoDB, Cassandra, Redis [44] [45] [46] [47] .
Commercial (Proprietary) DBMS:
8
Software developed by companies for commercial purposes, requiring a paid license for
t1
use. Source code is closed and only accessible to authorized users. Examples: Oracle,
Microsoft SQL Server, IBM Db2, Snowflake [44] [45] [46] .
at
bh
Key Differences
dd
Source Code Open and modifiable by anyone [44] [46] [47] Closed, only vendor can modify [44] [46]
Support Community-driven, sometimes limited [44] [46] Professional, vendor-backed support [44]
[47] [46]
Transparent, but may have security risks if not Strict controls, robust security
Security
managed [46] features [46]
Commercial DBMS
Advantages:
8
t1
at
Professional, dedicated support and service-level agreements [44] [46]
Advanced, enterprise-grade features (analytics, security, scalability) [48] [46]
bh
Better suited for mission-critical and large-scale enterprise applications [44] [46]
Disadvantages:
si
Similarities
Both support SQL and standard database operations [48] .
Both can handle large volumes of data and complex data structures [46] .
Both are capable of supporting mission-critical applications [48] [46] .
Popular Examples (May 2025 Rankings)
Open Source DBMS Rank Commercial DBMS Rank
MySQL 2 Oracle 1
MongoDB 5 Snowflake 6
Trends
dd
Open source DBMS have matured significantly and are now considered viable for many
si
In summary:
Open source DBMS offer cost savings, flexibility, and community support, making them ideal for
smaller organizations and those needing customization. Commercial DBMS provide advanced
features, professional support, and robust security, making them suitable for large enterprises
and mission-critical applications. The choice depends on organizational needs, budget, and
technical expertise [44] [46] [47] .
⁂
MySQL: Detailed Notes for DBMS
Introduction
MySQL is a widely used open-source relational database management system (RDBMS)
that implements the Structured Query Language (SQL) for managing and manipulating
data [49] .
It is available in two main editions: the open-source MySQL Community Server and the
proprietary MySQL Enterprise Server, which includes additional advanced features and
support [49] [50] .
Architecture of MySQL
MySQL follows a client-server architecture with three main layers [51] [52] [53] :
Layer Description
Users interact with MySQL through command-line, GUI tools, or APIs (e.g., MySQL Workbench).
Client End
Handles connection requests, authentication, and security [51] [52] [53] .
8
Server The core of MySQL. Handles query parsing, optimization, caching, thread management, built-in
t1
Layer functions, stored procedures, triggers, and views [51] [52] [53] .
at
Storage Responsible for actual data storage and retrieval. MySQL uses a pluggable storage engine
Engine architecture, allowing different engines (e.g., InnoDB, MyISAM) to be used per table [51] [54] .
bh
Key Points:
dd
The pluggable storage engine architecture lets you select the best storage engine for
specific application needs (e.g., transactions, high availability, analytics) [54] .
si
The server layer provides a consistent API, shielding applications from the complexities of
the underlying storage engines [54] [53] .
Editions of MySQL
MySQL Community Edition: Free, open-source, widely used for web and small to medium-
8
scale applications [49] . t1
MySQL Enterprise Edition: Paid, includes advanced security, backup, monitoring, and
support features [50] [49] [56] .
at
bh
Use Cases
dd
Advantages
Free and open-source (Community Edition)
High performance and scalability
Flexible storage engine options
Large community and ecosystem
Strong security features (especially in Enterprise Edition)
Easy integration with programming languages and tools
Summary Table: MySQL Key Points
Aspect Description
In summary:
MySQL is a robust, flexible, and widely adopted RDBMS known for its pluggable architecture,
strong security, and support for both transactional and analytical workloads. Its open-source
nature, combined with enterprise-grade features, makes it suitable for a broad range of
applications, from small websites to large-scale, mission-critical systems [49] [56] [54] .
⁂
8
Oracle Database: Detailed Notes for DBMS
t1
at
bh
Introduction
dd
It is widely used in enterprise environments for its scalability, reliability, advanced features,
and support for both transactional (OLTP) and analytical (OLAP) workloads [58] [59] [60] .
1. Physical Structures
These are files stored on the server’s disk:
Control Files:
Store metadata about the database (database name, data file locations, redo log locations)
and are essential for startup and recovery [61] .
Data Files:
Store actual user and system data. Data files are organized into logical units called
tablespaces [61] .
Redo Log Files:
Record all changes made to the database, enabling recovery in case of system failure [61] .
2. Logical Structures
These provide an abstraction over the physical files:
Tablespaces:
Logical storage units made up of one or more data files. Used to organize and manage data
efficiently [61] .
Segments, Extents, and Blocks:
Segments represent database objects (tables, indexes).
Extents are collections of contiguous blocks.
Blocks are the smallest unit of data storage [61] .
Schemas:
Collections of database objects (tables, views, indexes) owned by a user, helping organize
and manage objects systematically [61] .
8
t1
at
3. Oracle Instance Components
bh
Memory Structures
si
8
Supports pluggable databases (PDBs) within a single container database (CDB), enhancing
t1
consolidation and cloud deployment [58] .
In-Memory Column Store:
at
Allows data to be stored in memory in a columnar format for faster analytics and
bh
High Availability:
Supports Real Application Clusters (RAC), Data Guard, and Flashback technologies for fault
tolerance and disaster recovery [64] .
Scalability:
Efficiently handles large-scale enterprise workloads and supports clustering across multiple
servers [64] .
Support for Object-Relational Features:
User-defined types, inheritance, and polymorphism [60] .
Comprehensive Analytics:
Built-in support for data warehousing, analytics, machine learning, and JSON data [58] [59] .
Backup and Recovery:
Advanced tools like Recovery Manager (RMAN) for backup, restore, and recovery [62] [63] .
Partitioning:
Enables management of large tables and indexes for performance optimization.
Advantages of Oracle Database
Enterprise-grade reliability and performance
Advanced security and compliance features
Extensive support for high availability and disaster recovery
Rich feature set for both OLTP and OLAP workloads
Scalable to support large, mission-critical applications
8
Summary Table: Oracle Database Architecture t1
Component Description
at
Control Files Metadata, essential for startup and recovery
bh
SGA Shared memory: buffer cache, shared pool, redo log buffer, etc.
Background
DB Writer, Log Writer, System Monitor, Process Monitor, etc.
Processes
High Availability RAC, Data Guard, Flashback, backup and recovery tools
In summary:
Oracle Database is a robust, enterprise-class RDBMS with a sophisticated architecture, high
availability, advanced security, and comprehensive analytics capabilities. Its modular design,
support for both transactional and analytical workloads, and extensive feature set make it a
leading choice for mission-critical applications in large organizations [58] [59] [61] [60] [64] .
⁂
DB2 in Database Management Systems: Detailed Notes
Introduction
IBM Db2 (Database 2) is a family of data management products, including relational
database management systems (RDBMS), developed by IBM.
Db2 is widely used in enterprise environments, especially on mainframes, but is also
available for distributed systems and cloud platforms.
It is designed for high performance, scalability, reliability, and robust data management.
1. Client-Server Model
8
Clients: Applications (local or remote) interact with the Db2 server via the Db2 client library.
t1
Local clients use shared memory and semaphores for communication.
at
Remote clients use protocols like TCP/IP or named pipes [65] [66] .
bh
3. Buffer Pools
Buffer pools are memory areas where pages of user data, indexes, and catalog data are
temporarily stored and modified.
They are crucial for performance, as accessing data from memory is much faster than from
disk.
Prefetchers and page cleaners optimize buffer pool usage by managing data flow between
disk and memory [65] [66] .
4. Table Spaces and Storage
Table Spaces: Logical storage units that map to physical storage on disk.
Types: Regular table spaces (user data), index table spaces (indexes), LOB table
spaces (large objects) [67] .
Pages: The smallest unit of storage, typically 4 KB or 32 KB in size.
Storage Groups: Collections of disk volumes used to manage where data is physically
stored [67] .
5. Indexes
Improve data retrieval speed by allowing quick lookups.
Stored in separate index table spaces.
Can be created on one or more columns to optimize SELECT, UPDATE, DELETE, and MERGE
operations [67] .
8
Active and Archive Logs: Db2 maintains multiple copies for redundancy and disaster
t1
recovery [67] .
at
Supports backup and restore operations, ensuring data durability.
bh
Locking Services (IRLM): Manage concurrent data access and resolve deadlocks, ensuring
data consistency and isolation [67] .
si
Database Services Handles SQL execution, data manipulation, buffer management, and core DBMS
(DBAS) logic
Locking Services (IRLM) Manages locks and resolves deadlocks for concurrent access
Buffer Manager Manages buffer pools and coordinates data movement between disk and memory
Relational Data System Checks authorization, parses and optimizes SQL, creates access paths
Db2 pureScale Feature
Db2 pureScale is designed for high availability and scalability in clustered environments.
Multiple Db2 members (nodes) process requests in parallel, sharing access to the same
database on shared disk.
Supports up to 128 members, each with its own buffer pools and log files, enabling
continuous availability and workload balancing [68] .
8
warehousing and analytics [69] . t1
Cross-Platform: Available on mainframes (z/OS), Linux, UNIX, Windows, and cloud.
at
bh
Component Description
In summary:
IBM Db2 is a robust, enterprise-grade RDBMS known for its modular architecture, high
performance, reliability, and advanced features for both OLTP and analytical workloads. Its
architecture, with components like EDUs, buffer pools, and pureScale, ensures efficient data
processing, scalability, and continuous availability, making it a preferred choice for mission-
critical applications [65] [67] [68] [66] .
⁂
Introduction
Microsoft SQL Server is a widely used relational database management system (RDBMS)
developed by Microsoft.
It supports storing, retrieving, and managing data for enterprise applications, with advanced
features for security, scalability, and high availability.
8
Component
t1 Function
at
Protocol Layer Manages communication between clients and the SQL Server instance.
bh
Relational
Processes queries, optimizes execution plans, and handles transactions and security.
Engine
dd
Storage Engine Manages physical storage, retrieval, and manipulation of data on disk.
Provides operating system-like services (memory, scheduling, I/O) for SQL Server
si
SQLOS
processes.
1. Protocol Layer
Role: Handles all client-server communication.
Supported Protocols:
Shared Memory: For local connections.
TCP/IP: For remote and networked connections.
Named Pipes: For LAN environments.
Tabular Data Stream (TDS): The protocol for data transfer between client and
server [70] [71] [72] .
Function: Receives requests from clients, packages them, and passes them to the
Relational Engine.
2. Relational Engine (Query Processor)
Role: Responsible for query processing, optimization, and execution.
Key Components:
Query Parser: Checks syntax and translates T-SQL statements into internal
representations.
Query Optimizer: Generates the most efficient execution plan based on statistics and
indexes.
Query Executor: Executes the plan, interacting with the Storage Engine as needed [73]
[70] [71] [72] [74] .
Functions:
Processes DDL, DML, and other SQL statements.
Manages transactions, security, and user permissions.
Formats results for client applications.
3. Storage Engine
8
Role: Handles actual storage and retrieval of data from disk.
t1
Responsibilities:
at
Reads/writes data pages to/from disk.
bh
Components:
si
Data File Architecture: Organizes data into files and filegroups for efficient access.
Log File Architecture: Maintains transaction logs for recovery and consistency.
Buffer Pool: Caches frequently accessed data for performance.
Lock Manager: Controls concurrent access and resolves deadlocks.
File Architecture
Data Files: Store actual table data and indexes; grouped into filegroups for manageability.
Log Files: Store transaction logs for recovery and rollback.
Filegroups: Logical groupings of data files to optimize performance and management [75]
[72] .
8
Clustered: Multiple servers for high availability. t1
Mirrored: Redundant copy of the database for disaster recovery.
at
AlwaysOn Availability Groups: Synchronized databases across servers for high availability
and disaster recovery [75] .
bh
dd
Protocol Layer Client-server communication (Shared Memory, TCP/IP, Named Pipes, TDS)
Relational Engine Query parsing, optimization, execution, transaction and security management
Storage Engine Data storage/retrieval, index management, logging, locking, ACID compliance
Layer/Component Functionality
In summary:
SQL Server is a robust, enterprise-grade RDBMS with a multi-layered architecture that
separates client communication, query processing, and data storage. Its advanced features for
security, scalability, and high availability make it a popular choice for mission-critical applications
in organizations of all sizes [73] [70] [75] [71] [72] [74] .
⁂
8
A domain in DBMS refers to the set of all possible, valid values that an attribute (column) of
t1
a table can have [76] [77] .
at
Each attribute in a database schema is associated with a domain, which defines its data
type, possible range, length, and other constraints [76] [78] [77] .
bh
Examples of domains:
dd
For an attribute Age, the domain could be all integers from 0 to 120.
For an attribute Email, the domain could be all valid email addresses.
si
Domain Constraints
Domain constraints (or domain integrity constraints) are rules that restrict the type of data
that can be stored in a column, ensuring data consistency and correctness [76] [78] .
These constraints specify:
The data type (e.g., integer, character, date)
The format or pattern (e.g., phone number format)
The range or set of allowed values (e.g., gender can be 'M' or 'F')
Whether NULL values are allowed
Types of Domain Constraints:
NOT NULL: Ensures that a column cannot have NULL (missing) values [76] [78] .
CHECK: Restricts the values of a column to satisfy a specific condition (e.g., salary > 0) [76]
[78] .
Custom Domains: Can be created using SQL’s CREATE DOMAIN command to define reusable
data types with specific constraints [76] .
8
Why is Domain Dependency Important? t1
Data Integrity: Ensures only valid data is stored, preventing errors and inconsistencies [78] .
at
Data Validation: Automatically checks and rejects invalid data entries.
bh
Domain Constraint Rule that enforces attribute values to belong to their domain
Domain Dependency Attribute’s values must always come from its defined domain
In summary:
Domain dependency in DBMS enforces that each attribute’s values must conform to its defined
domain, ensuring data integrity and consistency by restricting invalid or unexpected data from
being stored in the database [76] [78] [77] .
⁂
8
t1
What is Data Dependency?
at
Data dependency in DBMS refers to a relationship between attributes (columns) in a
bh
database table where the value of one or more attributes uniquely determines the value of
one or more other attributes [79] [80] [81] .
dd
In simple terms, some data values are dependent on other data values to be recognized or
derived [81] .
si
Data dependencies are foundational to database design, normalization, and ensuring data
integrity.
Functional One attribute (or set) uniquely EmpID → EmpName: Knowing EmpID gives
Dependency determines another attribute (or set) you EmpName [79] [82] [80] [84]
Multivalued One attribute in a table implies multiple If a person has multiple phone numbers and
Dependency independent values of another attribute email addresses, both are independent [79]
[80]
8
t1
Functional Dependency (FD): The Core Concept
at
Definition: If two tuples (rows) in a relation have the same value for attribute(s) X, they
bh
must have the same value for attribute(s) Y. Written as X → Y [82] [80] [84] .
Example: In an Employee table, EmpID → EmpName means each EmpID is associated with only
dd
one EmpName.
si
In Summary
Data dependency in DBMS is the relationship where the value of one attribute (or set)
8
determines the value of another. Understanding and applying data dependencies is crucial for
t1
designing efficient, normalized, and reliable databases, ensuring data integrity, reducing
at
redundancy, and optimizing queries [79] [82] [80] [81] .
bh
⁂
dd
Introduction
Armstrong's axioms are a set of inference rules introduced by William W. Armstrong in 1974 for
reasoning about functional dependencies in relational databases [85] [86] . They provide a formal
and systematic way to deduce all possible functional dependencies from a given set, playing a
crucial role in database design, normalization, and ensuring data integrity [85] [87] [88] .
Significance
Soundness: Only valid (true) dependencies are derived.
Completeness: All possible dependencies implied by the given set can be derived using
these rules [85] [87] [88] .
Application: Used in normalization, finding minimal covers, and ensuring efficient and
consistent database schemas [85] [87] .
Primary Armstrong's Axioms (RAT)
There are three fundamental (primary) axioms, often abbreviated as RAT: Reflexivity,
Augmentation, and Transitivity [87] [88] [86] .
1. Reflexivity
Rule: If Y is a subset of X, then X → Y.
Explanation: Any set of attributes functionally determines its own subset.
Example: If X = {A, B}, then {A, B} → {A} holds [88] [86] .
2. Augmentation
Rule: If X → Y, then XZ → YZ for any set of attributes Z.
Explanation: Adding the same attributes to both sides of a dependency does not change
the dependency.
Example: If A → B, then AC → BC [88] [86] .
3. Transitivity
Rule: If X → Y and Y → Z, then X → Z.
8
t1
Explanation: Dependencies are transitive; if X determines Y and Y determines Z, then X
at
determines Z.
bh
These rules can be derived from the primary axioms and are often used to simplify the process
of finding all functional dependencies [86] :
Advantages
Simplicity: Clear and easy-to-apply rules [87] .
Efficiency: Streamlines the process of dependency analysis and normalization.
Minimality: Helps in finding minimal covers and reducing redundancy.
Limitations
Complexity for Large Sets: Can be computationally intensive for large schemas [87] .
Scope: Primarily addresses deterministic (not probabilistic or uncertain) dependencies.
8
Reflexivity If Y ⊆ X, then X → Y t1 {A, B} → {A}
Augmentation If X → Y, then XZ → YZ A → B ⇒ AC → BC
at
Transitivity If X → Y and Y → Z, then X → Z A → B, B → C ⇒ A → C
bh
In summary:
Armstrong's axioms are foundational inference rules in DBMS for deducing all functional
dependencies in a relational schema. They ensure data integrity, guide normalization, and
provide a systematic approach to database design by being both sound and complete [85] [87]
[88] [86] .
8
No repeating groups or arrays are allowed in any row [92] [89] [95] [93] [94] .
t1
Example Violation:
at
A cell containing “Math, Science” instead of separate rows for each subject.
bh
Definition:
si
8
It is already in 3NF. t1
For every functional dependency (A → B), A is a superkey [89] [95] [93] [94] [96] .
at
Key Point:
bh
BCNF is a stricter version of 3NF, resolving certain anomalies not handled by 3NF.
dd
Definition:
A table is in 4NF if:
It is in BCNF.
It has no multi-valued dependencies (MVDs) [89] [95] [93] [94] .
Key Point:
Addresses situations where one attribute in a table uniquely determines another set of attributes,
leading to redundancy.
1NF Atomic values, unique rows, no repeating groups Eliminates repeating groups
Why Normalize?
8
Reduces data redundancy t1
Prevents update, insert, and delete anomalies
at
Improves data integrity and consistency
bh
In summary:
dd
Normal forms are essential rules in DBMS that guide the structuring of tables to minimize
redundancy and ensure data integrity. The most common forms used in practice are 1NF, 2NF,
si
and 3NF, with BCNF, 4NF, and 5NF addressing more complex scenarios [92] [89] [95] [94] .
⁂
Formal Definition
Let:
$ R $ be a relation schema,
$ F $ be the set of functional dependencies on $ R $,
$ R $ is decomposed into $ R_1, R_2, ..., R_n $ with respective sets of functional
dependencies $ F_1, F_2, ..., F_n $ (where each $ F_i $ is the set of dependencies that can
be enforced on $ R_i $ alone).
8
The decomposition is dependency preserving if: t1
at
That is, the closure of the union of the projected dependencies is equivalent to the closure of the
bh
1. Project each functional dependency in $ F $ onto the decomposed relations ($ R_1, R_2, ...,
R_n $), forming $ F_1, F_2, ..., F_n $.
2. Take the union $ F' = F_1 \cup F_2 \cup ... \cup F_n $.
3. Compute the closure $ F'^+ $ and compare it to the closure of the original set $ F^+ $.
4. If $ F'^+ = F^+ $, the decomposition is dependency preserving [97] [98] [99] .
Example
Suppose $ R(A, B, C, D) $ with $ F = {A \rightarrow B, A \rightarrow C, C \rightarrow D} $.
Decompose $ R $ into:
$ R_1(A, B, C) $ with $ F_1 = {A \rightarrow B, A \rightarrow C} $
$ R_2(C, D) $ with $ F_2 = {C \rightarrow D} $
Union: $ F' = F_1 \cup F_2 = {A \rightarrow B, A \rightarrow C, C \rightarrow D} $
$ F'^+ = F^+ $ (all original dependencies can be enforced locally), so the decomposition is
dependency preserving [97] .
8
Summary Table: Dependency Preservation t1
Aspect Description
at
Definition All original FDs can be enforced on decomposed tables without joining
bh
Formal Condition $ (F_1 \cup F_2 \cup ... \cup F_n)^+ = F^+ $
Relation to Normalization 3NF always allows dependency preservation with lossless join
si
In summary:
Dependency preservation in DBMS ensures that all functional dependencies of the original
relation can be enforced locally in the decomposed tables, avoiding the need to join tables to
check constraints. This property is crucial for efficient and reliable database design, especially
during normalization [97] [98] [99] .
⁂
8
t1
Scalability: Makes it easier to modify or expand the schema without risking data loss [102] .
at
Definition: A decomposition of a relation $ R $ into $ R_1, R_2, ..., R_n $ is lossless if, by
dd
performing a natural join on all $ R_i $, the original relation $ R $ is obtained exactly (no
spurious tuples, no missing data) [102] [103] [104] [105] .
si
2. Non-Null Intersection:
The intersection of attributes between any two decomposed relations must not be empty:
3. Key Condition:
The common attribute(s) in the intersection must be a candidate key (or superkey) for at
least one of the decomposed relations [104] [106] .
This ensures that the join does not introduce spurious tuples.
Example
Suppose $ R(A, B, C, D) $ with functional dependency $ A \rightarrow BC $:
Decompose into $ R_1(A, B, C) $ and $ R_2(A, D) $
Attribute coverage: $ (A, B, C) \cup (A, D) = (A, B, C, D) $
8
Intersection: $ (A, B, C) \cap (A, D) = A $ t1
$ A $ is a key for $ R_1(A, B, C) $ because $ A \rightarrow BC $
at
Conclusion: This is a lossless join decomposition [106] .
bh
1. Lossless Join: No information is lost; original relation can be reconstructed [102] [103] [104]
[107] [105] .
2. Dependency Preservation: All functional dependencies are preserved and can be enforced
without joining tables [103] [107] [106] [108] .
3. Lack of Redundancy: Reduces unnecessary duplication of data [103] .
Note: Lossless join and dependency preservation are independent properties-one does not
guarantee the other [108] .
Lossless Join Original relation can be exactly reconstructed from decomposed tables
Attribute Coverage All original attributes must appear in the union of decomposed tables
Non-Null Intersection Decomposed tables must share at least one common attribute
Property Description
Key Condition Common attribute(s) must be a key for at least one decomposed table
Conclusion
Lossless design is essential for effective relational database normalization. It ensures that
decomposing a table into smaller relations does not lose or distort information, preserving the
ability to reconstruct the original data set exactly. This property, along with dependency
preservation, forms the foundation of robust, efficient, and reliable database schemas [102] [103]
[104] [107] [105] [106] [108] .
8
Overview
t1
at
The evaluation of relational algebra expressions is a crucial step in query processing within a
Database Management System (DBMS). Relational algebra expressions, built from a sequence
bh
of operations (such as selection, projection, join, etc.), are used to represent queries. Efficient
evaluation of these expressions ensures optimal query performance and resource utilization.
dd
si
Evaluation Strategies
There are two primary strategies for evaluating relational algebra expressions:
1. Materialized Evaluation
Process:
Each operation in the expression is evaluated one at a time, typically in a bottom-up
manner.
The result of each operation is stored in a temporary relation (often written to disk).
These intermediate results are then used as inputs for subsequent operations.
Example:
Compute $ A \bowtie B $, store result in a temporary file.
Compute $ C \bowtie D $, store result in another temporary file.
Join the temporary results as required.
Advantages:
Simplicity and modularity.
Disadvantages:
High I/O cost due to frequent writing and reading of intermediate results from disk [109]
[110] .
2. Pipelined Evaluation
Process:
Multiple operations are evaluated simultaneously.
The output of one operation is passed directly as input to the next, without storing
intermediate results on disk.
Evaluation still proceeds bottom-up, but intermediate results are kept in memory as
much as possible.
Advantages:
Reduces disk I/O and storage overhead [109] [110] .
Faster overall query execution.
Disadvantages:
8
t1
More complex implementation.
at
May be limited by available memory.
bh
dd
Evaluation Plans
Definition:
si
An evaluation plan is a detailed, step-by-step blueprint specifying the order and method of
executing each operation in a relational algebra expression.
Types:
Logical Plan: Specifies the sequence of relational algebra operations.
Physical Plan: Specifies the algorithms and access methods (e.g., index scan, hash
join) used for each operation [111] .
Optimization:
The DBMS may generate multiple equivalent evaluation plans for the same query and
choose the most efficient one based on estimated costs (using statistics like relation size,
tuple size, index availability, etc.) [111] .
Steps in Evaluation
1. Decomposition:
SQL queries are decomposed into query blocks, each translated into a relational
algebra expression [109] [112] .
2. Translation:
Each query block is converted into an equivalent relational algebra expression.
3. Optimization:
The DBMS considers equivalent expressions and chooses an evaluation plan with the
lowest estimated cost [111] [113] .
Heuristic rules (e.g., perform selection and projection early, replace Cartesian product
and selection with join) are applied to improve efficiency [113] .
4. Execution:
The chosen plan is executed using either materialized or pipelined evaluation, or a
combination of both.
Example
Suppose you have a query to select the names of all female students in the BCA course:
Relational algebra expression:
Evaluation:
Apply the selection first to reduce the number of tuples.
Then apply the projection to get only the required column [112] .
Summary Table: Evaluation Strategies
Strategy Intermediate Results Disk I/O Speed Complexity
In summary:
The evaluation of relational algebra expressions involves translating queries into algebraic
operations, optimizing their execution order, and choosing between materialized and pipelined
strategies. Efficient evaluation is key to high-performance query processing in DBMS [109] [110]
[111] [113] [112] .
Definition
8
Query equivalence refers to the situation where two relational algebra expressions (or
t1
queries) produce the same result set (i.e., the same set of tuples) for every possible legal
database instance-that is, any database state that satisfies all integrity constraints [114] [115]
at
[116] .
bh
The order of tuples may differ, but as long as the content (set of tuples) is identical, the
queries are considered equivalent [115] .
dd
si
8
t1
at
4. Set Operations
bh
5. Distribution
Selection distributes over Union, Intersection, and Set Difference:
6. Join Properties
Theta-join is commutative:
Natural-join is associative:
[117]
Examples of Query Equivalence
Example 1:
is equivalent to
Both yield the names of female students in the BCA course, regardless of the order of
selection conditions [118] .
Example 2:
Applying selection before or after union produces the same result [117] .
8
t1
Summary Table: Common Equivalence Rules
at
Operation Equivalence Rule Example
bh
Selection Cascade
dd
Selection Commutative
Projection Cascade
si
Selection-Join
Set Commutativity
Distribution
In summary:
Query equivalence in DBMS means two queries produce the same result for all legal database
instances. It is essential for query optimization, enabling the DBMS to transform and tune queries
for better performance using a set of well-defined equivalence rules [114] [115] [116] [117] .
⁂
Join Strategies in Database Management Systems (DBMS)
Introduction
Joins are fundamental operations in relational databases, enabling the combination of data from
multiple tables based on related columns. Efficient join strategies are crucial for query
performance, especially with large datasets or complex queries.
Types of Joins
Join Type Description Use Case Example
Returns rows with matching values in both tables; Customers who have
Inner Join
unmatched rows are excluded placed orders
Right Returns all rows from the right table, and matching
All products and their
Outer rows from the left table; unmatched left rows are
sales, even if unsold
8
Join NULL t1
Full Outer Returns all rows from both tables; unmatched rows Complete view of
Join are filled with NULLs employees and projects
at
Cross Returns the Cartesian product of two tables (all Generating all customer-
bh
Use Case: Joining a large fact table with a dimension table with indexed keys. [123]
si
Table Sizes Use nested loop for small tables, hash join for large tables with small build input.
Index Availability Use index join if the inner table has a suitable index.
Sorting Use merge join if both tables are sorted on the join key.
Join Condition Hash and merge joins are for equality; nested loop can handle any condition.
Factor Impact
Memory Constraints Hash joins require more memory; nested loop and merge join use less.
Data Distribution In distributed systems, consider broadcast or shuffle joins for performance.
8
Join Condition
Algorithm Best For Memory Use Notes
t1 Type
at
Nested
Small/filtered tables Low Any Simple, flexible
Loop
bh
In summary:
Join strategies in DBMS include various join types (inner, outer, cross, self) and execution
algorithms (nested loop, hash, merge, index join). The optimal strategy depends on table sizes,
indexes, sorting, memory, and data distribution. Understanding and applying the right join
strategy is key to writing efficient and scalable database queries.
⁂
8
statistics (table size, index selectivity, etc.), and picks the plan with the lowest estimated
t1
cost.
at
Pros: Produces more efficient plans for complex queries, especially with large or multiple
tables.
bh
[126] .
si
3. Heuristic-Based Optimization
How it works: Applies practical guidelines (heuristics) such as pushing selections and
projections as close to the data source as possible, or avoiding cross joins.
Pros: Quick, effective for routine queries.
Cons: May not find the globally optimal plan for complex queries [125] [127] .
8
t1
Selection Pushdown: Move selection operations as close to the data source as possible to
reduce intermediate result size [127] .
at
Projection Pushdown: Eliminate unnecessary columns early to minimize data transfer.
bh
Join Order Optimization: Reorder joins to minimize the size of intermediate results and
exploit indexes [127] [129] .
dd
Join Algorithm Selection: Choose the best join method (nested loop, hash join, merge join)
si
8
Rule-Based Fixed rules/heuristics Simple, routine queries
t1 pushdown
Dynamic
Systematic plan search Join order optimization Left-deep/right-deep trees
Programming
In summary:
Query optimization algorithms in DBMS include rule-based, cost-based, heuristic, and adaptive
approaches. They transform and evaluate multiple execution plans using relational algebra,
statistics, and equivalence rules to select the most efficient plan. Advanced techniques like
adaptive optimization and AI-driven tuning are shaping the future of high-performance, self-
managing databases [125] [127] [126] [129] .
⁂
1. https://www.tutorialspoint.com/dbms/relational_algebra.htm
2. https://en.wikipedia.org/wiki/Relational_algebra
3. https://bito.ai/resources/relational-algebra-in-dbms/
4. https://byjus.com/gate/relational-algebra-in-dbms-notes/
5. https://mrcet.com/downloads/digital_notes/IT/Database Management Systems.pdf
6. https://www.boardinfinity.com/blog/relational-algebra-in-dbms/
7. https://www.cbcb.umd.edu/confcour/Spring2014/CMSC424/Relational_algebra.pdf
8. https://studyglance.in/dbms/display.php?tno=18&topic=Tuple-Relational-Calculus-in-DBMS
9. https://www.scaler.com/topics/dbms/relational-calculus-in-dbms/
10. https://www.reddit.com/r/askscience/comments/84d9an/what_is_the_difference_between_relational_alg
ebra/
11. https://www.csbio.unc.edu/mcmillan/Media/Comp521F14Lecture04.pdf
12. https://herovired.com/learning-hub/topics/relational-calculus-in-dbms/
13. https://en.wikipedia.org/wiki/Tuple_relational_calculus
14. https://lkouniv.ac.in/site/writereaddata/siteContent/202004021910159071chandrabhan_DBMS_Relational
_model_and_Relational_Algebra.pdf
15. https://www.tutorialspoint.com/domain-relational-calculus-in-dbms
8
16. https://www.scaler.com/topics/dbms/relational-calculus-in-dbms/
t1
17. https://binaryterms.com/domain-relational-calculus.html
at
18. https://www.studocu.com/in/document/university-of-delhi/database-management-system/dbms-dbms/
bh
117875241
19. https://herovired.com/learning-hub/topics/relational-calculus-in-dbms/
dd
20. https://en.wikipedia.org/wiki/Domain_relational_calculus
21. https://www.w3schools.blog/relational-calculus-dbms
si
22. https://datascientest.com/en/all-about-sql3
23. https://www.youtube.com/watch?v=EJ6IpG0fZlk
24. https://celerdata.com/glossary/ansi-sql
25. https://www.iitk.ac.in/esc101/05Aug/tutorial/jdbc/jdbc2dot0/sql3.html
26. http://infolab.stanford.edu/~ullman/fcdb/spr99/lec12.pdf
27. https://byjus.com/gate/data-definition-language-notes/
28. https://www.scaler.com/topics/ddl-in-dbms/
29. https://www.techtarget.com/whatis/definition/Data-Definition-Language-DDL
30. https://www.almabetter.com/bytes/tutorials/sql/dml-ddl-commands-in-sql
31. https://www.dbvis.com/thetable/sql-ddl-the-definitive-guide-on-data-definition-language/
32. https://celerdata.com/glossary/data-definition-language-ddl
33. https://onecompiler.com/tutorials/mysql/commands/ddl-commands
34. https://www.datacamp.com/tutorial/sql-ddl-commands
35. https://byjus.com/gate/data-manipulation-language-dql-notes/
36. https://www.scaler.com/topics/dml-in-dbms/
37. https://www.almabetter.com/bytes/tutorials/sql/dml-ddl-commands-in-sql
38. https://www.theiotacademy.co/blog/dml-and-ddl-in-sql/
39. https://www.tutorialspoint.com/what-are-the-dml-commands-in-dbms
40. https://www.datacamp.com/tutorial/sql-dml-commands-mastering-data-manipulation-in-sql
41. https://opentextbc.ca/dbdesign01/chapter/chapter-sql-dml/
42. https://trainings.internshala.com/blog/dml-commands-in-sql-with-examples/
43. https://cloud.google.com/bigquery/docs/data-manipulation-language
44. https://www.navisite.com/blog/open-source-vs-commercial-database-systems/
45. https://db-engines.com/en/ranking_osvsc
46. https://simplelogic-it.com/difference-between-open-source-database-and-licensed-database/
47. https://www.ask.com/news/comparing-open-source-vs-proprietary-dbms-platforms-best
48. https://sis.binus.ac.id/2024/10/15/commercial-database-and-open-source-database-differences-and-s
imilarities/
49. https://en.wikipedia.org/wiki/MySQL
50. https://www.oracle.com/in/mysql/what-is-mysql/
51. https://www.cogentinfo.com/resources/architecture-of-mysql
52. https://cloudinfrastructureservices.co.uk/mysql-architecture-components-how-mysql-works-internally/
53. https://www.youtube.com/watch?v=jt3C9Ngbqfc
8
t1
54. https://dev.mysql.com/doc/en/pluggable-storage-overview.html
at
55. https://www.w3webschool.com/blog/features-of-mysql/
bh
56. https://www.mysql.com/products/enterprise/techspec.html
57. https://www.bytebase.com/blog/mysql-vs-sqlserver/
dd
58. https://docs.oracle.com/en/database/oracle/oracle-database/18/cncpt/introduction-to-oracle-database.
html
si
59. https://www.oracle.com/in/database/features/
60. https://docs.oracle.com/database/122/CNCPT/introduction-to-oracle-database.htm
61. https://www.tricentis.com/learn/a-guide-to-oracle-database-architecture
62. https://docs.oracle.com/cd/F19136_01/nonpub_db_techarch/pdf/db-19c-architecture.pdf
63. https://mindmajix.com/oracle-dba/oracle-11g-database-architecture-overview
64. https://indico.cern.ch/event/36804/attachments/731758/1003980/oracleArchitecture.pdf
65. https://www.ibm.com/docs/en/db2/11.1?topic=architecture-db2-process-overview
66. https://www.ibm.com/docs/en/db2/11.5?topic=architecture-db2-process-overview
67. https://www.youtube.com/watch?v=J8GvsoFLYEY
68. https://www.ibm.com/docs/en/db2/11.5?topic=environment-components-db2-purescale-feature
69. https://www.ibm.com/docs/SSEPGG_11.1.0/com.ibm.dwe.welcome.doc/dwev9welcome.html
70. https://www.simplilearn.com/what-is-microsoft-sql-server-architecture-article
71. https://www.guru99.com/sql-server-architecture.html
72. https://www.tutorialspoint.com/ms_sql_server/ms_sql_server_architecture.htm
73. https://learnomate.org/components-of-the-sql-server-architecture/
74. https://www.interviewbit.com/blog/sql-server-architecture/
75. https://www.milesweb.in/hosting-faqs/ms-sql-server-architecture/
76. https://www.scaler.com/topics/domain-in-dbms/
77. https://www.youtube.com/watch?v=HCLPUTFPcnk
78. https://www.boardinfinity.com/blog/domain-constraints-in-dbms/
79. https://www.arkware.com/what-are-database-dependencies/
80. https://www.tutorialspoint.com/Types-of-dependencies-in-DBMS
81. https://www.youtube.com/watch?v=HCLPUTFPcnk
82. https://talent500.com/blog/types-of-functional-dependencies-dbms/
83. https://www.lri.fr/~pierres/donn�es/save/these/articles/lpr-queue/database-dependency-discovery.pd
f
84. https://www.wrike.com/blog/functional-dependencies-database-systems/
85. https://www.prepbytes.com/blog/dbms/what-are-armstrongs-axioms-in-dbms/
86. https://www.nielit.gov.in/gorakhpur/sites/default/files/Gorakhpur/Alevel_1_DBMS_22Apr2020_AV.pdf
87. https://www.scaler.com/topics/armstrong-axioms-in-dbms/
88. https://digiimento.com/axioms-of-functional-dependencies-in-dbms-explained-armstrongs-axioms-wit
h-examples/
8
89. https://talent500.com/blog/normalization-dbms-types-normal-forms/
t1
90. https://en.wikipedia.org/wiki/Database_normalization
at
91. https://learn.microsoft.com/en-us/office/troubleshoot/access/database-normalization-description
bh
92. https://www.freecodecamp.org/news/database-normalization-1nf-2nf-3nf-table-examples/
93. https://www.youtube.com/watch?v=GFQaEYEc8_8
dd
94. https://www.datacamp.com/tutorial/normalization-in-sql
95. https://www.studytonight.com/dbms/database-normalization.php
si
96. https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization/
97. https://prepinsta.com/dbms/dependency-preserving-decomposition/
98. https://www.slideshare.net/slideshow/dependency-preservation-138672914/138672914
99. https://www.nielit.gov.in/gorakhpur/sites/default/files/Gorakhpur/Alevel_1_DBMS_02Jun2020_AV.pdf
100. https://homepages.inf.ed.ac.uk/libkin/papers/pods06b.pdf
101. https://solutionsadda.in/2024/08/29/database-management-system-376/
102. https://www.slideshare.net/slideshow/lossless-decomposition/138673228
103. https://byjus.com/gate/decomposition-in-dbms/
104. https://www.scaler.com/topics/lossless-join-decomposition-in-dbms/
105. https://testbook.com/gate/lossless-decomposition-in-dbms
106. https://prepinsta.com/dbms/lossless-join-and-dependency-preserving-decomposition/
107. https://www.db-book.com/Previous-editions/db4/slide-dir/ch7.pdf
108. https://stackoverflow.com/questions/39464758/lossless-decomposition-vs-dependency-preservation
109. https://www.tutorialspoint.com/explain-the-evaluation-of-relational-algebra-expression-dbms
110. https://www.youtube.com/watch?v=hJoK_wvTZ-M
111. https://www.cs.purdue.edu/homes/clifton/cs44800/QP1.pdf
112. https://www.tutorialspoint.com/distributed_dbms/distributed_dbms_relational_algebra_query_optimizati
on.htm
113. http://www.scienceandnature.org/IJEMS/IJEMS-Vol4(3)-July2013/IJEMS_V4(3)2013-8.pdf
114. https://piazza.com/class_profile/get_resource/jpuyegn6c76ih/jwu8g9g69b32o8
115. https://groups.google.com/g/stilgeichondman/c/CTY9ECa13G8
116. https://piazza.com/class_profile/get_resource/jyqypau0nkk4w0/k0q7f36yuag2ja
117. https://repository.dinus.ac.id/docs/ajar/formal_relational_query_language_part_3.pdf
118. https://www.tutorialspoint.com/distributed_dbms/distributed_dbms_relational_algebra_query_optimizati
on.htm
119. https://celerdata.com/glossary/sql-join-types-made-simple
120. https://www.coursera.org/articles/sql-join-types
121. https://www.w3schools.com/sql/sql_join.asp
122. https://celerdata.com/glossary/sql-joins
123. https://www.pingcap.com/article/sql-join-types-choosing-between-right-and-left-join/
124. https://www.linkedin.com/pulse/spark-join-strategies-mastering-joins-apache-venkatesh-nandikolla-m
k4qc
8
125. https://www.acceldata.io/blog/the-complete-guide-to-query-optimizers-and-performance-tuning
t1
126. https://docs.oracle.com/en/database/oracle/oracle-database/19/tgsql/query-optimizer-concepts.html
at
127. https://dev.to/ibrahimhyazouri/query-optimization-how-the-query-optimizer-works-using-relational-alg
ebra-1ho1
bh
128. https://www.tutorialspoint.com/distributed_dbms/distributed_dbms_relational_algebra_query_optimizati
on.htm
dd
129. https://www.db-book.com/slides-dir/PPTX-dir/ch16.pptx
130. https://en.wikipedia.org/wiki/Query_optimization
si
Indices in Database Management Systems (DBMS)
Definition:
An index in a DBMS is an additional data structure created on top of a database table to
improve the speed of data retrieval operations. It stores pointers to the actual data rows,
allowing the DBMS to locate records quickly without scanning the entire table [1] [2] [3] .
8
[3] .
t1
at
Structure of an Index
bh
Search Key: The column(s) on which the index is built. It can be a primary key, candidate
key, or any attribute.
dd
Data Reference (Pointer): Points to the location (disk block address) of the actual data in
the table [4] [3] .
si
Types of Indexes
Created on the primary key of a Unique, not null, 1:1 mapping, sorted order, fast
Primary Index
table. searching [2] [4] [3] .
Clustered Determines physical order of Only one per table, sorts data rows, efficient for range
Index data in the table. queries [5] .
Secondary Created on non-primary Data may not be sorted, can have multiple per table,
Index (candidate) keys. slower than primary index [2] [5] .
2. Based on Data Coverage
Type Description Use Case
Dense Entry for every search key value in the data file. Fast access, more storage required [1] [4] .
Entries only for some search key values (usually Less storage, slightly slower access, suitable for
Sparse
one per block). large data [1] [4] .
Bitmap Uses bit arrays for columns with few distinct values Suitable for fields like gender, status,
Index (low cardinality). etc. [1] [5] .
Uses hash functions to map search keys to Best for exact match queries, not for
Hash Index
locations. range queries [5] [4] .
Function- Indexes values computed from a function or Optimizes queries with functions in
based expression on one or more columns. WHERE clauses [5] .
8
Covering Includes all columns required by a query in the
t1 Eliminates need to access the base
Index index itself. table for those queries [5] .
at
Full-Text Designed for efficient text searching (e.g., searching Used in document or comment
bh
Spatial Optimized for geographical data types (e.g., Used in GIS and location-based
dd
Reverse Stores the reversed value of the key for specific Used in some telecom and specialized
si
Indexing Methods
Ordered Indexing: Index entries are sorted, making search operations faster. Example: B-
tree, B+ tree indices [4] [3] .
Hashing: Uses hash functions to directly map keys to data locations, efficient for equality
searches [4] .
Disadvantages of Indexing
Increases storage requirements due to additional index files.
Slows down write operations (INSERT, UPDATE, DELETE) as indexes must be updated [1] .
Complexity in managing multiple indexes.
Tip:
Always analyze your queries and data distribution before deciding which columns to index and
which index type to use. Proper indexing is essential for efficient database design and
operation [2] [5] .
⁂
8
Node Capacity: t1
Each non-root node contains between and keys.
at
Root node can have as few as one key.
bh
Operations on B-Trees
Search:
Follows the keys in internal nodes to the appropriate child, recursively, until the key is found
or determined absent. Search time is [9] [6] [8] .
Insertion:
Adds a key to the appropriate node. If the node overflows, it splits, and the middle key
moves up. This may propagate up to the root, maintaining balance [6] [8] .
Deletion:
Removes a key and may require merging or redistributing keys among nodes to maintain the
minimum key constraint [6] [8] .
8
t1
Advantages of B-Trees in DBMS
at
Efficient Data Retrieval: Logarithmic search, insertion, and deletion times, even as data
bh
Sorted Data: Facilitates range queries and ordered traversals [9] [7] .
Search Time
Data Storage Keys and references (payload); B+ trees store data in leaves
In summary:
8
t1
B-trees are essential for high-performance database systems, enabling fast, scalable, and
at
reliable data access through balanced multi-level indexing [6] [7] [8] [10] .
bh
⁂
dd
Definition:
Hashing is a technique in DBMS that allows direct access to data records on disk by calculating
their storage address using a hash function, instead of traversing index structures. This method
greatly accelerates data retrieval, especially for large datasets [11] [12] [13] .
Types of Hashing
1. Static Hashing
The number of buckets is fixed at the time of creation and does not change [12] [13] [14] .
The same search key always maps to the same bucket address.
Example: If the hash function is h(K) = K mod 5, keys are distributed among 5 buckets.
Limitation: Cannot adapt to changes in data volume; leads to overflow if buckets become
full [12] [13] [14] .
2. Dynamic Hashing (Extendible Hashing)
The number of buckets can grow or shrink dynamically as data is inserted or deleted [13] [14] .
Only a prefix of the hash value may be used to determine the bucket address, allowing for
flexible expansion [13] .
Helps prevent overflow and efficiently utilizes storage.
Commonly used in modern DBMS to handle unpredictable data growth [13] [14] .
8
t1
Hashing Operations
at
Insertion:
bh
The hash function computes the bucket address where the new record will be stored [12] [13] .
dd
Search:
The hash function computes the address, and the record is retrieved directly from the
si
Advantages of Hashing
Constant Time Operations:
Most hash operations (insert, search, delete) can be performed in constant time, regardless
of data size [13] .
Efficient Storage and Retrieval:
Direct calculation of addresses reduces the need for index traversal, minimizing disk I/O [11]
8
[12] [13] . t1
Dynamic Adaptation (with dynamic hashing):
at
Can handle growing or shrinking datasets efficiently [13] [14] .
bh
Disadvantages of Hashing
dd
Collisions:
si
Require additional mechanisms for resolution, which can add complexity [13] .
No Ordering:
Data is not stored in any sorted order, making range queries inefficient [13] [15] .
Not Ideal for All Workloads:
Less effective for queries requiring data ordering or range searches [13] [15] .
Collisions are inevitable and must be resolved using chaining or probing [13] .
Hashing is ideal for equality searches but not for range queries [15] [16] .
In summary:
Hashing in DBMS is a powerful technique for efficient, direct data access, particularly suited for
large datasets and workloads dominated by equality searches. Understanding its mechanisms,
strengths, and limitations is essential for effective database design and optimization [11] [12] [13] [15]
[16] .
1. https://en.wikipedia.org/wiki/Database_index
2. https://www.scaler.com/topics/dbms/indexing-in-dbms/
8
3. https://www.studocu.com/in/document/budge-budge-institute-of-technology/multimedia-systems/unit-
t1
3-storage-strategies-indices-b-trees-hashing/108104634
4. https://www.scribd.com/document/669769472/Storage-indices-b-tree-hashing-in-dbms
at
5. https://blog.algomaster.io/p/a-detailed-guide-on-database-indexes
bh
6. https://en.wikipedia.org/wiki/B-tree
7. https://www.pingcap.com/article/understanding-basics-b-tree-data-structures/
dd
8. https://www.scaler.com/topics/b-tree-in-dbms/
si
9. https://builtin.com/data-science/b-tree-index
10. https://www.wscubetech.com/resources/dsa/b-tree
11. https://byjus.com/gate/hashing-in-dbms-notes/
12. https://www.tutorialspoint.com/dbms/dbms_hashing.htm
13. https://prepinsta.com/dbms/hashing/
14. https://www.codecademy.com/resources/blog/what-is-hashing/
15. https://codefinity.com/courses/v2/d90d9403-ce34-4555-b549-6bb5773a48a2/1128b6ab-f333-4267-
893e-98c6efa140e3/da696862-fa0a-405f-9c04-e5420f3488fb
16. https://www.pingcap.com/article/understanding-b-tree-and-hash-indexing-in-databases/
Concurrency Control in Database Management Systems (DBMS)
Definition and Importance
Concurrency control in DBMS is the set of techniques and protocols used to manage
simultaneous operations (transactions) on a database, ensuring that the integrity, consistency,
and isolation of data are maintained even when multiple users or applications access or modify
data at the same time [1] [2] [3] . Without proper concurrency control, simultaneous transactions
can interfere with each other, leading to problems such as data inconsistency, lost updates, and
dirty reads [4] [5] .
Non-repeatable Read: A transaction reads the same data item twice and gets different
values because another transaction modified the data in between [5] .
dd
Phantom Read: A transaction re-executes a query and sees a different set of rows due to
another transaction's insert or delete [5] .
si
2. Timestamp-based Protocols
8
Each transaction is assigned a unique timestamp.
t1
at
Transactions are ordered based on their timestamps; older transactions get priority.
Ensures serializability by allowing transactions to proceed only if they do not violate the
bh
Serializability
Conflict Serializability: Ensures that the schedule of transactions is equivalent to some
serial schedule by checking for conflicting operations.
View Serializability: Ensures that the schedule produces the same final state as a serial
schedule, even if the order of operations differs [5] .
No locks, high
Validation (Optimistic) Validate at commit Rollbacks possible
concurrency
Conclusion
Concurrency control is essential in DBMS to ensure data integrity and consistency in a multi-user
environment. Various protocols-such as locking, timestamp ordering, MVCC, and validation-are
used to manage concurrent transactions, each with its own strengths and trade-offs.
Understanding these techniques and their challenges is crucial for designing robust database
systems [1] [7] [2] [6] [5] .
⁂
Atomicity
Atomicity means that a transaction is an indivisible unit: it either completes fully or not at all.
If any part of a transaction fails, the entire transaction is rolled back, leaving the database
unchanged.
This prevents partial updates that could leave the database in an inconsistent state.
Example: In a bank transfer, if money is debited from one account but not credited to
another due to a failure, atomicity ensures the debit is also undone [8] [10] [11] [12] .
Consistency
Consistency ensures that a transaction takes the database from one valid state to another,
8
preserving all predefined rules, constraints, and data integrity.
t1
Any data written to the database must be valid according to all rules (such as foreign keys,
at
triggers, and constraints).
bh
violating this rule will not be allowed [8] [10] [11] [12] .
si
Isolation
Isolation ensures that concurrent transactions do not interfere with each other.
The intermediate state of a transaction is invisible to other transactions; each transaction
executes as if it is the only one running.
This prevents problems like dirty reads, non-repeatable reads, and phantom reads.
Example: If two users try to update the same account balance simultaneously, isolation
ensures that each transaction sees a consistent view of the data and the final result reflects
both updates correctly [8] [10] [11] [13] [12] .
Durability
Durability guarantees that once a transaction is committed, its effects are permanent, even
in the event of a system crash or power failure.
Committed changes are saved to non-volatile storage and cannot be lost.
Example: After a successful transfer, the new account balances remain intact even if the
system fails immediately after the transaction [8] [10] [11] [14] [12] .
Summary Table
Property Description Example Scenario
Database moves from one valid state to another, Invoice must always have a valid
Consistency
preserving rules customer
Isolation Transactions do not affect each other’s execution Two users booking the same seat
Durability Committed transactions survive system failures Balance remains after power outage
Key Points
ACID properties are fundamental for reliable transaction processing in DBMS.
Most relational databases (e.g., MySQL, PostgreSQL, Oracle) are ACID compliant, but the
exact implementation may vary [15] [11] .
Understanding ACID is essential for designing robust, reliable, and consistent database
applications [8] [10] [11] [14] [12] .
8
⁂ t1
at
Serializability of Scheduling in Database Management Systems (DBMS)
bh
Definition
dd
serializable if its outcome is equivalent to some serial execution of those transactions, meaning
the transactions could have been executed one after another without overlapping, producing
the same final database state [16] [17] [18] .
Importance of Serializability
Maintains data consistency and integrity during concurrent transaction execution [16] [19]
[17] .
Prevents anomalies such as lost updates, dirty reads, and inconsistent data.
Ensures that the database remains in a valid state, adhering to all business rules and
constraints, even with multiple users or processes accessing it at the same time [19] [17] [20] .
Types of Serializability
1. Conflict Serializability
Definition: A schedule is conflict serializable if it can be transformed into a serial schedule
by swapping non-conflicting operations [21] [17] [18] .
Conflicting Operations: Two operations conflict if they:
Belong to different transactions,
Operate on the same data item, and
At least one is a write operation.
Examples of Conflicts:
Read-Write (RW) conflict: One transaction reads, another writes the same data.
Write-Read (WR) conflict: One writes, another reads.
Write-Write (WW) conflict: Both write the same data [18] .
Testing Conflict Serializability: Use a precedence (serialization) graph:
Nodes represent transactions.
Edges represent conflicts (if one must precede another due to conflicts).
If the graph has no cycles, the schedule is conflict serializable [21] .
8
2. View Serializability
t1
at
Definition: A schedule is view serializable if it is view equivalent to a serial schedule [21] [17]
[18] .
bh
The final write on each data item is performed by the same transaction in both
schedules [18] .
Note: All conflict serializable schedules are view serializable, but not all view serializable
schedules are conflict serializable.
Transactions execute one after another, Transactions execute concurrently but the result is equivalent to a
no overlap serial schedule [18]
Example
Suppose T1 and T2 both access account A:
Non-serializable schedule: T1 updates A, T2 reads old value of A before T1’s update is
visible, leading to inconsistency.
Serializable schedule: T2 either executes entirely before or after T1, ensuring a consistent
final state [21] [17] .
Key Points
8
Serializability is essential for correct concurrent transaction execution in DBMS.
t1
Conflict serializability is easier to test (using precedence graphs), while view
at
serializability is more general but harder to check.
Concurrency control is vital to enforce serializability, enabling safe parallelism and
bh
Summary:
Serializability ensures that even when transactions are executed concurrently, the result is as if
si
they were executed one after another, preserving data consistency and integrity in a multi-user
database environment [16] [17] [18] .
⁂
S Yes No
X No No
8
t1
at
Lock-Based Protocols
bh
Ensures data is protected during a transaction but can reduce concurrency [23] .
2. Pre-Claiming Lock Protocol
Before starting, a transaction requests all required locks.
If all locks are granted, it proceeds; otherwise, it waits.
Prevents deadlocks but may lead to reduced concurrency [23] .
3. Two-Phase Locking Protocol (2PL)
Divides transaction execution into two phases:
Growing Phase: Transaction acquires all required locks; no locks are released.
Shrinking Phase: Transaction releases locks; no new locks can be acquired.
Guarantees serializability but may cause deadlocks [23] [24] .
4. Strict Two-Phase Locking Protocol
A stricter version of 2PL: all exclusive locks are held until the transaction commits or aborts.
Prevents cascading rollbacks and is widely used in practice [23] .
Deadlock and Starvation
Deadlock: Occurs when two or more transactions wait indefinitely for locks held by each
other, forming a cycle [23] .
Starvation: A transaction waits indefinitely because other transactions keep acquiring the
required locks first. Can be prevented by using priority schemes such as aging [23] .
May reduce concurrency and throughput due to waiting for locks [23] [24] .
si
Strict Two-Phase Hold all exclusive locks until Prevents cascading May increase
Locking commit rollbacks waiting
In summary:
Locking-based schedulers are essential for managing concurrent transactions in DBMS. By using
shared and exclusive locks and protocols like 2PL, they ensure serializability and data integrity,
though they must also address challenges like deadlocks and starvation [23] [24] [25] [27] .
⁂
Key Concepts
Timestamp:
A unique identifier assigned to each transaction when it enters the system. This can be
generated using the system clock or a logical counter [28] [31] [32] .
Older Transactions:
Transactions with smaller (earlier) timestamps are considered older and given higher priority
over newer transactions [28] [33] [31] .
Serializability:
The protocol ensures that the schedule of transactions is equivalent to some serial (one-
8
after-another) execution based on their timestamps [28] [30] .
t1
at
How Timestamp-Based Schedulers Work
bh
1. Assignment:
When a transaction $ T $ enters the system, it receives a timestamp $ TS(T) $ [28] [29] [30] .
dd
2. Ordering:
All operations (read/write) by $ T $ are tagged with $ TS(T) $. The protocol ensures
si
Advantages
Deadlock-Free:
8
t1
No transaction waits for locks, so deadlocks do not occur [28] [30] .
at
Serializability:
bh
Disadvantages
Cascading Rollbacks:
If an older transaction aborts, all newer transactions that read its data must also abort
(unless using strict timestamp ordering) [28] [32] .
Starvation:
Long-running or older transactions may be repeatedly aborted if they conflict with many
newer transactions [32] .
Overhead:
Maintaining and updating timestamps for all data items can increase system overhead [32] .
Comparison: Lock-Based vs. Timestamp-Based Schedulers
Feature Lock-Based Schedulers Timestamp-Based Schedulers
Data Item Timestamps Each data item tracks last read and write timestamps
8
Deadlock Handling No deadlocks (no waiting), but possible starvation
t1
Serializability Always ensured
at
Drawbacks Cascading rollbacks, starvation, timestamp management overhead
bh
In summary:
Timestamp-based schedulers in DBMS use transaction timestamps to order and control
dd
concurrent execution, guaranteeing serializability and deadlock-free operation but may suffer
from cascading rollbacks and starvation for long-running transactions [28] [30] [32] .
si
8
Once committed, future reads access the new version; further updates create additional
t1
versions [34] .
at
3. Version Cleanup:
bh
Old versions are periodically removed (garbage collected) when they are no longer
needed by any active transaction.
dd
Advantages of MVCC
High Concurrency:
Multiple transactions can read and write simultaneously without blocking each other,
improving throughput and user experience [34] [38] [36] .
Reduced Lock Contention:
MVCC minimizes the need for locks, reducing contention and the likelihood of deadlocks [34]
[38] [36] [37] .
Drawbacks of MVCC
Storage Overhead:
Maintaining multiple versions increases database size and can lead to "version bloat" [34]
[38] [39] .
Garbage Collection:
The system must periodically clean up obsolete versions, which can be resource-
intensive [34] [39] .
Complex Implementation:
MVCC mechanisms are complex to implement, though this complexity is hidden from end
8
users and developers [34] [38] [39] . t1
Cascading Aborts:
at
If a transaction that created a version aborts, other transactions that read this version may
also need to abort [39] .
bh
dd
Concurrency High (reads and writes don’t block) Lower (locks can block access)
In summary:
Multiversion concurrency control (MVCC) is a powerful and widely adopted technique in DBMS
that enables high concurrency and performance by allowing simultaneous reads and writes
through versioning, reducing lock contention and deadlocks, but at the cost of increased
storage and system complexity [34] [38] [36] [37] [39] .
8
⁂ t1
at
Optimistic Concurrency Control (OCC) Schemes in DBMS
bh
conflicts between transactions are rare and allows transactions to execute without locking data
resources, only checking for conflicts at the end of the transaction [40] [41] [42] .
si
Key Principles
No Locks During Execution: Transactions proceed without acquiring locks on data items,
allowing maximum concurrency and resource utilization [40] [41] [43] .
Conflict Detection at Commit: Before a transaction commits, the system checks whether
any other transaction has modified the data it accessed. If a conflict is detected, the
transaction is rolled back and may be retried [40] [44] [41] .
Best for Low Contention: OCC is ideal for environments where data conflicts are infrequent,
such as read-heavy workloads or systems with many users but few overlapping updates [42]
[41] [43] .
Phases of Optimistic Concurrency Control
OCC typically divides each transaction into three main phases [40] [44] :
1. Read Phase
The transaction reads data from the database and stores it in local variables (workspace).
All operations are performed on these local copies.
No changes are made to the actual database during this phase [44] .
2. Validation Phase
Before committing, the transaction checks whether any other concurrent transaction has
modified the data items it read or intends to write.
The system compares the transaction’s read and write sets with those of other transactions
to detect conflicts.
If no conflicts are found, the transaction proceeds to commit; otherwise, it is rolled back [40]
[44] .
8
3. Write Phase t1
If validation succeeds, the transaction writes its changes to the database.
at
If validation fails, the transaction is aborted and may be retried [40] [44] .
bh
Validation Rules
dd
Backward Validation: Checks if any transaction that committed after the current
si
transaction started has written to a data item read by the current transaction [44] [45] .
Forward Validation: Checks if the current transaction’s write set conflicts with the read sets
of active transactions [45] .
Serializability: The validation ensures that the resulting schedule is serializable, maintaining
database consistency [40] [44] .
Advantages of OCC
High Concurrency: Multiple transactions can proceed in parallel without waiting for locks,
leading to higher throughput in low-contention environments [42] [41] [43] [45] .
No Deadlocks: Since no locks are held, deadlocks are impossible [40] [41] .
Reduced Lock Management Overhead: Eliminates the need for lock acquisition and release,
reducing system overhead [45] .
Drawbacks of OCC
Transaction Rollbacks: If conflicts are frequent, many transactions may be rolled back,
reducing overall performance [42] [40] [41] .
Starvation: Long-running transactions may be repeatedly aborted if they often conflict with
shorter transactions [40] .
Not Suitable for High Contention: In write-heavy or high-contention environments,
pessimistic approaches may be more efficient [42] [41] [43] .
Example Workflow
Suppose two transactions, T1 and T2, both read and attempt to update the same data item:
Both read the original value and make local changes.
At commit time, both validate whether the data has changed since they read it.
Only one transaction will succeed; the other will be rolled back and may retry [41] [46] .
8
Feature
t1
Optimistic Concurrency Control Pessimistic Concurrency Control
at
Locking No locks during transaction Locks acquired before data access
bh
Performance (High
Low (due to rollbacks) High (due to blocking)
Contention)
si
In summary:
Optimistic concurrency control schemes in DBMS maximize concurrency by allowing
transactions to execute without locks and validating for conflicts only at commit time. This
approach is highly efficient in environments with low data contention but can suffer from high
rollback rates when conflicts are frequent [40] [42] [41] .
⁂
8
t1
Catastrophic Failure: Natural disasters or events that destroy the database and backups
(e.g., fire, earthquake) [49] .
at
bh
1. Log-Based Recovery
Transaction Logs: Every operation (start, read, write, commit, abort) is recorded in a log file
stored on stable storage [51] [47] .
Example log entries:
<Tn, Start>: Transaction Tn started.
<Tn, X, V1, V2>: Tn changed X from V1 to V2.
<Tn, Commit>: Tn committed.
Write-Ahead Logging (WAL): Logs are written before any changes are applied to the
database, ensuring recoverability [52] .
Undo/Redo Operations:
Undo: Reverses changes of uncommitted transactions.
Redo: Reapplies changes of committed transactions that may not have been saved to
disk before the crash [51] [47] [50] .
Checkpointing: Periodically, the system records a checkpoint, a point where the database
is known to be consistent. During recovery, the system only needs to process logs from the
last checkpoint, improving efficiency [51] [47] [50] .
8
and logs are used to roll forward or roll back transactions as needed [54] [49] .
t1
4. Shadow Paging
at
Maintains two versions (pages) of the database: the current and a shadow copy. Updates
bh
are made to the current page, and the shadow page is only updated after a successful
commit, ensuring atomicity and easy rollback [52] .
dd
si
Crash Recovery Restores the database after a system crash, rolling back incomplete transactions [48] [49] .
Disaster Recovery Restores the database after catastrophic events using backups [49] .
Rollforward Recovery Applies committed changes from logs after restoring a backup [49] .
Log-Based
Uses logs for undo/redo, supports checkpoints Most common, all failures
Recovery
8
Backup & Restore Restores from periodic backups t1 Media/catastrophic failure
In summary:
Database recovery in DBMS is essential for maintaining consistency, atomicity, and durability
dd
incomplete ones are rolled back, bringing the database back to a consistent state [51] [47] [48] [49]
[50] [53] .
1. https://www.scaler.com/topics/dbms/concurrency-control-in-dbms/
2. https://www.dremio.com/wiki/concurrency-control/
3. https://www.studocu.com/in/document/dr-apj-abdul-kalam-technical-university/database-management
-system/dbms-unit-5-notes/48082167
4. https://www.scribd.com/presentation/320810503/Concurrency-Control-in-DataBase
5. https://www.slideshare.net/slideshow/concurrency-control-in-advanced-database/266475028
6. https://www.solarwinds.com/resources/it-glossary/database-concurrency
7. https://www.shiksha.com/online-courses/articles/concurrency-control-techniques-in-dbms/
8. https://byjus.com/gate/acid-properties-in-dbms-notes/
9. https://www.databricks.com/glossary/acid-transactions
10. https://www.scaler.com/topics/dbms/acid-properties-in-dbms/
11. https://en.wikipedia.org/wiki/ACID
12. https://mariadb.com/kb/en/acid-concurrency-control-with-transactions/
13. https://www.freecodecamp.org/news/how-databases-guarantee-isolation/
14. https://www.simplilearn.com/acid-properties-in-dbms-article
15. https://www.mongodb.com/resources/basics/databases/acid-transactions
16. https://www.scaler.com/topics/dbms/serializability-in-dbms/
17. https://www.fynd.academy/blog/serializability-in-dbms
18. https://www.tutorialspoint.com/what-is-the-term-serializability-in-dbms
19. https://www.theknowledgeacademy.com/blog/serializability-in-dbms/
20. https://www.prepbytes.com/blog/dbms/serializability-in-dbms/
21. https://www.upgrad.com/blog/serializability-in-dbms/
22. https://blog.purestorage.com/purely-educational/what-does-serializability-mean-in-a-dbms/
23. https://www.scaler.com/topics/lock-based-protocol-in-dbms/
24. https://sitams.org/wp-content/uploads/2023/COURSE/MCA/DBMS_UNIT_V.pdf
25. https://www.guru99.com/dbms-concurrency-control.html
26. https://www.db-book.com/slides-dir/PDF-dir/ch18.pdf
27. https://www.tmu.ac.in/assets/pdf/coe/e-content/ECS_306/DBMS_Unit-5.pdf
28. https://www.scaler.com/topics/timestamp-based-protocols-in-dbms/
8
29. https://www.guru99.com/dbms-concurrency-control.html
t1
30. https://www.tutorialspoint.com/concurrency-control-based-on-timestamp-ordering
at
31. https://beginnersbook.com/2022/07/timestamp-based-ordering-protocol/
bh
32. https://15445.courses.cs.cmu.edu/fall2021/notes/17-timestampordering.pdf
dd
33. https://cse.poriyaan.in/topic/timestamp-based-protocol-50884/
34. https://www.theserverside.com/blog/Coffee-Talk-Java-News-Stories-and-Opinions/What-is-MVCC-Ho
si
w-does-Multiversion-Concurrencty-Control-work
35. https://en.wikipedia.org/wiki/Multiversion_concurrency_control
36. https://celerdata.com/glossary/multi-version-concurrency-control
37. https://www.postgresql.org/docs/7.1/mvcc.html
38. https://www.tutorialspoint.com/multiversion-concurrency-control-techniques
39. https://gpttutorpro.com/multiversion-concurrency-control-in-databases/
40. https://en.wikipedia.org/wiki/Optimistic_concurrency_control
41. https://www.freecodecamp.org/news/how-databases-guarantee-isolation/
42. https://www.linkedin.com/advice/0/what-benefits-drawbacks-using-optimistic-concurrency-control
43. https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/optimistic-concurrency
44. https://www.tutorialspoint.com/what-is-an-optimistic-concurrency-control-in-dbms
45. https://codemia.io/knowledge-hub/path/backwardforward_validation_in_optimistic_concurrency_control
46. https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/database-transactions-optimistic-concurrenc
y
47. https://www.scaler.com/topics/recovery-techniques-in-dbms/
48. https://www.ibm.com/docs/en/db2/11.1?topic=recover-crash-recovery
49. https://www.ibm.com/docs/en/db2/12.1.0?topic=administration-data-recovery
50. https://www.slideshare.net/slideshow/crash-recovery-in-database/233050920
51. https://www.tutorialspoint.com/dbms/dbms_data_recovery.htm
52. https://www.youtube.com/watch?v=1pSxXwy0qiE
53. https://dspmuranchi.ac.in/pdf/Blog/Database Recovery techniques.pdf
54. https://takeuforward.org/dbms/database-recovery-management
8
t1
at
bh
dd
si
Authentication in Database Management Systems (DBMS)
Definition
Authentication in a DBMS is the process of verifying the identity of a user, device, or system
attempting to access a database. Its primary goal is to ensure that only authorized users can
interact with the database and its contents, thereby protecting sensitive information from
unauthorized access [1] [2] [3] .
Purpose of Authentication
Prevents unauthorized access to database resources.
Maintains data integrity and confidentiality.
Forms the first line of defense in database security [4] [2] [5] .
8
t1
Users must provide credentials (such as username and password) when attempting to
access the database.
at
The DBMS compares these credentials against stored data (usually in an encrypted format).
bh
Authentication is always performed before authorization (which determines what actions the
user can perform) [3] .
si
Users are authenticated directly by the DBMS using credentials stored within the
DBMS Authentication
database. No need for corresponding OS accounts [6] [7] .
Operating System The DBMS relies on the OS to authenticate users. If the user is logged into the OS,
Authentication they are granted access to the database.
Third-Party Uses external services or protocols (e.g., Kerberos, LDAP, SSO) to authenticate
Authentication users.
Aspect Authentication
8 Authorization
t1
Determines what actions the authenticated user can
at
Purpose Verifies identity of the user or system
perform
bh
Exam Tip: Always distinguish between authentication (identity verification) and authorization
(permission granting) in your answers, and be able to describe at least two authentication
mechanisms with examples.
⁂
Purpose of Authorization
Controls access to database objects (tables, views, schemas, etc.).
Ensures users can only perform actions necessary for their roles (principle of least privilege)
[9] .
8
t1
Protects sensitive information and maintains data integrity by restricting unauthorized
operations.
at
1. Authentication First: The system verifies the user's identity through authentication.
dd
2. Access Evaluation: The DBMS checks what permissions (privileges) the authenticated user
has.
si
3. Grant or Deny: Based on these permissions, the system allows or blocks access to
requested resources or operations [11] [10] .
4. Logging and Revocation: Activities may be logged, and permissions can be revoked if user
roles or policies change [11] .
8
Role-Based Access Control (RBAC): Permissions are grouped by roles, and users are
t1
assigned roles according to their job functions [13] .
at
Attribute-Based Access Control (ABAC): Permissions are granted based on attributes (e.g.,
department, location, time) [13] .
bh
Examples
A user with only SELECT privilege on a table cannot modify its data.
A database administrator (DBA) may have all system privileges, allowing them to manage
users, roles, and database resources.
A sales role may have privileges to view and update sales data but not to access payroll
information.
Authentication Authorization
Verifies the identity of the user Determines what actions the user can perform
Authentication Authorization
Exam Tip: Be able to define authorization, describe types of privileges and roles, explain
GRANT/REVOKE, and distinguish between DAC, MAC, and RBAC models with examples.
⁂
8
t1
Access control in a DBMS refers to the set of policies, models, and mechanisms that determine
which users or processes are permitted to access, modify, or manage database resources. Its
at
primary goal is to protect data from unauthorized access and ensure that users can only perform
bh
actions for which they have explicit permission [14] [15] [16] .
dd
Authorization: Determines the specific actions authenticated users are allowed to perform
on database objects (e.g., tables, views) [14] [15] .
Access Enforcement: The system enforces access decisions by allowing or denying
requested operations.
Role-Based Access Permissions are assigned to roles, and users Enterprises, organizations with
Control (RBAC) are assigned roles. Simplifies management. defined job functions [19] [14] [16]
[17]
Model Name Description Typical Use Cases
Rule-Based Access
Access is controlled by pre-defined rules or Systems needing flexible, policy-
Control (RBAC or RB-
policies, often time or context-based. driven controls [19] [20]
RBAC)
8
Access is governed by a central policy, often using security labels (e.g., confidential,
t1
secret).
Only administrators can change access policies; users cannot delegate permissions [18] [14]
at
[16] .
bh
Permissions are grouped into roles (e.g., "manager," "analyst"), and users are assigned to
these roles [19] [14] [16] [17] .
Simplifies administration, especially in large organizations.
Advantage: Scalable and manageable.
Drawback: Requires careful role design to avoid complexity.
Attribute-Based Access Control (ABAC)
Access is determined by evaluating attributes (user role, time, location, etc.) against
policies [14] [15] .
Highly dynamic and granular.
Advantage: Fine-grained, adaptable to complex requirements.
Drawback: Policy management can be complex.
Rule-Based Access Control
Access is controlled by a set of predefined rules or policies (e.g., "no access after 6 PM")
[19] [20] .
8
4. Audit/logging: Actions may be recorded for security and compliance [14] [15] .
t1
at
Summary Table: Access Control Models
bh
Key Takeaways
Access control is essential for database security, ensuring only authorized users can access
or modify data.
The choice of access control model depends on organizational needs, security
requirements, and system complexity [17] [21] .
Proper implementation of access control helps prevent data breaches, supports compliance,
and enforces organizational policies.
Exam Tip: Be able to define each access control model, explain their advantages and
disadvantages, and provide examples of where each is most appropriate.
⁂
Discretionary Access Control (DAC) Model in DBMS
Definition
Discretionary Access Control (DAC) is an access control model in which the owner of a resource
(such as a database table, file, or record) has the authority to determine who can access that
resource and what operations they are permitted to perform (e.g., read, write, modify) [22] [23]
[24] [25] [26] [27] [28] . The term "discretionary" reflects the fact that access decisions are left to
the discretion of the resource owner.
Key Concepts
Subjects and Objects:
Subjects are users or user groups seeking access to resources.
Objects are resources such as tables, files, or data entries [23] .
Ownership:
The creator of an object is usually its owner and can grant or revoke access rights to
other users [22] [24] [27] [28] .
Access Control Lists (ACLs):
8
Permissions are often managed through ACLs, which specify which users or groups can
t1
access a resource and what actions they can perform (e.g., SELECT, INSERT, UPDATE,
DELETE) [25] [28] .
at
Grant and Revoke:
bh
Owners can grant privileges to other users and can revoke them at any time [29] [27] .
dd
Propagation of Rights:
In some systems, users who are granted access (with a "GRANT OPTION") can further
si
Examples of DAC
File Systems:
In Unix and Windows, file owners can set read, write, and execute permissions for
themselves, groups, and others [22] [26] .
Database Tables:
In DBMSs, table owners can grant SELECT, INSERT, UPDATE, or DELETE privileges to other
users or roles [29] [26] .
Cloud Storage:
Users share files or folders with specific people and assign view or edit rights (e.g., Google
Drive, OneDrive) [26] .
Advantages of DAC
Flexibility:
Owners have fine-grained control over their resources and can easily share or restrict
access as needed [23] [25] [26] .
Ease of Implementation:
DAC is straightforward to implement and widely supported in commercial DBMSs [25] .
Supports Collaboration:
Facilitates sharing of information in business and collaborative environments [23] [28] .
8
Disadvantages of DAC
t1
at
Security Risks:
Since users can delegate access, there is a risk of excessive privilege propagation,
bh
Security Moderate (less secure than MAC) High (used for sensitive data)
Aspect Description
Flexibility High
Common Use Cases File systems, databases, cloud storage, collaborative apps
Limitation Potential for unauthorized access if privileges are not carefully managed
8
Exam Tip: t1
Be able to define DAC, explain how it works in a DBMS, discuss its advantages and
disadvantages, and compare it with other models like MAC. Use real-world examples (e.g., file
at
permissions, database privileges) to illustrate your answer.
bh
⁂
dd
Definition
Mandatory Access Control (MAC) is a highly secure access control model in which a central
authority (usually system administrators or the operating system) strictly regulates access to
database resources based on predefined security policies. In MAC, users and data objects are
assigned security labels (such as clearance levels and categories), and access decisions are
made according to these labels-not at the discretion of individual users or data owners [30] [31]
[32] .
Non-Discretionary: Users have no discretion to share or delegate access rights. All access
is determined by central policy [31] [32] [35] .
High Security: MAC is considered one of the most secure access control models, suitable
for environments where confidentiality and regulatory compliance are critical, such as
military, government, finance, and healthcare [30] [31] [36] .
Example Scenario
bh
A user with “Secret” clearance can access “Secret” and “Confidential” tables, but not “Top
Secret” tables.
Only administrators can change these assignments or permissions.
Advantages of MAC
High Security: Reduces risk of unauthorized access and enforces strict confidentiality [30]
[31] [33] .
Limitations of MAC
Low Flexibility: Users cannot share data or adjust permissions, making MAC unsuitable for
collaborative or dynamic environments [30] [33] .
Administrative Overhead: Requires significant effort from administrators to manage and
update policies, especially in large organizations [33] .
Complexity: Implementing and maintaining MAC can be complex and time-consuming [33]
[35] .
Management Admins set and enforce policies Users manage their own permissions
8
t1
Financial institutions with strict data compartmentalization
Healthcare systems with sensitive patient data
at
Any environment requiring “need-to-know” access and zero-trust principles [30] [31] [36]
bh
Feature Description
si
Exam Tip:
Be able to define MAC, explain its core principles (centralized control, security labels, strict
enforcement), describe its advantages and disadvantages, and contrast it with DAC using real-
world examples.
⁂
Role-Based Access Control (RBAC) Model in Database Management Systems
Definition
Role-Based Access Control (RBAC) is a security model for managing user access in a database
system by assigning permissions and privileges to roles rather than to individual users. Users are
then assigned to these roles based on their job responsibilities, ensuring they only have access
to the resources necessary for their role [37] [38] [39] .
permissions of their assigned roles and grants or denies access accordingly [37] [38] [39] [41] .
dd
Model Description
Hierarchical
Supports role hierarchies, allowing roles to inherit permissions from other roles.
RBAC
Constrained Adds constraints such as separation of duties (e.g., a user cannot both approve and
RBAC request funds) [40] .
Benefits of RBAC
Improved Security: Limits access to sensitive data by ensuring users only have the
permissions needed for their role, supporting the principle of least privilege [42] [43] .
Operational Efficiency: Streamlines user management, especially in large organizations, by
allowing administrators to manage permissions at the role level rather than individually [44]
[41] .
Drawbacks of RBAC
Initial Setup Complexity: Requires careful planning and analysis to define appropriate roles
and permissions.
Role Explosion: Too many roles can make the system complex to manage if not designed
carefully.
Maintenance: Keeping roles and permissions up to date as organizational needs change
requires ongoing attention [44] .
8
reduce errors [42] [45] . t1
Track Changes: Monitor modifications to roles and permissions to detect and investigate
at
suspicious activity [42] [45] .
bh
Example Scenario
Database Roles:
dd
Manager: Can view and update records but not delete them.
Analyst: Can only read data.
When an employee is promoted from Analyst to Manager, their user account is simply
moved to the Manager role, and their permissions are updated automatically.
Feature Description
8
Provide forensic evidence and auditing for security incidents.
t1
at
Key Components of Intrusion Detection Systems (IDS) for Databases
bh
Sensors: Collect data on database activity, such as queries, logins, and configuration
changes. Sensors can be placed at the network, host, or database level [48] [49] [50] .
dd
Analysis Engine: Processes collected data to identify suspicious activity using techniques
like signature-based detection (matching known attack patterns) and anomaly detection
si
8
t1
Features to Look for in Database Intrusion Detection Tools
at
Real-time monitoring and alerting.
bh
Component Function
Exam Tip:
Be able to define intrusion detection in the context of DBMS, describe the main components and
techniques (signature-based, anomaly-based, hybrid), explain the role of DAM, and discuss its
importance for security and compliance. Use real-world examples and mention integration with
SIEM systems for a comprehensive answer.
⁂
For example, an attacker might enter 1 OR 1=1 in a login form, altering the query logic to
always return true, thus bypassing authentication [53] [54] .
si
-- Intended query:
SELECT id FROM users WHERE username='user_input' AND password='user_input';
-- Malicious input:
username: ' OR 1=1 --
password: (left blank)
-- Resulting query:
SELECT id FROM users WHERE username='' OR 1=1 -- ' AND password='';
-- The condition '1=1' always evaluates to true, so the attacker gains access.
Consequences of SQL Injection
Data theft (customer information, intellectual property, etc.)
Data loss or corruption (deletion or modification of records)
Bypassing authentication and authorization controls
Gaining administrative privileges
Potential full system compromise [52] [53] [54]
Prevention Techniques
1. Input Validation and Sanitization
Validate all user inputs against expected formats, lengths, and types.
Sanitize inputs by removing or encoding potentially harmful characters [55] [56] [57] .
2. Parameterized Queries (Prepared Statements)
Use parameterized queries to separate SQL logic from user input, ensuring user data is
treated as data, not executable code [58] [59] .
Supported in all major programming languages and database drivers.
8
3. Stored Procedures t1
Use properly constructed stored procedures that utilize parameters, not dynamic SQL,
to handle user inputs [55] [56] [59] .
at
4. Allow-List (Whitelist) Input Validation
bh
Accept only known, safe values for inputs such as table or column names [56] [59] .
dd
Web Application Firewall (WAF) Block malicious traffic at the network/application layer
8
Exam Tip:
t1
Define SQL Injection, explain how it works with examples, discuss its consequences, and list
at
multiple prevention techniques with brief explanations for each.
bh
⁂
dd
1. https://www.strongdm.com/authentication
si
2. https://www.devx.com/terms/database-authentication/
3. https://www.techtarget.com/searchsecurity/definition/authentication
4. https://compositecode.blog/2023/12/07/security-and-authentication-in-relational-databases/
5. https://www.linkedin.com/pulse/discuss-authentication-authorization-encryption-udjgc
6. https://docs.actian.com/vector/5.1/Security/DBMS_Authentication.htm
7. https://docs.actian.com/ingres/10.2/Security/DBMS_Authentication.htm
8. https://www.ibm.com/docs/en/db2-big-sql/7.1?topic=sql-database-authorization
9. https://topperworld.in/database-security-and-authorization/
10. https://www.ibm.com/think/topics/authentication-vs-authorization
11. https://www.sailpoint.com/identity-library/difference-between-authentication-and-authorization
12. https://oercommons.org/authoring/21950-database-security/4/view
13. https://www.fortinet.com/resources/cyberglossary/authentication-vs-authorization
14. https://satoricyber.com/access-control/access-control-101-a-comprehensive-guide-to-database-acces
s-control/
15. https://www.optiq.ai/blog-post/what-is-access-control-4-types-of-access-control-models
16. https://portswigger.net/web-security/access-control/security-models
17. https://www.ijert.org/research/database-security-access-control-models-a-brief-overview-IJERTV2IS5
0406.pdf
18. https://butterflymx.com/blog/access-control-models/
19. https://delinea.com/blog/access-control-models-methods
20. https://www.goodaccess.com/blog/access-control-models-explained
21. https://www.twingate.com/blog/other/access-control-models
22. https://en.wikipedia.org/wiki/Discretionary_access_control
23. https://nordlayer.com/learn/access-control/discretionary-access-control/
24. https://www.sciencedirect.com/topics/computer-science/discretionary-access-control
25. https://www.tutorialspoint.com/difference-between-mac-and-dac
26. https://builtin.com/articles/discretionary-access-control
27. https://www.stigviewer.com/stig/oracle_database_12c/2021-04-06/finding/V-237715
28. https://www.syteca.com/en/blog/mac-vs-dac
29. https://www.slideshare.net/slideshow/discretionary-access-controldatabasepptx/262768086
30. https://nordlayer.com/learn/access-control/mandatory-access-control/
31. https://www.syteca.com/en/blog/mac-vs-dac
32. https://en.wikipedia.org/wiki/Mandatory_access_control
8
33. https://www.permit.io/blog/mac-vs-dac-comparing-access-control-fundamentals
t1
34. https://www.tutorialspoint.com/what-is-mandatory-access-control-in-information-security
at
35. https://www.tutorialspoint.com/difference-between-mac-and-dac
bh
36. https://www.pingidentity.com/en/resources/blog/post/access-control.html
37. https://www.techtarget.com/searchsecurity/definition/role-based-access-control-RBAC
dd
38. https://frontegg.com/guides/rbac
si
39. https://www.imperva.com/learn/data-security/role-based-access-control-rbac/
40. https://www.strongdm.com/rbac
41. https://auth0.com/docs/manage-users/access-control/rbac
42. https://www.solarwinds.com/resources/it-glossary/role-based-access-control
43. https://www.fortinet.com/resources/cyberglossary/role-based-access-control
44. https://www.caldersecurity.co.uk/role-based-access-control-rbac/
45. https://nordlayer.com/learn/access-control/role-based-access-control-implementation/
46. https://satoricyber.com/database-security/database-activity-monitoring-uses-features-and-how-to-cho
ose/
47. https://datacipher.com/top-database-activity-monitoring-solutions/
48. https://intellipaat.com/blog/intrusion-detection-system/
49. https://www.tookitaki.com/glossary/intrusion-detection-system-ids
50. https://www.stamus-networks.com/intrusion-detection-system-in-cyber-security
51. https://friendlycaptcha.com/wiki/what-is-network-intrusion-detection-system-nids/
52. https://owasp.org/www-community/attacks/SQL_Injection
53. https://www.techtarget.com/searchsoftwarequality/definition/SQL-injection
54. https://www.acunetix.com/websitesecurity/sql-injection/
55. https://www.indusface.com/blog/how-to-stop-sql-injection/
56. https://www.cloudflare.com/learning/security/threats/how-to-prevent-sql-injection/
57. https://www.esecurityplanet.com/threats/how-to-prevent-sql-injection-attacks/
58. https://www.strongdm.com/blog/how-to-prevent-sql-injection-attacks
59. https://www.legitsecurity.com/aspm-knowledge-base/how-to-prevent-sql-injection
8
t1
at
bh
dd
si
Object Oriented Databases (OODB) in Database Management Systems
Definition
An object-oriented database (OODB) is a database that stores data in the form of objects,
similar to how data is represented in object-oriented programming (OOP) languages like
Java, C++, or Python [1] [2] [3] .
OODBs combine object-oriented programming concepts (objects, classes, inheritance,
encapsulation, polymorphism) with database management features such as persistence,
concurrency, and transactions [1] [2] [4] .
Key Concepts
Objects
8
The fundamental unit in OODBs, representing real-world entities.
t1
Each object contains both data (attributes/properties) and behavior (methods/functions) [1]
[2] [3] .
at
Classes
bh
Define the structure (attributes) and behavior (methods) shared by all instances (objects) of
the class [1] [2] .
si
Object Identity
Each object has a unique identifier (OID) that distinguishes it from other objects, regardless
of its attribute values [5] [6] .
Encapsulation
Data and methods are bundled together, and internal details are hidden from the outside
world [1] [6] .
Inheritance
Objects can inherit properties and methods from other objects (parent classes), promoting
code reuse and hierarchy [1] [6] .
Polymorphism
The ability to use a unified interface for different underlying data types [1] .
Persistence
Objects can outlive the application process, being stored and retrieved from the database
as needed [2] [6] .
8
major disruptions [6] . t1
at
Advantages
bh
Natural Mapping: Closer alignment between database objects and real-world entities or
application objects, reducing the "impedance mismatch" seen in relational databases [1] [2]
dd
[6] [7] .
Efficient Handling of Complex Data: Well-suited for multimedia, CAD/CAM, GIS, and other
si
Disadvantages
Complexity: More complex to design, implement, and manage, especially for simple data
needs [2] [6] [7] [8] .
Lack of Standards: No universal data model or standard query language, leading to
compatibility and portability issues [2] [6] [9] .
Limited Adoption: Not as widely used as relational databases; less community and
commercial support [2] [6] .
Security: Often lacks robust, standardized security mechanisms and fine-grained access
controls [2] [6] .
No Support for Views: Typically does not support database views like relational systems [2]
[6] .
Learning Curve: Steeper learning curve for users familiar with traditional relational
databases [7] .
Use Cases
Multimedia Applications: Efficiently stores and retrieves images, audio, and video as
objects [2] .
CAD/CAM Systems: Manages complex engineering data and relationships [2] .
Geographic Information Systems (GIS): Handles spatial and topographical data [2] .
Telecommunications: Manages hierarchical and interconnected network data [2] .
Real-Time Systems: Used in robotics, automation, and embedded systems for fast,
8
complex data access [2] . t1
at
Examples of Object Oriented Databases
bh
ObjectDB
Db4o
dd
ObjectStore
si
Versant
GemStone/S
WakandaDB
MongoDB (offers some object-oriented features) [6]
Schema Flexibility High (supports schema evolution) Rigid (altering schema is complex)
Summary
Object-oriented databases provide a powerful way to model and manage complex data,
especially when working with object-oriented programming languages. They offer advantages in
terms of expressiveness, integration, and handling of complex relationships, but come with
increased complexity, lack of standards, and limited adoption compared to relational
databases [1] [2] [6] .
⁂
8
oriented databases. It supports traditional relational features (tables, rows, columns) while also
t1
allowing objects, classes, inheritance, and complex data types to be directly represented in the
schema and query language [10] [11] [12] .
at
bh
Key Features
dd
Allows creation of custom data types that can encapsulate both data and associated
behaviors (methods), similar to classes in object-oriented programming [13] [14] .
2. Type System and Table Inheritance
Supports inheritance in database schemas, allowing tables or types to inherit properties and
methods from parent tables or types [11] [14] .
3. Complex Data Types
Facilitates storage of arrays, structs, and other complex or nested data types directly in
tables [11] [13] .
4. Object Identity
Uses unique object identifiers (OIDs) to distinguish and reference objects, supporting object
identity beyond simple primary keys [13] .
5. Encapsulation
Operations (methods) can be encapsulated within UDTs, allowing data and behavior to be
bundled together [13] .
6. Enhanced SQL
SQL is extended to support object-oriented features, including querying and manipulating
complex objects and types [13] [14] .
7. ACID Transactions
Maintains full support for atomicity, consistency, isolation, and durability, ensuring data
integrity and reliability [11] .
Architecture
ORD architecture builds upon traditional relational database architecture with the following
enhancements [11] [14] :
Type System: Enables user-defined types and inheritance.
Table Inheritance: Tables can inherit structure and behavior from other tables.
Methods: Functions or procedures can be defined on data types and stored in the
database.
Complex Data Handling: Supports direct storage and querying of complex objects.
8
t1
Advantages
at
Enhanced Modeling Capabilities: Closer alignment with application object models, allowing
bh
Flexibility and Scalability: Supports complex applications and data types without
sacrificing the performance and robustness of relational databases [11] [12] .
Backward Compatibility: Maintains compatibility with existing relational database features
and SQL [14] .
Extensibility: Easily accommodates new data types and operations as application
requirements evolve [11] [12] .
Disadvantages
Complexity: Schema design and management can become complicated, especially for
simple applications [12] .
Learning Curve: Requires understanding of both relational and object-oriented
concepts [12] .
Lack of Universal Standards: Not all commercial DBMSs implement object-relational
features consistently, leading to portability issues [13] [12] .
Performance Considerations: Some object-oriented features may introduce overhead,
potentially affecting performance in certain scenarios [15] .
Use Cases
Applications with Complex Data: Multimedia, scientific, engineering, and GIS applications
benefit from ORD’s ability to handle complex and hierarchical data structures [11] [16] .
Enterprise Systems: Where integration between object-oriented application code and
relational data storage is needed.
Legacy System Modernization: When transitioning from pure relational to more object-
oriented paradigms without sacrificing existing investments.
8
t1
Comparison: Object-Relational vs. Relational vs. Object-Oriented Databases
at
Feature Relational DBMS Object-Relational DBMS Object-Oriented DBMS
bh
Summary
Object-relational databases bridge the gap between the relational and object-oriented models,
offering the robustness and familiarity of relational databases along with the modeling power and
flexibility of object-oriented systems. They are particularly useful for applications needing
complex data representation and close integration with object-oriented programming languages,
but can introduce additional complexity and require careful schema design [11] [12] .
⁂
Logical Databases in Database Management Systems
Definition
A logical database is an abstract representation of how data is organized and related within a
database system, focusing on the structure, relationships, and business rules rather than the
physical storage details [17] [18] [19] . Logical databases are especially prominent in data modeling
and in specific platforms like SAP ABAP, where they provide a read-only, hierarchical view of
data for application programs [20] [21] .
8
Business Alignment: They help translate business processes and requirements into
t1
implementable database designs [17] [18] .
at
Key Characteristics
bh
Entities: Represent real-world objects or concepts (e.g., Customer, Order, Product) [17] [22] .
dd
Relationships: Establish connections between entities (e.g., Customer places Order) [17] [19] .
Business Rules: Govern how data is managed and ensure data integrity (e.g., constraints,
validation rules) [17] [19] .
Normalization: Logical models focus on data normalization to reduce redundancy and
improve integrity [19] .
Technology-Agnostic: Logical databases are independent of specific DBMS technologies,
making them transferable across platforms [19] [23] .
Entity A distinct object or concept in the business domain Customer, Product, Order
Foreign Key Attribute that links entities and enforces referential integrity Customer ID in Order entity
Business Constraint or logic that governs data integrity and "Order Date cannot be in
Rule relationships future"
8
Logical vs. Physical Data Models
t1
at
Aspect Logical Data Model Physical Data Model
bh
Advantages
Improved Data Understanding: Helps stakeholders and developers understand data
requirements and business processes [17] [18] [19] .
Flexibility: Supports changes and evolution in business requirements without major
redesign [17] [19] .
Data Integrity: Promotes normalization and clear business rules, reducing redundancy and
errors [19] .
Platform Independence: Can be adapted to different DBMS technologies [19] [23] .
Disadvantages
No Direct Implementation: Logical databases are not directly implemented; they require
translation into physical models for deployment [19] [23] .
Complexity for Simple Applications: May be excessive for small or simple databases [19] .
Summary
Logical databases provide a structured, abstract view of data, focusing on entities, attributes,
relationships, and business rules. They are crucial in the early stages of database design,
ensuring that the database aligns with business needs and remains adaptable, consistent, and
technology-agnostic. In platforms like SAP, logical databases also offer reusable, hierarchical
data access for application programs [17] [20] [21] [19] .
⁂
8
A web database is a system for storing, managing, and displaying information that is accessible
t1
via the Internet or web. It serves as the backbone for many web applications, enabling users to
at
interact with dynamic data from anywhere using a web browser [24] [25] [26] .
bh
Key Features
dd
Accessibility: Data can be accessed and managed remotely through web browsers,
supporting multi-location and multi-device use [24] [26] .
si
Common Uses
E-commerce: Product catalogs, order management, customer accounts [27] .
Membership/Client Databases: Storing user profiles, login credentials, and activity logs [24]
[26] .
Inventory Management: Tracking stock levels, suppliers, and transactions [24] [26] .
8
t1
Content Management Systems (CMS): Blogs, news sites, and forums where content is
frequently updated [27] .
at
Data Analytics: Collecting and analyzing user or business data for reporting and decision-
bh
making [26] .
dd
Advantages
si
Remote Access: Users can access and manage data from any location with internet
connectivity [24] [26] .
Real-Time Updates: Data changes are immediately reflected across all users and devices.
Collaboration: Supports multiple users working together on shared data.
Centralized Management: Easier to maintain, backup, and secure data from a central
location.
Disadvantages
Security Risks: Exposed to internet threats such as hacking, data breaches, and
unauthorized access. Requires robust security measures like authentication, encryption, and
firewalls [30] [31] .
Performance Issues: Dependent on network speed and server capacity; may experience
latency with high traffic or large datasets.
Complexity: Requires knowledge of web development, database management, and
networking for setup and maintenance.
Best Practices
Use Secure Authentication and Authorization: Ensure only authorized users can access or
modify data [30] [31] .
Encrypt Sensitive Data: Protect data in transit and at rest.
Regular Backups: Prevent data loss due to failures or attacks.
Optimize Queries and Indexes: Improve performance and reduce server load.
Scalability Planning: Design for growth in data volume and user base.
Monitoring and Logging: Track usage, errors, and security incidents for ongoing
improvement [30] [31] .
8
Microsoft SQL Server t1
Oracle Database
at
These databases are commonly used as the backend for web applications and can be managed
through web-based interfaces or APIs [31] .
bh
dd
Summary
si
Web databases are essential for modern web applications, providing a centralized, accessible,
and dynamic way to store and manage data online. Their architecture typically involves clients,
application servers, and database servers working together to deliver interactive, data-driven
experiences to users across the globe [24] [26] [28] [30] [29] [31] .
⁂
Architecture Types
8
t1
Client-Server Architecture: Clients interact with a central server that manages data
at
storage and access [39] .
Peer-to-Peer Architecture: All nodes act as both clients and servers, managing their own
bh
Design Considerations
1. Data Partitioning
Horizontal Partitioning: Divides tables into rows, distributing subsets to different nodes.
Vertical Partitioning: Divides tables into columns, distributing attributes to different
nodes [39] .
2. Replication
Full Replication: Every node stores a complete copy of the database, increasing availability
but also storage and update costs.
Partial Replication: Only selected data is replicated based on access patterns.
Multi-master Replication: Multiple nodes can accept updates, improving performance and
fault tolerance [39] .
3. Consistency and Concurrency Control
Mechanisms like locking, timestamp ordering, or optimistic concurrency control ensure data
consistency during simultaneous transactions [39] .
4. Network Communication & Latency
Efficient protocols and low-latency networks are vital to minimize delays and maximize
performance [39] .
5. Security and Privacy
Security challenges increase with distribution; strong authentication, encryption, and access
controls are essential [39] .
Advantages
Modular Development: Easy to expand by adding new nodes without disrupting the
system [33] [38] .
Reliability: System continues to function even if some nodes fail, reducing risk of total
failure [33] [38] .
8
Lower Communication Costs: Local data storage reduces the need for long-distance data
t1
transfer [33] .
at
Better Response Time: Localized data access can lead to faster query responses [33] .
Scalability: Supports growth in data volume and user base by adding more sites [33] [37] [38] .
bh
disasters [38] .
si
Disadvantages
Complexity: Design, implementation, and maintenance are more complex than centralized
systems [38] .
Cost: Requires expensive software and skilled personnel for synchronization, data
consistency, and management [33] [38] .
Data Integrity: Ensuring data consistency and integrity across multiple sites is challenging,
especially with replication [33] [38] .
Security: More vulnerable due to multiple access points and the need to secure all nodes
and communication channels [38] [39] .
Overhead: Synchronization, coordination, and replication introduce significant processing
and network overhead [33] [38] .
Improper Data Distribution: Poorly planned data placement can reduce system
responsiveness and efficiency [33] .
Applications
Global Enterprises: Organizations with offices in multiple locations needing shared,
consistent data.
Cloud Services: Modern cloud databases are inherently distributed to provide scalability
and high availability.
E-commerce: Handling large-scale, geographically distributed transactions and user data.
Telecommunications: Managing customer and network data across regions.
Summary Table
Feature Distributed Database
8
Data Consistency
t1
Challenging; needs robust control mechanisms
at
Security Complex; must secure all nodes and network
bh
Conclusion
dd
Distributed databases provide a robust solution for organizations requiring high availability,
si
scalability, and reliability by distributing data across multiple sites. However, they introduce
complexity in design, management, and security, requiring careful planning and advanced
technologies to ensure data consistency, integrity, and performance [33] [38] [39] .
⁂
8
Central Data Warehouse Database: The core storage area, typically a relational or cloud-
t1
based database optimized for analytical queries [43] [45] .
at
Metadata: Data about the data-describes source, structure, and usage, aiding management
and discovery [43] [45] .
bh
Access Tools: BI tools, dashboards, reporting, OLAP, and data mining tools that allow users
to analyze and visualize data [43] [45] .
dd
Data Governance & Security: Policies and tools for data quality, access control, lineage, and
si
Architecture Types
Architecture
Description
Type
Single-Tier Minimizes data storage by deduplication; rarely used due to scalability limits [45] .
Two-Tier Separates data sources from the warehouse; limited scalability [45] .
Most common; includes data warehouse database (bottom), OLAP server (middle), and client
Three-Tier
tools (top) [45] [44] .
Data Warehousing Process
1. Data Extraction: Collect data from various internal and external sources.
2. Data Transformation: Cleanse, format, and integrate data to ensure consistency and
quality.
3. Data Loading: Store the processed data in the data warehouse.
4. Data Access: Users query, analyze, and visualize data using BI and analytics tools [41] [43]
[44] .
Use Cases
Business Intelligence & Reporting: Enables organizations to generate reports and
dashboards for strategic decision-making [40] [42] [43] .
Trend Analysis: Provides historical data for identifying patterns and forecasting.
Regulatory Compliance: Centralizes and preserves data for audits and legal
requirements [43] .
Data Mining: Supports advanced analytics and machine learning by providing clean,
integrated datasets.
8
t1
Advantages
at
Centralized Data Repository: Provides a single source of truth for the organization [40] [43] .
bh
Improved Decision Making: Facilitates data-driven strategies and faster, more accurate
dd
Data Quality and Consistency: ETL processes ensure high data integrity [41] [43] .
Scalability: Modern cloud-based warehouses can scale storage and compute
independently [45] [44] .
Disadvantages
Complexity and Cost: Implementation and maintenance require significant investment and
expertise.
Latency: Data is not always real-time; there can be delays between data generation and
availability for analysis.
ETL Overhead: Extracting, transforming, and loading large volumes of data can be
resource-intensive [45] .
Modern Trends
Cloud Data Warehousing: Shift from on-premises to cloud platforms for better scalability,
flexibility, and lower upfront costs [43] [45] [44] .
Support for Unstructured Data: Modern warehouses handle not only structured but also
semi-structured and unstructured data (e.g., logs, images) [43] .
Integrated Analytics: In-memory processing and real-time analytics capabilities are
increasingly common [43] [45] .
Summary Table
Feature Description
Architecture
8
Typically three-tier (database, OLAP, client tools)
t1
Modern Trends Cloud-based, support for unstructured data, real-time analytics
at
bh
Conclusion
dd
Data warehousing is foundational for organizations seeking to harness their data for business
intelligence and analytics. It centralizes, integrates, and preserves data from diverse sources,
si
empowering users to make informed, data-driven decisions and uncover valuable insights [40]
[41] [43] .
8
t1
Techniques and Methods
at
Classification: Assign data to predefined categories (e.g., spam detection, credit risk).
bh
Clustering: Group similar data points together without predefined categories (e.g.,
dd
customer segmentation).
Association Rule Mining: Discover relationships between variables (e.g., market basket
si
analysis).
Regression: Predict numeric values based on other variables (e.g., sales forecasting).
Anomaly Detection: Identify outliers or unusual data patterns (e.g., fraud detection).
Sequence Analysis: Find patterns in sequential data (e.g., web clickstreams).
Data is usually extracted from the warehouse, transformed as needed, and then analyzed
using data mining tools [49] .
Applications
Business Intelligence: Marketing analysis, customer segmentation, sales forecasting.
Fraud Detection: Identifying suspicious transactions in banking and finance.
Healthcare: Disease prediction, patient segmentation, and treatment effectiveness analysis.
Manufacturing: Quality control, predictive maintenance, and supply chain optimization.
Social Media & Web: Sentiment analysis, recommendation systems, and user behavior
analysis [46] [48] .
Advantages
Improved Decision Making: Provides actionable insights for strategic planning.
Competitive Advantage: Helps organizations identify new opportunities and optimize
operations.
Automation: Enables automated detection of patterns and anomalies, saving time and
resources.
8
Challenges and Limitations t1
Data Quality: Results depend on the accuracy and completeness of the input data.
at
Privacy Concerns: Mining personal data can raise ethical and legal issues [46] .
bh
Summary Table
Aspect Description
1. https://study.com/academy/lesson/what-is-an-object-oriented-database.html
2. https://phoenixnap.com/kb/object-oriented-database
3. https://www.scribd.com/document/578693921/OBJECT-oriented-databases
4. https://en.wikipedia.org/wiki/Object_database
5. https://celerdata.com/glossary/object-oriented-dbms
6. https://hackernoon.com/object-oriented-databases-and-their-advantages
7. https://daily.dev/blog/object-oriented-vs-nosql-databases-key-differences
8. https://www.ionos.com/digitalguide/hosting/technical-matters/object-oriented-databases/
9. https://ecomputernotes.com/database-system/adv-database/object-oriented-database-oodb
10. https://en.wikipedia.org/wiki/Object–relational_database
8
t1
11. https://www.ituonline.com/tech-definitions/what-is-an-object-relational-database-ord/
12. https://byjus.com/gate/object-relational-data-model-in-dbms-notes/
at
13. https://www.tutorialspoint.com/object-relational-features-object-database-extensions-to-sql
bh
14. https://docs.oracle.com/en/database/oracle/oracle-database/19/adobj/key-features-object-relational-m
odel.html
dd
15. https://www.theserverside.com/definition/object-relational-mapping-ORM
16. https://www.sciencedirect.com/topics/computer-science/object-relational-database
si
17. https://risingwave.com/blog/mastering-logical-database-models-a-comprehensive-guide/
18. https://www.tibco.com/glossary/what-is-a-logical-data-model
19. https://www.datamation.com/big-data/logical-vs-physical-data-model/
20. https://www.studocu.com/in/document/dr-apj-abdul-kalam-technical-university/btech/logical-database
-notes-for-dbms/51617884
21. https://help.sap.com/doc/saphelp_nw73ehp1/7.31.19/en-US/9f/db9b5e35c111d1829f0000e829fbfe/cont
ent.htm
22. https://www.gooddata.com/blog/physical-vs-logical-data-model/
23. https://hevodata.com/learn/conceptual-vs-logical-vs-physical-data-model/
24. https://mrwebsites.ca/solutions/web_databases.html
25. https://nexalab.io/blog/what-is-web-database/
26. https://theintactone.com/2022/02/27/web-databases/
27. https://www.w3schools.in/dbms/web-based-database-management-system
28. https://www.ibm.com/docs/sl/SSEPEK_12.0.0/intro/src/tpc/db2z_componentsofwebapplications.html
29. https://www.spaceotechnologies.com/blog/web-application-architecture/
30. https://enterprisemonkey.com.au/web-application-architecture/
31. https://www.clarity-ventures.com/how-to-guides/web-application-architecture
32. https://www.mongodb.com/en-us/resources/basics/databases/distributed-database
33. https://phoenixnap.com/kb/distributed-database
34. https://www.cockroachlabs.com/blog/what-is-a-distributed-database/
35. https://www.scylladb.com/glossary/distributed-database/
36. https://www.techtarget.com/searchoracle/definition/distributed-database
37. https://www.tutorchase.com/answers/a-level/computer-science/what-are-the-essential-characteristics-
of-a-distributed-database
38. https://www.tutorialspoint.com/DDBMS-Advantages-and-Disadvantages
39. https://www.tutorialspoint.com/distributed-database-architecture
40. https://www.oracle.com/in/database/what-is-a-data-warehouse/
41. https://www.trantorinc.com/blog/understanding-data-warehousing
42. https://aws.amazon.com/what-is/data-warehouse/
43. https://www.sap.com/products/data-cloud/datasphere/what-is-a-data-warehouse.html
44. https://www.datacamp.com/blog/data-warehouse-architecture
45. https://www.snowflake.com/guides/data-warehouse-architecture/
46. https://www.investopedia.com/terms/d/datamining.asp
8
t1
47. https://www.spiceworks.com/tech/big-data/articles/what-is-data-mining/
at
48. https://www.techtarget.com/searchbusinessanalytics/definition/data-mining
bh
49. https://ebooks.inflibnet.ac.in/csp4/chapter/data-mining-introduction/
50. https://www.sap.com/hk/products/technology-platform/hana/what-is-data-mining.html
dd
si