ACID stands for Atomicity, Serializability is the highest level of
Consistency, Isolation, and Durability. isolation in a database system. It
These properties ensure reliable ensures that the concurrent execution
processing of database transactions. of transactions is equivalent to some
1.Atomicity:- Ensures that a serial (one-by-one) execution of those
transaction is treated as a single, same transactions. Types of Serializability:
indivisible unit. Either all operations 1. Conflict Serializability:- Based on
within the transaction are completed, conflicting operations: Read-Write,
or none are. If a failure occurs, Write-Read, Write-Write on the same
changes are rolled back to maintain data item. A schedule is conflict-
database integrity. serializable if it can be transformed
2.Consistency:- Ensures that a into a serial schedule by swapping
transaction transforms the database non-conflicting operations.
from one valid state to another valid 2. View Serializability :- Based on
state. All integrity constraints (like views of data (what each transaction
primary key, foreign key) must be sees). More general than conflict
satisfied after a transaction. serializability. Harder to test
3.Isolation:- Ensures that the algorithmically.
concurrent execution of transactions
leaves the database in the same state ARIES is a recovery algorithm used in
as if the transactions were executed many modern DBMS (e.g., IBM DB2). It
one after another (serially). Prevents is used to recover from system crashes
conflicts like dirty reads, non- and ensure transaction durability.
repeatable reads, and phantom reads. ARIES Recovery Phases:
4.Durability:- Once a transaction is 1. Analysis Phase:- Scans log to
committed, its changes are identify active transactions and
permanently saved in the database. dirty pages at the time of crash.
Even if a system crash occurs, the 2. Redo Phase :- Reapplies all
changes must persist using backup, actions starting from the last
logs, etc. checkpoint. Ensures all changes
Importance:- ACID properties are are reflected.
essential for ensuring data integrity, 3. Undo Phase :- Rolls back
accuracy, and reliability in multi-user uncommitted transactions using
and crash-prone environments. log records in reverse order.
What is Concurrency Control? 4. Phantom Read :- A transaction
Concurrency Control in a database re-executes a query and gets a
management system (DBMS) ensures different set of rows, because another
that multiple transactions can execute transaction inserted or deleted rows.
simultaneously without conflicting Example:
with each other or violating data T1: SELECT * FROM Orders WHERE
integrity. amount > 1000;
Concurrency Control Problem: T2: INSERT INTO Orders VALUES
1. Lost Update Problem :- Occurs (1050);
when two transactions read the same T1: SELECT * FROM Orders WHERE
data and then update it, but the first amount > 1000; [New row appears]
update is overwritten by the second.
Example: 1. Two-Phase Locking Protocol (2PL)
T1: Read(A); A = A + 500; Write(A); Divides transaction execution into two
T2: Read(A); A = A + 1000; Write(A); phases:
Both read the same value, and T1’s Growing phase: Locks are acquired,
update is lost when T2 writes. none are released.
2. Dirty Read (Uncommitted Shrinking phase: Locks are released,
Dependency Problem) :- A transaction none are acquired. Ensures conflict
reads data from another transaction serializability.
that has not yet committed. If the 2. Timestamp Ordering Protocol (Not
other transaction is rolled back, the a locking protocol, but sometimes
compared):- Uses timestamps to order
first one has read invalid data.
transactions. Avoids locks, but can
Example:
cause more transaction rollbacks.
T1: Update(A); [Not yet committed] 3. Multiple Granularity Locking
T2: Read(A); [Reads dirty value] Protocol:- Locks can be applied at
T1: Rollback; various levels: database, table, page,
3. Non-Repeatable Read :- A row. Uses intention locks (IS, IX, SIX)
transaction reads the same data item to manage hierarchy. Allows efficient
twice, but gets different values locking and unlocking at appropriate
because another transaction modified levels.
the data in between. Example: 4. Deadlock Handling in Locking
Protocols:- Common issue in locking.
T1: Read(A);
Handled using: Wait-Die or Wound-
T2: Update(A); Commit; Wait schemes. Deadlock detection and
T1: Read(A); [Different value] recovery .Timeouts
What is Normalization? another non-prime attribute)
Normalization is the process of Example (Before 3NF):
organizing data in a database to: Employee(EmpID, DeptID, DeptName)
Minimize data redundancy. → DeptName depends on DeptID, not
Eliminate insertion, deletion, and directly on EmpID
update anomalies. Improve data 🛠 After 3NF:
integrity. Normalization is done using Employee(EmpID, DeptID)
normal forms, which are sets of rules Department(DeptID, DeptName)
or standards. 4. Boyce-Codd Normal Form (BCNF)
1. First Normal Form (1NF) Rule: Stronger version of 3NF.
Rule:- All attributes must be For every functional dependency X →
atomic (indivisible). Y, X must be a super key.
No repeating groups or arrays. Used when:- A relation is in 3NF but
Example (Before 1NF): still has anomalies due to overlapping
Student(Name, Roll, Courses) candidate keys.
Courses = {DBMS, OS} 5. Fourth Normal Form (4NF)
🛠 After 1NF: Rule: Must be in BCNF
Student(Name, Roll, Course) No multi-valued dependencies
Each course becomes a separate row. Example:
2. Second Normal Form (2NF) Teacher →→ Subject
Rule:- Must be in 1NF Teacher →→ Class
No partial dependency (non-prime One teacher teaches multiple subjects
attributes should depend on the whole and multiple classes independently —
primary key) Example (Before 2NF): violates 4NF.
Enroll(StudentID, CourseID, Schema refinement is the process of
StudentName) improving the structure of database
→ Partial dependency: StudentName schemas to remove data anomalies
depends only on StudentID such as:
🛠 After 2NF: • Redundancy
Student(StudentID, StudentName) • Insertion anomalies
Enroll(StudentID, CourseID) • Deletion anomalies
3. Third Normal Form (3NF) • Update anomalies
Rule: - Must be in 2NF It is typically done after an initial
No transitive dependency (non-prime design using functional dependencies
attribute should not depend on (FDs) and normalization techniques.
Data independence refers to the Data redundancy refers to the
ability to change the schema at one unnecessary repetition or duplication
level of a database system without of the same data in multiple places
affecting the schema at the next within a database or information
higher level. system. Consider a table:
It is a major advantage of DBMS over EmpID Name Dept DeptLocation
file systems and is made possible by 101 Ravi IT Mumbai
the three-level architecture of a
102 Neha IT Mumbai
database.
Types of Data Independence: 103 Aman IT Mumbai
1. Logical Data Independence :- Here, the Dept and DeptLocation
Ability to change the logical schema fields are repeated for each employee
(like tables, attributes, relationships) in the IT department.
without changing the external schema This is data redundancy.
or application programs. Problems Caused by Redundancy:
Examples of changes allowed: 1. Wasted Storage Space
• Adding new fields (columns) o Same data is stored
• Merging tables multiple times.
• Creating new relationships 2. Update Anomalies
2. Physical Data Independence:- o If you change "Mumbai"
Ability to change the physical storage to "Delhi" for one IT
structure (like file organization, record and forget others,
indexing) without affecting the logical data becomes
schema. inconsistent.
Examples of changes allowed: 3. Insertion Anomalies
Changing from heap file to indexed file. o You cannot add a new
Moving data to a new storage device. department without
Data independence is crucial for adding at least one
building flexible, robust, and scalable employee.
database systems. It allows modifying 4. Deletion Anomalies
the database structure without o Deleting the last
affecting users or applications, making employee in IT might
it a core principle in DBMS delete department info
architecture. too.
Relational Algebra is a procedural Relational Calculus is a non-
query language that uses a set of procedural query language that
operations to retrieve data from specifies what to retrieve, not how to
relations (tables). retrieve it.
It tells the system how to obtain the There are two types:
result. (A) Tuple Relational Calculus (TRC):
Basic Operations of Relational • Uses tuple variables to
Algebra: represent rows.
Operation Symbol Description • Format:
Selects rows { t | P(t) }
Selection σ satisfying a where t is a tuple and P(t) is a
condition predicate (condition).
Selects specific Example:
Projection π columns Get names of students from CS
(attributes) department:
{ t.Name | Student(t) ∧ t.Dept = 'CS' }
Combines tuples
(B) Domain Relational Calculus
Union ∪ from two
(DRC):
relations
• Uses domain variables (values
Returns tuples in of attributes).
Set
− one relation but • Format:
Difference
not the other { <x1, x2, ..., xn> | P(x1, x2, ..., xn) }
Cartesian Combines all Example:
×
Product pairs of tuples Get names of CS students:
Renames a { <Name> | ∃SID ∃Dept (Student(SID,
Rename ρ relation or its Name, Dept) ∧ Dept = 'CS') }
attributes Relational Calculus expresses logic-
Example Queries: based conditions to filter data.
Let Student(SID, Name, Dept) and Both are foundation models for SQL
Enrolled(SID, CourseID). and play a critical role in query
Find all students in 'CS' processing and database theory.
department:
σ Dept = 'CS' (Student)
Get names of all students:
π Name (Student)
List of students enrolled in some
course: π SID (Enrolled)
Aggregation is a concept in the ER • Engineer and Clerk are
model where a relationship itself is specialized forms of Employee.
treated as a higher-level entity. • Each subclass can have
It is used when we need to express a additional attributes (e.g.,
relationship between a relationship Engineer → SkillSet, Clerk →
and an entity. Example: TypingSpeed).
Consider: Types of Specialization:
• Employee works on a Project Type Description
• A Manager monitors the
An entity instance can
Works_On relationship Disjoint belong to only one
Here, we need to represent a subclass
relationship between:
An entity instance can
• Manager and the relationship
Overlapping belong to multiple
Works_On
subclasses
We use aggregation to group
Works_On as an abstract entity: Every entity must
Total
Employee ---- Works_On ---- Project belong to a subclass
| Some entities may
[Aggregated] Partial not belong to any
| subclass
monitored_by
| Generalization is the reverse of
Manager specialization — a bottom-up
The Works_On relationship approach where common features of
becomes a composite entity multiple entities are combined into a
monitored by Manager. single generalized entity.
Example:
Specialization is the process of Engineer Clerk
creating subclasses (sub-entities) from \ /
a higher-level entity based on some Employee
distinguishing characteristics. • Here, both Engineer and Clerk
It follows a top-down approach. share common attributes like
Example: Name, EmpID, Salary
Employee • These common attributes are
/ \ abstracted into a generalized
Engineer Clerk entity: Employee.
The levels of abstraction in DBMS refer • Describes how users interact
to how data is viewed and managed at with data
different layers. • Each user may have a different
It hides the complexity of data storage view depending on their role or
from users and provides data need
independence. • Provides:
The three levels are part of the Three- o Security (by restricting
Schema Architecture defined by the access)
ANSI/SPARC DBMS model. o Simplification of complex
1. Physical Level (Internal Schema) data
• Lowest level of abstraction Example:
• Describes how data is actually A teacher’s view may only show Name
stored in memory (files, indexes, and Dept of students, not Fees.
blocks, etc.) Diagram: Three-Level Architecture
• Concerned with: External Level (User Views)
o File formats ↑
o Data compression Logical Level (Conceptual
o Storage paths Schema)
Example: ↑
Storing a table as B+ tree or heap file Physical Level (Storage)
on a disk.
2. Logical Level (Conceptual Schema) Importance of Abstraction Levels:
• Middle level of abstraction Benefit Description
• Describes what data is stored
Data Changes at one level
and relationships among data
Independence don’t affect others
• Independent of how data is
Controls what data
stored physically Security
users can see
• Defines:
o Tables, attributes Hides complex
o Primary & foreign keys Simplicity implementation
o Constraints details
Example: Optimizes storage
A table: Student(SID, Name, Dept) Efficiency and query
3. View Level (External Schema) performance
• Highest level of abstraction