0% found this document useful (0 votes)

52 views43 pages

Database Topics Explained With Examples

This document provides a comprehensive guide to Database Management Systems (DBMS), detailing the evolution from file systems to DBMS, their architecture, and the languages used for database interaction. It highlights the limitations of traditional file systems and the advantages of DBMS, including data integrity, concurrency control, and security. Additionally, it describes the internal components of a DBMS, the levels of data abstraction, and the various SQL sub-languages for managing databases.

Uploaded by

abhisheksharma91172

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views43 pages

Database Topics Explained With Examples

Uploaded by

abhisheksharma91172

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

A Comprehensive Guide to Database

Management Systems

Part I: Foundational Concepts of Database Systems

This first part establishes the fundamental principles of database systems. It begins by
exploring the crucial transition from simple file-based storage to sophisticated Database
Management Systems (DBMS), highlighting the problems that drove this evolution. It then
dissects the internal architecture of a DBMS, explaining its core components, the levels of
abstraction that make it powerful, and the essential distinction between a database's
structure (schema) and its content (instance). The section concludes by examining the
languages used to interact with databases and the vital human roles responsible for their
management and use.

Section 1.1: The Evolution from File Systems to Database Systems

To understand the value of a modern database system, one must first appreciate the
limitations of its predecessor: the traditional file system. For decades, applications stored
their data in individual files, managed directly by the operating system. While simple, this
approach created significant challenges as applications grew in complexity and data became
more valuable. The emergence of the Database Management System (DBMS) was not merely
an improvement but a paradigm shift in how we manage and interact with information.

A file system is a method an operating system uses to control how data is stored and
retrieved. It organizes data in a hierarchy of files and directories (folders) on a storage device
like a hard disk.1 Think of it as a traditional office filing cabinet. Each drawer is a directory, and
each manila folder is a file. To find a specific piece of information—say, a customer's phone
number—you need to know which drawer and which folder it's in. You pull out the folder,
search through the papers manually, and hope the information is correct.

A Database Management System (DBMS), in contrast, is a sophisticated software

application that provides a centralized and controlled environment for managing large
collections of structured data.1 Continuing the analogy, a DBMS is not the filing cabinet itself
but an expert librarian managing an entire library. The librarian (the DBMS) doesn't just store
books (data); it creates a comprehensive catalog (indexes), enforces a checkout system to
prevent conflicts (concurrency control), ensures there is only one master copy of each book
to avoid contradictory versions (data consistency), controls access to restricted sections
(security), and understands how different books relate to one another (relationship
management). The user simply asks the librarian for information, and the librarian efficiently
retrieves it, shielding the user from the complexities of how or where the books are physically
stored.

The transition from file systems to DBMS was driven by the need to solve a series of critical
problems that became bottlenecks for developing robust, multi-user applications. The DBMS
was engineered specifically to address these shortcomings.

Key Differences: A Problem/Solution Framework

The inherent limitations of file systems are best understood when contrasted with the features
provided by a DBMS.

Feature File System (The Problem) Database Management

System (The Solution)

Data Redundancy & Data is often duplicated Reduces redundancy

Consistency across multiple files. through a process called
Updating an address in one normalization. Centralized
file but not another leads to control ensures that
data inconsistency.1 updates are applied
consistently, leading to
higher data integrity.1

Data Access & Querying Lacks an efficient method Provides a powerful,

for complex data retrieval. high-level query language
The application must (like SQL) that allows users
implement its own logic to to ask complex questions
parse files and find of the data without
information.1 specifying how to retrieve
it.2

Concurrency Control Poor support for Implements robust

simultaneous access. If two concurrency control
users try to write to the mechanisms (e.g., locking)
same file at the same time, to manage simultaneous
data can be corrupted or access, ensuring that
updates can be lost.3 transactions do not
interfere with one another.3

Security & Integrity Security is often limited to Offers fine-grained

the operating system's security, allowing
file-level permissions (read, administrators to control
write, execute). Enforcing access at the level of
complex business rules is tables, columns, or even
the application's job.1 rows. Enforces complex
data integrity constraints
directly.3

Data Independence Applications are tightly Provides data

coupled to the physical file independence. The
structure. A change in the physical storage can be
file format (e.g., adding a changed without affecting
new field) requires changes applications (physical
to all programs that access independence), and the
it.1 logical structure can be
evolved without breaking
existing views (logical
independence).1

Backup & Recovery No built-in mechanisms. Includes integrated tools

Backup and recovery for regular backups and
procedures must be sophisticated recovery
custom-built and are often mechanisms to restore the
unreliable.1 database to a consistent
state after a failure.1

The necessity for this evolution is rooted in the increasing complexity of software. Early,
single-user applications could tolerate the simplicity of file systems. However, the rise of
enterprise-level, multi-user systems where data is a shared, critical asset made the file
system's limitations untenable. A DBMS is not just a better storage system; it is an enabling
technology that made modern, data-driven applications possible by treating data as a
managed, valuable asset rather than a passive byproduct of a program.

However, it is important to recognize that the landscape is not a simple dichotomy. The
boundaries have become blurred with the advent of embedded databases like SQLite, which
is fundamentally a file-based database library used extensively in applications like web
browsers and mobile apps.3 This illustrates a critical engineering principle: the choice of data
storage is a trade-off. For a simple, single-user application, the overhead of a full-scale DBMS
is unnecessary; an embedded database provides the benefits of SQL and transactions
without the complexity of a client-server architecture. For a large-scale e-commerce
platform, anything less than a full-fledged DBMS would be inadequate. The modern data
environment is a spectrum of tools, and the architect's job is to select the right one for the
task at hand.

Section 1.2: The Architecture of a Database System

A DBMS is a complex piece of software with several interacting components that work
together to store, retrieve, and manage data. Understanding its internal architecture reveals
how it can provide powerful features like data independence, query optimization, and
concurrency control.

Core Components of the DBMS Engine

At the heart of any DBMS are two primary components that handle the core tasks of data
management and query execution.
1. Storage Manager: This is the module that interfaces with the operating system's file
system and is responsible for all low-level data operations. It translates commands from
the query processor into physical actions on the disk. Its key functions include managing
the physical storage of data, allocating and deallocating disk space, organizing data into
files and pages, managing the buffer (the area of main memory used to cache disk pages
for faster access), and maintaining data structures like indexes that speed up data
retrieval.6 The storage manager is the component that worries about the physical reality
of bits and bytes on a storage device.
2. Query Processor: This component acts as the "brain" of the DBMS. It is responsible for
understanding and executing user queries. When a user submits a query (e.g., in SQL),
the query processor first parses it to check for correct syntax, then optimizes it by
determining the most efficient way to execute it, and finally generates a sequence of
low-level instructions that it passes to the storage manager for execution.6 The
optimization step is crucial; for a complex query involving multiple tables, there can be
thousands of ways to execute it, and the optimizer's job is to find a plan that minimizes
resource usage (like disk I/O and CPU time).
Data Abstraction: Hiding the Complexity

One of the most important functions of a DBMS is to provide users with an abstract view of
the data, hiding the intricate details of how it is physically stored and maintained. This is
achieved through levels of abstraction, much like an onion with layers that shield the user
from the complex core.9
● Visual Representation: The Three Levels of Abstraction
Imagine three concentric circles. The innermost circle is the Physical Level, the middle is
the Logical Level, and the outermost is the View Level. The user interacts with the outer
level, completely shielded from the inner workings.

!(https://i.imgur.com/8Q5F5YJ.png)
1. Physical Level (Internal Level): This is the lowest level of abstraction and describes
how the data is actually stored on the physical storage devices. It deals with complex,
low-level data structures like B+ trees and hashing methods, file organization, and
memory management details.9 This level is the concern of database system developers
and, to some extent, DBAs who tune performance.
2. Logical Level (Conceptual Level): This is the next level up, describing what data is
stored in the database and what relationships exist among that data. It presents the
entire database in terms of a small number of relatively simple structures, such as tables
(relations), columns (attributes), and constraints.9 Database administrators and
application developers work at this level. For example, a developer defines a
Students table with columns like StudentID, Name, and Major, without needing to know
how these records are physically arranged on a disk.
3. View Level (External Level): This is the highest level of abstraction and describes only a
part of the entire database, tailored to the needs of a particular user group. A view can
hide certain data for security purposes or present a simplified structure to make
interaction easier.9 For instance, a university registrar might see all student information,
while a faculty member's view might be restricted to only the students enrolled in their
courses, and it might hide sensitive information like financial aid status.

The purpose of these layers of abstraction is to achieve data independence. The logical level
hides the physical storage details, providing physical data independence. This means the
DBA can change the physical storage structures or devices (e.g., move the database to a
faster SSD, create a new index) to improve performance without requiring any changes to the
application programs that access the data.5 Similarly, the view level hides changes in the
logical structure, providing

logical data independence. This allows the DBA to change the conceptual schema (e.g.,
split a table into two, add a new column) without affecting applications that do not depend on
those changes.5 This separation is a cornerstone of modern software engineering, as it
dramatically reduces the long-term cost of system maintenance and evolution.

Instances and Schemas: Blueprint vs. Reality

Two fundamental terms that describe the state and structure of a database are schema and
instance.
● Schema: The schema is the logical blueprint of the database. It is the overall design,
defining the tables, the columns within each table, the data types for each column, the
relationships between tables, and integrity constraints.4 The schema is designed during
the database design phase and is relatively static; it does not change frequently.12 To use
an analogy, the schema is the architect's empty blueprint for a building, detailing the
structure of rooms, doors, and windows, but containing no furniture or people.4
● Instance: An instance of a database is the actual data contained within it at a specific
point in time.4 It is a "snapshot" of the database's content. While the schema is stable,
the instance is highly dynamic, changing with every
INSERT, UPDATE, or DELETE operation.12 In our analogy, an instance is the building at a
particular moment, filled with furniture and occupied by people.4

Application Architecture: Two-Tier and Three-Tier Models

The way applications connect to and interact with a database is defined by its architecture.
The two most common models are two-tier and three-tier.
1. Two-Tier Architecture: This is a simple client-server model where the application logic
resides on the client machine and communicates directly with the database server.15 For
example, a desktop application installed on your computer might connect directly to a
central database. While easy to develop for simple scenarios, this model is less scalable
and poses security risks, as the client has direct access to the database.
2. Three-Tier Architecture: This is the dominant architecture for modern web and
enterprise applications. It introduces a middle layer, the Application Tier, between the
client (Presentation Tier) and the database (Data Tier).15
○ Presentation Tier: This is the user interface that the end-user interacts with, such
as a web browser or a mobile app. Its job is to display information and collect input
from the user.18
○ Application Tier (Middle Tier): This layer contains the business logic of the
application. It receives requests from the presentation tier, processes them (e.g.,
validating data, performing calculations), and interacts with the data tier to store or
retrieve information. This is where the bulk of the application's work happens.16
○ Data Tier: This tier consists of the DBMS and the database itself. It is responsible for
storing and managing the data. Crucially, it is only accessible through the application
tier, never directly by the client.16
● Visual Representation: Three-Tier Architecture
A clear diagram shows the logical separation and communication flow: The User interacts
with the Presentation Tier, which communicates with the Application Tier. The Application
Tier then communicates with the Data Tier to fulfill the user's request.

!(https://www.ibm.com/docs/en/was/images/thtrcs.gif)

This three-tier model offers significant advantages, including enhanced scalability (each tier
can be scaled independently), flexibility (a tier can be updated or replaced without affecting
the others), and improved security (the data tier is shielded from direct external access).17

Section 1.3: The Language of Databases

To interact with a database—to define its structure, manipulate its data, and control
access—users employ a specialized set of commands. In relational databases, these
commands are part of the Structured Query Language (SQL). SQL is not a single monolithic
language but is composed of several sub-languages, each with a distinct function.
● Data Definition Language (DDL): DDL commands are used to define and manage the
database structure, or schema. They create, modify, and delete database objects like
tables, indexes, and views.19 Because these commands alter the fundamental structure of
the database, they are typically used by database administrators and designers.
○ CREATE: To build new database objects (e.g., CREATE TABLE Students (...)).
○ ALTER: To modify the structure of existing objects (e.g., ALTER TABLE Students ADD
COLUMN GPA DECIMAL(3,2);).
○ DROP: To permanently delete objects (e.g., DROP TABLE Students;).
○ TRUNCATE: To remove all data from a table quickly, without deleting the table
structure itself.20
● Data Manipulation Language (DML): DML commands are used to manage the data
within the schema objects—that is, to interact with the database instance.21 These are
the everyday commands used by applications and users to perform operations on the
data.
○ SELECT: To retrieve data from the database. This is the most frequently used SQL
command.
○ INSERT: To add new rows of data into a table (e.g., INSERT INTO Students VALUES
(...);).
○ UPDATE: To modify existing data in a table (e.g., UPDATE Students SET Major =
'Computer Science' WHERE StudentID = 123;).
○ DELETE: To remove rows of data from a table (e.g., DELETE FROM Students WHERE
StudentID = 123;).
● Data Control Language (DCL): DCL commands are concerned with rights, permissions,
and other controls of the database system. They are used by DBAs to manage user
access to data.20
○ GRANT: To give a specific user permission to perform certain tasks (e.g., GRANT
SELECT, INSERT ON Students TO 'professor_smith';).
○ REVOKE: To take away permissions from a user (e.g., REVOKE DELETE ON Students
FROM 'intern_user';).
● Transaction Control Language (TCL): TCL commands are used to manage transactions
in the database, ensuring that work is done reliably and data integrity is maintained.19
○ COMMIT: To save all the work done in the current transaction, making the changes
permanent.
○ ROLLBACK: To undo all the work done in the current transaction, restoring the
database to its state before the transaction began.
○ SAVEPOINT: To set a point within a transaction to which you can later roll back,
without rolling back the entire transaction.

Table 1: SQL Command Categories

Category Purpose Example Commands

DDL (Data Definition Defines and manages the CREATE, ALTER, DROP,
Language) database structure TRUNCATE
(schema).

DML (Data Manipulation Manipulates the data SELECT, INSERT, UPDATE,

Language) within the tables (instance). DELETE

DCL (Data Control Manages user access and GRANT, REVOKE

Language) permissions.

TCL (Transaction Control Manages transactions to COMMIT, ROLLBACK,

Language) ensure data integrity. SAVEPOINT

A profound aspect of SQL, particularly its DML component, is its declarative nature.22 When
a user writes a query like

SELECT Name FROM Students WHERE Major = 'Physics', they are declaring what data they
want, not prescribing the step-by-step procedure for how to get it. The procedural
work—deciding which index to use, the order in which to access tables, and the algorithm for
filtering the data—is handled by the DBMS's query optimizer. This separation of user intent
from execution strategy is a key reason for SQL's enduring power and longevity. It allows the
underlying database technology to evolve and improve its performance without requiring any
changes to the millions of applications that rely on it.

Section 1.4: The Human Element: Users and Administrators

A database system is not just technology; it is a resource used and managed by people in
various roles. Understanding these roles is essential to appreciating how a database functions
within an organization.

Types of Database Users

Database users can be categorized based on their level of technical expertise and how they
interact with the system.23
1. Naive Users: This is the largest group of users. They are typically not aware that they are
using a database. They interact with the system through pre-written application
programs with simple graphical user interfaces.24 Examples include a bank teller using a
terminal to process a withdrawal, a customer using an ATM, or someone booking a flight
on a website.
2. Application Programmers: These are the software developers who write the application
programs that naive users interact with. They are skilled professionals who use
programming languages (like Java, Python, or C#) in conjunction with database
commands (like SQL) to create the user interfaces and business logic of the system.24
3. Sophisticated Users: These users interact with the database directly, without writing
application programs. They are typically engineers, scientists, or business analysts who
are proficient in writing complex SQL queries to perform ad-hoc data analysis.23 They use
tools like SQL clients or data analysis software to explore the data and generate reports.
4. Specialized Users: These users write specialized database applications that do not fit
into the traditional data-processing framework. Their applications often involve complex
data types, such as those for computer-aided design (CAD), geographic information
systems (GIS), or multimedia databases.23

The Role of the Database Administrator (DBA)

The Database Administrator (DBA) is the person or team responsible for the overall
management of the database system. The DBA is the guardian of the data, ensuring its
integrity, security, and availability.24 The role is multifaceted and requires a blend of technical
expertise and strategic thinking.

Key responsibilities of a DBA include:

● Schema Definition and Design: Defining the logical structure (schema) of the database
based on the requirements of the organization.25
● Installation and Configuration: Installing, configuring, and upgrading the DBMS
software and related products.27
● Security and Authorization: Managing user accounts, granting and revoking privileges,
and implementing security measures to protect the data from unauthorized access.27
● Backup and Recovery: Establishing and executing procedures for backing up the
database and, in the event of a failure, recovering the data to a consistent state.26
● Performance Monitoring and Tuning: Proactively monitoring the database's
performance, identifying bottlenecks, and tuning the system (e.g., by creating indexes or
optimizing queries) to ensure efficient operation.26
● Collaboration: Working closely with application developers to assist with database
design and query writing, and with system administrators to manage the underlying
hardware and operating system.27

The DBA role is not just a technical one; it is a strategic function that acts as a bridge between
the technology, the business, and the users. A successful DBA must understand the technical
intricacies of the DBMS, the data needs of the application developers, and the overarching
business requirements for data security, availability, and performance. They ensure that the
database, one of the most critical assets of any modern organization, is not only functioning
correctly but is also aligned with and actively supporting the organization's strategic goals.

Module I Recommended Reading

For students seeking to build a strong theoretical foundation in the concepts covered in this
module, the following texts are highly recommended:
● For Comprehensive Theory: Database System Concepts by Abraham Silberschatz,
Henry F. Korth, and S. Sudarshan. Often referred to as the "Sailboat Book," this is a
cornerstone academic text that provides rigorous and in-depth coverage of fundamental
database concepts. It is ideal for those who want a deep theoretical understanding.30
● For a Balanced Approach: Fundamentals of Database Systems by Ramez Elmasri and
Shamkant B. Navathe. This is another classic textbook widely used in university courses. It
is known for its broad coverage, clear explanations of both theory and design, and its
inclusion of real-world examples, making it highly accessible for students.30

Part II: The Art of Data Modeling

Before a single line of code is written or a table is created, a database must be designed. Data
modeling is the process of creating a conceptual representation of the data that an
organization needs to store and the relationships between different data elements. It is
arguably the most critical step in building a successful database application. A well-designed
model leads to a database that is efficient, easy to maintain, and scalable, while a poor model
can lead to a system plagued by performance issues and data anomalies. This part focuses on
the Entity-Relationship (E-R) model, the industry standard for conceptual data modeling, and
also introduces the alternative data models offered by NoSQL databases.

Section 2.1: The Entity-Relationship (E-R) Model

The Entity-Relationship (E-R) model is a high-level, conceptual data modeling approach that
provides a graphical way to view data, making it an effective tool for database design.34 It
allows designers to create a blueprint of the database based on real-world objects and their
associations, which can then be translated into a relational database schema. The E-R model
is built upon three fundamental concepts: entities, attributes, and relationships.
● Entity: An entity is a real-world object or concept that is distinguishable from other
objects and about which we want to store data. Entities are the "nouns" of our database
model.35 In a university database, examples of entities would be
Student, Course, and Instructor. An entity set is a collection of similar entities (e.g., all
the students in the university).34 In an E-R diagram, an entity is represented by a
rectangle.
○ Strong Entity: An entity that has a primary key attribute that uniquely identifies each
instance. Most entities are strong entities.35
○ Weak Entity: An entity that cannot be uniquely identified by its own attributes alone
and relies on a relationship with another (owner) entity for its identity. For example, a
Dependent entity might only be identifiable in the context of the Employee it belongs
to. A weak entity is represented by a double-lined rectangle.34
● Attribute: An attribute is a property or characteristic that describes an entity. Attributes
are the "adjectives" or "properties" of the entities.35 For the
Student entity, attributes might include StudentID, Name, and Major. In an E-R diagram,
attributes are represented by ovals connected to their entity. There are several types of
attributes:
○ Key Attribute: An attribute whose value is unique for each entity instance. This is
used as the primary key and is typically underlined in the diagram.35
○ Composite Attribute: An attribute that can be subdivided into smaller sub-parts. For
example, an Address attribute could be composed of Street, City, and ZipCode.35
○ Multivalued Attribute: An attribute that can hold multiple values for a single entity
instance. For example, a PhoneNumber attribute for a student could hold both a
home and a mobile number. This is represented by a double-lined oval.35
○ Derived Attribute: An attribute whose value can be calculated or derived from
another attribute. For example, a student's Age can be derived from their
DateOfBirth. This is represented by a dashed oval.39
● Relationship: A relationship represents an association between two or more entities.
Relationships are the "verbs" that connect the entities.35 For example, a
Student entity is associated with a Course entity through an "enrolls in" relationship. In an
E-R diagram, a relationship is represented by a diamond.
○ Cardinality Constraints: These constraints define the numerical nature of the
relationship, specifying how many instances of one entity can be related to instances
of another entity. The most common cardinalities are 34:
■ One-to-One (1:1): One instance of entity A can be associated with at most one
instance of entity B, and vice versa. (e.g., One Department has one Head).
■ One-to-Many (1:M): One instance of entity A can be associated with many
instances of entity B, but each instance of B is associated with only one instance
of A. (e.g., One Instructor teaches many Courses).
■ Many-to-Many (M:N): Many instances of entity A can be associated with many
instances of entity B. (e.g., Many Students enroll in many Courses).

Section 2.2: Visualizing the Design: E-R Diagrams

The graphical representation of the E-R model is the Entity-Relationship Diagram (E-R
Diagram). Its primary purpose is to serve as a visual blueprint that helps database designers,
developers, and business stakeholders communicate and refine the database structure before
it is implemented.34 A clear E-R diagram provides a preview of how tables will connect and
what fields they will contain, allowing for the identification of potential design flaws early in
the development process.34

Step-by-Step Example: A University Database E-R Diagram

Let's construct a basic E-R diagram for the context provided in the syllabus, Vivekananda
Global University.
1. Identify Entities: The core objects in a university setting are Student, Course, Instructor,
and Department.41 These will be our primary entities, represented by rectangles.
2. Define Attributes: Next, we define the properties for each entity and identify their
primary keys (underlined).
○ Student: StudentID, Name, Address (composite: Street, City, State), DateOfBirth
○ Course: CourseID, Title, Credits
○ Instructor: InstructorID, Name, OfficeNumber
○ Department: DeptID, DeptName
3. Establish Relationships: We now define the associations between these entities using
diamonds.
○ A Student enrolls in a Course.
○ An Instructor teaches a Course.
○ A Course is offered by a Department.
○ An Instructor belongs to a Department.
4. Determine Cardinality: We analyze the numerical constraints of each relationship.
○ Student enrolls in Course: A student can enroll in multiple courses, and a course
can have multiple students enrolled. This is a Many-to-Many (M:N) relationship.
○ Instructor teaches Course: Let's assume for simplicity that one instructor can
teach multiple courses, but each course is taught by only one instructor. This is a
One-to-Many (1:M) relationship from Instructor to Course.
○ Department offers Course: A department can offer many courses, but each course
is offered by only one department. This is a One-to-Many (1:M) relationship.
○ Instructor belongs to Department: An instructor belongs to one department, and a
department has many instructors. This is also a One-to-Many (1:M) relationship.
5. Draw the Final Diagram: Combining these elements results in the following E-R
diagram:

!(https://i.imgur.com/q2y1f2P.png)

This diagram serves as a clear, high-level blueprint. From this visual model, a database
designer can proceed to the next stage: translating these conceptual entities and
relationships into a logical schema of relational tables, primary keys, and foreign keys.

Section 2.3: Beyond Relational: An Introduction to NoSQL Data Models

While the E-R model and the resulting relational database have been the dominant paradigm
for decades, the rise of the internet, big data, and applications requiring massive scalability
led to the development of a new class of databases known as NoSQL ("Not Only SQL"). These
databases are designed to handle use cases where the rigid schema and strict consistency of
relational databases can be a limitation.43 They excel at managing large volumes of
unstructured or semi-structured data and are typically designed to scale out horizontally (by
adding more servers) rather than scaling up (by using a more powerful single server).

There are several major categories of NoSQL data models, each suited to different types of
problems:
1. Document Databases: These databases store data in flexible, semi-structured
documents, most commonly in a format like JSON (JavaScript Object Notation).43 Each
document can have its own unique structure. This model is very intuitive for developers
as it maps directly to objects in application code.
○ Example Use Cases: Content management systems, blogging platforms,
e-commerce product catalogs, user profiles.
○ Prominent System: MongoDB.
2. Key-Value Stores: This is the simplest NoSQL data model. Data is stored as a collection
of key-value pairs, where each key is unique and is used to retrieve its corresponding
value.43 The value can be anything from a simple string to a complex object.
○ Example Use Cases: Caching web pages or query results, storing user session
information for web applications, real-time bidding systems.
○ Prominent Systems: Redis, Amazon DynamoDB.
3. Column-Family (or Wide-Column) Stores: These databases store data in tables with
rows and columns, but unlike relational databases, the names and format of the columns
can vary from row to row in the same table.43 They are optimized for queries over large
datasets and are highly scalable for write-heavy workloads.
○ Example Use Cases: Big data analytics, recommendation engines, event logging,
systems that require heavy write throughput.
○ Prominent Systems: Apache Cassandra, Apache HBase.
4. Graph Databases: These databases are purpose-built to store and navigate
relationships. Data is modeled as nodes (entities), edges (relationships), and properties
(attributes).43 They are designed to efficiently handle complex queries that explore the
connections between data points.
○ Example Use Cases: Social networks (e.g., finding friends of friends), fraud
detection (e.g., identifying complex rings of fraudulent activity), logistics and network
management, recommendation engines (e.g., "customers who bought this also
bought...").
○ Prominent Systems: Neo4j, Amazon Neptune.

The introduction of NoSQL in a database curriculum immediately following the E-R model is
significant. It highlights a fundamental shift in the world of data management. The choice of a
data model is no longer automatically "relational." Instead, it is a critical architectural decision
that must be driven by the specific requirements of the application. A modern data architect
needs to be proficient in multiple data models—a concept sometimes called "polyglot
persistence"—to select the right tool for the right job. For a system requiring complex
transactions and strong data integrity, like a banking application, the relational model remains
the superior choice. For a social media application that needs to manage a vast,
interconnected network of users and scale to millions of requests per second, a graph
database is a more natural and efficient fit.

Module II Recommended Reading

To gain a deeper understanding of data modeling, from foundational relational design to

modern system architecture, these books are invaluable:
● For Practical Design: Database Design for Mere Mortals: A Hands-on Guide to
Relational Database Design by Michael J. Hernandez. This book is widely acclaimed for its
clear, practical, and software-independent approach to teaching the principles of
relational database design. It is an excellent starting point for anyone new to the field.45
● For Modern Systems: Designing Data-Intensive Applications: The Big Ideas Behind
Reliable, Scalable, and Maintainable Systems by Martin Kleppmann. This is an essential
text for any serious software engineer or architect. It masterfully explains the trade-offs
between different data models (including relational and various NoSQL types), storage
engines, and distributed system architectures, providing the conceptual tools needed to
build reliable and scalable applications.46

Part III: Data Retrieval and Query Languages

Once a database is designed and populated with data, its primary purpose is to provide
information through queries. A query language is the interface used to communicate with the
database, allowing users to retrieve, insert, update, and delete data. For relational databases,
the universal standard is SQL (Structured Query Language). This part delves into the structure
of SQL queries, from basic data retrieval to complex operations that combine data from
multiple tables, and concludes with an introduction to the concept of query optimization.

Section 3.1: Introduction to SQL (Structured Query Language)

SQL is a powerful, declarative language used to manage and query data in a relational
database. Its syntax is designed to be readable and expressive, resembling natural English.

Anatomy of a Basic Query

The cornerstone of data retrieval in SQL is the SELECT statement. A basic query has a
well-defined structure composed of several clauses that are processed in a logical order.
● Core Clauses:
○ SELECT: Specifies the columns (attributes) you want to retrieve. Using an asterisk (*)
selects all columns.49
○ FROM: Specifies the table (relation) from which to retrieve the data.49
○ WHERE: Filters the rows based on a specified condition. Only rows that satisfy the
condition are included in the result.49
● Optional Clauses for Grouping and Sorting:
○ GROUP BY: Groups rows that have the same values in specified columns into
summary rows. It is often used with aggregate functions.49
○ HAVING: Filters the results of a GROUP BY clause. While WHERE filters rows before
aggregation, HAVING filters groups after aggregation.49
○ ORDER BY: Sorts the final result set in ascending (ASC) or descending (DESC) order
based on one or more columns.49

Example: Using a hypothetical Students table from our university database:

To find the names of all students majoring in 'Computer Science' and order them
alphabetically by last name, the query would be:

SQL
SELECT FirstName, LastName
FROM Students
WHERE Major = 'Computer Science'
ORDER BY LastName ASC;

Aggregate Functions

Aggregate functions perform a calculation on a set of values and return a single value. They
are frequently used with the GROUP BY clause to generate summary reports.20
● COUNT(): Returns the number of rows. COUNT(*) counts all rows, while
COUNT(column_name) counts non-NULL values in that column.
● SUM(): Returns the total sum of a numeric column.
● AVG(): Returns the average value of a numeric column.
● MAX(): Returns the largest value in a column.
● MIN(): Returns the smallest value in a column.

Example: To find the number of students in each major:

SQL

SELECT Major, COUNT(StudentID) AS NumberOfStudents

FROM Students
GROUP BY Major;

Handling NULL Values

In SQL, NULL is a special marker used to indicate that a data value does not exist in the
database. It is not the same as zero or an empty string. Because NULL represents an unknown
value, it cannot be compared using standard comparison operators like = or !=. Instead, the IS
NULL and IS NOT NULL operators must be used to test for NULL values.51
Example: To find all students who have not yet declared a major:

SQL

SELECT Name
FROM Students
WHERE Major IS NULL;

Section 3.2: Advanced Data Retrieval with Joins

The true power of a relational database lies in its ability to store related data in separate
tables and then combine that data on the fly to answer complex questions. This is
accomplished using joins. A JOIN clause combines rows from two or more tables based on a
related column between them, typically a foreign key in one table that references a primary
key in another.53

Venn diagrams are an excellent way to visualize how different join types work, showing which
records are included from two tables, Table A (left) and Table B (right).56

Table 2: SQL JOIN Types Explained

Join Type Description Venn Diagram

INNER JOIN Returns only the rows

where the join condition is
met in both tables. It
represents the intersection
of the two sets.58

LEFT JOIN Returns all rows from the

left table (Table A), and the
matched rows from the
right table (Table B). If
there is no match, NULL is
returned for the columns
from the right table.58

RIGHT JOIN Returns all rows from the !(https://media.geeksforgee

right table (Table B), and ks.org/wp-content/uploads/
the matched rows from the 20250607130703636787/Ri
left table (Table A). If there ght_join.webp)
is no match, NULL is
returned for the columns
from the left table.58

FULL OUTER JOIN Returns all rows when there

is a match in either the left
or the right table. It
effectively combines the
results of both LEFT and
RIGHT joins.58

Other Important Join Types

● CROSS JOIN: Returns the Cartesian product of the two tables—every row from the first
table is paired with every row from the second table. This is rarely used in practice with a
WHERE clause, but can be useful for generating combinatorial data.59
● SELF JOIN: This is not a distinct join type but a technique where a table is joined with
itself. It is useful for querying hierarchical data or comparing rows within the same table.
For example, you could use a self-join on an Employees table to find all employees who
have the same manager.59

Example: Using our university database, let's say we have a Students table and an
Enrollments table. To get a list of all students and the courses they are enrolled in, we would
use an INNER JOIN:

SQL

SELECT Students.Name, Enrollments.CourseID

FROM Students
INNER JOIN Enrollments ON Students.StudentID = Enrollments.StudentID;
To get a list of all students, including those not currently enrolled in any course, we would use
a LEFT JOIN:

SQL

SELECT Students.Name, Enrollments.CourseID

FROM Students
LEFT JOIN Enrollments ON Students.StudentID = Enrollments.StudentID;

For students who are not enrolled, the CourseID column in the result would be NULL.

Section 3.3: Complex Queries and Optimization

Beyond basic SELECT statements and joins, SQL offers capabilities for constructing highly
complex queries.
● Nested Queries and Subqueries: A subquery is a SELECT statement nested inside
another SQL statement (e.g., inside a WHERE or FROM clause).52 They allow for
sophisticated, multi-step filtering. For example, to find all students enrolled in the
'Advanced Databases' course, one could first use a subquery to find the
CourseID for that course title.
● Integrity Constraints: While not query commands, integrity constraints like PRIMARY
KEY, FOREIGN KEY, and NOT NULL are defined using DDL and are crucial for ensuring the
logical consistency of the data that queries operate on.20 A
FOREIGN KEY constraint, for instance, ensures that a value in one table refers to a valid,
existing primary key in another table, making joins meaningful and reliable.
● Query Optimization: As mentioned earlier, the DBMS does not execute a query exactly
as it is written. The query optimizer, a critical component of the query processor,
analyzes the query and determines the most efficient execution plan.6 It considers
factors like available indexes, table sizes, and data distribution statistics to decide on the
best join order, access methods, and algorithms. This process is what allows a high-level,
declarative language like SQL to perform efficiently on complex databases.

It is here that the connection between data modeling and data retrieval becomes clear. The
abstract relationships defined in the E-R model during the design phase are made concrete
and actionable through the JOIN clauses in SQL queries. A line connecting Student and
Course in an E-R diagram becomes an ON Students.StudentID = Enrollments.StudentID clause
in a query. A well-structured E-R model, which accurately captures the real-world
relationships, translates directly into a database schema where logical and efficient joins are
possible. Conversely, a poorly designed model leads to a schema that requires convoluted,
inefficient, or sometimes impossible queries to get the needed information, reinforcing the
paramount importance of thoughtful data modeling.

Module III Recommended Reading

To master the art of writing effective SQL queries, from the basics to advanced techniques,
the following books are highly recommended:
● For a Quick Start: SQL in 10 Minutes, Sams Teach Yourself by Ben Forta. This book is
renowned for its concise, lesson-based approach that helps beginners quickly grasp the
essential syntax and commands of SQL.30
● For Deeper Understanding: SQL Queries for Mere Mortals: A Guide to Data
Manipulation in SQL by John L. Viescas. This classic text goes beyond simple syntax to
teach the logic and thought process behind constructing powerful and accurate queries.
It is an excellent resource for moving from a basic user to a proficient query writer.61

Part IV: Principles of Effective Database Design

A robust and efficient database is not an accident; it is the result of a deliberate and
principled design process. The central goal of this process is to create a structure that
minimizes data redundancy and protects data integrity. The primary technique used to
achieve this is normalization, a systematic process for organizing the columns and tables in a
relational database to reduce data duplication and eliminate undesirable characteristics. This
part explores the problems that arise from poor design, explains the theory of functional
dependencies that underpins normalization, and provides a step-by-step guide through the
most common normal forms.

Section 4.1: The Problem of Redundancy: Data Anomalies

When a database schema is not properly designed, it often contains redundant data—the
same piece of information is stored in multiple places. This redundancy is not just inefficient; it
is dangerous because it leads to data anomalies, which are inconsistencies or errors that
occur when a user attempts to insert, update, or delete data.63 Normalization is the formal
process of decomposing tables to eliminate these anomalies.66

There are three primary types of data anomalies:

1. Insertion Anomaly: This occurs when you cannot insert a valid fact into the database
because another, unrelated fact is not yet known.65
○ Example: Consider a single table that stores both student and course information:
(StudentID, StudentName, CourseID, CourseName, InstructorName). If a new course,
'CS470 - Advanced AI', is created but no students have enrolled yet, we cannot add it
to the table because it requires a StudentID, which is part of the primary key. The
existence of a course is incorrectly made dependent on the existence of an
enrollment.
2. Deletion Anomaly: This is the unintentional loss of data that occurs when a row is
deleted.65
○ Example: Using the same table, suppose student 'Jane Doe' is the only student
enrolled in 'HIST101'. If Jane Doe drops out of the university and her record is
deleted, we also lose the fact that 'HIST101' exists and is taught by a particular
instructor. The deletion of one fact (a student's enrollment) unintentionally causes
the loss of another fact (the existence of a course).
3. Update Anomaly: This occurs when an update to a single piece of data requires multiple
rows to be modified, and a failure to update all of them leads to data inconsistency.66
○ Example: Suppose the instructor for 'CS101' changes. In our single-table design, the
InstructorName is repeated for every student enrolled in 'CS101'. To reflect the
change, we must find and update every single one of these rows. If even one row is
missed, the database will contain conflicting information about who teaches the
course.

These anomalies are all symptoms of a single underlying problem: a poorly structured schema
where a single table is used to store facts about multiple different entities (students, courses,
instructors). The solution is normalization.

Section 4.2: Normalization: A Cure for Anomalies

Normalization is a formal, step-by-step process of decomposing tables into smaller,

well-structured relations. The process is guided by the concept of functional dependency
and a series of rules called normal forms.
Functional Dependency (FD)

Functional dependency is the theoretical foundation of relational database design and

normalization. A functional dependency, denoted as X→Y, exists between two sets of
attributes X and Y if, for any valid instance of the relation, the value of X uniquely determines
the value of Y.70 In other words, if two rows have the same value for

X, they must also have the same value for Y.

● Example: In our university database, we have the functional dependency StudentID
\rightarrow StudentName. Given a specific StudentID, there can only be one
corresponding StudentName. StudentID is the determinant, and StudentName is the
dependent. Identifying all the functional dependencies in a set of data is the first step in
the normalization process.

The Normal Forms (A Step-by-Step Guide)

We will now walk through the process of normalizing a single, unnormalized table for student
course registrations to achieve a well-structured design.

Initial Unnormalized Table (UNF):

StudentID StudentName Courses (CourseID,

CourseName, Instructor)

101 John Doe (CS101, Intro to CS, Dr.

Smith), (MATH203,
Calculus II, Dr. Jones)

102 Jane Smith (CS101, Intro to CS, Dr.

Smith)

This table violates the basic principles of a relational model because the Courses column
contains a repeating group of multiple values.

First Normal Form (1NF): Eliminate Repeating Groups

A table is in 1NF if all its attributes are atomic, meaning each cell contains only a single value,
and each row is unique.72 To convert our table to 1NF, we eliminate the repeating group by
creating a separate row for each course a student takes.
Table in 1NF: (Primary Key: {StudentID, CourseID})

StudentID StudentName CourseID CourseName Instructor

101 John Doe CS101 Intro to CS Dr. Smith

101 John Doe MATH203 Calculus II Dr. Jones

102 Jane Smith CS101 Intro to CS Dr. Smith

Anomalies still exist: We have solved the atomicity problem, but now we have significant
redundancy. StudentName is repeated for John Doe, and CourseName and Instructor are
repeated for CS101. This table still suffers from update, insertion, and deletion anomalies.

Second Normal Form (2NF): Eliminate Partial Dependencies

A table is in 2NF if it is in 1NF and every non-key attribute is fully functionally dependent on
the entire primary key.72 A partial dependency exists when a non-key attribute depends on
only a part of a composite primary key.
In our 1NF table, the primary key is {StudentID, CourseID}.
● StudentName depends only on StudentID (a part of the key). This is a partial
dependency.
● CourseName and Instructor depend only on CourseID (also a part of the key). This is
another partial dependency.

To achieve 2NF, we decompose the table into smaller tables, separating the partially
dependent attributes.

Decomposition into 2NF:

● Students Table:

StudentID StudentName

101 John Doe

102 Jane Smith

● Courses Table:
CourseID CourseName Instructor

CS101 Intro to CS Dr. Smith

MATH203 Calculus II Dr. Jones

● Enrollments Table:

StudentID CourseID

101 CS101

101 MATH203

102 CS101

Anomalies are reduced: We can now add a new student without them being enrolled in a
course, and add a new course before any students enroll. However, the Courses table still has
a problem.

Third Normal Form (3NF): Eliminate Transitive Dependencies

A table is in 3NF if it is in 2NF and there are no transitive dependencies.72 A transitive
dependency exists when a non-key attribute depends on another non-key attribute.
In our Courses table, the primary key is CourseID.
● CourseID \rightarrow Instructor (Let's assume an instructor is tied to a course for this
example).
● However, if we add InstructorOffice, we would have Instructor \rightarrow
InstructorOffice.
● This creates a transitive dependency: CourseID \rightarrow Instructor \rightarrow
InstructorOffice. The non-key attribute InstructorOffice depends on another non-key
attribute Instructor.

To achieve 3NF, we decompose again to remove the transitive dependency.

Decomposition into 3NF:

● Courses Table:

CourseID CourseName InstructorID

CS101 Intro to CS 901

MATH203 Calculus II 902

● Instructors Table:

InstructorID InstructorName InstructorOffice

901 Dr. Smith Room 301

902 Dr. Jones Room 402

(The Students and Enrollments tables remain the same). Now, each table contains facts about
only one entity, and all non-key attributes depend only on the primary key.

Boyce-Codd Normal Form (BCNF): A Stricter 3NF

BCNF is a slightly stronger version of 3NF. A table is in BCNF if, for every non-trivial functional
dependency X→Y, X is a superkey (i.e., a candidate key).72 While most 3NF tables are also in
BCNF, a subtle difference can exist in tables with multiple overlapping candidate keys. BCNF
addresses certain rare anomalies not handled by 3NF.
Table 3: Summary of Normal Forms (1NF to BCNF)

Normal Form Primary Goal/Rule Anomaly Solved

1NF Ensure all attributes are Problems with repeating

atomic (single-valued). groups and non-atomic
data.

2NF Be in 1NF and eliminate all Redundancy where

partial dependencies. non-key attributes depend
on part of a composite key.

3NF Be in 2NF and eliminate all Redundancy where

transitive dependencies. non-key attributes depend
on other non-key
attributes.

BCNF Be in 3NF and for every FD, Certain rare anomalies

the determinant must be a arising from multiple
superkey. overlapping candidate
keys.
Section 4.3: Advanced Normalization and Decomposition

The process of normalization involves breaking down (decomposing) a single large table into
multiple smaller tables. For this process to be correct, the decomposition must have two
crucial properties.

Properties of Decomposition

1. Lossless Join Decomposition: The decomposition must be "lossless," meaning that if
we join the decomposed tables back together, we must be able to perfectly reconstruct
the original table without creating any extra (spurious) rows or losing any original rows.76
A decomposition of a relation R into R1 and R2 is lossless if the intersection of their
attributes is a key for at least one of them.76
2. Dependency Preserving Decomposition: The decomposition must preserve all the
original functional dependencies. This means that every FD from the original table must
be logically implied by the FDs in the individual decomposed tables.76 This is important
because it allows the database to enforce the original business rules by checking
constraints on the smaller, more efficient tables.

While it is always possible to achieve a lossless join decomposition that is in BCNF, it is not
always possible for that decomposition to also be dependency-preserving. In such cases,
designers might choose to stick with a 3NF design that preserves dependencies over a BCNF
design that does not.

Fourth & Fifth Normal Forms (4NF & 5NF)

Beyond BCNF, there are higher normal forms that address more complex data dependencies.
● Fourth Normal Form (4NF): Deals with multi-valued dependencies. A table is in 4NF if
it is in BCNF and has no multi-valued dependencies. This arises when a table has multiple
independent one-to-many relationships, which should be separated into their own
tables.81
● Fifth Normal Form (5NF): Deals with join dependencies. It is designed to reduce
redundancy in databases that contain multi-valued facts by isolating them semantically.
A table is in 5NF if it cannot be decomposed into any number of smaller tables without
loss of information.81

In practice, achieving 3NF or BCNF is sufficient for the vast majority of database designs.

It is critical to understand that normalization is not a goal in and of itself, but a tool to achieve
a better design. In the real world, there is a fundamental trade-off between update efficiency
and query performance. A highly normalized database (e.g., in 3NF/BCNF) minimizes
redundancy, which makes updates, insertions, and deletions very efficient and safe from
anomalies. However, it also means that retrieving data often requires joining many small tables
together, which can be computationally expensive.

For this reason, while Online Transaction Processing (OLTP) systems (like e-commerce
checkout systems or banking applications) are almost always highly normalized to ensure
data integrity, Online Analytical Processing (OLAP) systems (like data warehouses used for
business intelligence) are often intentionally denormalized. They use designs like star
schemas, which introduce controlled redundancy to reduce the number of joins required for
complex analytical queries, thereby dramatically improving query performance. The "best"
level of normalization is not always the highest possible level, but rather the level that best
serves the specific workload of the application.

Module IV Recommended Reading

To fully grasp the theory and practice of normalization and relational database design, this
book is an essential resource:
● For Theory and Practice: Database Systems: The Complete Book by Hector
Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. This text offers a rigorous and
comprehensive treatment of relational design theory, including detailed explanations of
functional dependencies, the process of normalization through all major forms, and the
algorithms for ensuring lossless and dependency-preserving decompositions.46

Part V: Ensuring Reliability: Transaction Management

and Concurrency

In a multi-user database system, many users and applications may be trying to read and write
data at the same time. The part of the DBMS that ensures these concurrent operations do not
corrupt the data and that the database remains in a consistent state, even in the face of
system failures, is the transaction manager. This final part explores the concept of a
transaction, the ACID properties that guarantee its reliability, the problems that arise from
concurrent access, and the mechanisms used to control concurrency and recover from
failures.

Section 5.1: The Concept of a Transaction

A transaction is a sequence of operations performed as a single logical unit of work.84 All

operations within a transaction must either be completed successfully as a group, or none of
them should be performed at all. This "all or nothing" principle is fundamental to maintaining
data integrity.

The ACID Properties

To ensure that transactions are processed reliably, a DBMS must guarantee four key
properties, known by the acronym ACID.85 The classic example used to illustrate these
properties is a bank transfer of $100 from Account A to Account B.
1. Atomicity: This property ensures that a transaction is an "atomic" or indivisible unit.
Either all of its operations are executed, or none are.
○ Bank Transfer Example: The transaction consists of two operations: debiting $100
from Account A and crediting $100 to Account B. Atomicity guarantees that if the
system crashes after the debit but before the credit, the entire transaction will be
rolled back, and the $100 will be returned to Account A. The database is never left in
a state where the money has vanished.85
2. Consistency: This property ensures that a transaction brings the database from one
valid state to another, preserving all predefined rules and constraints.84
○ Bank Transfer Example: A business rule might be that the total sum of money
across all accounts must remain constant. The transfer operation moves money but
does not change the total. The consistency property ensures that the database is in
a consistent state both before the transaction begins and after it commits.87
3. Isolation: This property ensures that concurrently executing transactions cannot
interfere with each other. From the perspective of any single transaction, it should
appear as if it is the only transaction running on the system.84
○ Bank Transfer Example: If another transaction is calculating the total assets of the
bank while our transfer is in progress, isolation ensures it will not read the accounts
in an intermediate state (e.g., after the debit from A but before the credit to B). It will
either see the state before the transfer or the state after it, but never an inconsistent
state in between.85
4. Durability: This property guarantees that once a transaction has been successfully
completed (committed), its changes are permanent and will survive any subsequent
system failure, such as a power outage or server crash.84
○ Bank Transfer Example: Once the transfer is complete and the user receives a
confirmation, durability ensures that the new account balances are permanently
recorded. Even if the bank's servers crash moments later, the record of the
transaction will not be lost.85

Transaction States

During its execution, a transaction passes through several states, which describe its lifecycle
from start to finish.
● Visual Representation: Transaction State Diagram
A state diagram illustrates the lifecycle of a transaction, showing how it moves from one
state to another.

!(https://i.imgur.com/vB1kE3V.png)
● The States:
1. Active: The initial state where the transaction is executing its operations (e.g., READ,
WRITE).89
2. Partially Committed: After the final statement of the transaction has been
executed. At this point, the changes are still in a temporary buffer in main memory
and have not yet been permanently written to the database.89
3. Committed: After the changes have been successfully and permanently stored in the
database. The transaction has completed successfully.89
4. Failed: The state entered if the transaction cannot proceed normally due to an error
(e.g., hardware failure, violation of a constraint).89
5. Aborted: The state after the transaction has failed and all its changes have been
rolled back, restoring the database to its state prior to the transaction's start.89
6. Terminated: The final state of the transaction, reached after it has been either
committed or aborted.92
Section 5.2: Managing Concurrent Access

In a multi-user environment, a DBMS must be able to handle multiple transactions executing

concurrently. While this improves overall system throughput, it introduces the risk of
transactions interfering with one another, which can lead to data inconsistencies if not
managed properly.94

Common Concurrency Problems

Three classic problems illustrate the dangers of uncontrolled concurrency:

1. Lost Update Problem: This occurs when two transactions access the same data item,
and one transaction's update is overwritten by the other's.94
○ Scenario: Two airline agents are trying to book the last seat on a flight. Both read
that there is 1 seat available. Agent A books the seat and updates the count to 0.
Simultaneously, Agent B also books the same seat and updates the count to 0. Agent
B's update overwrites Agent A's update. The result is that two passengers have been
booked for one seat, and the database incorrectly shows 0 seats available, losing the
information that one of the bookings was invalid.
2. Dirty Read (Temporary Update) Problem: This occurs when one transaction reads
data that has been modified by another transaction that has not yet committed.95
○ Scenario: Transaction T1 updates a product's price from $50 to $40 but has not yet
committed. Transaction T2 reads the "dirty" price of $40 and uses it to calculate a
total for a customer's order. If T1 then fails and rolls back, the price reverts to $50. T2
has now used a price that never officially existed in the database, leading to an
incorrect order total.
3. Unrepeatable Read Problem: This occurs when a transaction reads the same data item
twice but gets different values because another transaction modified and committed the
data between the two reads.94
○ Scenario: A manager's transaction reads an employee's salary and sees $60,000.
While the manager's transaction is still active, an HR transaction gives the employee
a raise, updates the salary to $65,000, and commits. If the manager's transaction
reads the same employee's salary again, it will now see $65,000. The original value is
"unrepeatable," which can cause inconsistencies if the transaction was performing
calculations based on the initial value.
Section 5.3: Concurrency Control and Recovery Mechanisms

To prevent these problems and enforce the ACID properties, particularly Isolation, DBMSs use
concurrency control techniques. To enforce Durability, they use recovery techniques.

Concurrency Control Techniques

● Locking Mechanisms: This is the most common approach. Before a transaction can
access a data item, it must acquire a lock on it. Locks can be shared (read-only) or
exclusive (read-write). If a transaction holds an exclusive lock on an item, no other
transaction can access it until the lock is released. This prevents dirty reads and lost
updates.
● Timestamp Ordering: Each transaction is assigned a unique timestamp when it starts.
The DBMS then ensures that any conflicting operations are executed in timestamp order,
which prevents inconsistencies without the overhead of locking.

Deadlock

A serious problem that can arise with locking is deadlock. This occurs when two or more
transactions are in a circular waiting pattern, each waiting for a resource that is locked by
another transaction in the cycle.98
● Scenario: Transaction T1 locks Record A and requests a lock on Record B.
Simultaneously, Transaction T2 locks Record B and requests a lock on Record A. T1
cannot proceed until T2 releases B, and T2 cannot proceed until T1 releases A. They are
stuck in a permanent standoff.
● Visual Representation: Wait-For Graph
A deadlock can be visualized using a "wait-for" graph, where nodes represent
transactions and a directed edge from T1 to T2 means T1 is waiting for a resource held by
T2. A cycle in this graph indicates a deadlock.98

!(https://i.imgur.com/L7c10b7.png)
● Deadlock Handling: DBMSs handle deadlocks through:
○ Prevention: Designing protocols that ensure a deadlock can never occur (e.g.,
requiring transactions to acquire all locks at once).
○ Avoidance: Checking resource requests in real-time to see if granting a lock could
lead to a deadlock.
○ Detection and Recovery: Periodically checking for cycles in the wait-for graph and,
if one is found, breaking it by aborting one of the transactions (the "victim") and
rolling back its changes.101

Recovery Techniques

Recovery mechanisms are essential for ensuring the Atomicity and Durability properties of
transactions in the face of failures.
● Log-Based Recovery: The system maintains a log on stable storage (like a hard disk)
that records all database modifications.103 Each log record contains information such as
the transaction ID, the data item modified, its old value, and its new value. The
Write-Ahead Logging (WAL) protocol ensures that the log record for an operation is
written to stable storage before the actual data is modified on disk.103 After a crash, the
recovery manager uses the log to:
○ Undo the operations of transactions that had not committed.
○ Redo the operations of transactions that had committed, to ensure their changes are
on disk.
● Shadow Paging: This is an alternative recovery technique that avoids the need for a log.
It works by maintaining two page tables: a current page table and a shadow page table.
When a transaction starts, both point to the same database pages on disk. When a write
operation occurs, the modified page is written to a new location on disk, and the current
page table is updated to point to this new page, while the shadow page table remains
unchanged. If the transaction commits, the shadow page table is updated to match the
current one. If it aborts or the system crashes, the current page table is simply discarded,
and the shadow page table, which still points to the original, unmodified data, is used to
restore the database state.106

Section 5.4: An Introduction to Distributed Databases

A distributed database is a collection of multiple, logically interrelated databases that are

physically spread across different locations and connected by a computer network.107 A
Distributed Database Management System (DDBMS) manages this collection in a way that
makes the distribution transparent to the user, who sees it as a single, unified database.110
● Advantages:
○ Reliability and Availability: If one site (or node) fails, the rest of the system can
continue to operate, providing higher availability than a centralized system with a
single point of failure.109 Data can be replicated across multiple sites for fault
tolerance.
○ Improved Performance: Data can be stored closer to the users who access it most
frequently, reducing network latency and improving response times.107 Queries can
also be processed in parallel across multiple sites.
○ Scalability: The system can be scaled horizontally by simply adding more nodes to
the network, which is often more cost-effective than scaling a single server
vertically.109
● Challenges:
○ Increased Complexity: Managing a distributed system is significantly more complex
than managing a centralized one. Query processing, concurrency control, and
recovery all become much harder problems.111
○ Distributed Transactions: A transaction may need to update data on multiple sites.
Ensuring the atomicity of such transactions (e.g., using a protocol like two-phase
commit) is complex and can impact performance.
○ Consistency: Keeping replicated data consistent across all sites is a major
challenge.

The principles of transaction management are deeply intertwined with the challenges of
distributed systems. The strong consistency guarantees of ACID, which are relatively
straightforward to implement on a single machine, become a significant performance
bottleneck in a distributed environment. This tension is famously described by the CAP
Theorem, which states that a distributed data store can only provide two of the following
three guarantees: Consistency, Availability, and Partition Tolerance (the ability to function
despite network failures). Because partition tolerance is a necessity in any real-world network,
designers are often forced to choose between strong consistency (like that provided by ACID)
and high availability. This trade-off is a primary reason why many modern NoSQL distributed
databases have chosen to relax their consistency guarantees (opting for a model known as
"eventual consistency") to achieve higher availability and better performance at a massive
scale.

Module V Recommended Reading

For those interested in the advanced topics of transaction processing, concurrency, and
distributed systems, these books provide deep insights:
● For Advanced Theory: Readings in Database Systems (often called the "Red Book"),
edited by Peter Bailis, Joseph M. Hellerstein, and Michael Stonebraker. This is a curated
collection of the most influential research papers in the database field, offering a direct
look at the foundational ideas behind transaction management, concurrency control, and
distributed databases.45
● For Distributed Systems Internals: Database Internals: A Deep Dive into How
Distributed Data Systems Work by Alex Petrov. This book provides an excellent, modern
exploration of the internal workings of databases, with a strong focus on the challenges
and solutions related to building distributed data systems, including storage engines,
distributed algorithms, and concurrency control.45

Conclusion

This comprehensive guide has journeyed through the five core modules of a foundational
database management systems course. The journey began with the fundamental
"why"—understanding that a DBMS is not merely a data container but a sophisticated system
engineered to solve the critical problems of data redundancy, inconsistency, and insecure
access that plagued older file systems. It established the core architectural components and
the powerful concept of abstraction that provides data independence, a key economic benefit
in software engineering.

The guide then moved to the "what" and "how" of database design and interaction. The art of
data modeling was explored through the Entity-Relationship model, providing a blueprint for
structuring data logically before implementation. The power of SQL was detailed, showing
how its declarative nature allows users to retrieve complex information efficiently, with a
particular focus on the JOIN operation as the practical realization of conceptual data
relationships. The principles of effective design were covered through the process of
normalization, a crucial technique for eliminating data anomalies and ensuring the integrity of
the database schema.

Finally, the guide addressed the critical issues of reliability and performance through the
study of transaction management and concurrency control. The ACID properties were
presented as the bedrock guarantee of reliable data processing, while mechanisms for
managing concurrent access and recovering from system failures were explained as the
means to uphold these guarantees. The introduction to distributed databases highlighted the
modern challenges of scaling and availability, revealing the fundamental trade-offs that
engineers must navigate between consistency and performance in today's data-intensive
world.

Ultimately, the study of database systems is the study of organized information. It is a field
that blends rigorous theory with practical engineering trade-offs. A successful database
professional must not only understand the syntax of SQL or the rules of normalization but
must also grasp the underlying principles that govern the design of reliable, scalable, and
maintainable systems. The true goal is not just to store data, but to transform it into a
consistent, secure, and accessible asset that can power applications and drive informed
decisions.

Works cited

1. Difference between File System and DBMS - GeeksforGeeks, accessed on

August 27, 2025,
https://www.geeksforgeeks.org/dbms/difference-between-file-system-and-dbm
s/
2. I admit I can't find the right analogy to explain files, dbs and object stora... - DEV
Community, accessed on August 27, 2025, https://dev.to/rhymes/comment/4am1
3. When should databases be used as opposed to files/file systems? - Reddit,
accessed on August 27, 2025,
https://www.reddit.com/r/Database/comments/apehdr/when_should_databases_
be_used_as_opposed_to/
4. What Is a Database Schema? | IBM, accessed on August 27, 2025,
https://www.ibm.com/think/topics/database-schema
5. Unlock the Understanding of Data Model, Schema, Instance, and Data
Independence in DBMS (4) | by Tamanna shaikh | Medium, accessed on August 27,
2025,
https://medium.com/@tw4512/unlock-the-understanding-of-data-model-schem
a-instance-and-data-independence-in-dbms-4-6280de4ea558
6. DBMS Architecture: Components and Types of Database Models ..., accessed on
August 27, 2025,
https://www.sprinkledata.com/blogs/dbms-architecture-its-5-key-components-a
nd-types-of-database-models
7. Lesson 3: Components of Database Management Systems (DBMS) | BTU,
accessed on August 27, 2025,
https://btu.edu.ge/wp-content/uploads/2024/05/Lesson-3_-Components-of-Dat
abase-Management-Systems-DBMS.pdf
8. Understanding the Structure of DBMS: Components, Architecture, and Models -
NxtWave, accessed on August 27, 2025,
https://www.ccbp.in/blog/articles/structure-of-dbms
9. Data Abstraction in DBMS: Guide Advantages & Disadvantages - Hero Vired,
accessed on August 27, 2025,
https://herovired.com/learning-hub/blogs/data-abstraction/
10.Data Abstraction and Data Independence - GeeksforGeeks, accessed on August
27, 2025,
https://www.geeksforgeeks.org/dbms/data-abstraction-and-data-independence/
11. Data Abstraction: Overview, Benefits, Levels & Examples - CData Software,
accessed on August 27, 2025, https://www.cdata.com/blog/data-abstraction
12.Difference between Schema and Instance in DBMS - GeeksforGeeks, accessed
on August 27, 2025,
https://www.geeksforgeeks.org/dbms/difference-between-schema-and-instance
-in-dbms/
13.Database schema - Wikipedia, accessed on August 27, 2025,
https://en.wikipedia.org/wiki/Database_schema
14.Instance and schema in DBMS DBMS Schema, accessed on August 27, 2025,
https://dspmuranchi.ac.in/pdf/Blog/Instance%20and%20Schema.pdf
15.What Is Database Architecture? - MongoDB, accessed on August 27, 2025,
https://www.mongodb.com/resources/basics/databases/database-architecture
16.Three-tier architectures - IBM, accessed on August 27, 2025,
https://www.ibm.com/docs/en/was/8.5.5?topic=overview-three-tier-architectures
17.Introduction of 3-Tier Architecture in DBMS - GeeksforGeeks, accessed on
August 27, 2025,
https://www.geeksforgeeks.org/dbms/introduction-of-3-tier-architecture-in-db
ms-set-2/
18.What Is Three-Tier Architecture? | IBM, accessed on August 27, 2025,
https://www.ibm.com/think/topics/three-tier-architecture
19.What Are DDL, DML, DCL, and TCL in SQL? - Baeldung, accessed on August 27,
2025, https://www.baeldung.com/sql/ddl-dml-dcl-tcl-differences
20.SQL Commands | DDL, DQL, DML, DCL and TCL Commands ..., accessed on
August 27, 2025,
https://www.geeksforgeeks.org/sql/sql-ddl-dql-dml-dcl-tcl-commands/
21.SQL Commands: DDL, DML, DCL, TCL, DQL (With Examples) - InterviewBit,
accessed on August 27, 2025, https://www.interviewbit.com/blog/sql-commands/
22.DDL, DML, TCL and DCL - SQL - GeeksforGeeks, accessed on August 27, 2025,
https://www.geeksforgeeks.org/sql/sql-ddl-dml-tcl-dcl/
23.Database User and Administrator | PDF - SlideShare, accessed on August 27,
2025, https://www.slideshare.net/slideshow/lecture-04-215811708/215811708
24.What are the Different Types of Database Users in DBMS? - Scaler ..., accessed on
August 27, 2025, https://www.scaler.com/topics/database-users-in-dbms/
25.Database user and administrator.pptx - SlideShare, accessed on August 27, 2025,
https://www.slideshare.net/slideshow/database-user-and-administratorpptx/2516
22495
26.Database Administrators and Architects : Occupational Outlook Handbook,
accessed on August 27, 2025,
https://www.bls.gov/ooh/computer-and-information-technology/database-admini
strators.htm
27.The Role of a Database Administrator: Keeping Data Secure - Baker College,
accessed on August 27, 2025,
https://www.baker.edu/about/get-to-know-us/blog/role-of-a-database-administr
ator/
28.Database user management - IBM, accessed on August 27, 2025,
https://www.ibm.com/docs/en/ias?topic=features-database-user-management
29.What Does a Database Administrator Do? - CompTIA, accessed on August 27,
2025,
https://www.comptia.org/en-us/blog/what-does-a-database-administrator-do/
30.Top 10 Best Books on Database Management Systems (DBMS) for Beginners and
Experts in 2025 - Cinute Digital, accessed on August 27, 2025,
https://cinutedigital.com/blog/post/top-10-best-books-on-database-manageme
nt-systems-dbms-for-beginners-and-experts-in-2025
31.What Are the Best Books on Database Management Systems? - Blog - Cinute
Digital, accessed on August 27, 2025,
https://cinutedigital.com/blog/post/what-are-the-best-books-on-database-mana
gement-systems
32.Database System Concepts - Abraham Silberschatz, Henry F. Korth, S. Sudarshan
- Google Books, accessed on August 27, 2025,
https://books.google.com/books/about/Database_System_Concepts.html?id=Hvn
_QQAACAAJ
33.Fundamentals of Database Systems Seventh Edition - DSpace at Debra College,
accessed on August 27, 2025,
http://debracollege.dspaces.org/bitstream/123456789/168/1/Fundamentals-of-Dat
abase-Systems-Pearson-2015-Ramez-Elmasri-Shamkant-B.-Navathe.pdf
34.ER Diagram in DBMS An Entity–relationship model (ER model) describes the
structure of a databa, accessed on August 27, 2025,
https://www.du.ac.in/du/uploads/departments/Operational%20Research/2404202
0_E-R%20Model.pdf
35.Introduction of ER Model - GeeksforGeeks, accessed on August 27, 2025,
https://www.geeksforgeeks.org/dbms/introduction-of-er-model/
36.What is an Entity Relationship Diagram? - Miro, accessed on August 27, 2025,
https://miro.com/diagramming/what-is-an-er-diagram/
37.Entity–relationship model - Wikipedia, accessed on August 27, 2025,
https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model
38.Chapter 8 The Entity Relationship Data Model – Database Design - BC Open
Textbooks, accessed on August 27, 2025,
https://opentextbc.ca/dbdesign01/chapter/chapter-8-entity-relationship-model/
39.What is an Entity Relationship Diagram (ERD)? - Lucidchart, accessed on August
27, 2025, https://www.lucidchart.com/pages/er-diagrams
40.ER Diagrams for University Database: A Complete Tutorial | Edraw, accessed on
August 27, 2025,
https://www.edrawsoft.com/article/er-diagrams-for-university-database.html
41.How to Draw an ER Diagram: A Step-by-Step Guide - Miro, accessed on August
27, 2025, https://miro.com/diagramming/how-to-draw-an-er-diagram/
42.10 ER Diagrams for a University Management System + Free ..., accessed on
August 27, 2025,
https://creately.com/guides/er-diagrams-for-a-university-management-system/
43.What is NoSQL? Databases Explained | Google Cloud, accessed on August 27,
2025, https://cloud.google.com/discover/what-is-nosql
44.Types of NoSQL databases - AWS Documentation, accessed on August 27, 2025,
https://docs.aws.amazon.com/whitepapers/latest/choosing-an-aws-nosql-databa
se/types-of-nosql-databases.html
45.Top Database Books recommended by experts (2025 Edition ..., accessed on
August 27, 2025, https://mentorcruise.com/books/database/
46.The Best Database Books You Should Read Now - LearnSQL.com, accessed on
August 27, 2025, https://learnsql.com/blog/database-books/
47.Designing Data-Intensive Applications: Review - - Joao Junior, accessed on
August 27, 2025,
https://joaojunior.org/posts/designing-data-intensive-applications-review/
48.Designing Data-Intensive Applications by Martin Kleppmann | Goodreads,
accessed on August 27, 2025,
https://www.goodreads.com/book/show/23463279-designing-data-intensive-app
lications
49.Access SQL: basic concepts, vocabulary, and syntax - Microsoft Support,
accessed on August 27, 2025,
https://support.microsoft.com/en-us/office/access-sql-basic-concepts-vocabular
y-and-syntax-444d0303-cde1-424e-9a74-e8dc3e460671
50.Mastering SQL Query Structure: Components, Syntax, and Best Practices -
Secoda, accessed on August 27, 2025,
https://www.secoda.co/learn/mastering-sql-query-structure-components-syntax
-and-best-practices
51.SQL Commands List: Basic Database Queries - Codecademy, accessed on
August 27, 2025, https://www.codecademy.com/article/sql-commands
52.SQL SELECT Query - GeeksforGeeks, accessed on August 27, 2025,
https://www.geeksforgeeks.org/sql/sql-select-query/
53.Joins (SQL Server) - Microsoft Community, accessed on August 27, 2025,
https://learn.microsoft.com/en-us/sql/relational-databases/performance/joins?vie
w=sql-server-ver17
54.SQL Table Joins: Advanced Tutorial - SQLCourse, accessed on August 27, 2025,
https://www.sqlcourse.com/advanced-course/table-joins/
55.7 SQL JOIN Examples With Detailed Explanations - LearnSQL.com, accessed on
August 27, 2025, https://learnsql.com/blog/sql-join-examples-with-explanations/
56.SQL JOINs Explained with Venn Diagrams - LearnSQL.com, accessed on August
27, 2025, https://learnsql.com/blog/sql-joins/
57.A Visual Explanation of SQL Joins - Towards Data Science, accessed on August 27,
2025, https://towardsdatascience.com/visual-sql-joins-4e3899d9d46c/
58.SQL Joins (Inner, Left, Right and Full Join) - GeeksforGeeks, accessed on August
27, 2025,
https://www.geeksforgeeks.org/sql/sql-join-set-1-inner-left-right-and-full-joins/
59.SQL JOIN types (INNER, LEFT, RIGHT, SELF, CROSS JOINs Explained) - Devart,
accessed on August 27, 2025,
https://www.devart.com/dbforge/sql/sqlcomplete/sql-join-statements.html
60.5 Infographics to Understand SQL Joins visually - DataLemur, accessed on August
27, 2025, https://datalemur.com/blog/sql-joins-infographics
61.The Best 13 Databases Books - Blinkist, accessed on August 27, 2025,
https://www.blinkist.com/en/content/topics/databases-en
62.Can someone recommend me good books on fundamental data
modeling/database design? - Reddit, accessed on August 27, 2025,
https://www.reddit.com/r/Database/comments/fiqzw0/can_someone_recommen
d_me_good_books_on/
63.Understanding Database Anomalies: Causes and Solutions - digna - AI, accessed
on August 27, 2025,
https://www.digna.ai/understanding-database-anomalies-causes-and-solutions
64.Anomalies in Relational Model - GeeksforGeeks, accessed on August 27, 2025,
https://www.geeksforgeeks.org/dbms/anomalies-in-relational-model/
65.Data anomalies in DBMS - Tutorialspoint, accessed on August 27, 2025,
https://www.tutorialspoint.com/data-anomalies-in-dbms
66.What Is Database Normalization? | IBM, accessed on August 27, 2025,
https://www.ibm.com/think/topics/database-normalization
67.Why is Database Normalization so Important? - Sales Layer Blog, accessed on
August 27, 2025,
https://blog.saleslayer.com/why-is-database-normalization-so-important
68.Understanding Insertion, Deletion, and Update Anomalies in Database Design -
Medium, accessed on August 27, 2025,
https://medium.com/@sitharosrey/understanding-insertion-deletion-and-update-
anomalies-in-database-design-c2083e434a9d
69.Chapter 10 ER Modelling – Database Design – 2nd Edition - BC Open Textbooks,
accessed on August 27, 2025,
https://opentextbc.ca/dbdesign01/chapter/chapter-10-er-modelling/
70.Functional Dependencies in Database Design 101 - Fibery, accessed on August
27, 2025,
https://fibery.io/blog/product-management/functional-dependencies-explained/
71.Basics of Functional Dependencies and Normalization for Relational Databases,
accessed on August 27, 2025,
https://www.tutorialspoint.com/basics-of-functional-dependencies-and-normaliz
ation-for-relational-databases
72.Database Normalization: 1NF, 2NF, 3NF & BCNF Examples ..., accessed on August
27, 2025,
https://www.digitalocean.com/community/tutorials/database-normalization
73.Normalization in SQL and DBMS (1NF - 6NF): Complete Guide - Simplilearn.com,
accessed on August 27, 2025,
https://www.simplilearn.com/tutorials/sql-tutorial/what-is-normalization-in-sql
74.Normalization in SQL DBMS: 1NF, 2NF, 3NF, and BCNF Examples | PopSQL,
accessed on August 27, 2025, https://popsql.com/blog/normalization-in-sql
75.Database Normalization: A Comprehensive Guide to 1NF, 2NF, 3NF, and BCNF -
Medium, accessed on August 27, 2025,
https://medium.com/@digitaldadababu/database-normalization-a-comprehensiv
e-guide-to-1nf-2nf-3nf-and-bcnf-1e877d9f942a
76.Lossless Join and Dependency Preserving Decomposition ..., accessed on August
27, 2025,
https://www.geeksforgeeks.org/dbms/lossless-join-and-dependency-preserving-
decomposition/
77.What is Lossless Join Decomposition in DBMS? - Scaler Topics, accessed on
August 27, 2025,
https://www.scaler.com/topics/lossless-join-decomposition-in-dbms/
78.Difference between Lossless and Lossy Join Decomposition - GeeksforGeeks,
accessed on August 27, 2025,
https://www.geeksforgeeks.org/dbms/difference-between-lossless-and-lossy-joi
n-decomposition/
79.lossless join decomposition - T. Andrew Yang: CSCI5333 - Class Notes, accessed
on August 27, 2025,
https://sceweb.sce.uhcl.edu/yang/teaching/csci5333Fall04/csci5333Ch15.htm
80.Lossless decomposition vs Dependency Preservation - Stack Overflow, accessed
on August 27, 2025,
https://stackoverflow.com/questions/39464758/lossless-decomposition-vs-depe
ndency-preservation
81.Dbms 4NF & 5NF | PPTX - SlideShare, accessed on August 27, 2025,
https://www.slideshare.net/soham28/dbms-4-nf-amp-5nf
82.Database Systems: The Complete Book by Hector Garcia-Molina | Goodreads,
accessed on August 27, 2025,
https://www.goodreads.com/book/show/236665.Database_Systems
83.Database Systems - The Complete Book (2nd Edition) - ELTE, accessed on August
27, 2025, https://people.inf.elte.hu/kiss/DB/ullman_the_complete_book.pdf
84.ACID Transactions in Databases | Databricks, accessed on August 27, 2025,
https://www.databricks.com/glossary/acid-transactions
85.A Beginner's Guide to Understanding ACID Properties in Databases | by Prathik C
| Medium, accessed on August 27, 2025,
https://medium.com/@prathik.codes/a-beginners-guide-to-understanding-acid-
properties-in-databases-2bd165a9d6a3
86.ACID Transactions Explained: Properties, Examples, and Real-Time Applications -
Estuary, accessed on August 27, 2025, https://estuary.dev/blog/acid-transactions/
87.ACID Properties in DBMS - GeeksforGeeks, accessed on August 27, 2025,
https://www.geeksforgeeks.org/dbms/acid-properties-in-dbms/
88.What Are ACID Transactions? A Complete Guide for Beginners - DataCamp,
accessed on August 27, 2025, https://www.datacamp.com/blog/acid-transactions
89.Transaction Management in DBMS, accessed on August 27, 2025,
https://www.du.ac.in/du/uploads/departments/Operational%20Research/2404202
0_Transaction%20Management%20in%20DBMS.pdf
90.Transaction States in DBMS - GeeksforGeeks, accessed on August 27, 2025,
https://www.geeksforgeeks.org/dbms/transaction-states-in-dbms/
91.DBMS Transaction States - BeginnersBook, accessed on August 27, 2025,
https://beginnersbook.com/2018/12/dbms-transaction-states/
92.Transaction States in DBMS - Naukri Code 360, accessed on August 27, 2025,
https://www.naukri.com/code360/library/states-of-transaction-in-dbms
93.State of transaction in DBMS - Scaler Topics, accessed on August 27, 2025,
https://www.scaler.com/topics/transaction-state-in-dbms/
94.Concurrency problems in DBMS Transactions - GeeksforGeeks, accessed on
August 27, 2025,
https://www.geeksforgeeks.org/dbms/concurrency-problems-in-dbms-transacti
ons/
95.What Is Database Concurrency? Problems and Control Techniques | Netdata,
accessed on August 27, 2025,
https://www.netdata.cloud/academy/what-is-database-concurrency/
96.Transactions & Concurrency I - Database Systems - CS 186, accessed on August
27, 2025, https://cs186berkeley.net/notes/note11/
97.Dirty Read in SQL - GeeksforGeeks, accessed on August 27, 2025,
https://www.geeksforgeeks.org/sql/dbms-dirty-read-in-sql/
98.Deadlock in DBMS: Avoidance, Detection and Prevention Explained, accessed on
August 27, 2025,
https://www.theknowledgeacademy.com/blog/deadlock-in-dbms/
99.Deadlock in DBMS: Types, Handling, Applications - NxtWave, accessed on August
27, 2025, https://www.ccbp.in/blog/articles/deadlock-in-dbms
100. Deadlock in DBMS - GeeksforGeeks, accessed on August 27, 2025,
https://www.geeksforgeeks.org/dbms/deadlock-in-dbms/
101. Deadlock in DBMS | Scaler Topics, accessed on August 27, 2025,
https://www.scaler.com/topics/dbms/deadlock-in-dbms/
102. Deadlocks Guide - SQL Server | Microsoft Learn, accessed on August 27,
2025,
https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-deadlocks-
guide?view=sql-server-ver17
103. Step-by-Step Log-Based Recovery in DBMS: What You Need to Know,
accessed on August 27, 2025,
https://cyfuture.cloud/kb/database/step-by-step-log-based-recovery-in-dbms
104. Chapter 17: Recovery System, accessed on August 27, 2025,
https://db-book.com/slides-dir/PDF-dir/ch19.pdf
105. Log based Recovery in DBMS - GeeksforGeeks, accessed on August 27, 2025,
https://www.geeksforgeeks.org/dbms/log-based-recovery-in-dbms/
106. What is shadow paging in DBMS? - Tutorials Point, accessed on August 27,
2025, https://www.tutorialspoint.com/what-is-shadow-paging-in-dbms
107. DISTRIBUTED DATABASES FUNDAMENTALS AND RESEARCH 1. Introduction -
Indiana University South Bend, accessed on August 27, 2025,
https://clas.iusb.edu/computer-science-informatics/research/reports/TR-200505
25-1.pdf
108. Distributed Database System - GeeksforGeeks, accessed on August 27, 2025,
https://www.geeksforgeeks.org/dbms/distributed-database-system/
109. Distributed Database System in DBMS - Scaler Topics, accessed on August 27,
2025, https://www.scaler.com/topics/dbms/distributed-database-in-dbms/
110. Distributed DBMS - Distributed Databases - Tutorialspoint, accessed on
August 27, 2025,
https://www.tutorialspoint.com/distributed_dbms/distributed_dbms_databases.ht
m
111. What is a distributed database? - CockroachDB, accessed on August 27, 2025,
https://www.cockroachlabs.com/blog/what-is-a-distributed-database/
112. What Is A Distributed Database? | MongoDB, accessed on August 27, 2025,
https://www.mongodb.com/resources/basics/databases/distributed-database
113. Readings in Database Systems, 5th Edition, accessed on August 27, 2025,
http://www.redbook.io/

DBMS Unit-1-1
No ratings yet
DBMS Unit-1-1
16 pages
KMBN It03 - Unit - 1
No ratings yet
KMBN It03 - Unit - 1
18 pages
Database Management System
No ratings yet
Database Management System
86 pages
22CS4201 DBMS UNIT 1 - Notesbbhjsjsjjsnsns
No ratings yet
22CS4201 DBMS UNIT 1 - Notesbbhjsjsjjsnsns
42 pages
Chapter 1
No ratings yet
Chapter 1
32 pages
Dbms Unit 01
No ratings yet
Dbms Unit 01
11 pages
Screencapture App e Box Co in Amphisession Processsession 205868 2024 02 02 10 - 12 - 33
No ratings yet
Screencapture App e Box Co in Amphisession Processsession 205868 2024 02 02 10 - 12 - 33
15 pages
DBMS Unit-1 Notes
No ratings yet
DBMS Unit-1 Notes
22 pages
KMBNIT03 - Unit 1
No ratings yet
KMBNIT03 - Unit 1
24 pages
Dbms Notes
100% (1)
Dbms Notes
28 pages
Unit 1 and 2 It03
No ratings yet
Unit 1 and 2 It03
91 pages
Module 1 Introduction To DBMS
No ratings yet
Module 1 Introduction To DBMS
67 pages
DBMS Unit 1
No ratings yet
DBMS Unit 1
24 pages
Unit 1 2
No ratings yet
Unit 1 2
76 pages
DBMS Basics for Beginners
No ratings yet
DBMS Basics for Beginners
32 pages
Dbms Mod1fullshirin
No ratings yet
Dbms Mod1fullshirin
23 pages
Dbms New
No ratings yet
Dbms New
156 pages
COSC 0150 Lecture 1 and 2
No ratings yet
COSC 0150 Lecture 1 and 2
12 pages
RDBMS For BCOM 6th (Old) and 3rd (New) Sem
80% (5)
RDBMS For BCOM 6th (Old) and 3rd (New) Sem
110 pages
Chapter 1 Database Management
No ratings yet
Chapter 1 Database Management
37 pages
Data Base Management System
No ratings yet
Data Base Management System
18 pages
Database Management System: Lec - 1: Basic Database Concepts
No ratings yet
Database Management System: Lec - 1: Basic Database Concepts
39 pages
DBMS Introduction and Advantages
No ratings yet
DBMS Introduction and Advantages
35 pages
Database Management Systems Guide
No ratings yet
Database Management Systems Guide
7 pages
Unit 1
No ratings yet
Unit 1
8 pages
File System in DBMS
No ratings yet
File System in DBMS
10 pages
DBMS Module 1
No ratings yet
DBMS Module 1
55 pages
Unit One - DBMS
No ratings yet
Unit One - DBMS
19 pages
Dbms Edited Module 1 - DBMS
No ratings yet
Dbms Edited Module 1 - DBMS
24 pages
Intro to Database Management
No ratings yet
Intro to Database Management
20 pages
1 To 5 DBMS
No ratings yet
1 To 5 DBMS
65 pages
File Systems vs. DBMS Explained
No ratings yet
File Systems vs. DBMS Explained
5 pages
DBMS Unit-1
No ratings yet
DBMS Unit-1
30 pages
Unit 1
No ratings yet
Unit 1
66 pages
Unit 1
No ratings yet
Unit 1
12 pages
Database Systems for Beginners
No ratings yet
Database Systems for Beginners
28 pages
DBMS Anu
No ratings yet
DBMS Anu
58 pages
DBMS Unit 1
No ratings yet
DBMS Unit 1
42 pages
Database Mangement
No ratings yet
Database Mangement
80 pages
Unit 1
No ratings yet
Unit 1
23 pages
Unit 1 (DBMS)
No ratings yet
Unit 1 (DBMS)
22 pages
DBMS
No ratings yet
DBMS
41 pages
Introduction To DBMS
No ratings yet
Introduction To DBMS
17 pages
Unit 1
No ratings yet
Unit 1
104 pages
DBMS Module 1
100% (1)
DBMS Module 1
83 pages
DBMS Module 1
No ratings yet
DBMS Module 1
37 pages
DBMS Essentials for CS Students
No ratings yet
DBMS Essentials for CS Students
20 pages
DBMS
No ratings yet
DBMS
6 pages
SE - AIML-DBMS-Unit I
No ratings yet
SE - AIML-DBMS-Unit I
69 pages
DBMS Unit-1 Part-A Notes
No ratings yet
DBMS Unit-1 Part-A Notes
35 pages
Introduction To DBMS
No ratings yet
Introduction To DBMS
36 pages
Dbms Unit I Notes
No ratings yet
Dbms Unit I Notes
40 pages
1 DBMS
No ratings yet
1 DBMS
142 pages
DBMS Essentials for Beginners
No ratings yet
DBMS Essentials for Beginners
24 pages
Chapter 1 Introduction To Database Systems
No ratings yet
Chapter 1 Introduction To Database Systems
36 pages
DBMS Unit1
No ratings yet
DBMS Unit1
20 pages
Introduction To Smart Crime Reconstruction System
No ratings yet
Introduction To Smart Crime Reconstruction System
10 pages
Java 100 MCQ Questions
No ratings yet
Java 100 MCQ Questions
21 pages
Soul Energy - Researching Life and Death
No ratings yet
Soul Energy - Researching Life and Death
30 pages
The Persistent Self - An Inquiry Into The Nature of The Soul and Its Fate After Death
No ratings yet
The Persistent Self - An Inquiry Into The Nature of The Soul and Its Fate After Death
31 pages
Train Tours to Sapa: Top 5 Companies
No ratings yet
Train Tours to Sapa: Top 5 Companies
5 pages
Disassembly and Assembly: Automatic Transmission
No ratings yet
Disassembly and Assembly: Automatic Transmission
1 page
Hafed TPP
No ratings yet
Hafed TPP
49 pages
Manual DA5
No ratings yet
Manual DA5
71 pages
AMSOIL Synthetic Motor Oils For OE Oil Change Interval. 3000 Mile Oil Change
No ratings yet
AMSOIL Synthetic Motor Oils For OE Oil Change Interval. 3000 Mile Oil Change
2 pages
Case Analysis - PACADI
0% (1)
Case Analysis - PACADI
12 pages
Beach Please
No ratings yet
Beach Please
1 page
Biometric Reuse
No ratings yet
Biometric Reuse
1 page
Introduction To Maya Hieroglyphs - European Maya Conference - Harri Kettunen, Christophe Helmke
No ratings yet
Introduction To Maya Hieroglyphs - European Maya Conference - Harri Kettunen, Christophe Helmke
158 pages
Respiratory System - History and Physical Examination
No ratings yet
Respiratory System - History and Physical Examination
22 pages
01 Aen 17526 s17 Model Answer
No ratings yet
01 Aen 17526 s17 Model Answer
26 pages
A Season in Hell - The Illuminations - Arthur Rimbaud - 2023 - Anna's Archive
No ratings yet
A Season in Hell - The Illuminations - Arthur Rimbaud - 2023 - Anna's Archive
193 pages
Group 10 - Uber Strategic Alliances
No ratings yet
Group 10 - Uber Strategic Alliances
10 pages
Ways To Heal - Mind & Body
No ratings yet
Ways To Heal - Mind & Body
16 pages
Beast: A Tale of Love and Revenge by Lisa Jansen Chapter Sampler
No ratings yet
Beast: A Tale of Love and Revenge by Lisa Jansen Chapter Sampler
41 pages
Study in Austria-General Information For Pakistani Students
No ratings yet
Study in Austria-General Information For Pakistani Students
15 pages
Module 6 Stoichiometry 1
No ratings yet
Module 6 Stoichiometry 1
37 pages
FMCG Sales & Marketing Resume
No ratings yet
FMCG Sales & Marketing Resume
2 pages
FM Midterm Chapter4
No ratings yet
FM Midterm Chapter4
13 pages
Cape Physics Unit 2 Formula Sheet
No ratings yet
Cape Physics Unit 2 Formula Sheet
4 pages
Joel M. Bowman Et Al - Variational Quantum Approaches For Computing Vibrational Energies of Polyatomic Molecules
No ratings yet
Joel M. Bowman Et Al - Variational Quantum Approaches For Computing Vibrational Energies of Polyatomic Molecules
73 pages
Case Study-3rd test-TQM-2018-19
100% (1)
Case Study-3rd test-TQM-2018-19
5 pages
DragonArt Fantasy Characters
90% (10)
DragonArt Fantasy Characters
131 pages
Class 12 Economics: Macroeconomics Quiz
No ratings yet
Class 12 Economics: Macroeconomics Quiz
6 pages
PhD Research Topic Selection Guide
0% (2)
PhD Research Topic Selection Guide
177 pages
Safety Solutions in IP67
No ratings yet
Safety Solutions in IP67
8 pages
Promo: Ganadores de La Semana:)
No ratings yet
Promo: Ganadores de La Semana:)
4 pages
22 208 219 Ajsshr (S) Use+of+Digital+Sports+Technologies+in+Sports+Television
No ratings yet
22 208 219 Ajsshr (S) Use+of+Digital+Sports+Technologies+in+Sports+Television
12 pages
Class X Exam Marks Register
No ratings yet
Class X Exam Marks Register
9 pages
Sprayit Gravity Feed Spray Gun SPRAYIT
No ratings yet
Sprayit Gravity Feed Spray Gun SPRAYIT
8 pages

Database Topics Explained With Examples

Uploaded by

Database Topics Explained With Examples

Uploaded by

A Comprehensive Guide to Database

Part I: Foundational Concepts of Database Systems

Section 1.1: The Evolution from File Systems to Database Systems

A Database Management System (DBMS), in contrast, is a sophisticated software

Key Differences: A Problem/Solution Framework

Feature File System (The Problem) Database Management

Data Redundancy & Data is often duplicated Reduces redundancy

Data Access & Querying Lacks an efficient method Provides a powerful,

Concurrency Control Poor support for Implements robust

Security & Integrity Security is often limited to Offers fine-grained

Data Independence Applications are tightly Provides data

Backup & Recovery No built-in mechanisms. Includes integrated tools

Section 1.2: The Architecture of a Database System

Core Components of the DBMS Engine

Instances and Schemas: Blueprint vs. Reality

Application Architecture: Two-Tier and Three-Tier Models

Section 1.3: The Language of Databases

Table 1: SQL Command Categories

Category Purpose Example Commands

DML (Data Manipulation Manipulates the data SELECT, INSERT, UPDATE,

DCL (Data Control Manages user access and GRANT, REVOKE

TCL (Transaction Control Manages transactions to COMMIT, ROLLBACK,

Section 1.4: The Human Element: Users and Administrators

Types of Database Users

The Role of the Database Administrator (DBA)

Key responsibilities of a DBA include:

Module I Recommended Reading

Part II: The Art of Data Modeling

Section 2.1: The Entity-Relationship (E-R) Model

Section 2.2: Visualizing the Design: E-R Diagrams

Step-by-Step Example: A University Database E-R Diagram

Section 2.3: Beyond Relational: An Introduction to NoSQL Data Models

Module II Recommended Reading

To gain a deeper understanding of data modeling, from foundational relational design to

Part III: Data Retrieval and Query Languages

Section 3.1: Introduction to SQL (Structured Query Language)

Anatomy of a Basic Query

Example: Using a hypothetical Students table from our university database:

Example: To find the number of students in each major:

SELECT Major, COUNT(StudentID) AS NumberOfStudents​

Handling NULL Values

Section 3.2: Advanced Data Retrieval with Joins

Table 2: SQL JOIN Types Explained

Join Type Description Venn Diagram

INNER JOIN Returns only the rows

LEFT JOIN Returns all rows from the

RIGHT JOIN Returns all rows from the !(https://media.geeksforgee

FULL OUTER JOIN Returns all rows when there

Other Important Join Types

SELECT Students.Name, Enrollments.CourseID​

SELECT Students.Name, Enrollments.CourseID​

Section 3.3: Complex Queries and Optimization

Module III Recommended Reading

Part IV: Principles of Effective Database Design

Section 4.1: The Problem of Redundancy: Data Anomalies

There are three primary types of data anomalies:

Section 4.2: Normalization: A Cure for Anomalies

Normalization is a formal, step-by-step process of decomposing tables into smaller,

Functional dependency is the theoretical foundation of relational database design and

X, they must also have the same value for Y.

The Normal Forms (A Step-by-Step Guide)

Initial Unnormalized Table (UNF):

StudentID StudentName Courses (CourseID,

101 John Doe (CS101, Intro to CS, Dr.

102 Jane Smith (CS101, Intro to CS, Dr.

First Normal Form (1NF): Eliminate Repeating Groups

StudentID StudentName CourseID CourseName Instructor

101 John Doe CS101 Intro to CS Dr. Smith

101 John Doe MATH203 Calculus II Dr. Jones

102 Jane Smith CS101 Intro to CS Dr. Smith

Second Normal Form (2NF): Eliminate Partial Dependencies

Decomposition into 2NF:

101 John Doe

102 Jane Smith

CS101 Intro to CS Dr. Smith

MATH203 Calculus II Dr. Jones

SELECT Major, COUNT(StudentID) AS NumberOfStudents

SELECT Students.Name, Enrollments.CourseID

SELECT Students.Name, Enrollments.CourseID

1. Difference between File System and DBMS - GeeksforGeeks, accessed on