DBMS Unit-1 Notes
DBMS Unit-1 Notes
DBMS: A database management system (DBMS) is a software system for creating and
managing databases. A DBMS enables end users to create, protect, read, update, and delete
data in a database. It also manages security, data integrity, and concurrency for databases.
Some popular databases are MySQL, Oracle, MongoDB, etc. DBMS provides many operations,
e.g., creating a database, storing the database, updating an existing database, and deleting it
from the database. DBMS is a system that enables you to store, modify, and retrieve data in
an organized way. It also provides security to the database.
Data is a collection of raw, unorganized facts and details like text, observations, figures,
symbols, audio, videos, descriptions of things, etc. In other words, data does not carry any
specific purpose and has no significance. Moreover, data is measured in bits and bytes – basic
information units in computer storage and processing.
Information is processed, organized, and structured data. It provides context for data and
enables decision-making. For example, a single customer’s sale at a restaurant is data – this
becomes information when the business can identify the most or least popular dish.
Advantages of DBMS
• Data independence: the capacity to change the schema (structure) of the database
without affecting the application programs or user views that access the data. It is a
fundamental concept that simplifies database maintenance and enhances flexibility.
• Efficient data access: providing indexing and query optimization techniques that speed
up data retrieval. This reduces the time required to process large volumes of data and
increases the system's overall performance.
• Data integrity and security: Data integrity is defined as the database's correct and
consistent data. For this purpose, the data stored in the database must satisfy certain
procedures (rules). The data in a database must be correct and consistent. DBMS
provides different ways to implement such types of constraints (rules). It can be
implemented by rules, i.e., primary, secondary, and foreign keys. This improves data
integrity in a database. Data security refers to the prevention of data from
unauthorized users. Only authorized users are allowed access to the data. In the
database, the DBA or head of the department can access all the data. Some users are
only allowed to retrieve data, whereas others are allowed to retrieve and modify the
data.
• Data administration: a Database Administrator (DBA) is an individual or person
responsible for controlling, maintaining, coordinating, and operating a database
management system. Managing, securing, and caring for the database systems is a
prime responsibility. They are responsible and in charge of authorizing access to the
database, coordinating, capacity, planning, installation, and monitoring uses. Their
roles vary from configuration, database design, migration, security, troubleshooting,
backup, and data recovery. Database administration is a major and key function in any
firm or organization that relies on one or more databases.
• Concurrent access and crash recovery: a DBMS schedules concurrent accesses to the
data so that users can think of the data as being accessed by only one user at a time.
Further, the DBMS protects users from the effects of system failures.
• Reduced application development time: The DBMS supports many important
functions common to many applications that access data stored in the DBMS. This, in
conjunction with the high-level interface to the data, facilitates the quick development
of applications. Such applications are also likely to be more robust than applications
developed from scratch because many important tasks are handled by the DBMS
instead of being implemented by the application.
Storage Data
The records in databases are stored in file formats. Physically, the data is stored in
electromagnetic format on a device. The electromagnetic devices used in database systems
for data storage. The Redundant Array of Independent Disks (RAID) as two or more secondary
storage devices are connected to operate as one storage medium. A RAID array consists of
several disks linked together for various purposes, such as Stripping, Mirroring, error
correction, etc.
A data model is a collection of high-level data constructs that hide many low-level storage
details. A DBMS allows a user to define the data to be stored in terms of a data model. Most
database management systems today are based on the relational data model. While the data
model of the DBMS hides many details, it is nonetheless closer to how the DBMS stores data
than to how a user thinks about the underlying enterprise. A semantic data model is a more
abstract, high-level data model that makes it easier for a user to develop a good initial
description of the data in an enterprise. These models contain various constructs that help
describe a real application scenario. A DBMS is not intended to support all these constructs
directly; it is typically built around a data model with just a few basic constructs, such as the
relational model. A database design in terms of a semantic model serves as a useful starting
point. It is subsequently translated into a database design regarding the data model the DBMS
supports.
Queries
In Database Management Systems (DBMS), a query is a fundamental tool for retrieving,
manipulating, and managing data stored within a database. Queries are the bridge between
users and databases, enabling them to communicate with the system to extract specific
information or perform various operations on the data. Understanding the nuances of queries
and their associated terminologies is crucial for anyone working with databases, whether as
a developer, analyst, or database administrator.
A query in a DBMS is a request made by a user or application to retrieve or manipulate data
stored in a database. This request is typically formulated using a structured query language
(SQL) or a query interface provided by the DBMS. The primary purpose of a query is to specify
precisely what data is needed and how it should be retrieved or modified. There are some
terminologies related to queries.
• Database: A structured data collection organized for efficient retrieval and storage. It
serves as the repository for data accessed and manipulated through queries.
• SQL (Structured Query Language): A standardized programming language used to
interact with relational databases. SQL provides a set of commands for querying,
updating, and managing databases.
• Table: A fundamental relational database component representing a related data
collection organized into rows and columns. Each table in a database typically
corresponds to a specific entity or concept.
• Field/Column: A single piece of data stored within a table, representing a specific
attribute or characteristic of the entities described by the table.
• Record/Row: A complete data set representing an individual instance or entity stored
within a table. Each row contains values for each field/column defined in the table
schema.
• Primary Key: A unique identifier for each record in a table, ensuring that each row can
be uniquely identified and accessed. Primary keys are used to establish relationships
between tables and enforce data integrity.
• Query Language: The language used to communicate with a database management
system. This language allows users to perform data retrieval, manipulation, and
schema definition operations.
DBMS Structure
The top part of the architecture shows application interfaces used by naive users, application
programs created by application programmers, query tools used by sophisticated users, and
administration tools used by database administrators.
The lowest part of the architecture is for disk storage.
The Middle two parts (Query processor and storage manager) are important components of
database architecture.
Query processor: The interactive query processor helps the database system simplify and
facilitate data access. It consists of a DDL (Data Definition Language) interpreter, a DML (Data
Manipulation Language) compiler, and a query evaluation engine. The following are various
functionalities and components of query processor.
• DDL interpreter: This translator interprets the DDL statements in data dictionaries.
• DML compiler: It translates DML statements query language into an evaluation plan.
This plan consists of the instructions the query evaluation engine understands.
• Query evaluation engine: It executes the low-level instructions the DML compiler
generates.
When a user issues a query, the parsed query is presented to a query optimizer, which uses
information about how the data is stored to produce an efficient execution plan for evaluating
the query. An execution plan is a blueprint for evaluating a query. It is evaluated by a query
evaluation engine.
Storage manager: The storage manager is the component of the database system that
provides an interface between the low-level data stored in the database and the application
programs and queries submitted to the system. The storage manager is responsible for
storing, retrieving, and updating data in the database. The storage manager components
include
• Authorization and integrity manager: Validates the users who want to access the data
and tests for integrity constraints.
• Transaction manager: Ensures the database remains consistent despite system
failures and that concurrent transaction execution proceeds without conflict.
• File manager: Manages space allocation on disk storage and representation of the
information on disk.
• Buffer manager: Manages data fetching from disk storage into main memory. The
buffer manager also decides what data to cache in the main memory. The buffer
manager is a crucial part of the database system.
Storage manager implements several data structures, such as
Types of Databases
Designing different types of databases lies at the core of the functionality that they provide to
the users. Since data is a dynamic entity, the way it is stored varies a lot. It is also why
companies design databases that comply with their needs. Several types of databases are
briefly explained below.
Hierarchical database: A Hierarchical Database is a data storage system where the data is
organized into records. Each record comprises fields that contain values. The fields may be
arranged in a hierarchy, meaning that some fields are subordinate to others. This allows for
efficient storage and retrieval of data.
This property of having multiple links applies in two ways: the schema and the database itself
can be seen as a generalized graph of record types that are connected by relationship types.
The main advantage of a network database is that it allows for a more natural modeling of
relationships between records or entities, as opposed to the hierarchical model. However, the
relational database model has started to win over both the network and the hierarchical
models because its added flexibility and productivity have become more evident as hardware
technology has become faster.
Relational database: Considered the most mature of all databases, these databases lead in
the production line along with their management systems. In this database, every piece of
information has a relationship with every other piece of information. This is because every
data value in the database has a unique identity as a record.
Note that all data is tabulated in this model. Therefore, every row of data in the database is
linked with another row using a primary key. Similarly, every table is linked with another table
using a foreign key.
Refer to the diagram below and notice how the ‘Keys’ concept links two tables.
Due to this introduction of tables to organize data, it has become exceedingly popular.
Consequently, they are widely integrated into Web-Ap interfaces to serve as an ideal
repository for user data. The ease of mastering it makes it further interesting since the
language used to interact with the database is simple (SQL in this case) and easy to
comprehend.
It is also worth being aware that scaling and traversing through data in relational databases is
quite a lightweight task compared to hierarchical databases.
Object-oriented database: Those familiar with the Object-Oriented Programming Paradigm
would be able to relate to this model of databases easily. Information stored in a database can
be represented as an object that responds as an instance of the database model. Therefore,
the object can be referenced and called without any difficulty. As a result, the database
workload is substantially reduced.
In the chart above, we have different objects linked to one another using methods; one can
get the Person's address (represented by the Person Object) using the livesAt() method.
Furthermore, these objects have attributes, which are, in fact, the data elements that need to
be defined in the database.
An example of such a model is the Berkeley DB software library, which uses the same
conceptual background to deliver quick and highly efficient responses to database queries
from the embedded database.
Key-value database: A key-value database is a type of non-relational database, also known as
a NoSQL database, that uses a simple key-value method to store data. It stores data as a
collection of key-value pairs in which a key serves as a unique identifier. Both keys and values
can be anything, ranging from simple to complex compound objects. Key-value databases (or
key-value stores) are highly partitionable and allow horizontal scaling at a level that other
types of databases cannot achieve.
Traditional relational databases (SQL databases) store data in tables containing rows and
columns. They enforce a rigid structure on data and are not optimal for every use case. On the
other hand, key-value databases are NoSQL databases. They allow flexible database schemas
and improved performance at scale for certain use cases.
Key-value databases work by organizing all data as a set of key-value pairs. In the example, the
primary key is a composite of two keys: Product ID and Type. The Product ID is the partition
key which describes the partition in which the item will be stored. The Type is the sort key,
which determines the order in which items will be stored on disk. Combining the Partition Key
and the Sort Key forms a unique primary key, which maps to a single value in the database.
In this example, the data object book has attributes like title, author, and publishing date.
Every book data object has a key called BookID. You can directly link the BookID and associated
book object in the key-value store. In addition, you can retrieve data by looking up the BookID
in the table. Also, each item has its own schema, making key-value stores highly flexible for
storing data of varying structures.
XML DB: An XML database is a database that can be used to store large amounts of data or
information in XML format. They can handle data of any size or format. XML is a markup
language that uses “tags” or specially formatted text labels to identify the function of varied
data elements within a document. There are two major types of XML databases:
• XML-enabled – An XML-enabled database is the extension provided for converting an
XML document. This database is a relational database in which data is stored in tables
consisting of rows and columns.
• Native XML (NXD) – A Native XML database is based on the container rather than table
format. This type of database can store large amounts of XML documents and data.
Native XML databases have an advantage over the XML-enabled database, as it is
easier to store, query, and maintain the XML document in a native database than in an
XML-enabled database.
XML databases are an easy choice when the data being stored is XML. However, any data can
be stored and can be the preferred option when dealing with complex data.
The preceding schema says that each record in the Student's relation has five fields, with field
names and types as indicated. An example of the student's relationship appears in the figure
below.
Each row in the Students relation is a record that describes a student. We can make the
description of a collection of students more precise by specifying integrity constraints, which
are conditions that the records in a relation must satisfy. For example, we could specify that
every student has a unique sid value. We cannot capture this information by simply adding
another field to the Students schema. Thus, the ability to specify the uniqueness of the values
in a field increases the accuracy with which we can describe our data. An important aspect of
a data model is the expressiveness of the constructs available for specifying integrity
constraints.
Data Abstraction
Data abstraction hides the complex data structure to simplify the system's user interface. It is
done because many users interacting with the database system are not computer-trained to
understand its complex data structures. To achieve data abstraction, we will discuss a Three-
Schema architecture that abstracts the database at three levels discussed below.
This architecture's main objective is effectively separating the user interface and physical
database. So, the user never has to be concerned regarding the internal storage of the
database, and it has a simplified interaction with the database system.
The three-schema architecture defines the view of data at three levels:
• Physical level (internal level): The physical or internal level schema describes how the
data is stored in the hardware. It also describes how the data can be accessed. The
physical level shows the data abstraction at the lowest level and has complex data
structures. Only the database administrator operates at this level.
• Logical level (conceptual level): It is above the physical level. Here, the data is stored
in the entity set, entities, their data types, the relationship among the entity sets, user
operations performed to retrieve or modify the data, and certain constraints on the
data. Well, adding constraints to the view of data adds security. Users are restricted to
accessing some particular parts of the database.
• View level (external level): It is the highest level of data abstraction and exhibits only
a part of the whole database. It exhibits the data in which the user is interested. The
view level can describe many views of the same data. Here, the user retrieves the
information using different applications from the database.
The figure below describes the three-schema architecture of the database:
In the figure above, you can clearly distinguish between the three levels of abstraction. To
understand it more clearly, let us take an example: We have to create a college database. Now,
what entity sets would be involved? Student, Lecturer, Department, Course and so on…
Now, the entity sets Student, Lecturer, Department, and Course, which will be stored in the
storage as the consecutive blocks of the memory location. This is on the physical or internal
level and is hidden from the programmers, but the database administrator is aware of it.
At the logical level, the programmers define the entity sets, and the relationship among these
entity sets using a programming language like SQL. So, the programmers work at the logical
level, and even the database administrator operates at this level.
At the view level, the users have a set of applications that they use to retrieve the data they
are interested in.
• Organizes Data: The Data model structures data logically and organized, making it
easier to understand and manage.
• Improves Data Quality: The data model helps identify and rectify inconsistencies and
errors in data, leading to better data quality.
• Ensures Data Integrity: The data model enforces constraints and relationships,
ensuring data integrity and preventing data anomalies.
• Supports Decision Making: Well-designed data models provide valuable insights and
support informed decision-making processes.
• Facilitates Database Design: The data model is a crucial step in database design,
helping create efficient and optimized database structures.
• Reduces Redundancy: The data model minimizes data redundancy by eliminating
unnecessary duplication of information.
• Simplifies Data Retrieval: A well-designed data model enables efficient and quick data
retrieval, improving system performance.
• Enhances Application Development: Data models serve as a blueprint for application
development, making integrating data into software solutions easier.
• Enables Scalability: A robust data model supports future growth and scalability,
accommodating additional data without major disruptions.
• Promotes Standardization: The data model promotes standardization and consistency
in data representation across the organization.
• Aids Data Governance: The Data model facilitates data governance initiatives,
ensuring compliance with regulations and data management policies.
• Supports Data Analysis: Data models provide a structured data analysis and reporting
framework, enabling meaningful insights.
• Encourages Collaboration: The data model encourages collaboration among business
analysts, developers, and stakeholders in the data model process.
• Minimizes Development Errors: By defining data requirements upfront, the data
model reduces errors during the development phase.
• Long-term Investment: A well-maintained data model is a long-term investment that
provides value throughout the lifecycle of the data and applications.
• ER diagrams represent the E-R model in a database, making them easy to convert into
relations (tables).
• ER diagrams provide the purpose of real-world modeling of objects, which makes them
useful.
• ER diagrams require no technical knowledge and no hardware support.
• These diagrams are easy to understand and create, even for a naive user.
• It gives a standard solution for visualizing the data logically.
ER Model is used to model the logical view of the system from a data perspective, which
consists of these symbols:
Entities: An Entity may be an object with a physical existence – a particular person, car,
house, or employee – or an object with a conceptual existence – a company, a job, or a
university course. An Entity is defined by attributes.
Entity Set: An Entity is an object of Entity Type, and a set of all entities is called an entity set.
Another way is that Entities have the same set of attributes. They belong to the same entity
set. For example, E1 is an entity with an Entity Type Student, and the set of all students is
called Entity Set. In the ER diagram, Entity Type is represented as:
We can represent the entity set in the ER Diagram but can’t represent the entity in the ER
Diagram because the entity is a row and column in the relation, and the ER Diagram is a
graphical representation of data.
1. Strong Entity: it is a type of entity that has a key Attribute. A strong entity does not
depend on another entity in the schema. It has a primary key that helps identify it
uniquely, and it is represented by a rectangle. These are called Strong Entity Types.
2. Weak Entity: An Entity type has a key attribute uniquely identifying each entity in the
set. However, some entity types exist for which key attributes can’t be defined. These
are called Weak Entity types.
For Example, A company may store the information of dependents (Parents, Children, Spouse)
of an Employee. But the dependents can’t exist without the employee. So, the dependent will
be a weak entity type, and the employee will identify the entity type for the dependent, which
means it is a strong entity type.
A weak entity type is represented by a Double Rectangle. The participation of weak entity
types is always total. The relationship between the weak entity type and its identifying strong
entity type is called the identifying relationship, and it is represented by a double diamond.
Attributes: Attributes are the properties that define the entity type. For example, Roll_No,
Name, DOB, Age, Address, and Mobile_No are the attributes that define entity type Student.
In ER diagram, the attribute is represented by an oval.
The Complete Entity Type Student with its Attributes can be represented as:
A set of relationships of the same type is known as a relationship set. The following
relationship set depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.
Degree of a Relationship Set
The number of different entity sets participating in a relationship set is called the degree of a
relationship set.
Unary Relationship: When there is only ONE entity set participating in a relation, the
relationship is called a unary relationship. For example, one person is married to only one
person.
Binary Relationship: When TWO entities are set to participate in a relationship, the
relationship is called a binary relationship. For example, a Student is enrolled in a Course.
Ternary Relationship: When n entities are set to participate in a relationship, the relationship
is called an n-ary relationship.
Mapping/cardinalities: The number of times an entity of an entity set participates in a
relationship set is known as cardinalities. Cardinality can be of different types:
One-to-One: When each entity in each entity set can take part only once in the relationship,
the cardinality is one-to-one. Let us assume that a male can marry one female and a female
can marry one male. So, the relationship will be one-to-one.
One-to-Many: In one-to-many mapping, each entity can be related to more than one entity,
and the total number of tables that can be used in this is 2. Let us assume that one surgeon
department can accommodate many doctors. So the Cardinality will be 1 to M. It means one
department has many Doctors.
In this case, each student is taking only one course, but one course has been taken by many
students.
Many-to-Many: When entities in all entity sets can take part more than once in the
relationship, cardinality is many to many. Let us assume that a student can take more than
one course, and one course can be taken by many students. So, the relationship will be many
to many.
Using Sets, it can be represented as:
In this example, student S1 is enrolled in C1 and C3, and Course C3 is enrolled in S1, S3, and
S4. So, it is many-to-many relationships.
Participation Constraints
Participation Constraints in database management refer to rules determining the minimum
and maximum participation of entities or relationships in a given relationship set. While partial
participation permits discretionary involvement, total participation requires every entity in
one set to participate in a relationship in another. By maintaining consistency and enforcing
business standards, these restrictions guarantee data integrity.
For example, in a College Database, partial participation would permit courses with no
enrolled students, while entire participation might require all students to be enrolled in at
least one course. For database schemas to effectively replicate real-world circumstances and
enable efficient data management, it is imperative to comprehend and include participation
limitations.
There are two types of participation constraints in database management systems, such as:
Total Participation: Entire participation, sometimes known as required participation, denotes
the requirement that each group member participate in an activity about another group. It’s
similar to saying that to belong to one group, you must somehow be associated with another.
In a university database, for instance, total participation between courses and students
indicates that each student must be registered in at least one course. It follows that no student
can be excluded from a course. It guarantees that every member of one group is connected
to something within another, ensuring that nothing is overlooked or left disconnected.
In the diagram below, The Participation of an entity set E in a relationship set R is said to be
total if every entity in E participates in at least one relationship in R.
The participation of entity set A in the relationship set is total because every entity and entity
set B in the relationship set is total because every entity of B participates in the relationship
set.
Partial Participation: It is also known as optional participation, which allows certain
relationship aspects to be optional. This implies that the way the database is configured does
not require every entity to be linked to every other thing. Consider a database at a university,
for instance. Partial participation can mean that some students are enrolled in classes, but not
all students are registered in them. This flexibility is crucial because it recognizes that not
everything in real life is always connected to everything else. While some objects are
connected to one another, others may stand alone. It permits scenarios where certain
database entities may not be connected to any other entity.
In the diagram below, the participation of an entity set E in a relationship set R is said to be
partial if only some entities in E participate in the relationship set R.
The participation of entity set A in the relationship set is partial because only some entities of
A participate in the relationship set. while the participation of entity set B in the relationship
set is total because every entity of B participates in the relationship set.
Keys: In databases, keys are crucial for data organization and integrity. The primary key
uniquely identifies each record, while foreign keys establish table relationships. Candidate
keys meet uniqueness and minimality criteria, composite keys combine columns for a unique
identifier, and super keys can uniquely identify records. These keys form the foundation of
effective data management.
The keys are maintaining data integrity and organization. They serve as unique identifiers and
establish relationships between tables, enabling efficient data retrieval and manipulation. This
introduction will explore the various types of keys used in databases, including primary,
foreign, candidate, composite, super, etc.
• Super Key: A super key is a set of one or more attributes that can uniquely identify
records within a table. It’s like a "candidate key in progress," as it can contain extra
attributes beyond what’s needed for uniqueness.
• Alternate Key: An alternate key is a candidate key that isn’t chosen as the primary key.
It’s like a backup key that could have been the primary means of identification.
• Primary Key: The primary key is the chosen key that uniquely identifies records in a
table. The field of the primary key is unique and not null. It’s the one that’s used as the
main identifier for each record.
• Foreign Key: A foreign key is a column in one table that links to the primary key in
another table. It establishes relationships between tables and ensures data
consistency.
• Unique Key: A unique key ensures that values in a column or set of columns are unique
within a table. Unlike a primary key, it may allow for NULL values.
• Composite Key: A composite key is formed by combining two or more attributes to
identify records uniquely. It’s used when a single attribute can’t guarantee uniqueness.
• Candidate Key: A candidate key is a set of attributes that can uniquely identify records
in a table. It’s like a contestant for the role of the primary key, and one of them is
chosen as the primary key.