KEMBAR78
DBMS | PDF | Databases | Conceptual Model
0% found this document useful (0 votes)
30 views63 pages

DBMS

This is detail notes of dbms

Uploaded by

sharmanand2213
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views63 pages

DBMS

This is detail notes of dbms

Uploaded by

sharmanand2213
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 63

DBMS

Database Management System

Simran Rawat
28-Sep-23
1

Content

Unit I: - Introduction
 Overview of DBMS
 Database system VS File system
 (three schema architecture)
 Architecture of DBMS
 (data abstraction and data instance)
 Data models
 Entity relationship (ER) diagram
 Types of keys
 Integrity rule
 Data dictionary
 Normalization (1-NF, 2-NF, 3-NF, BC-NF, 4-NF, 5-NF): -
 Inclusion dependencies
 Loss less join decompositions
 Codd’s rules
Unit II: - Transaction Management
 Concept
 ACID properties
 States of transaction
 Serializability:-
 Conflict & view Serializable Schedule
 Check points
 Deadlock Handling
Unit III: - Database querying & concurrency control
 Relational algebra
 Set operations
 Relational calculus
 Step In Query Processing
 Algorithms For Selection, Sorting and Join operations
 Understanding Cost Issues In Queries
 Query optimization
 Transformation of relational expressions
 Query Evacuation plans
 Concurrency Control:-
 Locks based protocols
 Time stamp based Protocols
2

 Validation based protocol


 Multiple Granularity
 Multi-version Schemes
Unit IV: - Recovery system & security
Unit V: - SQL and PL/SQL
Unit VI: -
3

Unit-I introduction

OVERVIEW OF DBMS: -
Data simply refers to the raw facts and figures, data has implicit meaning. For example: - alphabets,
sounds, images, number etc.

Information is processed data, which has some meaning and which is used to take further decisions.
Example: - if we have data about marks of all the students, then we can take decision about the topper
of the class.

After processed information we get knowledge.

Database is a collection of related data or we can say it is a collection of relate data with an implicit
meaning. A database has the following implicit properties: -

 A database represents some aspect of the real world, sometimes called the mini-world or the
universe of discourse (UoD). Changes to the mini-world are reflected in the database.
 A database is a logically coherent collection of data with some inherent meaning. A random
assortment of data cannot correctly be referred to as a database.
 A database is designed, built, and populated with data for a specific purpose. It has an intended
group of users and some preconceived applications in which these users are interested.

Database System is a collection of inter-related data and a set of programs to access those data or we
can say that the database and DBMS system together is called Database system.

Database Management System (DBMS ) is general-purpose software system which facilitates an


interface to perform various operations, like defining, constructing, manipulating and sharing database
among various users and applications. It provides protection and security to database. In case of
multiple users it maintains data consistency. For example SQL Server, Ms access, MySQL, oracle, DB2
etc.

The database definition or descriptive information is also stored in database in the form of a database
cat-log or data dictionary; it is called Meta-data.

Functions of DBMS: -
 Defining a database involves specifying the data types, structures, and constraints of the data to
be stored in the database.
 Constructing the database is the process of storing the data on some storage medium that is
controlled by the DBMS.
 Manipulating the database includes functions such as querying the database to retrieve specific
data, updating the database to reflect changes in the mini-world, and generating reports from
the data.
4

 Sharing a database allows multiple users and programs to access the database simultaneously.
 Protection includes system protection against hardware or software malfunction (or cashes) and
security protection against unauthorized or malicious access.
 A typical large database may have a life cycle of many years, so the DBMS must be able to
maintain the database system by allowing the system to evolve as requirements change over
time.
An application program accesses the database by sending queries or requests for data to the DBMS. A
query typically causes some data to be retrieved a transaction may cause some data to be read and
some data to be written into the database.

Structured database are those database which we manage in the form of tables.

Unstructured database are those database which are unmanaged.

Interfaces are points of communication between different components of an application or system.

Advantages of DBMS/ solution of the problems of file system: -


1) Data redundancy and inconsistency: - Redundancy is the concept of repetition of data i.e. each
data may have more than a single copy. DBMS controls redundancy by maintaining a single
repository of data that is defined once and is accessed by many users. As there is no or less
redundancy, data remains consistent.
2) Data sharing: - DBMS, data can be shared easily due to a centralized system.
5

3) Data concurrency: - concurrent access to data means more than one user is accessing the same
data at the same time. Anomalies occur when changes made by one user get lost because of
changes made by another user. DBMS provides a locking system to stop anomalies to occur.
4) Data searching: - DBMS provides inbuilt searching operations. The user only has to write a small
query to retrieve data from the database.
5) Data integrity: - There may be cases when some constraints need to be applied to the data
before inserting it into the database. DBMS maintains data integrity by enforcing user-defined
constraints on data itself.
6) System crashing: - In some cases, systems might have crashed due to various reasons. DBMS
will have the recovery manager which retrieves the data making it another advantage over file
systems.
7) Data security: - DBMS has specialized algorithms and features that help provide shielding to its
data.
8) Backup: - it creates a backup subsystem to restore the data if required.
9) Interfaces: - It provides different multiple user interfaces like graphical user interface and
application program interface.
10) Easy maintenance: - It is easily maintainable due to its centralized nature.

Disadvantages of DBMS: -
1) Cost of hardware and software: - For DBMS it is mandatory to have a high speed processor and
also a large memory size. The requirement of this large amount of space and a high speed
processor needs expensive hardware and software.
2) Complexity: - variety of functionalities has made the database management system extremely
complex software. For DBMS it is very important to have a good knowledge of it by the
developers, DBA, designers and also the end-users. This is because if any one of them does not
acquire proper and complete skills then this may lead to data loss or database failure.
3) Currency maintenance: - this is very necessary to keep your system current because efficiency
which is one of the biggest factors and needs to be overlooked must be maximized. For this,
frequent updating must be performed on all the components as new threats come daily. DBMS
should be updated accordingly to the current scenario. Also, security measures must be
implemented. Due to advancement in database technology, training cost tends to be significant.
4) Performance: - the traditional file system is written for small organizations and for some specific
applications due to which performance is generally very good. But for the small scale firms
DBMS does not give a good performance as its speed is very slow. As result, some applications
will not run as fast as they could. Hence it is not good to use DBMS for small firms. Because
performance is a factor that everyone overlooks, the system’s speed depends on the
performance, so performance needs to be good.
5) Frequency upgrade/replacement cycles: - nowadays in this world, we need to stay up-to-date
about the latest technologies, developments arriving in the market. Frequent upgrade of the
products is done by the DBMS vendors to add new functionality to the systems. New upgrade
versions of software often come bundled. Sometimes these update also need hardware
upgrades. Sometimes these changes and updates are so fast that the users find it difficult to
6

work with that system because it is not easy to learn new commands and understand them
again when new upgrades are done. All this upgrades costs money to train users, design, etc. to
use new features.
6) Damaged part: - if one part of database is corrupted or damaged, then entire database may get
affected.

Types of databases: -
1) Centralized database: - It is the type of database that stores data at a centralized database
system. It comforts the users to access the stored data from different locations through several
applications. These applications contain the authentication process to let users access data
securely. An example of a Centralized database can be Central Library that carries a central
database of each library in a college/university.
2) Distributed database: - Unlike a centralized database system, in distributed systems, data is
distributed among different database systems of an organization. These database systems are
connected via communication links. Such links help the end-users to access the data
easily. Examples of the Distributed database are Apache Cassandra, HBase, Ignite, etc. we can
further divide a distributed database system into:
Homogeneous DDB: - Those database systems which execute on the same operating system
and use the same application process and carry the same hardware devices.
Heterogeneous DDB: - Those database systems which execute on different operating systems
under different application procedures, and carries different hardware devices.
3) Relational database: - This database is based on the relational data model, which stores data in
the form of rows(tuple) and columns(attributes), and together forms a table(relation). A
relational database uses SQL for storing, manipulating, as well as maintaining the data. E.F. Codd
invented the database in 1970. Each table in the database carries a key that makes the data
unique from others. Examples of Relational databases are MySQL, Microsoft SQL Server, Oracle,
etc. Relational database has ACID properties.
4) NoSQL database: - Non-SQL/Not Only SQL is a type of database that is used for storing a wide
range of data sets. It is not a relational database as it stores data not only in tabular form but in
several different ways. It came into existence when the demand for building modern
applications increased. Thus, NoSQL presented a wide variety of database technologies in
response to the demands. We can further divide a NoSQL database into the following four
types: key-value storage, document-oriented database, graph databases and wide-column
stores.
5) Cloud database: - A type of database where data is stored in a virtual environment and executes
over the cloud computing platform. It provides users with various cloud computing services
(SaaS, PaaS, IaaS, etc.) for accessing the database. There are numerous cloud platforms, but the
best options are: amazon web services (AWS), Microsoft Azure, kamatera, phonixNAP, science
Soft, Google Cloud SQL, etc.
6) Object-oriented databases: - The type of database that uses the object-based data model
approach for storing data in the database system. The data is represented and stored as objects
which are similar to the objects used in the object-oriented programming language.
7

7) Hierarchical databases: - It is the type of database that stores data in the form of parent-
children relationship nodes. Here, it organizes data in a tree-like structure. Data get stored in the
form of records that are connected via links. Each child record in the tree will contain only one
parent. On the other hand, each parent record can have multiple child records.

8) Network databases: - It is the database that typically follows the network data model. Here, the
representation of data is in the form of nodes connected via links between them. Unlike the
hierarchical database, it allows each record to have multiple children and parent nodes to form
a generalized graph structure.
9) Personal databases: - Collecting and storing data on the user's system defines a Personal
Database. This database is basically designed for a single user.
10) Operational databases: - The type of database which creates and updates the database in real-
time. It is basically designed for executing and handling the daily data operations in several
businesses. For example, An organization uses operational databases for managing per day
transactions.
11) Enterprise database: - Large organizations or enterprises use this database for managing a
massive amount of data. It helps organizations to increase and improve their efficiency. Such a
database allows simultaneous access to users.

Applications of database: -
 Banking: For customer information, accounts, and loans, and banking transactions.
 Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner—terminals situated around the world
accessed the central database system through phone lines and other data networks.
 Universities: For student information, course registrations, and grades.
 Credit card transactions: For purchases on credit cards and generation of monthly statements.
 Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
 Finance: For storing information about holdings, sales, and purchases of financial instruments
such as stocks and bonds.
 Sales: For customer, product, and purchase information.
 Manufacturing: For management of supply chain and for tracking production of items in
factories, inventories of items in warehouses/stores, and orders for items.
 Human resources: For information about employees, salaries, payroll taxes and benefits, and for
generation of paychecks.
8

DBMS VS. FILE SYSTEM: -

File system: -
 The File system is basically a way of arranging the files, a storage medium like a hard disk.
 File system based systems were an attempt to computerize the manual system. It is also called a
traditional based approach in which a decentralized approach was taken where each
department stored and controlled its own data processing specialist.
 The file system organizes the files and helps in the retrieval of files when they are required.
 File systems consist of different files which are grouped into directories. The directories further
contain other folders and files.
 The file system performs basic operations like management, file naming, giving access rules, etc.
Example: - NTFS (New Technology File System), EXT (Extended File System).

Disadvantages/ Problems of File System: -


1) Difficulty to accessing data: - Conventional file-processing environments so not allow needed
data to be retrieved in a convenient and efficient manner. More responsive data-retrieval
systems are required for general use.
For example suppose that one of the bank officers needs to find out the names of all customers
who live within a particular postal-code area the officer has to ask data-processing department
to generate such a list because the designers of the original system did not anticipate this
request, there is no application program on hand to meet it.
2) Data redundancy: - Since different programmers create the files and application programs over
a long period, the various files are likely to have different. Moreover, the same information may
be duplicated in several places (files). For example, the address and telephone number of a
particular customer may appear in a file that consists of savings-account records and in a file
that consists of checking-account records. This redundancy leads to higher storage and access
cost.
3) Data inconsistency: - that is, the various copies of the same data may no longer agree. There
may be a possibility that two users are maintaining the data of the same file for different
applications. Hence changes made by one user do not reflect in files used by second user’s .For
example, a changed customer address may be reflected in savings-account records but not
elsewhere in the system.
4) Data isolation: - Because data are scattered in various files, and files may be in
different formats, writing new application programs to retrieve the appropriate data
is difficult.
9

5) Data integrity problems: - There may be cases when some constraints need to be applied to the
data before inserting it into the database. The file system does not provide any procedure to
check these constraints automatically.
For example the balance of a bank account may never fall below a prescribed amount (say
100rs). Developers enforce these constraints in the system by adding appropriate code in
various application programs. However, when new constraints are added, it is difficult to change
the programs enforce them.
6) Atomicity problems: - Operations performed in the database must be atomic i.e. either the
operation takes place as a whole or does not take place at all.
7) For example consider a program to transfer 50rs from account A to account B. If the system
failure occurs during the execution of the program, it is possible that the 50rs was removed
from account A but was not credited to account B, resulting in an inconsistent database state.
Clearly, it is essential to database consistency that either both the credit and debit occur, or that
neither occur.
8) Concurrent-access anomalies: - concurrent access to data means more than one user is
accessing the same data at the same time. For the sake of overall performance of the system
and faster response, many systems allow multiple users to update the data simultaneously. In
such an environment, interaction of concurrent updates may result in inconsistent data. The file
system does not provide any procedure to stop anomalies.
9) Security problems: - not every user of the database system should be able to access all the data.
For example in a banking system, payroll personnel need to see only that part of the database
that has information about customer accounts. But, since application programs are added to the
system in an ad hoc manner, enforcing such security constraints is difficult.

Difference between File System and DBMS: -

Basis DBMS File System


Meaning DBMS is a collection of data. In The file system is a collection of
DBMS the user is not required to data. In this system, the user has
write the procedures. to write the procedures for
managing the database.
Sharing of Data Due to the centralized approach, Data is distributes in many files,
data sharing is easy. and many files, and it may be of
different formats, so it isn’t easy to
share data.
Security and DBMS provides a good protection It isn’t easy to protect a file under
protection mechanism the file system.
Recovery mechanism DBMS provides a crash recovery The file system doesn’t have a
mechanism, i.e., DBMS protects crash mechanism, i.e., if the
the user from system failure. system crashes while entering
some data, then the content of the
file will be lost.
10

Manipulation DBMS contains a wide variety of The file system can’t efficiently
techniques sophisticated techniques to store store and retrieve the data.
and retrieve the data.
Concurrency problems DBMS takes care of concurrent In the file system, concurrent
access of data using some form of access has many problems like
locking. redirecting the file while deleting
some information or updating
some information.
where to use Database approach used in large File system approach used in large
systems which interrelate many systems which interrelate many
files. files.
Cost The database system is expensive The file system approach is
to design. cheaper to design.
Data redundancy and Due to the centralization of the In this, the files and application
inconsistency database, the problems of data programs are created by different
redundancy and inconsistency are programmers so that there exists a
controlled. lot of duplication of data which
may lead to inconsistency.
Structure The database structure is complex The file system approach has a
to design. simple structure.
Data independence In this system, data independence In the file system approach, there
exists, and it can be of two types. exists no data independence.
Logical data independence and
physical data independence.
Integrity constraints Integrity constraints are easy to Integrity constraints are difficult to
apply. implement in file system.
Data models In the database approach, 3 types In the file system approach, there
of data models exist: Hierarchal is no concept of data models
data models, Network data exists.
models and Relational data
models.
Flexibility Changes are often a necessity to The flexibility of the system is less
the content of the data stored in as compares to the DBMS
any system, and these changes are approach.
more easily with a database
approach.
Example Oracle, SQL Server, Sybase etc. Cobol, C++ etc.

Database administrators (DBA): -


 In any organization where many people use the same resources, there is a need for a chief
administrator to oversee and manage these resources. In a database environment, the primary
11

resource is the database itself, and the secondary resource is the DBMS and related software.
Administering these resources is the responsibility of the database administrator (DBA).
 Definition: -The DBA is responsible for authorizing access to the database, coordinating and
monitoring its use and acquiring software and hardware resources as needed.
 The DBA is accountable for the problems such as security breaches and poor system response
time. In large organizations, the DBA is assisted by a staff that carries out these functions.

Database Designers: -
 Database designers are responsible for identifying the data t be stored in the database and for
choosing appropriate structures to represent and store this data. These tasks are mostly
undertaken before the database is actually implemented and populated with data.
 It is responsibility of database designers to communicate with all prospective database users in
order to understand their requirements and to create a design that meets these requirements.

End users: -
End users are the people whose jobs require access to the database for querying, updating and
generating reports; the database primarily exists for their use. There are several categories of end users:

 Casual end users: - Occasionally access the database, but they may need different information
each time. They use a sophisticated database query interface to specify their requests and
typically middle or high level managers or other occasional browsers.
 Naive or parametric end users: - Naive end users make up a sizable portion of database end
users. Their main job function revolves around constantly querying and updating the database,
using standard types of queries and updates called canned transactions that have been carefully
programmed and tested. A few examples are: Bank customers, post withdrawals and deposits,
reservation agents or customers for airlines, hotels and car rental, social media users post and
read items on social media websites.
 Sophisticated end users: - Sophisticated users include engineers, scientists, business analysts
and others who thoroughly familiarize themselves with the facilities of the DBMS on order to
implement their own applications to meet their complex requirements.
 Standalone users: - these users maintain personal database by using ready-made program
packages that provide easy-to-use menu-based or graphics-based interfaces. An example is the
user of a financial software package that stores a variety of personal financial data.

Application programmers: -
Application programmers implement these specifications as programs; then they test, debug, document
and maintain these canned transactions. Such analysts and programmers-commonly referred to as
software developers or software engineers.
12

ARCHITECTURE OF DBMS: -
 The DBMS design depends upon its architecture. The basic client/server architecture is used to
deal with a large number of PCs, web servers, database servers and other components that are
connected with networks.
 The client/server architecture consists of many PCs and a workstation which are connected via
the network.
 DBMS architecture depends upon how users are connected to the database to get their request
done

1) 1 tier architecture: -
In this architecture, the database is directly available to the user. It means the user can directly
sit on the DBMS and uses it.
Any changes done here will directly be done on the database itself. It doesn't provide a handy
tool for end users.
The 1-Tier architecture is used for development of the local application, where programmers
can directly communicate with the database for the quick response.

Advantages of 1 tier architecture: -


o 1-tier architecture is the simplest architecture to set up, as only a single machine is
required to maintain it.
o No additional hardware is requires for implementing 1-tier architecture, which makes it
cost-effective.
o Easy to implement: 1-tier architecture can be easily deployed and hence it is mostly
used in small projects.

2) 2-tier/server architecture for DBMS: -


The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications
on the client end can directly communicate with the database at the server side. For this
interaction, API's like: ODBC, JDBC are used.
The user interfaces and application programs are run on the client-side. The server is
responsible to provide the functionalities like: query processing and transaction management.
To communicate with the DBMS, client-side application establishes a connection the server side.
Advantages of 2-tier architecture: -
o 2-tier architecture makes easy access to the database, which makes fast retrieval.
13

o We can scale the database easily, by adding clients or upgrading hardware.


o 2-tier architecture is cheaper than 3-tier architecture and multi architecture.
o 2-tier architecture is easier to deploy than 3-tier architecture.
o It is easily understandable as well as simple because of only two components.

3) 3 tier/ n-tier architecture for web applications: -


The 3-tier architecture also known as the three layer architecture contains a layer between the
client and server. In this architecture, client can’t commutate with the server.
The application on the client-end interacts with an application server which further
communicates with the database system. The end-user has no idea about the existence of the
database beyond the application server.
The 3-tier architecture is used in case of large web application.
The three layers are:
o Presentation layer: this layer represents the user interface of the application and is
responsible for presenting data to the user and receiving user input. Includes the
graphical user interface (GUI) components.
o Application layer: this layer contains the business logic of the application and performs
the core processing of the data. It is responsible for the processing user requests,
retrieving and manipulating data and performing complex operations. It includes
modules such as controllers, services and API’s.
o Data storage layer: this layer is responsible for managing the data storage and retrieval
in the system. It includes the database or file system where data is stored, as well as the
data access layer (DAL) that interacts with the database to read and write data.

The 3 layers are connected through well-defined interfaces, allowing each layer to function
independently and enabling easy modification and maintenance of the application.

Advantages of 3-tier architecture: -


o Scalability is enhanced due to the distributed deployment of application servers.
Now, individual connections need not be made between the client and server.
o 3-tier architecture maintains Data integrity. Since there is a middle layer between
the client and server, data corruption can be avoided/ removed.
14

o 3-tier architecture Improves security. This type of model prevents direct interaction
of the client with the server thereby reducing access to unauthorized data.

Three schema architecture: -


The three schema architecture is also called ANSI/SPARC architecture or three-level architecture. This
framework is used to describe the structure of a specific database system. The three schema
architecture is also used to separate the user applications and physical database. The three schema
architecture contains three-levels. It breaks the database down into three different categories.

The main objective of three level architecture is to enable multiple users to access the same data with a
personalized view while storing the underlying data only once. Thus it separates the user's view from
the physical structure of the database.

The three-schema architecture is as follows:

1) External level: - At the external level, a database contains several schemas that sometimes
called as subschema. The subschema is used to describe the different view of the database. An
15

external schema is also known as view schema. Each view schema describes the database part
that a particular user group is interested and hides the remaining database from that user
group. The view schema describes the end user interaction with database systems.

2) Conceptual level: -The conceptual schema describes the design of a database at the conceptual
level. Conceptual level is also known as logical level. The conceptual schema describes the
structure of the whole database. The conceptual level describes what data are to be stored in
the database and also describes what relationship exists among those data. In the conceptual
level, internal details such as an implementation of the data structure are hidden. Programmers
and database administrators work at this level.

3) Internal level: - The internal level has an internal schema which describes the physical storage
structure of the database. The internal schema is also known as a physical schema. It uses the
physical data model. It is used to define that how the data will be stored in a block. The physical
level is used to describe complex low-level data structures in detail.

Data abstraction generally refers to the suppression of details of data organization and storage, and
the highlighting of the essential features for an improved understanding of data. One of the main
characteristics of the database approach is support data abstraction so that different users can perceive
data at their preferred level of details.

DATA MODELS: -
Data model is a collection of concepts that can be used to describing data, data relationships, data
semantics and consistency constraints. Here are two types of data models: -
16

 ER model is a high level data model. It is based on a perception of a real world that consists of a
collection of basic objects, called entities and of relationships among these objects.
 The Relational model is a lower-level model. It uses a collection of tables to represent both data
and the relationships among those data.
 Object-based Data Model: An extension of the ER model with notions of functions,
encapsulation, and object identity, as well. This model supports a rich type system that includes
structured and collection types. Thus, in 1980s, various database systems following the object-
oriented approach were developed. Here, the objects are nothing but the data carrying its
properties.
 Semi structured Data Model: This type of data model is different from the other three data
models (explained above). The semi structured data model allows the data specifications at
places where the individual data items of the same type may have different attributes sets. The
Extensible Mark-up Language, also known as XML, is widely used for representing the semi
structured data. Although XML was initially designed for including the mark-up information to
the text document, it gains importance because of its application in the exchange of data.

Designers often formulate database schema design by first modelling data at a high level, using the ER
model and then translating it into the relational model.

Entity-Relationship Model:

The Entity-relationship (E-R) model is a high level data model. It is based on a perception of a real world
that consists of a collection of basic objects, called entities and of relationships among these objects.
The E-R model employs three basic notions: entity sets, relationship sets and attributers.

It was developed to facilitate database design by allowing specification of an enterprise schema, which
represents the overall logical structure of a database.

The E-R model is very useful in mapping the meanings and interactions of real-world enterprises onto a
conceptual schema. Because of this usefulness, many database-design tools draw on concepts from the
E-R model.

Why use ER diagrams in DBMS? / advantages of ER model: -


 ER diagrams are used to represent the ER model in a database, which makes the easy to be
converted into relations (tables).
 ER diagrams provide the purpose of real-world modelling of objects which makes them intently
useful.
 ER diagrams require no technical knowledge and no hardware support.
 These diagrams are very easy to understand and easy to create even for a naïve user.
 It gives a standard solution for visualizing the data logically.

Symbol used in ER model: ER model is used to model the logical view of the system from a data
perspective which consists of these symbols:
17

The E-R model employs three basic notions / components of ER diagram:

Entity: -
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For
example each person in an enterprise is an entity. An entity has a set of properties and the values for
some set of properties may uniquely identify an entity. For instance, a person may have a person-id
property whose value uniquely identifies that person in the enterprise.

An entity set is a set of entities of the same type that share the same properties, or attributes. For
example, the set of all persons who are customers at a given bank can be defined as the entity set
customer. Similarly, the entity set loan might represent the set of all loans awarded by a particular bank.
The individual entities that constitute a set are said to be the extension of the entity set. Thus, all the
individual bank customers are the extension of the entity set customer.
18

o Strong entity is a type of entity that has a key attribute. Strong entity does not depend on other
entity in the schema. It has a primary key, that helps in identifying it uniquely and it is
represented by a rectangle. These are called strong entity types.
o Weak entity has a key attribute that uniquely identifies each entity in the entity set. But some
entity type exists for which key attributes can’t be defines. These are called weak entity types. A
weak entity is represented by a double rectangle.
For Example, A company may store the information of dependents (Parents, Children, Spouse)
of an Employee. But the dependents don’t have existed without the employee. So Dependent
will be a Weak Entity Type and Employee will be Identifying Entity type for Dependent, which
means it is Strong Entity Type.

Attributes: Attributes are the properties that define the entity type. For example, Roll_No, Name, DOB,
Age, Address, and Mobile_No are the attributes that define entity type Student. In ER diagram, the
attribute is represented by an oval.

o Key attribute: the attribute which uniquely identifies each entity in the entity set is called key
attribute. For example, Roll_No will be unique for each student. In ER diagram, the key
attribute is represented by an oval with underlying lines.
o Composite attribute: an attribute composed if many other attributes is called a composite
attribute. For example, the address attribute of the student entity type consists of street, city,
state and country. In ER diagram, the composite attribute is represented by an oval comprising
of ovals.
19

o Multivalued attribute: An attribute consisting of more than one value for a given entity. For
example, phone_no (can be more than one for a given student). In ER diagram, a multivalued
attribute is represented by a double oval.

o Derived attribute: An attribute that can be derived from other attributes of the entity type is
known as a derived attribute, e.g, age (can be derived from DOB). In ER diagram, the derived
attribute id represented by a dashed oval.

The complete entity type student with its attributes can be represented as:
20

Relationship sets and relationship type: A relationship describes the relation between entities. For
example, ‘Enrolled in’ is a relationship type that exists between entity type student and course. In ER
diagram, the relationship type is represented by a diamond and connecting the entities with lines.

A set of relationships of the same type is known as a relationship set. The following relationship set
depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.

Types of relationships: -
21

o One-to-one relationship: When only one instance of an entity is associated with the
relationship, then it is known as one to one relationship. For example a female can marry to one
male and a male can marry to one female.

o One-to-many relationship: When only one instance of the entity on the left, and more than one
instance of an entity on the right associates with the relationship then this is known as a one-to-
many relationship. For example Scientist can invent many inventions, but the invention is done
by the only specific scientist.

o Many–to-one relationship: When more than one instance of the entity on the left, and only one
instance of an entity on the right associates with the relationship then it is known as a many-to-
one relationship. For example Student enrols for only one course, but a course can have many
students.

o Many-to-many relationship: When more than one instance of the entity on the left, and more
than one instance of an entity on the right associates with the relationship then it is known as a
many-to-many relationship. For example, Employee can assign by many projects and project can
have many employees.

Mapping constraints: -

Cardinality: -
Cardinality means how the entities are arranged to each other or what is the relationship structure
between entities in a relationship set. In a Database Management System, Cardinality represents a
number that denotes how many times an entity is participating with another entity in a relationship set.
The Cardinality of DBMS is a very important attribute in representing the structure of a Database. In a
table, the number of rows or tuples represents the Cardinality. Cardinality can be of different types:

o (1:1) one to one cardinality: One to one cardinality is represented by a 1:1 symbol. In this, there
is at most one relationship from one entity to another entity. There are a lot of examples of one-
to-one cardinality in real life databases. For example, one student can have only one student id,
22

and one student id can belong to only one student. So, the relationship mapping between
student and student id will be one to one cardinality mapping.

o (M:1) Many to one cardinality: In many to one cardinality mapping, from set 1, there can be
multiple sets that can make relationships with a single entity of set 2. Or we can also describe it
as from set 2, and one entity can make a relationship with more than one entity of set 1. One to
one Cardinality is the subset of Many to one Cardinality. For example, there are multiple
patients in a hospital who are served by a single doctor, so the relationship between patients
and doctors can be represented by many to one Cardinality.

o (1:M) One to many cardinality: In One-to-many cardinality mapping, from set 1, there can be a
maximum single set that can make relationships with a single or more than one entity of set 2.
Or we can also describe it as from set 2, more than one entity can make a relationship with only
one entity of set 1. One to one cardinality is the subset of One-to-many Cardinality. For
Example, in a hospital, there can be various compounders, so the relationship between the
hospital and compounders can be mapped through One-to-many Cardinality.
23

o (M:M) Many to many cardinality: In many, many cardinalities mapping, there can be one or
more than one entity that can associate with one or more than one entity of set 2. In the same
way from the end of set 2, one or more than one entity can make a relation with one or more
than one entity of set 1.
It is represented by M: N or N: M. One to one cardinality, One to many cardinalities, and Many
to one cardinality is the subset of the many to many cardinalities.
For Example, in a college, multiple students can work on a single project, and a single student
can also work on multiple projects. So, the relationship between the project and the student can
be represented by many to many cardinalities.

Participation constraint is applied to the entity participating in the relationship set. Participation
constraints tell us that the participation in a relationship can either be total or partial.

o Total participation: Each entity in the entity set must participate in the relationship. If each
student must enroll in a course, the participation of students will be total. Total participation
is shown by a double line in the ER diagram.
o Partial participation: The entity in the entity set may or may NOT participate in the
relationship. If some courses are not enrolled by any of the students, the participation in the
course will be partial.
o The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total
participation and Course Entity set having partial participation.
24

Keys in DBMS: -
Keys play an important role in the relational database. It is used to uniquely identify any record or row of
data from the table. It is also used to establish and identify relationships between tables. For
example, ID is used as a key in the Student table because it is unique for each student. In the PERSON
table, passport_number, license_number, SSN are keys since they are unique for each person.

TYPES OF KEYS: -

1) Primary key: - It is the first key used to identify one and only one instance of an entity uniquely.
An entity can contain multiple keys, as we saw in the PERSON table. The key which is most
suitable from those lists becomes a primary key.
In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License_Number and Passport_Number as primary keys
since they are also unique.
For each entity, the primary key selection is based on requirements and developers.
25

2) Candidate key: - A candidate key is an attribute or set of attributes that can uniquely identify a
tuple.
Except for the primary key, the remaining attributes are considered a candidate key. The
candidate keys are as strong as the primary key. For example: In the EMPLOYEE table, id is best
suited for the primary key. The rest of the attributes, like SSN, Passport_Number,
License_Number, etc., are considered a candidate key.

3) Super Key: - Super key is an attribute set that can uniquely identify a tuple. A super key is a
superset of a candidate key. For example: In the above EMPLOYEE table, for(EMPLOEE_ID,
EMPLOYEE_NAME), the name of two employees can be the same, but their EMPLYEE_ID can't
be the same. Hence, this combination can also be a key. The super key would be EMPLOYEE-ID
(EMPLOYEE_ID, EMPLOYEE-NAME), etc.

4) Foreign Key: - Foreign keys are the column of the table used to point to the primary key of
another table.
Every employee works in a specific department in a company, and employee and department
are two different entities. So we can't store the department's information in the employee
table. That's why we link these two tables through the primary key of one table.
We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the
EMPLOYEE table. In the EMPLOYEE table, Department_Id is the foreign key, and both the tables
are related.
26

5) Alternate Key: - There may be one or more attributes or a combination of attributes that
uniquely identify each tuple in a relation. These attributes or combinations of the attributes are
called the candidate keys. One key is chosen as the primary key from these candidate keys, and
the remaining candidate key, if it exists, is termed the alternate key. In other words, the total
number of the alternate keys is the total number of candidate keys minus the primary key. The
alternate key may or may not exist. If there is only one candidate key in a relation, it does not
have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as
candidate keys. In this relation, Employee_Id is chosen as the primary key, so the other
candidate key, PAN_No, acts as the Alternate key.

6) Composite Key: - Whenever a primary key consists of more than one attribute, it is known as a
composite key. This key is also known as Concatenated Key. For example, in employee relations,
we assume that an employee may be assigned multiple roles, and an employee may work on
multiple projects simultaneously. So the primary key will be composed of all three attributes,
namely Emp_ID, Emp_role, and Proj_ID in combination. So these attributes act as a composite
key since the primary key comprises more than one attribute.

7) Artificial Key: - The key created using arbitrarily assigned data are known as artificial keys. These
keys are created when a primary key is large and complex and has no relationship with many
other relations. The data values of the artificial keys are usually numbered in a serial order. For
example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in
27

employee relations. So it would be better to add a new virtual attribute to identify each tuple in
the relation uniquely.

Generalization (Bottom to up)


Generalization is like a bottom-up approach in which two or more entities of lower level combine to
form a higher level entity if they have some attributes in common. In generalization, an entity of a
higher level can also combine with the entities of the lower level to form a further higher level entity.
Generalization is more like subclass and superclass system, but the only difference is the approach.
Generalization uses the bottom-up approach.
In generalization, entities are combined to form a more generalized entity, i.e., subclasses are combined
to make a superclass.
For example, Faculty and Student entities can be generalized and create a higher level entity Person.

Specialization (Top to bottom)


Specialization is a top-down approach, and it is opposite to Generalization. In specialization, one higher
level entity can be broken down into two lower level entities.
Specialization is used to identify the subset of an entity set that shares some distinguishing
characteristics.
Normally, the superclass is defined first, the subclass and its related attributes are defined next, and
relationship set are then added.
For example: In an Employee management system, EMPLOYEE entity can be specialized as TESTER or
DEVELOPER based on what role they play in the company.
28

Aggregation
In aggregation, the relation between two entities is treated as a single entity. In aggregation,
relationship with its corresponding entities is aggregated into a higher level entity.

For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor. In the real world, if a visitor visits a coaching center then he will
never enquiry about the Course only or just about the Center instead he will ask the enquiry about both.

Relational Data-model

Relational model can represent as a table with columns and rows. Each row is known as a tuple. Each
table of the column has a name or attribute.

Domain: It contains a set of atomic values that an attribute can take.


Attribute: It contains the name of a column in a particular table. Each attribute Ai must have a domain,
dom(Ai)
Relational instance: In the relational database system, the relational instance is represented by a finite
set of tuples. Relation instances do not have duplicate tuples.
Relational schema: A relational schema contains the name of the relation and name of all columns or
attributes.
29

Relational key: In the relational key, each row has one or more attributes. It can identify the row in the
relation uniquely.

RDBMS (Relational DBMS)

All modern database management systems like SQL, MS SQL Server, IBM DB2, ORACLE, My-SQL, and
Microsoft Access are based on RDBMS. Data is represented in terms of tuples (rows) in RDBMS.
It is called Relational Database Management System (RDBMS) because it is based on the relational
model introduced by E.F. Codd.

What is relation/table in RDMS?


Everything in a relational database is stored in the form of relations. The RDBMS database uses tables to
store data. A table is a collection of related data entries and contains rows and columns to store data.
Each table represents some real-world objects such as person, place, or event about which information
is collected. The organized collection of data into a relational table is known as the logical view of the
database.
Properties of Relation: -
 Each relation has a unique name by which it is identified in the database.
 Relation does not contain duplicate tuples.
30

 The tuples of a relation have no specific order.


 All attributes in a relation are atomic, i.e., each cell of a relation contains exactly one value.

What is a row / record?


A row of a table is also called a record or tuple. It contains the specific information of each entry in the
table. It is a horizontal entity in the table. For example, the above table contains 5 records.
Properties of a row:
 No two tuples are identical to each other in all their entries.
 All tuples of the relation have the same format and the same number of entries.
 The order of the tuple is irrelevant. They are identified by their content, not by their position.

What is a column / attribute?


A column is a vertical entity in the table which contains all information associated with a specific field in
a table. For example, "name" is a column in the above table which contains all information about a
student's name.
Properties of an Attribute:
 Every attribute of a relation must have a name.
 Null values are permitted for the attributes.
 Default values can be specified for an attribute automatically inserted if no other value is
specified for an attribute.
 Attributes that uniquely identify each tuple of a relation are the primary key.
31

What is data item / cells?


The smallest unit of data in the table is the individual data item. It is stored at the intersection of tuples
and attributes.
Properties of data items:
 Data items are atomic.
 The data items for an attribute should be drawn from the same domain.
In the below example, the data item in the student table consists of Ajeet, 24 and Btech, etc.

Degree: The total number of attributes that comprise a relation is known as the degree of the table.

Cardinality: The total number of tuples at any one time in a relation is known as the table's cardinality.
The relation whose cardinality is 0 is called an empty table.

Domain: The domain refers to the possible values each attribute can contain. It can be specified using
standard data types such as integers, floating numbers, etc. For example, An attribute entitled
Marital_Status may be limited to married or unmarried values.

NULL values: The NULL value of the table specifies that the field has been left blank during record
creation. It is different from the value filled with zero or a field that contains space.

Data integrity: There are the following categories of data integrity exist with each RDBMS:
 Entity integrity: It specifies that there should be no duplicate rows in a table.
 Domain integrity: It enforces valid entries for a given column by restricting the type, the format,
or the range of values.
 Referential integrity specifies that rows cannot be deleted, which are used by other records.
 User-defined integrity: It enforces some specific business rules defined by users. These rules are
different from the entity, domain, or referential integrity.
32

Difference between DBMS and RDMS:

DBMS RDBMS
DBMS applications store data as files RDMS applications store data in a tabular form
In DBMS, data is generally stored in either a In RDMS, the tables have an identifier called
hierarchical form or a hierarchical form or a primary key and the data values are stored in the
navigational form form of tables.
Normalization is not present in DBMS Normalization is present in RDBMS
DBMS does not apply any security with regard RDBMS defines the integrity constraint for the
to data manipulation. purpose of ACID property.
DBMS uses file system to store data, so there In RDBMS data values are stored in the form of
will be no relation between the tables. tables so relationship between these data values
will be stored in the form of a table as well.
DBMS has to provide some uniform methods to RDBMS system supports a tabular structure of
access the stored information the data and a relationship between them to
access the stored information.
DBMS is meant to be for small organization and RDBMS is designed to handle large amount of
deal with small data. It supports single user. data. It supports multiple users.
DBMS does not support distributed database. RDBMS supports distributed database.
Examples: file system, xml. Examples: MySQL, postgre, sql server, oracle.

INTEGRITY CONSTRAINTS / RULES: -

Integrity constraints are a set of rules. It is used to maintain the quality of information.
Integrity constraints ensure that the data insertion, updating, and other processes have to be performed
in such a way that data integrity is not affected.
Thus, integrity constraint is used to guard against accidental damage to the database.

1) Domain constraints: Domain constraints can be defined as the definition of a valid set of values
for an attribute. The data type of domain includes string, character, integer, time, date,
currency, etc. The value of the attribute must be available in the corresponding domain.
33

2) Entity integrity constraint: The entity integrity constraint states that primary key value can't be
null. This is because the primary key value is used to identify individual rows in relation and if
the primary key has a null value, then we can't identify those rows. A table can contain a null
value other than the primary key field.

3) Referential integrity constraint: A referential integrity constraint is specified between two


tables. In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary
Key of Table 2, then every value of the Foreign Key in Table 1 must be null or be available in
Table 2.

4) Key constraints: Keys are the entity set that is used to identify an entity within its entity set
uniquely. An entity set can have multiple keys, but out of which one key will be the primary key.
A primary key can contain a unique and null value in the relational table.
34

DATA DICTIONARY: -

The database definition or descriptive information is also stored in database in the form of a database
cat-log or data dictionary; it is called Meta-data.

The types of information a system must store are:


 Name of the relations
 Name of the attributes of each relation
 Lengths and domains of attributes
 Name and definitions of the views defined on the database
 Various integrity constraints

With this, the system also keeps the following data based on users of the system:
 Name of authorized users
 Accounting and authorization information about users.
 The authentication information for users, such as passwords or other related information.

In addition to this, the system may also store some statistical and descriptive data about the relations,
such as:
 Number of tuples in each relation
 Method of storage for each relation, such as clustered or non-clustered.

A system may also store the storage organization, whether sequential, hash, or heap. It also notes the
location where each relation is stored:
 If relations are stored in the files of the operating system, the data dictionary note, and stores
the names of the file.
 If the database stores all the relations in a single file, the data dictionary notes and store the
blocks containing records of each relation in a data structure similar to a linked list.

At last, it also stores the information regarding each index of all the relations:
 Name of the index.
 Name of the relation being indexed.
 Attributes on which the index is defined.
 The type of index formed.
35

NORMALIZATION: -

Normalization is the process of organizing the data in the database.


Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to
eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
Normalization divides the larger table into smaller and links them using relationships.
The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?


The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the database
grows. Normalization consists of a series of guidelines that helps to guide you in creating a good
database structure.
Normalization works through a series of stages called Normal forms. The normal forms apply to
individual relations. The relation is said to be in particular normal form if it satisfies constraints.
(Note: ------
Prime attributes OR Key attributes: Attributes of the relation which exist in at least one of the possible
candidate keys, are called prime or key attributes.
Non-prime attributes OR Non-key attributes: Attributes of the relation who does not exist in any of the
possible candidate keys of the relation, such attributes are called non-prime or non-key attributes.

In order to distinguish whether an attribute of the relation is prime or non-prime, we should always find
all the possible candidate keys of the give relation. -----)

1) 1 NF: - A relation will be 1NF if it contains an atomic value. It states that an attribute of a table
cannot hold multiple values. It must hold only single-valued attribute. First normal form
36

disallows the multi-valued attribute, composite attribute, and their combinations.


Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

The decomposition of the EMPLOYEE table into 1 NF has shown below:

2) 2 NF: - In the 2NF, relational must be in 1NF. In the second normal form, all non-key attributes
are fully functional dependent on the primary key. Example: Let's assume, a school can store the
data of teachers and the subjects they teach. In a school, a teacher can teach more than one
subject.

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a


proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
37

3) 3 NF: - A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity. If there is
no transitive dependency for non-prime attributes, then the relation must be in third normal
form.
A relation is in third normal form if it holds at least one of the following conditions for every
non-trivial function dependency X → Y.
 X is a super key.
 Y is a prime attribute, i.e., each element of Y is part of some candidate key.

{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime. Here,
EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-
prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It
violates the rule of third normal form. That's why we need to move the EMP_CITY and
EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
38

4) BCNF: - BCNF is the advance version of 3NF. It is stricter than 3NF. A table is in BCNF if every
functional dependency X → Y, X is the super key of the table. For BCNF, the table should be in
3NF, and for every FD, LHS is super key. Example: Let's assume there is a company where
employees work in more than one department.

In the above table functional dependencies are as follows:


EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}


The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
39

Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.

5) 4 NF: - A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency. For a dependency A → B, if for a single value of A, multiple values of B exists, then
the relation will be a multi-valued dependency.

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses,
Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued
dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
40

6) 5 NF: - A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should
be lossless. 5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy. 5NF is also known as Project-join normal form (PJ/NF).

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a valid
data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will
be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together
acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
41

INCLUSION DEPENDENCY: -

 Multivalued dependency and join dependency can be used to guide database design although
they both are less common than functional dependencies.
 Inclusion dependencies are quite common. They typically show little influence on designing of
the database.
 The inclusion dependency is a statement in which some columns of a relation are contained in
other columns.
 The example of inclusion dependency is a foreign key. In one relation, the referring relation is
contained in the primary key column(s) of the referenced relation.
 Suppose we have two relations R and S which was obtained by translating two entity sets such
that every R entity is also an S entity.
 Inclusion dependency would be happen if projecting R on its key attributes yields a relation that
is contained in the relation obtained by projecting S on its key attributes.
 In inclusion dependency, we should not split groups of attributes that participate in an inclusion
dependency.
 In practice, most inclusion dependencies are key-based that is involved only keys.

DECOMPOSITION: -
 When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
 The process of breaking up of a relation into smaller sub relations is called
Decomposition. Decomposition is required in DBMS to convert a relation into a specific
normal form which further reduces redundancy, anomalies, and inconsistency in the relation.
 In a database, it breaks the table into multiple tables.
 If the relation has no proper decomposition, then it may lead to problems like loss of
information.
 Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.

Types of decompositions:
42

 Lossless decompositions
 Dependency preserving

Lossless Decomposition: -
 If the information is not lost from the relation that is decomposed, then the decomposition will
be lossless.
 Lossless join decomposition is a decomposition of a relation R into relations R1, and R2 such
that if we perform a natural join of relation R1 and R2, it will return the original relation R.
This is effective in removing redundancy from databases while preserving the original data.
 In other words by lossless decomposition, it becomes feasible to reconstruct the relation R
from decomposed tables R1 and R2 by using Joins.
 Only 1NF, 2NF, 3NF, and BCNF are valid for lossless join decomposition.
 The lossless decomposition guarantees that the join of relations will result in the same relation
as it was decomposed.
 The relation is said to be lossless decomposition if natural joins of all the decomposition give the
original relation.

Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation
will look like:
43

Lossy decomposition: -
 Just like the name indicates, lossy decomposition is when a relation gets decomposed into
multiple relational schemas, in such a way that retrieving the original relation leads to a loss of
information. Thus, a lossy decomposition is bound to lose information.
 Careless decomposition is another name for lossy join decomposition.
 Let us consider a relation X. Let us now consider that it gets decomposed into n number of sub
relations, X1, X2, X3, …, Xn. If we naturally join these sub relations, then we will either obtain the
exact previous relation X or we will lose information in this process. In case we do not get the
same relation X (that was decomposed) after joining X1 and X2, it is known as a lossy
decomposition in DBMS.
 The natural joining of these sub relations always has some extraneous tuples. In the case of a

X1 ⋈ X2 ⋈ X3 ……. ⋈ Xn ⊃ X
lossless decomposition, we can see that:

Here, the operator ⋈ acts as a natural join operator.




44

Now, if we try to join both the tables mentioned above, we won’t be able to do it- since the relation
Cand_ID isn’t part of the relation Sec_Details.
Thus, this relation mentioned here is a lossy decomposition.

CODD’S RULES: -

Every database has tables, and constraints cannot be referred to as a rational database system. And if
any database has only relational data model, it cannot be a Relational Database System (RDBMS). So,
some rules define a database to be the correct RDBMS. These rules were developed by Dr. Edgar F.
Codd (E.F. Codd) in 1985, which has vast research knowledge on the Relational Model of database
Systems.
Codd presents his 13 rules for a database to test the concept of DBMS against his relational model, and
if a database follows the rule, it is called a true relational database (RDBMS). These 13 rules are popular
in RDBMS, known as Codd's 12 rules.

1) Rule 0: The Foundation Rule: The database must be in relational form. So that the system can
handle the database through its relational capabilities.
2) Rule 1: Information Rule: A database contains various information, and this information must
be stored in each cell of a table in the form of rows and columns.
3) Rule 2: Guaranteed Access Rule: Every single or precise data (atomic value) may be accessed
logically from a relational database using the combination of primary key value, table name, and
column name.
4) Rule 3: Systematic Treatment of Null Values : This rule defines the systematic treatment of Null
values in database records. The null value has various meanings in the database, like missing the
data, no value in a cell, inappropriate information, unknown data and the primary key should
not be null.
5) Rule 4: Active/Dynamic Online Catalog based on the relational model : It represents the entire
logical structure of the descriptive database that must be stored online and is known as a
database dictionary. It authorizes users to access the database and implement a similar query
language to access the database.
6) Rule 5: Comprehensive Data Sublanguage Rule : The relational database supports various
languages, and if we want to access the database, the language must be the explicit, linear or
well-defined syntax, character strings and supports the comprehensive: data definition, view
definition, data manipulation, integrity constraints, and limit transaction management
operations. If the database allows access to the data without any language, it is considered a
violation of the database.
7) Rule 6: View Updating Rule: All views table can be theoretically updated and must be practically
updated by the database systems.
8) Rule 7: Relational Level Operation (High-Level Insert, Update and delete) Rule: A database
system should follow high-level relational operations such as insert, update, and delete in each
45

level or a single row. It also supports union, intersection and minus operation in the database
system.
9) Rule 8: Physical Data Independence Rule: All stored data in a database or an application must
be physically independent to access the database. Each data should not depend on other data
or an application. If data is updated or the physical structure of the database is changed, it will
not show any effect on external applications that are accessing the data from the database.
10) Rule 9: Logical Data Independence Rule: It is similar to physical data independence. It means, if
any changes occurred to the logical level (table structures), it should not affect the user's view
(application). For example, suppose a table either split into two tables, or two table joins to
create a single table, these changes should not be impacted on the user view application.
11) Rule 10: Integrity Independence Rule : A database must maintain integrity independence when
inserting data into table's cells using the SQL query language. All entered values should not be
changed or rely on any external factor or application to maintain integrity. It is also helpful in
making the database-independent for each front-end application.
12) Rule 11: Distribution Independence Rule: The distribution independence rule represents a
database that must work properly, even if it is stored in different locations and used by different
end-users. Suppose a user accesses the database through an application; in that case, they
should not be aware that another user uses particular data, and the data they always get is only
located on one site. The end users can access the database, and these access data should be
independent for every user to perform the SQL queries.
13) Rule 12: Non Subversion Rule: The non-submersion rule defines RDBMS as a SQL language to
store and manipulate the data in the database. If a system has a low-level or separate language
other than SQL to access the database system, it should not subvert or bypass integrity to
transform data.
46

Unit-II Transaction Management


TRANSACTIONS: -
A transaction can be defined as a group of tasks. A single task is the minimum processing unit which
cannot be divided further. A transaction is an action or series of actions. It is performed by a single user
to perform operations for accessing the contents of the database.

Let’s take an example of a simple transaction. Suppose an employee of bank transfers Rs 800 from X's
account to Y's account. This small transaction contains several low-level tasks:

Following are the main operations of transaction: -


 read_item(X): Read operation is used to read the value of X from the database and stores it in a
buffer in main memory.
 write_item(X): Write operation is used to write the value back to the database from the buffer.

But it may be possible that because of the failure of hardware, software or power, etc. that transaction
may fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing operation 2 then X's
value will remain 4000 in the database which is not acceptable by the bank.
To solve this problem, we have two important operations:
 Commit: It is used to save the work done permanently.
 Rollback: It is used to undo the work done.

ACID PROPERTIES: -

DBMS is the management of data that should remain integrated when any changes are done in it. It is
because if the integrity of the data is affected, whole data will get disturbed and corrupted. Therefore,
to maintain the integrity of the data, there are four properties described in the database management
system, which are known as the ACID properties. The ACID properties are meant for the transaction.
47

1) Atomicity: - The term atomicity means if any operation is performed on the data, either it
should be performed or executed completely or should not be executed at all. It further means
that the operation should not break in between or execute partially. In the case of executing
operations on the transaction, the operation should be completely executed and not partially.
Atomicity involves the following two operations:
 Abort: If a transaction aborts then all the changes made are not visible.
 Commit: If a transaction commits then all the changes made are visible.

2) Consistency: - The word consistency means that the value should remain preserved always.
In DBMS, the integrity of the data should be maintained, which means if a change in the
database is made, it should remain preserved always. In the case of transactions, the integrity of
the data is very essential so that the database remains consistent before and after the
transaction. The data should always be correct.

3) Isolation: - The term 'isolation' means separation. In DBMS, Isolation is the property of a
database where no data should affect the other one and may occur concurrently. In short, the
operation on one database should begin when the operation on the first database gets
complete or if two operations are being performed on two different databases, they may not
affect the value of one another.
48

In the case of transactions, when two or more transactions occur simultaneously, the
consistency should remain maintained. Any changes that occur in any particular transaction will
not be seen by other transactions until the change is not committed in the memory.

4) Durability: - Durability ensures the permanency of something. In DBMS, the term durability
ensures that the data after the successful execution of the operation becomes permanent in the
database. The durability of the data should be so perfect that even if the system fails or leads to
a crash, the database still survives. However, if gets lost, it becomes the responsibility of the
recovery manager for ensuring the durability of the database. For committing the values, the
COMMIT command must be used every time we make changes.

STATE OF TRANSACTIONS: -

1) Active state: - When the instructions of the transaction are running then the transaction is in
active state. If all the ‘read and write’ operations are performed without any error then it goes
to the “partially committed state”; if any instruction fails, it goes to the “failed state”. For
example: Insertion or deletion or updating a record is done here. But all the records are still not
saved to the database.

2) Partially committed: - After completion of all the read and write operation the changes are
made in main memory or local buffer. If the changes are made permanent on the Database
then the state will change to “committed state” and in case of failure it will go to the “failed
state”.
49

In the partially committed state, a transaction executes its final operation, but the data is still
not saved to the database. In the total mark calculation example, a final display of the total
marks step is executed in this state.

3) Committed: - A transaction is said to be in a committed state if it executes all its operations


successfully. In this state, all the effects are now permanently saved on the database system and
therefore transaction terminated in the “terminated state”.

4) Failed state: - When any instruction of the transaction fails or if any of the checks made by the
database recovery system fails, it goes to the “failed state ”. In the example of total mark
calculation, if the database is not able to fire a query to fetch the marks, then the transaction
will fail to execute.

5) Terminated: - If there isn’t any roll-back or the transaction comes from the “committed
state”, then the system is consistent and ready for brand new transaction and the old
transaction is terminated.

SCHEDULE: -
A series of operation from one transaction to another transaction is known as schedule. It is used to
preserve the order of the operation in each of the individual transaction.

 Serial schedule: - The serial schedule is a type of schedule where one transaction is executed
completely before starting another transaction. In the serial schedule, when the first transaction
completes its cycle, then the next transaction is executed.

 Non-serial schedule: - This is a type of Scheduling where the operations of multiple


transactions are interleaved. This might lead to a rise in the concurrency problem.
50

In the non-serial schedule, the other transaction proceeds without waiting for the previous
transaction to complete. This sort of schedule does not provide any benefit of the concurrent
transaction. It can be of two types namely, Serializable and Non-Serializable Schedule.

 Serializable schedule: - The serializability of schedules is used to find non-serial schedules that
allow the transaction to execute concurrently without interfering with one another.
It identifies which schedules are correct when executions of the transaction have interleaving of
their operations.
A non-serial schedule will be serializable if its result is equal to the result of its transactions
executed serially.

 Conflict Schedule: - A schedule is called conflict serializable if it can be transformed into a serial
schedule by swapping non-conflicting operations. Two operations are said to be conflicting if
all conditions satisfy:
o They belong to different transactions
o They operate on the same data item
o At Least one of them is a write operation
Swapping is possible only if S1 and S2 are logically equal.

Here, S1 = S2. That means it is non-conflict.

Here, S1 ≠ S2. That means it is conflict.


51

In the conflict equivalent, one can be transformed to another by swapping non-conflicting


operations.
In the given example, S2 is conflict equivalent to S1 (S1 can be converted to S2 by swapping non-
conflicting operations).
Two schedules are said to be conflict equivalent if and only if:
o They contain the same set of the transaction.
o If each pair of conflict operations are ordered in the same way.

 View Schedule: - A Schedule is called view serializable if it is view equal to a serial schedule (no
overlapping transactions). A conflict schedule is a view serializable but if the serializability
contains blind writes, then the view serializable does not conflict serializable.
Two schedules S1 and S2 are said to be view equivalent if they satisfy the following conditions:

1) Initial Read: - An initial read of both schedules must be the same. Suppose two schedule
S1 and S2. In schedule S1, if a transaction T1 is reading the data item A, then in S2,
transaction T1 should also read A.

Above two schedules are view equivalent because Initial read operation in S1 is done by
T1 and in S2 it is also done by T1.

2) Updated Read: - In schedule S1, if Ti is reading A which is updated by Tj then in S2 also,


Ti should read A which is updated by Tj.
52

Above two schedules are not view equal because, in S1, T3 is reading A updated by T2
and in S2, T3 is reading A updated by T1.

3) Final Write: - A final write must be the same between both the schedules. In schedule
S1, if a transaction T1 updates A at last then in S2, final writes operations should also be
done by T1.

Above two schedules is view equal because Final write operation in S1 is done by T3 and
in S2, the final write operation is also done by T3.

CHECKPOINTS: -

 The checkpoint is a type of mechanism where all the previous logs are removed from the system
and permanently stored in the storage disk. The checkpoint is like a bookmark. While the
execution of the transaction, such checkpoints are marked, and the transaction is executed then
using the steps of the transaction, the log files will be created.
 When it reaches to the checkpoint, then the transaction will be updated into the database, and
till that point, the entire log file will be removed from the file. Then the log file is updated with
the new step of transaction till next checkpoint and so on. The checkpoint is used to declare a
point before which the DBMS was in the consistent state, and all transactions were committed.
53

 The recovery system reads log files from the end to start. It reads log files from T4 to T1.
Recovery system maintains two lists, a redo-list, and an undo-list.
 The transaction is put into redo state if the recovery system sees a log with <Tn, Start> and <Tn,
Commit> or just <Tn, Commit>. In the redo-list and their previous list, all the transactions are
removed and then redone before saving their logs.

DEADLOCK: -

In a database management system (DBMS), a deadlock occurs when two or more transactions are
waiting for each other to release resources, such as locks on database objects, that they need to
complete their operations. As a result, none of the transactions can proceed, leading to a situation
where they are stuck or “deadlocked.” Deadlock is said to be one of the most feared complications in
DBMS as it brings the whole system to a Halt.
Deadlocks can happen in multi-user environments when two or more transactions are running
concurrently and try to access the same data in a different order. When this happens, one transaction
may hold a lock on a resource that another transaction needs, while the second transaction may hold
a lock on a resource that the first transaction needs. Both transactions are then blocked, waiting for
the other to release the resource they need.

Features of deadlock in a DBMS:


 Mutual Exclusion: Each resource can be held by only one transaction at a time, and other
transactions must wait for it to be released.
 Hold and Wait: Transactions can request resources while holding on to resources already
allocated to them.
 No Preemption: Resources cannot be taken away from a transaction forcibly, and the
transaction must release them voluntarily.
 Circular Wait: Transactions are waiting for resources in a circular chain, where each
transaction is waiting for a resource held by the next transaction in the chain.
 Indefinite Blocking: Transactions are blocked indefinitely, waiting for resources to become
available, and no transaction can proceed.
 System Stagnation: Deadlock leads to system stagnation, where no transaction can proceed,
and the system is unable to make any progress.
54

 Inconsistent Data: Deadlock can lead to inconsistent data if transactions are unable to
complete and leave the database in an intermediate state.
 Difficult to Detect and Resolve: Deadlock can be difficult to detect and resolve, as it may
involve multiple transactions, resources, and dependencies.

Deadlock avoidance: -

 When a database is stuck in a deadlock state, then it is better to avoid the database rather than
aborting or restating the database. This is a waste of time and resource.
 Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A method
like "wait for graph" is used for detecting the deadlock situation but this method is suitable only
for the smaller database. For the larger database, deadlock prevention method can be used .

Deadlock detection: -
In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should detect
whether the transaction is involved in a deadlock or not. The lock manager maintains a Wait for the
graph to detect the deadlock cycle in the database.

Wait for graph: -


 This is the suitable method for deadlock detection. In this method, a graph is created based on
the transaction and their lock. If the created graph has a cycle or closed loop, then there is a
deadlock.
 The wait for the graph is maintained by the system for every transaction which is waiting for
some data held by the others. The system keeps checking the graph if there is any cycle in the
graph.

Deadlock prevention: -
 Deadlock prevention method is suitable for a large database. If the resources are allocated in
such a way that deadlock never occurs, then the deadlock can be prevented.
 The Database management system analyzes the operations of the transaction whether they can
create a deadlock situation or not. If they do, then the DBMS never allowed that transaction to
be executed.

(Note:----
55

Timestamp is a unique identifier created by the DBMS to identify the relative starting time of a
transaction. Typically, timestamp values are assigned in the order in which the transactions are
submitted to the system. So, a timestamp can be thought of as the transaction start time.
----)

Deadlock prevention mechanism proposes two schemes :

 Wait-die scheme: - In this scheme, if a transaction requests for a resource which is already held
with a conflicting lock by another transaction then the DBMS simply checks the timestamp of
both transactions. It allows the older transaction to wait until the resource is available for
execution.
Let's assume there are two transactions Ti and Tj and let TS(T) is a timestamp of any transaction
T. If T2 holds a lock by some other transaction and T1 is requesting for resources held by T2 then
the following actions are performed by DBMS:
o Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has held some resource, then
Ti is allowed to wait until the data-item is available for execution. That means if the
older transaction is waiting for a resource which is locked by the younger transaction,
then the older transaction is allowed to wait for resource until it is available.
o Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held some resource and if Tj is
waiting for it, then Tj is killed and restarted later with the random delay but with the
same timestamp.

 Would wait scheme: - In wound wait scheme, if the older transaction requests for a resource
which is held by the younger transaction, then older transaction forces younger one to kill the
transaction and release the resource. After the minute delay, the younger transaction is
restarted but with the same timestamp.
If the older transaction has held a resource which is requested by the Younger transaction, then
the younger transaction is asked to wait until older releases it.
56

Unit-III Database Querying &


Concurrency Control
RELATIONAL ALGEBRA: -
Relational algebra is a procedural query language. It gives a step by step process to obtain the result of
the query. It uses operators to perform queries.
Relational algebra mainly provides a theoretical foundation for relational databases and SQL. The
main purpose of using Relational Algebra is to define operators that transform one or more input
relations into an output relation.

Fundamental operators of Relational algebra are: -


 Selection (σ): Selection is used to select required tuples of the relations. Notation: σp(r).
Where σ is used for selection prediction, r is used for relation and p is used as a propositional
logic formula which may use connectors like: AND OR AND NOT. These relational can use as
relational use as relational operators like =, ≠, >, ≥, ≤, <.

Input: σ BRANCH_NAME=”perryride” (LOAN)

 Projection (π): This operation shows the list of those attributes that we wish to appear in the
result. Rests of the attributes are eliminated from the table. Notation: πA1,A2,An(r).
Where A1, A2, A3 is used an attribute name of relation r.
57

Input: πNAME, CITY (CUSTOMER)

 Union (U): Suppose there are two tuples R and S. The union operation contains all the tuples
that are either in R or S or both in R & S. It eliminates the duplicate tuples. Notation: R U S
A union operation must hold the following condition:
o R and S must have the attribute of the same number.
o Duplicate tuples are eliminated automatically.

Input: π CUSTOMER_NAME (BORROW) U π CUSTOMER_NAME (DEPOSITOR)


58

 Set intersection (∩): Suppose there are two tuples R and S. The set intersection operation
contains all tuples that are in both R & S. Notation: R∩S.
Using the above DEPOSITOR table and BORROW table.
Input: π CUSTOMER_NAME (BORROW) ∩ π CUSTOMER_NAME (DEPOSITOR)

 Set difference (-): Suppose there are two tuples R and S. The set intersection operation
contains all tuples that are in R but not in S. Notation: R – S.
Using the above DEPOSITOR table and BORROW table.
Input: π CUSTOMER_NAME (BORROW) - π CUSTOMER_NAME (DEPOSITOR)
59

 Rename (ρ): Rename is a unary operation used for renaming attributes of a relation. It is
denoted by rho (ρ).
Example, we can use the operator to rename STUDENT relation to STUDENT_1.
ρ(STUDENT_1, STUDENT)

 Cartesian product ( X): The Cartesian product is used to combine each row in one table with
each row in the other table. It is also known as a cross product. Notation: E X D.

Input: EMPLOYEE X DEPARTMENT


60

Important Questions: -
 Very Short questions:
1) Define file system and DBMS.
2) What is an application programmer?
Application programmers implement these specifications as programs; then they test,
debug, document and maintain these canned transactions. Such analysts and
programmers-commonly referred to as software developers or software engineers.

3) What is query analyser?


A query analyser is a tool used to monitor SQL servers in SQL server and can help users
analyse database objects for improving database performance.

4) What is DBA?
In any organization where many people use the same resources, there is a need for a
chief administrator to oversee and manage these resources. In a database environment,
the primary resource is the database itself, and the secondary resource is the DBMS and
related software. Administering these resources is the responsibility of the database
administrator (DBA).
The DBA is responsible for authorizing access to the database, coordinating and
monitoring its use and acquiring software and hardware resources as needed.

5) What is naive user?


Naive end users make up a sizable portion of database end users. Their main job
function revolves around constantly querying and updating the database, using standard
types of queries and updates called canned transactions that have been carefully
programmed and tested. A few examples are: Bank customers, post withdrawals and
61

deposits, reservation agents or customers for airlines, hotels and car rental, social media
users post and read items on social media websites.

6) What is end user?


End users are the people whose jobs require access to the database for querying,
updating and generating reports; the database primarily exists for their use. There are
several categories of end users: casual end users, naïve or parametric end users,
sophisticated end users and standalone users.
7) Define specialization OR generalization OR aggregation
8) What is view or conflict?
9) Afsdas
10) Afa

 Short or large questions:


1) ER diagram of: Reservations (flight, bus, railway), Management (school, college,
library, hospital, banking) and online shopping.
2) Difference between specialization, generalization and aggregation.
3) Write a brief note on specialization Or generalization Or aggregation.
4) Difference between view and conflict schedule.
5) Properties of transaction? OR what are ACID properties?
6)
62

You might also like