Data Base Systems
Data Base Systems
Prepared by:
Mr. SOMA VIVEKANANDA
Asst. Professor
VISION
❖ To establish a pedestal for the integral innovation, team spirit, originality and
competence in the students, expose them to face the global challenges and become
technology leaders of Indian vision of modern society.
MISSION
❖ To become a model institution in the fields of Engineering, Technology and
Management.
❖ To impart holistic education to the students to render them as industry ready
engineers.
❖ To ensure synchronization of MRCET ideologies with challenging demands of
International Pioneering Organizations.
QUALITY POLICY
❖ To implement best practices in Teaching and Learning process for both UG and PG
courses meticulously.
❖ To channelize the activities and tune them in heights of commitment and sincerity,
the requisites to claim the never - ending ladder of SUCCESS year after year.
VISION
MISSION
Quality Policy
PSO3 Ability to apply the learned Mechanical Engineering knowledge for the
Development of society and self.
PEO1: PREPARATION
To make the students to design, experiment, analyze, interpret in the core field with the help of
other inter disciplinary concepts wherever applicable.
To inculcate the habit of lifelong learning for career development through successful completion
of advanced degrees, professional development courses, industrial training etc.
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
www.mrcet.ac.in
Department of Mechanical Engineering
PEO5: PROFESSIONALISM
To impart technical knowledge, ethical values for professional development of the student to
solve complex problems and to work in multi-disciplinary ambience, whose solutions lead to
significant societal benefits.
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
www.mrcet.ac.in
Department of Mechanical Engineering
Blooms Taxonomy
Bloom’s Taxonomy is a classification of the different objectives and skills that educators set for
their students (learning objectives). The terminology has been updated to include the following
six levels of learning. These 6 levels can be used to structure the learning objectives, lessons,
and assessments of a course.
1. Remembering: Retrieving, recognizing, and recalling relevant knowledge from long‐ term
memory.
2. Understanding: Constructing meaning from oral, written, and graphic messages through
interpreting, exemplifying, classifying, summarizing, inferring, comparing, and explaining.
3. Applying: Carrying out or using a procedure for executing or implementing.
4. Analyzing: Breaking material into constituent parts, determining how the parts relate to
one another and to an overall structure or purpose through differentiating, organizing, and
attributing.
5. Evaluating: Making judgments based on criteria and standard through checking and
critiquing.
6. Creating: Putting elements together to form a coherent or functional whole; reorganizing
elements into a new pattern or structure through generating, planning, or producing.
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
www.mrcet.ac.in
Department of Mechanical Engineering
COURSE STRUCTURE R22
UNIT I
INTRODUCTION
Database: Purpose of Database Systems, File Processing System Vs DBMS, History,
Characteristic- Three schema Architecture of a database, Functional components of a
DBMS, DBMS Languages- Database users and DBA.
UNIT II
DATABASE DESIGN
ER Model: Objects, Attributes and its Type. Entity set and Relationship set-Design Issues of
ER model-Constraints, Keys-primary key, Super key, candidate keys. Introduction to
relational model-Tabular, Representation of Various ER Schemas. ER Diagram
Notations Goals of ER Diagram- Weak Entity Set- Views
UNIT III
STRUCTURED QUERY LANGUAGE
SQL: Overview, The Form of Basic SQL Query -UNION, INTERSECT, and EXCEPT– join
operations: equi join and non equi join -Nested queries - correlated and uncorrelated
Aggregate Functions-Null values. Views, Triggers.
UNIT IV
DEPENDENCIES AND NORMAL FORMS
Importance of a good schema design, - Problems encountered with bad schema designs,
Motivation for normal forms- functional dependencies, -Armstrong's axioms for FD's Closure
of a set of FD’s, - Minimal covers-Definitions of 1NF,2NF, 3NF and BCNF Decompositions and
desirable properties
UNIT V
Transactions: Transaction concept, transaction state, System log, Commit point, Desirable
Properties of a Transaction, concurrent executions, serializability, recoverability,
implementation of isolation, transaction definition in SQL, Testing for serializability,
Serializability by Locks- Locking Systems with Several Lock Modes- Concurrency Control by
Timestamps, validation.
TEXT BOOKS:
1. Abraham Silberschatz, Henry F. Korth, S. Sudarshan,‖ Database System Concepts‖,
McGraw-Hill, 6th Edition , 2010.
2. Fundamental of Database Systems, by Elmasri, Navathe, Somayajulu, and Gupta, Pearson
Education
REFERENCE BOOKS:
1. Raghu Ramakrishnan, Johannes Gehrke, ―Database Management System‖, McGraw
Hill., 3rd Edition 2007.
2. Elmasri& Navathe, ‖Fundamentals of Database System, ‖ Addison-Wesley Publishing,
5thEdition, 2008.
3. Date.C.J, ―An Introduction to Database, Addison-Wesley Pub Co, 8th Edition, 2006.
4. Peterrob, Carlos Coronel, ―Database Systems – Design, Implementation, and
Management, 9th Edition, Thomson Learning, 2009
COURSE OUTCOMES:
At the end of this course, students will be able to :
1. Understand the basic concepts and the applications of database systems
2. Master the basics of SQL and construct queries using SQL.
3. Understand the relational database design Principles.
4. Familiarize with the basic issues of transaction processing and concurrency
control.
5. Familiarize with database storage structures and access techniques.
Data: The facts that can be recorded and have implicit meaning are known as 'data'.
Record: Collection of related data items, e.g., in the above example, the three data items had
no meaning. However, if we organize them in the following way, they collectively represent
meaningful information.
➢ The columns of this relation are called Fields, Attributes or Domains. The rows are
called Tuples or Records.
Database: It is the collection of related relations. Consider the following collection of tables:
Table 1. Table 2.
➢ We now have a collection of 4 tables. They can be called a "related collection" because
we can clearly find that some common attributes exist in a selected pair of tables.
Because of these common attributes, we may combine the data of two or more tables
to determine a student's complete details. Questions like "Which hostel does the
youngest student live in?" can be answered now, although Age and hostel attributes are
in different tables. Many people with different responsibilities could view a database in
a DBMS.
➢ For example, a company has different departments and customers, each needing to see
different kinds of data. Each employee in the company will have different levels of
access to the database with their customized front-end application. Data is organized
strictly in a row and column format in a database. The rows are called Tuples or records.
The data items within one row may belong to different data types. On the other hand,
the columns are often called Domain or Attribute. All the data items within a single
attribute are of the same data type.
Database system" refers collectively to the database model and database management system
(DBMS). Physically, database servers are dedicated computers that hold the actual databases
and run only the DBMS and related software.
A database is a collection of related relations or tables. The database is in the form of structural
and unstructured. Structural Database has a particular format, like a table format. An
unstructured database does not have any particular format. A database may be generated and
manipulated manually, or it may be computerized.
The primary purpose of database systems is to store, manage and retrieve data in an organized
and efficient manner. They provide a centralized repository for data, making it easier for users
to access and manage the information.
The diagram below explains how data transformation to Information to Knowledge to action
happens respectively in the DBMS.
In summary, while FPS systems are simpler to implement and suitable for small-scale data
management, DBMSs are more suitable for larger-scale data management, as they provide
more robust data organization, integrity, security, and sharing capabilities.
5. DBMS APPLICATIONS:
Database Management Systems (DBMS) applications exist in various industries and fields.
Some common examples include.
1. Financial Systems: Banks, stock market companies, and other financial organizations use
DBMS to manage their transactions and financial data.
2. Healthcare Systems: Hospitals, clinics, and other healthcare organizations use DBMS to
store patient information, manage appointments, and keep track of medical records.
3. Retail Systems: Retail companies use DBMS to manage their inventory, sales, and
customer data.
4. Educational Systems: Schools and universities use DBMS to manage student
information, grades, and other academic records.
5. Human Resources Systems: Companies use DBMS to store and manage employee
information, payroll data, and performance records.
6. Transportation Systems: Airlines, shipping companies, and other transportation
organizations use DBMS to manage their scheduling, routing, and billing information.
7. Government Systems: Governments use DBMS to store and manage information such as
census data, tax records, and voting records.
8. Manufacturing Systems: Manufacturing companies use DBMS to manage their supply
chain, production, and inventory information
File Processing systems (FPS) and Database Management Systems (DBMS) are two different
methods of storing and managing data.
FILE SYSTEM:
The file system is basically a way of arranging the files in a storage medium like a hard disk.
The file system organizes the files and helps in the retrieval of files when they are required.
File systems consist of different files which are grouped into directories. The directories further
contain other folders and files. The file system performs basic operations like management, file
naming, giving access rules, etc.
Example: NTFS (New Technology File System), EXT (Extended File System).
Database Management System is software that manages the collection of related data. It
is used for storing data and retrieving it effectively when needed. It also provides proper
security measures for protecting the data from unauthorized access. In Database
Management System, the data can be fetched by SQL queries and relational algebra. It
also provides mechanisms for data recovery and data backup.
Example: Oracle, MySQL.
Figure: DBMS
Here are some key differences between the two:
Basis FPS DBMS
structure The file system is software that DBMS is software for
manages and organizes the files in a managing the database
storage medium within a computer.
Data Redundancy Redundant data can be present in a In DBMS, there is no
file system. redundant data
Consistency There is less data consistency in the There is more data
file system consistency because of the
process of normalization
Backup and It doesn’t provide backup and It provides backup and
Recovery recovery of data if it is lost. recovery of data even if it
is lost.
Complexity It is less complex as compared to It has more complexity in
DBMS handling as compared to
the file system.
Query processing There is no efficient query Efficient query processing
processing in the file system. is there in DBMS.
Cost It is less expensive than DBMS. It has a comparatively
higher cost than a file
system.
Data Independence In the File system approach, there Data Independence exists,
exists no Data Independence and it can be of two types.
• Logical Data
Independence
• Physical Data
Independence
User Only one user can access data at a Multiple users can access
Access/concurrent time. data at a time.
access
Data Sharing Data is distributed in many files. So, Due to its centralized
not easy to share data nature, sharing is easy
Data Abstraction It gives details of storage and It hides the internal details
representation of data of the Database
Security File systems provide less security in DBMS has more security
Constraints comparison to DBMS mechanisms as compared
to file systems.
Integrity Integrity Constraints are difficult to Integrity constraints are
Constraints implement easy to implement
Example Cobol, C++ Oracle, SQL Server
In summary, while FPS systems are simpler to implement and suitable for small-scale data
management, DBMSs are more suitable for larger-scale data management, as they provide
more robust data organization, integrity, security, and sharing capabilities.
7. History
• The history of Database Management Systems (DBMS) can be traced back to the
1960s, when the first commercially available DBMSs were introduced. The early
DBMSs were designed to address the needs of businesses to manage and store large
amounts of data.
• Data is a collection of facts and figures. The data collection was increasing daily, and
they needed to be stored in a device or safer software. Charles Bachman was the first
to develop the Integrated Data Store (IDS) based on the network data model for which
he was inaugurated with the Turing Award (The most prestigious award which is
equivalent to the Nobel prize in the field of Computer Science.)
1. The 1960s: The first commercially available DBMSs, such as IBM's Information
Management System (IMS), were introduced in the 1960s. These systems were designed
to manage large amounts of data in mainframe computer systems.
2. The 1970s: The relational database model was introduced in the 1970s, with the release of
the first relational DBMS, IBM's System R. This model organized data into tables with
rows and columns, making it easier to query and manipulate data.
3. The 1980s: The 1980s saw the rise of personal computers and the introduction of the first
DBMSs for the PC, such as dBase and FoxPro. These systems made it easier for small
businesses to manage their data.
4. The 1990s: The 1990s saw the introduction of client-server DBMSs, which allowed
multiple users to access and modify data stored in a centralized database server. This era
also saw the rise of open-source DBMSs, such as MySQL and PostgreSQL.
5. The 2000s: The 2000s saw the rise of cloud computing and the introduction of cloud-based
DBMSs, such as Amazon Relational Database Service (RDS) and Microsoft Azure
Database. These systems made it easier for businesses to store and manage data in the
cloud.
6. In conclusion, the history of DBMS has been marked by the introduction of new
technologies and innovations, which have made it easier for businesses to store, manage,
and analyze large amounts of data. Today, DBMSs are widely used by organizations of all
sizes to support a wide range of applications, from data management to business
intelligence and data analytics.
A database system is a collection of interrelated data and software programs allowing users to
access and modify these data. A major purpose of a database system is to provide users with
an abstract view of the data. The system hides certain details of how the data are stored and
maintained.
Data Abstraction:
The three-schema architecture of a database refers to a design pattern that separates the
conceptual view, the external view, and the internal view of a database into three separate
schemas. These schemas provide a way to represent and understand the data structure in a
database in a clear and organized manner.
The three-schema architecture can be visualized as three layers: the external schema at the top,
the conceptual schema in the middle, and the internal schema at the bottom. The layers can be
considered a series of abstractions, with the external schema representing the highest level of
abstraction and the internal schema representing the lowest level of abstraction.
1. External schema/View Level: The external schema defines how different user groups see
the database. It describes the view of the data presented to end-users and application
programs and defines the level of abstraction and detail they see.
The highest level of abstraction, which provides a user-defined view of the data, hiding
any irrelevant or complex details and allowing the user to interact with the data in a
simplified way.
2. Conceptual schema/logical level:
• The next level of abstraction, which provides a view of the data as it is organized
into tables, columns, and relationships, without any concern for how it is physically
stored.
• This schema represents the overall logical view of the database and defines the
relationships between the different data entities.
• It describes the data structure in a high-level, abstract manner, independent of any
specific data storage technology or implementation details.
• The conceptual schema represents what data is stored in a database.
3. Internal schema/Physical level:
• The internal schema defines the physical storage structure of the database and
provides implementation details such as data storage format, access methods, and
file organization.
• It describes how the data is actually stored on disk, and it provides the information
required to access and manipulate the data.
The three-schema architecture is useful because it allows the database designer to clearly
separate the database's different aspects and consider each aspect independently. This leads to
a more organized and maintainable design and makes it easier to change the database design
as needed over time.
Each schema builds upon the schema below it. The external schema uses the definitions and
relationships defined in the conceptual, and the internal schema uses the definitions and
relationships defined in the external schema. The different schemas are connected by
mappings, which define the relationships between the different views of the data.
What is Data Independence in DBMS?
The ability to modify the schema definition of a DBMS at one level, without affecting the
schema definition of the next higher level is called data independence.
There are two levels of data independence based on three levels of abstraction. These are as
follows −
• Physical Data Independence
• Logical Data Independence.
Physical Data Independence: Physical data independence refers to the ability to modify the
physical storage structures or devices without affecting the logical view or schema of the
database. It allows changes in the storage structures, such as reorganizing files, adding indexes,
or changing storage systems, without impacting the application programs that use the data. This
type of independence is primarily the concern of database administrators and system developer.
Logical Data Independence: Logical data independence refers to the ability to modify the
logical schema of the database without impacting the external schema or application programs.
It allows changes to the organization or structure of the data, such as adding or modifying
tables, without affecting the way data is accessed or manipulated by applications. Logical data
independence is essential for accommodating changes in business requirements or evolving
data models without disrupting existing applications.
9. Database Languages
✓ Database languages, also known as query languages or data query languages, are a
classification of programming languages that developers use to define and access
databases, which are collections of organized data that users can access electronically.
✓ These languages allow users to complete tasks such as controlling access to data,
defining and updating data and searching for information within the database
management system (DBMS).
✓ A DBMS is a piece of technology that interacts with users, applications and the database
to record and analyze data while also manipulating the database to offer a way to store,
access and retrieve data.
✓ Different examples of database languages are available, including SQL, the standard
programming language for many databases. Database languages comprise four
sublanguages that serve different functions to execute tasks.
1. Data Definition Language (DDL)
➢ Data Definition Language (DDL) is a set of special commands that allows us to
define and modify the database's structure and metadata. These commands can be
used to create, modify, and delete the database structures such as schema, tables,
indexes, etc.
➢ Since DDL commands can alter the structure of the whole database and every
change implemented by a DDL command is auto-committed (the change is saved
permanently in the database), these commands are normally not used by an end-
user (someone who is accessing the database via an application).
➢ These commands are used to update the database schema that's why they come
under Data definition language.
2. Data Manipulation Language (DML)
➢ Data Manipulation Language (DML) is a set of special commands allowing us
to access and manipulate data stored in schema objects. These commands are
used to perform certain operations such as insertion, deletion, updation, and
retrieval of the data from the database.
➢ These commands deal with user requests as they are responsible for all types of
data modification. The DML commands that deal with data retrieval are known
as Data Query language.
• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what
data are needed without specifying how to get those data.
There are the following operations which have the authorization of Revoke:
➢ In practice, a combination of DDL, DML, TCL and DCL is used as a single database
language.
➢ Any person who uses a database and avails benefits from the database is known
as database user in DBMS. Database users in DBMS can access the database and
retrieve the data from the database using applications and interfaces provided by
the Database Management System (DBMS).
➢ Database users in DBMS can be categorized based on their interaction with the
databases. According to the tasks performed by the database users on the databases, we
can categorize them into actors on the scene and workers behind the scene.
Query Processor:
The query processor receives and interprets user queries written in a query language (such as
SQL) and converts them into an efficient execution plan. It analyzes the query, optimizes it for
efficient execution, and interacts with other components to retrieve and manipulate the
requested data
• DDL interpreter: which interprets DDL statements and records the definitions in
the data dictionary.
• DML compiler: which translates DML statements in a query language into an
evaluation plan consisting of low-level instructions that the query evaluation engine
understands.
• A query can usually be translated into several alternative evaluation plans with the
same result. The DML compiler also performs query optimization, picking the
lowest cost evaluation plan from among the alternatives.
• Query evaluation engine: which executes low-level instructions generated by the
DML compiler.
Storage Manager:
• A storage manager is a program module that provides the interface between the
lowlevel data stored in the database and the application programs and queries
submitted to the system.
• The storage manager is responsible for the interaction with the file manager.
The raw data are stored on the disk using the file system, which is usually
provided by a conventional operating system.
• The storage manager translates the various DML statements into low-level file-
system commands. Thus, the storage manager is responsible for storing,
retrieving, and updating data in the database.
• The disc or data storage component is responsible for physically storing the
database on storage devices such as hard disks or solid-state drives.
• It manages the allocation and organization of data on the storage media, including
data files, indexes, and other structures. The data storage component ensures
efficient data retrieval and storage space management.
Data Dictionary :
We can define a data dictionary as a DBMS component that stores the definition of data
characteristics and relationships. You may recall that such “data about data” were labeled
metadata.
The DBMS data dictionary provides the DBMS with its self describing characteristic. In
effect, the data dictionary resembles and X-ray of the company’s entire data set, and is a
crucial element in the data administration function.
Important questions
1. a) Differentiate between a file processing system and DBMS.
b) Explain about the architecture of DBMS
Syllabus :
1. ER Model - Objects, Attributes and its Type.
2. Entity set and Relationship set
3. Design Issues of ER model-Constraints.
4. Keys-primary key, super key, candidate keys.
5. Introduction to relational model-Tabular,
6. Representation of Various ER Schemas.
7. ER Diagram Notations
8. Goals of ER Diagram
10. Views
1. Data Models
➢ Data models in DBMS help to understand the design at the conceptual, physical, and logical
levels. It provides a clear picture of the data making it easier for developers to create a physical
database.
➢ Data models describe how the data is stored, accessed, and updated in a DBMS. A set of
symbols and text is used to represent them so that all the members of an organization can
understand how the data is organized.
2. Relational Model
This is the most widely accepted data model. In this model, the database is represented as
a collection of relations in the form of rows and columns of a two-dimensional table. Each row
is known as a tuple (a tuple contains all the data for an individual record) while each column
represents an attribute.
The above table shows a relation "STUDENT" with attributes such as Roll Number, Name, and
CGPA which consists of 4 records or tuples.
3. Object-Oriented Model
As suggested by its name, the object-oriented data model is a combination of object-oriented
programming and relational data model. In this data model, the data and their relationship are
represented in a single structure which is known as an object.
Since data is stored as objects, we can easily store audio, video, images, etc in the database
which was very difficult and inconvenient to do in the relational model. As shown in the image
below two objects are connected with each other through links.
In the above image, we have two objects, Employee and Department, in which all the data is
contained in a single unit (object). They are linked with each other as they share a common
attribute. Department_Id.
4. Semi structured model
The semi-structured model is a database model where there is no separation between the data and
the schema, and the amount of structure used depends on the purpose.A semi-structured data model is a
generalized form of the relational model, which allows representing data in a flexible way, hence we can
not differentiate between data and schema in this model because, in this model, some entities have a
missing attribute(s) and on the other hand, some entities might have some extra attribute(s) which in turn
makes it easy to update the schema of the database.
For example - We can say a data model is semi-structured if in some attributes we are storing both
atomic values (values that can't be divided further, for example, Roll_No) as well as a collection of
values..
5. Hierarchical Model:
The hierarchical data model is one of the oldest, developed in the 1950s by IBM. In this data model, the
data is organized in a hierarchical tree-like structure. This data model can be easily visualized because
each record in DBMS has one parent and many children (possibly 0), as shown in the image below.
The above-given image represents the data model of the Vehicle database, vehicle are classified into two
types Viz. two-wheelers and four-wheelers and then they are further classified.
The main drawback we can see here is we can only have one too many relationships under this model,
hence the hierarchical data model is very rarely used nowadays.
6. Network Model
In this model, data is organized in a graph-like structure where records can have multiple parent
and child records. A network model is nothing but a generalization of the hierarchical data
model as this data model allows many to many relationships; therefore, in this model, a record
can also have more than one parent.
The network model in DBMS can be represented as a graph, and hence it replaces the
hierarchical tree with a graph in which object types are the nodes and relationships are the
edges.
Here you can see all three departments are linked with the director, which was impossible in the
hierarchical data model.
In the network model, there can be many possible paths to reach a node from the root node (College is
the root node in the above case). Therefore, the data can be accessed efficiently compared to the
hierarchical data model. But, on the other hand, the process of insertion and deletion of data is quite
complex.
2. ER -Model
➢ A Database is used to store our data, however, before making the actual database we should
specify all the requirements of the database in the form of a diagram known as an entity-
relationship diagram which makes it easier to create a database.
➢ For making an ER diagram, we need to first identify the different field/entities that should be
present in a database, the attributes that describe these entities, and the type of relationship that
exist between them.
➢ All this information is collected and formulated in the form of a diagram known as an entity-
relationship diagram (ERD). Therefore, this ER diagram is a logical view of the database and
describes its structure of the database.
➢ An Entity Relationship Diagram in DBMS is a blueprint of the database that can be later
implemented as an actual database in the form of tables. It is a "diagrammatic representation
of the database".
➢ An ER diagram is used for creating a systematic study of the data requirements and then
making a proper database. It is considered to be one of the best practices before implementing
an actual database.
➢ An ER diagram is made up of three components, an entity, its attributes that describe the
entities such as the color, and price are some of the attributes of a Car entity, and the
relationship that exists between these entities. These ER diagrams are used to show the
relationships among various entities in a database.
I. Entity
➢ An entity is referred to as an object or thing that physical /conceptual exists in the real world.
For example, customer, car, pen, bank account etc.
➢ Entities are stored in the database, and they should be distinguishable, i.e., they should be easily
identifiable from the group.
➢ In ER modeling, rectangle is used to represent the entities in an entity-relationship diagram.
ENTITY
Kinds of Entity:
a. Tangible Entity:
It is an entity in DBMS, which is a physical object that we can touch or see. In simple words, an entity
that has a physical existence in the real world is called a tangible entity.
For example, in a database, a table represents a tangible entity because it contains a physical object that
we can see and touch in the real world. It includes colleges, bank lockers, mobiles, cars, watches, pens,
paintings, etc.
b. Intangible Entity:
It is an entity in DBMS, which is a non-physical object that we cannot see or touch. In simple words, an
entity that does not have any physical existence in the real world is known as an intangible entity.
For example, a bank account logically exists, but we cannot see or touch it.
1. Entity Type:
A collection of entities with general characteristics is known as an entity type. It refers to the category
that a particular entity belongs to.
➢ The category of a particular entity in the relation in RDBMS is called the entity type.
➢ It is represented by the name of the table and its schema
For example, a database of a corporate company has entity types such as employees, departments, etc.
In DBMS, every entity type contains a set of attributes that explain the entity.
The Employee entity type can have attributes such as name, age, address, phone number, and salary.
The Department entity type can have attributes such as name, number, and location in the department.
ENTITY
In the above example, the "Customer" is the entity type with attributes such as ID, Name, Gender, and
Phone Number. Customer is a strong entity type as it has a unique ID for each customer
➢ It is an entity that does not have its own existence and relies on a strong entity for its existence.
unlike a strong entity, a weak entity does not have any primary key. It instead has a partial
discriminator key
➢ The Entity Relationship Diagram represents the weak entity type using double rectangles.
Below is the ERD of the weak entity type:
In the above example, "Address" is a weak entity type with attributes such as House No., City, Location,
and State.The relationship between a strong and a weak entity type is known as an identifying
relationship.
Using a double diamond, the Entity-Relationship Diagram represents a relationship between the strong
and the weak entity type. Let us see an example of the relationship between the Strong entity type and
weak entity type with the help of ER Diagram:
2. Entity Set
An entity set is a group of entities of the same entity type.For example, an entity set of students, an entity
set of motorbikes, an entity of smartphones, an entity of customers, etc.
II. Attribute
➢ The attribute is used to describe the property/characterstics of an entity. Eclipse is used to
represent an attribute.
Attribute
Suppose we have several students in a class; we can uniquely identify a student by the properties
associated with a student, like a student roll number, name, branch, age, etc. These properties that
describe an entity are attributes for students.
1. Single-valued Attribute
Single valued attributes are properties that have a single particular value for a particular entity. This
means that single-valued attributes don't have more than one value for an item.
For example, attributes like age, roll number, and gender associated with the student can only have a
single value for an individual student.
2. Multi-valued Attribute
Attributes that can have multiple sets of values for a single entity are known as Multi-valued attributes.
They are enclosed inside double bounded ovals in the ER diagram (ER diagram stands for entity
relation diagram that displays the relationship of entity sets stored in a database.)
For example, attributes like phone_number, email_id, and hobbies are categorized as multi-valued
attributes because a single student in the class can have more than one phone number, email_id, and
hobbies
3. Simple/Atomic Attribute
Such attributes that cannot be further sub-divided into several attributes are called simple attributes. For
example, attributes like the current class and branch of a student have an absolute value, and they cannot
be further sub-divided into other parts like we divided full_name into first_name, middle_name,
and last_name.
4. Composite Attribute
As the name suggests, those attributes composed of many other simple attributes are called composite
attributes in DBMS.
Properties like student_name can be further divided into sub-parts like first_name, middle_name,
and last_name where each sub-part can form an independent attribute. Similarly, address property can
be composed of state, city, country etc.
5. Stored attribute:
The stored attribute are those attribute which doesn’t require any type of further update since they are
stored in the database.
6. Derived attribute
Such attributes whose value can be derived from other attributes of the entity are called derived
attributes in DBMS. Valued stored in derived attributes can be obtained from processing other
attributes.
For example, properties like age of a student can easily be obtained by subtracting the current day from
the date of birth attribute of the student entity. So, age is a derived attribute in the student table
7. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a primary key.
The key attribute is represented by an ellipse with the text underlined.
8. Complex Attribute
Nesting multi-valued and composite attributes together form a complex attribute. Complex attributes are
formed by arbitrarily nesting the composite and multi-valued attributes.
For example, a person owns more than one house, and each house has more than one phone number.
Then, the attribute phone is represented as a complex attribute.
III. Relationships
➢ Any Associations/connection between entities are called relationships . In ER
modeling, relationship is represented by diamond shape.
In DBMS, the degree of relationship is the number of entities that have taken part in the relationship.
We can analyze the E-R diagram and tell the degree of relationship between various entities. The degree
of relationship in DBMS can be categorized as Unary, Binary, Ternary or N-ary relationship.
➢ Unary relationship
➢ Binary relationship
➢ Ternary relationship
➢ N-ary relationship
1. Unary relationship
As the name suggests, in a unary relationship, only one entity is involved. There is only a single entity
set, i.e., the association is between a single entity. Here, the degree of relationship is 1. The unary
relationship is also known as a recursive relationship.
Example: In the college presidential elections, a president is chosen among the students. He/she leads
the entire community and looks after the student-centric activities. Even though he/she is the president,
but after all is a student. So we can say that there is only one entity i.e., the student.
2. Binary relationship
You must have rightly guessed that in a binary relationship, there are two entities involved.
There are two entities that are participating in the relationship. The degree of relationship is 2.
This is the most common type of relationship.
Example- Each Indian citizen has their own Aadhar Card so we can say that citizen and Aadhar
Card are two entities involved.
3. Ternary relationship
In a ternary relationship, there are three entities involved. There are three entities that are
participating in the relationship created. The degree of the relationship is 3. As in this
relationship, the number of entities increases, it is a bit difficult to convert it to a relational table.
Teacher
4. N-ary Relationship
This is a generalized form of a relationship when the n number of total entities are participating.
When there is an n number of involved entities, we can say that they have an N-ary relationship.
As there are many entities involved, it is hard to convert them to a relational table. It is an
uncommon relationship that is not as widely used as a binary relationship. The degree of
relationship is 'n'
Example- Let us consider the example of a university. It has many entities like students,
teachers, affiliated colleges, courses, etc. Here, there are many entities associated with the
university
Constraints are used for modeling limitations on the relations between entities.
There are two types of constraints on the Entity Relationship (ER) model −
I. Mapping cardinality or cardinality ratio.
II. Participation constraints.
i. Cardinality ratio
It is expressed as the number of entities to which another entity can be associated via a
relationship set.
For the binary relationship set there are entity set A and B then the mapping cardinality can be one of
the following −
• One-to-one
• One-to-many
• Many-to-one
• Many-to-many
An entity set A is associated with at most one entity in B and an entity in B is associated with at
most one entity in A.
2. One-to-many
In One-to-Many cardinality, an entity in set A relates to many occurrences of another entity in set B. In
other words, an entity in set A can be linked with any number of entities in set B while an entity in set
B can be linked with only one entity in set A.
Department 1 have M employees
3. Many -to-one
In Many-to-One Cardinality, multiple occurrences of an entity in set A can relate to one
occurrence of another entity in set B. In other words, an entity in set B can be linked with at
most one entity in set A while an entity in set A can be linked with any number of entities in set
B.
4. Many-to-Many (M:N)
In Many-to-Many cardinality, multiple occurrences in an entity set can relate to multiple
occurrences of another entity set. In other words, an entity in set A can link to any number of
entities in set B while an entity in set B can link to any number of entities in set A.
student M studies N subject
It specifies weather existence of an entity depends on its being related to another entity.
a. Total participation
The participation of an entity set E in a relationship set R is said to be total if every entity in E
Participates in at least one relationship in R.
For Example − Participation of loan in the relationship borrower is total participation.
b. Partial Participation
If only some of the entities in E participate in relationship R, then the participation of E in R is said to be
partial participation.
For example − Participation of customers in the relationship borrower is partial participation.
In ER model the total participation is represent double line . below example department is total
participation and employee is partial participation.
Partial partcipation Total Participation
employee Manages department
E1
D1
E2
D2
E3
D3
E4
Keys in dbms
Superkeys : {ID},{SSN},
{ID,Name},{D,SSN},{ID,Phone},{ID,Email},{Name,Phone},{SSN,Name}
{SSN,Email},{SSN,Phone},{SSN,Salary},{Name,Email,Phone},
{Name,Salary,Phone},{Name,salary,Phone,Email},{ID,SSN,Name,Phone,,salary,
Email},……..etc.
2. Candidate Key: The minimal set of attributes that can uniquely identify a tuple is known as a
candidate key.
➢ It is a super key with no repeated data is called a candidate key.
➢ The minimal set of attributes that can uniquely identify a record.
➢ It must contain unique values.
➢ It can contain NULL values.
➢ Every table must have at least a single candidate key.
➢ A table can have multiple candidate keys but only one primary key
3. Primary key
There can be more than one candidate key in relation out of which one can be chosen as the
primary key.
➢ it is a unique key.
➢ It can identify only one tuple (a record) at a time.
➢ It has no duplicate values, it has unique values.
➢ It cannot be NULL.
➢ Primary keys are not necessarily to be a single column; more than one column can also
be a primary key for a table.
From the employee table candidate keys the key {ID} has chosen as primary key.
4. Alternate key :
The candidate key other than the primary key is called an alternate key
➢ All the keys which are not primary keys are called alternate keys.
➢ It is a secondary key.
➢ It contains two or more fields to identify two or more records
5. Unique key
Unique key looks very similar to primary key but they’re not and have a very different usage
altogether. Once we set an attribute as a unique key then its column value should be unique,
although only ONE value could be null.
sometimes, a table might not have a single column/attribute that uniquely identifies all the
records of a table. To uniquely identify rows of a table, a combination of two or more
columns/attributes can be used. It still can give duplicate values in rare cases. So, we need to
find the optimal set of attributes that can uniquely identify rows in a table.
7. Foreign Key
A foreign key is a column or columns of data in one table that refers to the unique data values
often the primary key data in another table. Foreign keys link together two or more tables in
a relational database.
In the above tables student marks table ID is the foreign key because it has taken reference
from student details table.
ER-Design Issues :
The users often mislead the concept of the elements and the design process of the ER diagram. Thus, it
leads to a complex structure of the ER diagram and certain issues that does not meet the characteristics
of the real-world enterprise model.
Here, we will discuss the basic design issues of an ER database schema in the following points:
In the real world situations, sometimes it is difficult to select the property as an attribute
or an entity set.
Student_Id Sub_ID
date
Name
assignment
name
Due date
url
data
Sub_ID Due
Student_Id name
date date
submissi assignme
student submit complete
on nt
Name url
We can replace the relationship set R by an entity set E and create three relationship sets as: •
RA, relating E and A • RB, relating E and B • RC, relating E and C If the relationship set R had
any attributes, these are assigned to entity set E. A special identifying attribute is created for
E.
Since we know the entity-relationship diagram has entities, attributes, and the relationship between
the data. However, all these components of the ER diagram are represented with the help of certain
symbols. Three symbols are used to make an ER diagram, which are, rectangle, oval, and diamond.
However, all other symbols that are used to make an ER diagram are sub-elements and are based on
the main elements of the ER diagram.
Diamond: This symbol is used to represent the type of relationship that exists between the entities such
as one-to-one, many-to-one, and many-to-many
Lines : It links the entities to the relationship types whereas the attributes to the entity types.
The relational model for database management is an approach to logically represent and manage the
data stored in a database. In this model, the data is organized into a collection of two-dimensional
inter-related tables, also known as relations. Each relation is a collection of columns and rows,
where the column represents the attributes of an entity and the rows (or tuples) represents the record.
The use of tables to store the data provided a straightforward, efficient, and flexible way to store and
access structured information. Because of this simplicity, this data model provides easy data sorting and
data access. Hence, it is used widely around the world for data storage and processing.
Consider a case where you wish to store the name, the CGPA attained, and the roll number of all the
students of a particular class. This structured data can be easily stored in a table as described below:
✓ Any given row of the relation indicates a student i.e., the row of the table describes a real-world
entity.
✓ The columns of the table indicate the attributes related to the entity. In this case, the roll
number, CGPA, and the name of the student.
➢ A database implemented and organized in terms of the relational model is known as a relational
database management system (RDBMS). Hence, the relational model describes how data is
stored in relational databases.
a relational database is based on the relational model. This database consists of various components
based on the relational model. These include:
Relation : Two-dimensional table used to store a collection of data elements
Attribute/Field : Column of the relation, depicting properties that define the relation.
Attribute Domain : Set of pre-defined atomic values that an attribute can take i.e., it describes the legal
values that an attribute can take.
Cardinality : It specifies the number of entities involved in the relation i.e., it is the total number of rows
present in the relation. Read more about Cardinality in DBMS
Relational Schema : It is the logical blueprint of the relation i.e., it describes the design and the
structure of the relation. It contains the table name, its attributes, and their types:
For our Student relation example, the relational schema will be:
Relational Instance : It is the collection of records present in the relation at a given time
Relation Key : It is an attribute or a group of attributes that can be used to uniquely identify an entity in
a table or to determine the relationship between two tables. Relation keys can be different types.
1. Super key
2. Candidate key
3. Primary key
4. Alternate key
5. Unique key
6. Composite key
7. Foreign key
1. Domain constraints
2. Key constraints or Uniqueness Constraints
3. Entity Integrity constraints
4. Referential integrity constraints
1. Domain Constraints
➢ Every domain must contain atomic values(smallest indivisible units) which means
composite and multi-valued attributes are not allowed.
➢ We perform a datatype check here, which means when we assign a data type to a column
we limit the values that it can contain. Eg. If we assign the datatype of attribute age as int,
we can’t give it values other than int datatype.
Explanation: In the above relation, Name is a composite attribute and Phone is a multi-values attribute,
so it is violating domain constraint.
➢ These are called uniqueness constraints since it ensures that every tuple in the relation should be
unique.
➢ A relation can have multiple keys or candidate keys(minimal superkey), out of which we choose
one of the keys as the primary key, we don’t have any restriction on choosing the primary key out
of candidate keys, but it is suggested to go with the candidate key with less number of attributes.
➢ Null values are not allowed in the primary key, hence Not Null constraint is also part of the key
constraint.
Explanation: In the above table, EID is the primary key, and the first and the last tuple have the
same value in EID ie 01, so it is violating the key constraint.
3. Entity Integrity Constraints:
➢ Entity Integrity constraints say that no primary key can take a NULL value, since using
the primary key we identify each tuple uniquely in a relation.
Explanation: In the above relation, EID is made the primary key, and the primary key can’t take
NULL values but in the third tuple, the primary key is null, so it is violating Entity Integrity
constraints.
4. Referential Integrity Constraints
➢ The Referential integrity constraint is specified between two relations or tables and used to
maintain the consistency among the tuples in two relations.
➢ This constraint is enforced through a foreign key, when an attribute in the foreign key of
relation R1 has the same domain(s) as the primary key of relation R2, then the foreign key of
R1 is said to reference or refer to the primary key of relation R2.
➢ The values of the foreign key in a tuple of relation R1 can either take the values of the primary
key for some tuple in relation R2, or can take NULL values, but can’t be empty.
Explanation: In the above tables, the DNO of Table 1 is the foreign key, and DNO in Table 2 is
the primary key. DNO = 22 in the foreign key of Table 1 is not allowed because DNO = 22 is not
defined in the primary key of table 2. Therefore, Referential integrity constraints are violated here.
Database Schema
When we talk about a database, we must differentiate between the database schema, which is the
logical design of the database, and the database instance, which is a snapshot of the data in the
database at a given instant in time. The concept of a relation corresponds to the programming-
language notion of a variable, whilethe concept of a relation schema corresponds to the
programming-language notion of type definition.
In general, a relation schema consists of a list of attributes and their corresponding domains. The
concept of arelation instance corresponds to the programming-language notion of a value of a
variable. The value of a given variable may change with time;
Let us continue with our university database example. Each course in a university may be
offered multiple times, across different semesters, or even within a semester.We need a
relation to describe each individual offering, or section, of the class.
The schema is section (course id, sec id, semester, year, building, room number, time
slot id)
Figure 2 shows a sample instance of the section relation. We need a relation to describe the association
between instructors and the class sections that they teach. The relation schema to describe this
association is
Figure 3. shows a sample instance of the teaches relation. As you can imagine, there are many more
relations maintained in a real university database. In addition to those relations we have listed already,
instructor, department, course, section, prereq, and teaches,we use the following relations in this text:
➢ A database schema, along with primary key and foreign key dependencies, can be
depicted by schemadiagrams. Below figure shows the schema diagram for our
university organization. Each relation appears as abox, with the relation name at
the top in blue, and the attributes listed inside the box.
➢ Primary key attributes are shown underlined.
➢ Foreign key dependencies appear as arrows from the foreign key attributes of the
referencing relation to the primary key of the referenced relation.
Referential integrity constraints other than foreign key constraints are not shown explicitly in schema
diagrams. Entity relationship diagrams let us represent several kinds of constraints, including general
referential integrity constraints.Many database systems provide design tools with a graphical user
interface for creating schema diagrams.
➢ An entity type should have a key attribute which uniquely identifies each entity in the
entity set, but there exists some entity type for which key attribute can’t be defined.
These are called Weak Entity type.
➢ The entity sets which do not have sufficient attributes to form a primary key are
known as weak entity sets and the entity sets which have a primary key are known as
strong entity sets.
➢ As the weak entities do not have any primary key, they cannot be identified on their
own, so they depend on some other entity (known as owner entity). The weak entities
have total participation constraint (existence dependency) in its identifying
relationship with owner identity. Weak entity types have partial keys. Partial Keys are
set of attributes with the help of which the tuples of the weak entities can be
distinguished and identified. Note – Weak entity always has total participation but
Strong entity may not have total participation.
➢ . Weak entity is represented by double rectangle. The relation between one strong and
one weak entity is represented by double diamond. Partial Key attributes are
represented with dotted lines.
Example-1: In the below ER Diagram, ‘Payment’ is the weak entity. ‘Loan Payment’ is
the identifying relationship and ‘Payment Number’ is the partial key. Primary Key of the
Loan along with the partial key would be used to identify the records.
Views in SQL
➢ Views in SQL are considered as a virtual table. A view also contains rows and
columns.
➢ To create the view, we can select the fields from one or more tables present in the
database.
➢ A view can either have specific rows based on certain condition or all the rows of
a table.
1. Creating view
A view can be created using the CREATE VIEW statement. We can create a view
from a single table or multiple tables.
Syntax:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE condition;
Sample tables.
In this example, we create a View named DetailsView from the table Student_Detail.
Query:
FROM Student_Details
Just like table query, we can query the view to view the data.
View from multiple tables can be created by simply include multiple tables in the SELECT
statement.
In the given example, a view is created named MarksView from two tables Student_Detail
and Student_Marks.
Query:
2. Deleting View
A view can be deleted using the Drop View statement.
Syntax
DROP VIEW view_name;
Example: If we want to delete the View MarksView, we can do this as:
DROP VIEW MarksView;
3. Updating View
We can use the CREATE OR REPLACE VIEW statement to add or remove fields from
a view
syntax
CREATE OR REPLACE VIEW view_name AS
SELECT column1,column2,..
FROM table_name
WHERE condition
For example, if we want to update the view MarksView and add the field AGE to this View
from StudentMarks Table, we can do this as:
b. What are the key constraints? Explain the design issues of ER model
4. a. Draw an ER-Diagram for Week entity set and Strong entity set with example.
b. Explain the E-R diagram components and notations with their extended features?
5. a. Explain the E-R diagram components and notations with their extended features?
b. Explain the following with examples. a) Primary Key. b) foreign key c) Candidate
key
b. Explain the features of E-R model and its design issue constraints.
b. What are the key constraints? Explain the design issues of ER model.
8. a. Draw an ER-Diagram for Week entity set and Strong entity set with example.
b. Explain the E-R diagram components and notations with their extended features?
Chapter 3: SQL (structured query language)
Syllabus :
1. Overview
2. The form of basic SQL Query :UNION,INTERSECT &EXCEPT
3. Join operations :Equi join and Non Equi join
4. Nested Queries : correlated and uncorrelated
5. Aggregate functions
6. Null values
7. Views
8. Triggers
1. Overview :
What is SQL?
Why SQL?
When executing an SQL command for any RDBMS, the system determines the best way to
carry out your request, and the SQL engine figures out how to interpret the task.
There are various components included in this process. These components are −
• Query Dispatcher
• Optimization Engines
• Classic Query Engine
• SQL Query Engine, etc.
A classic query engine handles all the non-SQL queries, but a SQL query engine will not handle
logical files. Following is a simple diagram showing the SQL Architecture –
In SQL, set operators are used to combine the results of two or more SELECT statements into
a single result set. The commonly used set operators are UNION, INTERSECT, and EXCEPT
(or MINUS in some databases). These operators allow you to perform operations on sets of
rows rather than individual rows. Here's an overview of SQL set operators.
➢ The result sets (tables) of all the queries must be the same number of columns.
➢ In all result sets, the data type of each column must be well-matched and compatible
with the data type of its corresponding columns in another result set.
➢ The ORDER BY clause can be applied to the last query to sort the result.
Example :
Speakers Authors
UNION
➢ Union combines the results of two queries into a single result set of all matching
rows.
➢ Both queries must have the same number of columns and compatible data types to
unite.
➢ All duplicate records are removed automatically unless UNION ALL is used.
➢ Generally, it can be useful in applications where tables are not perfectly normalized,
for example, a data warehouse application.
Synthax :
Example-1:
You want to invite all the Speakers and Authors for the annual conference. Hence, how will
you prepare the invitation list?
Synthax :
union
order by name
output :
As you can see here, the default order is ascending order and you have to use in the last query
instead of both queries.
UNION ALL
You want to give a prize to all the Speakers and Authors at the annual conference. Hence,
how will you prepare the prize list?
Syntax:
union all
order by name
output
INTERSECT
It is used to take the result of two queries and returns the only those rows which are
common in both result sets. It removes duplicate records from the final result set.
Syntax
SELECT column_name(s) FROM table1
INTERSECT
SELECT column_name(s) FROM table2;
Example-3
You want the list of people who are Speakers and they are also Authors. Hence, how will you
prepare such a list?
Syntax:
intersect
select name from Authors
order by name
Output:
EXCEPT :
It is used to take the distinct records of two one query and returns the only those rows which
do not appear in the second result set.
Syntax :
SELECT column_name(s) FROM table1
EXCEPT
SELECT column_name(s) FROM table2;
Example-4
You want the list of people who are only Speakers and they are not Authors. Hence, how will
you prepare such a list?
Syntax :
select name from Speakers
except
order by name
OUTTPUT
Example-5
You want the list of people who are only Authors and they are not Speakers. Hence, how will
you prepare such a list?
Syntax :
select name from Authors
except
order by name
output :
A SQL Join statement combines data or rows from two or more tables based on a common
field between them.
➢ SQL Joins are mostly used when a user is trying to extricate data from multiple
tables (which have one-to-many or many-to-many relationships with each other) at
one time.
➢ Large databases are often prone to data redundancy, i.e., the creation of repetitive
data anomalies by insertion, deletion, and updation. But by using SQL Joins, we
promote database normalization, which reduces data redundancy and eliminates
redundant data.
➢ In SQL joins are mainly two types they are INNER JOINS and OUTER JOINS.
Join can represented by using below symbol.
JOINS
THETA JOIN NATURAL JOIN LEFT JOIN RIGHT JOIN FULL OUTER
(Non-Equi join) (Equi-Join)
JOIN
CONDITION
Equi-Join: An equi-join is a type of join where the matching condition is based on equality
between columns from different tables. It matches rows where the specified columns have the
same values.
➢ the Equi Join in SQL returns only the data in all the tables we are comparing based on
the common column field. It does not display null or unmatchable data.
➢ The equality operator in the Equi Join operation is used to refer to the equality in
the WHERE clause. However, it returns the same result when we use
the JOIN keyword with the ON clause along with column names and their respective
tables
Example :
SELECT *
OR
SELECT *
FROM TableName1
JOIN TableName2
ON TableName1.ColumnName = TableName2.ColumnName;
➢ An equi join is any JOIN operation that uses only and only the equals sign. If there
is a query with more than one join condition, out of which one condition has an
equals sign, and the other doesn't, then this query would be considered a non-equi
join in SQL.
Non-equi join :
➢ A non-equi join, also known as a range join or a theta join, is a type of join operation
where the joining condition involves operators other than equality, such as greater
than (>), less than (<), greater than or equal to (>=), less than or equal to (<=), or
not equal to (!= or <>).
➢ Non-Equi Join is also a type of INNER Join in which we need to retrieve data from
multiple tables. In a non-equi join, rows are matched based on a range of values
rather than a direct equality. Non-equi joins are less common and are often used to
solve specific data analysis problems.
SELECT *
FROM TableName1, TableName2
WHERE TableName1.columnName [> | < | >= | <= | != | BETWEEN ] table_name2.column;
EQUI-Join Example:
Suppose we have two tables, namely state and city, which contain the name of the states and
the name of the cities, respectively. In this example, we will map the cities with the states in
which they are present.
Table City :
Now, if we execute a query of Equi-join using the equality operation and the WHERE clause,
then
SELECT *
FROM state, city
WHERE state.State_Id = city.City_Id;
Output
2 Uttarakhand 2 Rishikesh
3 Madhyapradesh 3 Gwalior
Now, if we execute a query of Non-Equi-join using any operator other than the equality
operator, such as >(greater than) with the WHERE clause –
SELECT *
FROM test1,test2
WHERE test1.SNo > test2.SNo;
Output
FROM table1
ON table1.column = table2.column;
NATURAL JOIN
➢ SQL Natural Join is a type of Inner join based on the condition that columns having
the same name and datatype are present in both the tables to be joined.
SELECT *
FROM table-1
SQL Outer joins give both matched and unmatched rows of data depending on the type of outer
joins. These types are outer joins are sub-divided into the following types:
➢ A left join returns all the rows from the left table and the matching rows from the
right table. If there are no matching rows in the right table, NULL values are
included for the columns of the right table. The syntax for a left join is as follows:
SELECT *
FROM table1
LEFT JOIN table2
ON table1.column = table2.column;
A right join returns all the rows from the right table and the matching rows from the left table.
If there are no matching rows in the left table, NULL values are included for the columns of
the left table. The syntax for a right join is as follows:
SELECT *
FROM table1
ON table1.column = table2.column;
A full join returns all the rows from both the left and right tables. If there are no matching rows
in either table, NULL values are included for the columns of the non-matching table.
SELECT *
FROM table1
Theta join :
In SQL, a theta join, also known as a non-equi join or a range join, is a type of join operation
where the joining condition involves comparison operators other than equality (=).
The syntax for a theta join typically involves using the JOIN keyword followed by the joined
tables and the join condition with the desired comparison operator(s). Here's an example:
SELECT *
FROM table1
JOIN table2
Examples :
Customer Orders
1. LEFT JOIN
Right Join
4. Nested Queries:
➢ A nested query in SQL contains a query inside another query. The outer query will
use the result of the inner query. For instance, a nested query can have
two SELECT statements, one on the inner query and the other on the outer query.
In independent nested queries, the execution order is from the innermost query to the outer
query. An outer query won't be executed until its inner query completes its execution. The outer
query uses the result of the inner query. Operators such as IN, NOT IN, ALL, and ANY are
used to write independent nested queries.
➢ If a subquery uses any refrences from outer query, it would be called
Independent Subquery.
➢ The IN operator checks if a column value in the outer query's result is present in
the inner query's result. The final result will have rows that satisfy the IN condition.
➢ The NOT IN operator checks if a column value in the outer query's result is not
present in the inner query's result. The final result will have rows that satisfy
the NOT IN condition.
➢ The ALL operator compares a value of the outer query's result with all the
values of the inner query's result and returns the row if it matches all the values.
➢ The ANY operator compares a value of the outer query's result with all the inner
query's result values and returns the row if there is a match with any value.
In co-related nested queries, the inner query uses the values from the outer query to execute
the inner query for every row processed by the outer query. The co-related nested queries run
slowly because the inner query is executed for every row of the outer query's result.
Syntax:
SELECT column1, column2, ....
FROM table1 outer
WHERE column1 operator
(SELECT column1, column2
FROM table2
WHERE expr1 = outer.expr2);
Examples :
We will use the Employees and Awards table below to understand independent and co-related
nested queries. We will be using Oracle SQL syntax in our queries.
);
);
Example 1: IN
Example 2: NOT IN
Output :
Example 3: ALL
➢ Select all Developers who earn more than all the Managers
);
Output :
Example 4: ANY
Output :
➢ Select all employees whose salary is above the average salary of employees in their
role.
SELECT AVG(salary)
);
Output :
An aggregate function in SQL performs a calculation on multiple values and returns a single
value. SQL provides many aggregate functions that include avg, count, sum, min, max, etc. An
aggregate function ignores NULL values when it performs the calculation, except for the count
function.
1. COUNT() Function
The COUNT() aggregate function returns the total number of rows from a database table that
matches the defined criteria in the SQL query.
Syntax
COUNT(*) OR COUNT(COLUMN_NAME)
Example :
The given table named EMP_DATA consists of data concerning 10 employees working in the
same organization in different departments.
1. Suppose you want to know the total number of employees working in the organization.
You can do so by the below-given query.
As COUNT(*) returns the total number of rows and the table named EMP_DATA provided
above consists of 10 rows, so the COUNT(*) function returns 10. The output is printed as
shown below.
Output: 10
Note: Except for COUNT(*), all other SQL aggregate functions ignore NULL values.
2. Suppose you need to count the number of people who are getting a salary. The query
given below can help you achieve this.
Output : 9
Here, the Salary column is passed as a parameter to the COUNT() function, and hence, this
query returns the number of non NULL values from the column Salary, i.e. 9.
3. Suppose you need to count the number of distinct departments present in the
organization. The following query can help you achieve this.
Output: 3
The above query returns the total number of distinct non NULL values over the column
Department i.e. 3 (Marketing, Production, R&D). The DISTINCT keyword makes sure that
only non-repetitive values are counted.
4. What if you want to calculate the number of people whose salaries are more than a
given amount(say 70,000)? Check out the example below.
Output : 5
The query returns the number of rows where the salary of the employee is greater than or equal
to 70,000 i.e 5.
2. SUM() Function
The SUM() function takes the name of the column as an argument and returns the sum of all
the non NULL values in that column. It works only on numeric fields(i.e the columns contain
only numeric values). When applied to columns containing both non-numeric(ex - strings) and
numeric values, only numeric values are considered. If no numeric values are present, the
function returns 0.
Syntax:
The function name is SUM() and the name of the column to be considered is passed as an
argument to the function.
SUM(COLUMN_NAME)
Example:
1. Suppose you need to build a budget for the organization and you need to know the total
amount needed to provide salaries to all the employees. To calculate the sum of all the
values present in column Salary. You can refer to the below-given example.
Output :646000
The above-mentioned query returns the sum of all non-NULL values over the column Salary
i.e 80000 + 76000 + 76000 + 84000 + 80000 + 64000 + 60000 + 60000 + 66000 = 646000
2. What if you need to consider only distinct salaries? The following query will help you
achieve that.
Output : 430000
The DISTINCT keyword makes sure that only non-repetitive values are considered. The query
returns the sum of all distinct non NULL values over the column Salary i.e. 80000 + 76000 +
84000 + 64000 + 60000 + 66000 = 430000.
3. Suppose you need to know the collective salaries for each department(say Marketing).
The query given below can help you achieve this.
Output :160000
The query returns the sum of salaries of employees who are working in the Marketing
Department i.e 80000 + 80000 = 160000.
Note: There are 3 rows consisting of Marketing as Department value but the third value is a
NULL value. Thus, the sum is returned considering only the first two entries having Marketing
as Department.
3.AVG() Function
The AVG() aggregate function uses the name of the column as an argument and returns the
average of all the non NULL values in that column. It works only on numeric fields(i.e the
columns contain only numeric values).
Note: When applied to columns containing both non-numeric (ex - strings) and numeric values,
only numeric values are considered. If no numeric values are present, the function returns 0.
Syntax:
The function name is AVG() and the name of the column to be considered is passed as an
argument to the function.
AVG(COLUMN_NAME)
Example:
Output : 71777.77777
Here, the column name Salary is passed as an argument and thus the values present in column
Salary are considered. The above query returns the average of all non NULL values present in
the Salary column of the table.
Average = (80000 + 76000 + 76000 + 84000 + 80000 + 64000 + 60000 + 60000 + 66000 ) / 9
= 646000 / 9 = 71777.77777
2. If you need to consider only distinct salaries, the following query will help you out.
Output : 71666.66666
The query returns the average of all non NULL distinct values present in the Salary column of
the table.
The MIN() function takes the name of the column as an argument and returns the minimum
value present in the column. MIN() returns NULL when no row is selected.
Syntax:
The function name is MIN() and the name of the column to be considered is passed as an
argument to the function.
MIN(COLUMN_NAME)
Example;
1. Suppose you want to find out what is the minimum salary that is provided by the
organization. The MIN() function can be used here with the column name as an
argument.
SELECT MIN(Salary) FROM EMP_DATA;
Output :60000
The query returns the minimum value of all the values present in the mentioned column i.e
60000.
2. Suppose you need to know the minimum salary of an employee belonging to the
Production department. The following query will help you achieve that.
Output:60000
The query returns the minimum value of all the values present in the mentioned column and
has Production as Department value i.e 60000.
5. MAX() Function
The MAX() function takes the name of the column as an argument and returns the maximum
value present in the column. MAX() returns NULL when no row is selected.
Syntax:
The function name is MAX() and the name of the column to be considered is passed as an
argument to the function.
MAX(COLUMN_NAME)
Example :
1. Suppose you want to find out what is the maximum salary that is provided by the
organization. The MAX() function can be used here with the column name as an
argument.
Output : 84000
The query returns the maximum value of all the values present in the mentioned column i.e
84000.
2. Suppose you need to know the maximum salary of an employee belonging to the R&D
department. The following query will help you achieve that.
Output : 84000
The query returns the maximum value of all the values present in the mentioned column and
has R&D as Department value i.e 84000.
6. Null values
➢ In SQL, NULL represents the absence of a value. It is used to indicate that a data
point does not have a value or that the value is unknown or undefined. Here are
some important points to understand about handling NULL values in SQL:
➢ NULL is not the same as an empty string or zero. It is a distinct value that signifies
the absence of a value.
➢ NULL values can be used in columns of any data type, including numeric, string,
date, and other data types.
➢ When performing comparisons involving NULL values, the result is always
unknown (neither true nor false). Therefore, you cannot use standard equality
operators like = or <> to compare NULL values.
➢ To check for NULL values in SQL, you use the IS NULL or IS NOT NULL
operators. For example:
SELECT * FROM table_name WHERE column_name IS NULL;
➢ When performing calculations involving NULL values, the result is usually NULL.
However, some database systems have specific behaviors when NULL values are
involved in calculations, so it's important to consult the documentation for your
specific database management system (DBMS) to understand its behavior.
➢ When inserting or updating data in a table, you can explicitly set a column to NULL
if you want to represent the absence of a value. For example:
➢ NULL values can also be used in joins and filtering conditions. For example, you
can include NULL values in a result set using a LEFT JOIN.
➢ It's important to handle NULL values appropriately in your SQL queries to ensure
accurate and reliable data processing. Be aware of any specific behavior and
handling of NULL values in your chosen database system, as it can vary between
different database management systems.
7. Triggers
➢ In SQL, a trigger is a database object that is associated with a table and
automatically executes a set of actions in response to certain database events, such
as INSERT, UPDATE, or DELETE operations on the table. Triggers are useful for
enforcing data integrity rules, auditing changes, maintaining derived data, or
implementing complex business logic within the database. Here's an overview of
SQL trigger
➢ Syntax: The basic syntax to create a trigger in SQL is as follows:
CREATE TRIGGER trigger_name
{BEFORE | AFTER}
{INSERT | UPDATE | DELETE}
ON table_name
[FOR EACH ROW]
[WHEN (condition)]
BEGIN
-- Trigger actions here (trigger body)
END;
Types of Triggers
DML Triggers
After Triggers
➢ These triggers execute after the database has processed a specified event (such as
an INSERT, UPDATE, or DELETE statement). AFTER triggers are commonly used
to perform additional processing or auditing tasks after a data modification has
occurred.
Instead Triggers
➢ These triggers are used for views and fire instead of the DML statement (INSERT,
UPDATE, DELETE) on the view.
DDL Triggers
➢ These triggers fire in response to data definition language (DDL) statements like
CREATE, ALTER, or DROP.
LOGON Triggers
LOGOFF Triggers
SERVERERROR Triggers
EXAMPLE :
Let’s take an example. Let’s assume a student table with column id, first_name, last_name,
and full_name.
Query 1:
Query 2:
Query 3:
Query 4:
/* Display all the records from the table */
OUTPUT
8. Views in SQL
Refer in Chapter 2.
Previous years ASKED questions and Answers (SQL_Querrys)
1. Write SQL Queries for the given Schemas employees (emp-id, first-name, last-name,
hire-date, dept-id, salary, job) departments (dept-id, dept-name, manager-id, location-
id)
i) Display the last names and hire dates of all latest hires in their respective
Departments in the location ID 1700.
ii) Write an SQL Query to display all employee names and salary whose salary is
Greater than minimum salary of the company and job title starts with ' 𝑀 '.
Answer :
i) Display the last names and hire dates of all latest hires in their respective Departments
ii) Write an SQL Query to display all employee names and salary whose salary is Greater
than minimum salary of the company and job title starts with ' 𝑀 '.
b. Find the IDs and names of all students who have not taken any course offering before
Spring 2009.
c. For each department, lind the maximum salary of instructors in that department, You may
assume that every department has at least one instructor.
d. Find the lowest, across all departments, of the per-department maximum salary computed
by the preceding query.
Answer :
a) Find the names of all students who have taken at least one Comp. Sci. course; make sure
there are no duplicate names in the result.
FROM departments d
d. Find the lowest, across all departments, of the per-department maximum salary computed
by the preceding query.
This query uses the CASE statement to evaluate the score of each student and assign the
corresponding grade based on the given conditions. The result will include the student ID,
score, and the calculated grade.
a. Display the grade for each student, based on the marks relation.
This query first calculates the grade for each student using the same CASE statement
as in the previous query. It then groups the result by the grade and counts the
number of students for each grade using the COUNT(*) function. The result will
include the grade and the corresponding number of students.
SELECT name
FROM student
WHERE ID NOT IN (
SELECT ID
FROM takes
);
b) Find the names of the instructors not teaching any course
SELECT name
FROM instructor
WHERE ID NOT IN (
SELECT ID
FROM teaches
);
c) Find the total number of courses taught department wise
Answer :
SELECT Employee.Name
FROM Employee
JOIN Company ON Employee.Emp_no = Company.Emp_no
WHERE Company.Company_name = 'XYZ';
Answer :
i. Select the department name of instructor whose name neither "A " nor , 𝐵 ′′ .
SELECT dept_name
FROM Instructor
WHERE name NOT IN ('A', 'B');
ii. Find the number of instructors of each department who teach course "DBMS"
SELECT dept_name
FROM Instructor
GROUP BY dept_name
HAVING AVG(salary) = (
SELECT MAX(avg_salary)
FROM (
SELECT AVG(salary) AS avg_salary
FROM Instructor
GROUP BY dept_name
) subquery
);
iv. Give a 5% salary raise to instructor whose salary is less than average.
UPDATE Instructor
SET salary = salary * 1.05
WHERE salary < (
SELECT AVG(salary)
FROM Instructor
);
SELECT name
FROM Student
WHERE courseId = 'computer';
(iii) List name of all courses with their duration
SELECT name
FROM Student
WHERE name LIKE 'a%';
(v) List email Id and cell no of all mechanical engineering students
ii) Find the names of the sailors who have reserved at least two boats
SELECT s.sname
FROM Sailor s
JOIN Reserves r ON s.sid = r.sid
GROUP BY s.sname
HAVING COUNT(r.bid) >= 2;
Schema: A database schema is a blueprint that represents the tables and relations of a data set.
Good database schema design is essential to making your data tractable so that you can make
sense of it and build the dashboards, reports, and data models that you need.
A good schema design is crucial for the efficient and effective management of data in a database
system. It plays a fundamental role in determining how data is organized, stored, and retrieved,
and impacts the overall performance, scalability, and maintainability of the system. Here are
some key reasons why a good schema design is important
Data organization: A well-designed schema helps in structuring data in a logical and organized
manner. It defines the tables, relationships, and constraints that govern the data model, ensuring
data integrity and consistency. This organization facilitates easy navigation and understanding
of the data, making it more manageable and accessible.
Query performance: The schema design significantly impacts the performance of database
queries. By properly structuring tables, defining appropriate indexes, and optimizing data
types, a good schema design can enhance query execution speed and minimize resource
consumption. Efficient query performance leads to faster response times and improved overall
system performance.
Data integrity and consistency: A good schema design enforces data integrity and ensures
consistency. By defining appropriate constraints, such as primary keys, foreign keys, unique
constraints, and check constraints, it prevents the insertion of invalid or inconsistent data. This
helps maintain data quality and reliability throughout the system.
Scalability: A well-designed schema allows for easy scalability as the volume and complexity
of data grow. By considering future requirements and potential expansion, a good schema
design can accommodate evolving needs without significant rework or performance
degradation. This scalability is crucial for applications and systems that need to handle
increasing data loads over time.
Maintainability and extensibility: A good schema design simplifies the maintenance and
evolution of the database system. It provides a solid foundation for making changes and
additions to the schema without causing disruptions or data inconsistencies. A well-designed
schema also allows for seamless integration with new features or modules, making the system
more extensible and adaptable to future enhancements.
Data analysis and reporting: A well-designed schema facilitates effective data analysis and
reporting. By structuring data in a way that aligns with the analytical needs of the system, a
good schema design enables efficient querying, aggregation, and summarization of data. This,
in turn, supports decision-making processes and enables the extraction of meaningful insights
from the data
In summary, a good schema design is essential for data organization, query performance, data
integrity, scalability, maintainability, and data analysis. It is a foundational element in the
design and implementation of a robust and efficient database system.
Bad schema designs can lead to several problems that can hinder the efficient management and
utilization of data in a database system. Here are some common problems encountered with
bad schema designs:
➢ Poor query performance: A bad schema design can result in slow and inefficient query
performance. This can be due to a lack of proper indexing, inappropriate data types, or
inefficient table relationships. Slow queries can negatively impact the overall system
performance and user experience.
➢ Data redundancy and inconsistency: Inadequate schema designs can lead to data
redundancy and inconsistency. Redundant data takes up unnecessary storage space and
can cause data integrity issues when updates or modifications are made. Inconsistent
data, such as conflicting values or duplicate records, can lead to inaccurate results and
unreliable information.
➢ Difficulty in data maintenance: Bad schema designs can make data maintenance
challenging and error-prone. Without proper constraints and relationships, it becomes
harder to enforce data integrity and ensure consistent updates. This can lead to data
corruption, data loss, or difficulties in updating and modifying data in a controlled and
reliable manner.
➢ Lack of scalability: A poorly designed schema may lack scalability, making it difficult
to accommodate future growth and evolving data requirements. This can result in
performance degradation and the need for extensive schema modifications when the
system needs to handle increased data volumes or changes in data structure.
➢ Limited flexibility and extensibility: Bad schema designs can restrict the flexibility
and extensibility of the database system. It may be challenging to add new features or
modify existing ones without significant schema changes. This can lead to increased
development time, complexity, and potential disruptions to the system.
➢ Data analysis and reporting challenges: Inefficient schema designs can make data
analysis and reporting difficult. Poorly organized data, lack of appropriate relationships,
or inconsistent naming conventions can hinder the extraction of meaningful insights
from the data. This can limit the effectiveness of decision-making processes and hinder
the overall value derived from the data.
➢ Increased development and maintenance costs: Bad schema designs can result in
higher development and maintenance costs. Fixing or modifying a poorly designed
schema requires significant effort and resources. It may involve rewriting queries,
restructuring tables, or migrating data, which can be time-consuming and error-prone.
In summary, bad schema designs can lead to poor query performance, data redundancy and
inconsistency, difficulties in data maintenance, limited scalability and flexibility, challenges in
data analysis and reporting, and increased development and maintenance costs. It is crucial to
invest time and effort in designing a well-thought-out schema to avoid these problems and
ensure the efficient management of data in a database system
3. FUNCTIONAL DEPENDENCY
For any relation R, attribute Y is functionally dependent on attribute X(usually the Primary
key), It is denoted by X -> Y, where X is called a determinant and Y is called dependent.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.
Emp_Id → Emp_Name
William Armstrong in 1974 suggested a few rules related to functional dependency. They are
called RAT rules.
Secondary Rules
1. Union– Union rule says, if X determines Y and X determines Z, then X must also
determine Y and Z. If X → Y and X → Z then X → YZ
This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
If X → YZ then X → Y and X → Z.
➢ The Closure of Functional Dependency means the complete set of all possible
attributes that can be functionally derived from given functional dependency using the
inference rules known as Armstrong’s Rules.
➢ If “F” is a functional dependency then closure of functional dependency can be
denoted using “{F}+ ”.
There are three steps to calculate closure of functional dependency. These are:
➢ Step-1 : Add the attributes which are present on Left Hand Side in the original
functional dependency.
➢ Step-2 : Now, add the attributes present on the Right Hand Side of the functional
dependency.
➢ Step-3 : With the help of attributes present on Right Hand Side, check the other
attributes that can be derived from the other given functional dependencies. Repeat this
process until all the possible attributes which can be derived are added in the closure.
{Roll_no}+ = {Roll_No}
➢ Step-2 : Add attributes present on the RHS of the original functional dependency to
the closure.
Step-3 : Add the other possible attributes which can be derived using attributes present on the
RHS of the closure. So Roll_No attribute cannot functionally determine any attribute but Name
attribute can determine other attributes such as Marks and Location using 2nd Functional
Dependency
Similarly, we can calculate closure for other attributes too i.e “Name”.
Step-1 : Add attributes present on the LHS of the functional dependency to the closure.
{Name}+ = {Name}
Step-2 : Add the attributes present on the RHS of the functional dependency to the closure.
➢ Step-3 : Since, we don’t have any functional dependency where “Marks or Location”
attribute is functionally determining any other attribute , we cannot add more attributes
to the closure. Hence complete closure of Name would be :
➢ NOTE : We don’t have any Functional dependency where marks and location can
functionally determine any attribute. Hence, for those attributes we can only add the
attributes themselves in their closures. Therefore,
➢ {Marks}+ = {Marks}
➢ {Location}+ = { Location}
Example :
In a relation R(ABCD) ,given functional dependencies {A->B , B->C , C->D} find closure
of each attribute.
{A}+ = {ABCD}
{B}+ = {BCD}
{C}+ = {CD}
{D}+ = {D}
Here attribute A have all attributes have in their closure , so it is a candidate key of relation.
Example :
In a relation R(ABCD) ,given functional dependencies {A->B , B->C , C->D, D->A} find
closure of each attribute.
{A}+ = {ABCD}
{B}+ = {BCDA}
{C}+ = {CDAB}
{D}+ = {DABC}
6. MINIMAL COVERS:
➢ A minimal cover is a simplified and reduced version of the given set of functional
dependencies.
Since it is a reduced version, it is also called as Irreducible set.
It is also called as Canonical Cover.
Characteristics :
Canonical cover is called minimal cover which is called the minimum set of FDs. A set of FD
is called canonical cover of F if each FD in
➢ Simple FD.
➢ Left reduced FD.
➢ Non-redundant FD.
Need :
• Working with the set containing extraneous functional dependencies increases the
computation time.
• Therefore, the given set is reduced by eliminating the useless functional
dependencies.
• This reduces the computation time and working with the irreducible set becomes
easier.
➢ Consider each functional dependency one by one from the set obtained in Step-01.
➢ Determine whether it is essential or non-essential.
To determine whether a functional dependency is essential or not, compute the closure of its
left side-
• Once by considering that the particular functional dependency is present in the set
• Once by considering that the particular functional dependency is not present in the set
• Consider the newly obtained set of functional dependencies after performing Step-02.
• Check if there is any functional dependency that contains more than one attribute on
its left side.
Then following two cases are possible-
Case-01: No-
• There exists no functional dependency containing more than one attribute on its
left side.
• In this case, the set obtained in Step-02 is the canonical cover.
Case-02: Yes-
• There exists at least one functional dependency containing more than one attribute on
its left side.
• In this case, consider all such functional dependencies one by one.
• Check if their left side can be reduced.
•
PRACTICE PROBLEM BASED ON FINDING CANONICAL COVER-
Problem-
The following functional dependencies hold true for the relational scheme R ( W , X , Y , Z )
–
X→W
WZ → XY
Y → WXZ
Write the irreducible equivalent for this set of functional dependencies.
Solution-
Step-01:
Write all the functional dependencies such that each contains exactly one attribute on its right
side-
X→W
WZ → X
WZ → Y
Y→W
Y→X
Y→Z
Step-02:
Check the essentiality of each functional dependency one by one.
For X → W:
• ConsideringX → W, (X)+ = { X , W }
• Ignoring X → W, (X)+ = { X }
Now,
• Clearly,
the two results are different.
• Thus, we conclude that X → W is essential and can not be eliminated.
For WZ → X:
For Y → W:
• Considering Y → W, (Y)+ = { W , X , Y , Z }
• Ignoring Y → W, (Y)+ = { W , X , Y , Z }
Now,
• Clearly,
the two results are same.
• Thus, we conclude that Y → W is non-essential and can be eliminated.
Eliminating Y → W, our set of functional dependencies reduces to-
X→W
WZ → Y
Y→X
Y→Z
For Y → X:
• Considering Y → X, (Y)+ = { W , X , Y , Z }
• Ignoring Y → X, (Y)+ = { Y , Z }
Now,
• Clearly, the two results are different.
• Thus, we conclude that Y → X is essential and can not be eliminated.
For Y → Z:
• Considering Y → Z, (Y)+ = { W , X , Y , Z }
• Ignoring Y → Z, (Y)+ = { W , X , Y }
Now,
• Clearly,
the two results are different.
• Thus, we conclude that Y → Z is essential and can not be eliminated.
From here, our essential functional dependencies are-
X→W
WZ → Y
Y→X
Y→Z
Step-03:
• Consider the functional dependencies having more than one attribute on their left side.
• Check if their left side can be reduced.
In our set,
• Only WZ → Y contains more than one attribute on its left side.
• Considering WZ → Y, (WZ)+ = { W , X , Y , Z }
Now,
• Consider all the possible subsets of WZ.
• Check if the closure result of any subset matches to the closure result of WZ.
(W)+ = { W }
(Z)+ = { Z }
Clearly,
• None of the subsets have the same closure result same as that of the entire left side.
• Thus, we conclude that we can not write WZ → Y as W → Y or Z → Y.
• Thus, set of functional dependencies obtained in step-02 is the canonical cover.
Example 2:
Minimize {A->C, AC->D, E->H, E->AD}
7. NORMALIZATION
Normalization is the process of organizing the data and the attributes of a database. It is
performed to reduce the data redundancy in a database and to ensure that data is stored
logically.
➢ It helps to divide large database tables into smaller tables and make a relationship
between them. It can remove the redundant data and ease to add, manipulate or delete
table fields.
➢ Data redundancy in DBMS means having the same data but at multiple places.
➢ It is necessary to remove data redundancy because it causes anomalies in a database
which makes it very hard for a database administrator to maintain it.
➢ A normal form is a process that evaluates each relation against defined criteria and
removes the multi valued, joins, functional and trivial dependency from a relation.
The motivation for normal forms in database design is to eliminate data redundancy and
anomalies, ensure data integrity, and promote efficient data management.
Normal forms provide guidelines and principles for structuring the database schema to achieve
these objectives. Here are some key motivations for normal forms
➢ Eliminate data redundancy: Redundant data occurs when the same information is
repeated across multiple records or tables. This redundancy wastes storage space and
can lead to inconsistencies when updating or modifying data. Normal forms help
identify and eliminate redundant data by organizing data into separate tables based on
their functional dependencies.
➢ Prevent update anomalies: Update anomalies occur when modifying data results in
inconsistencies or unintended changes. For example, if the same data is stored in
multiple places and not all instances are updated correctly, inconsistencies can arise.
Normal forms help prevent these anomalies by ensuring that data is stored in a way that
allows for easy and controlled updates without introducing inconsistencies.
➢ Maintain data integrity: Data integrity refers to the accuracy, validity, and consistency
of data. Normal forms help enforce data integrity by defining appropriate constraints,
such as primary keys, foreign keys, and entity relationships. These constraints ensure
that data is correctly and consistently represented, preventing the insertion of invalid or
inconsistent data.
➢ Simplify data management and maintenance: Normal forms provide guidelines for
organizing data in a logical and structured manner. By following these guidelines,
database management and maintenance tasks become more manageable and less prone
to errors. Normalized schemas are typically easier to understand, navigate, and modify,
reducing the complexity and effort required for data management activities
➢ Support efficient query processing: Normal forms can contribute to improved query
performance and efficiency. By reducing data redundancy and organizing data based on
functional dependencies, normalized schemas allow for more efficient retrieval and
manipulation of data. Well-designed indexes and relationships based on normal forms
can speed up query execution and improve overall system performance.
➢ Facilitate data integration and interoperability: Normalized schemas provide a
standardized and consistent way of representing data. This promotes data integration
and interoperability across different systems and applications. By adhering to normal
forms, databases can easily exchange and share data without conflicts or
inconsistencies, enabling seamless integration and collaboration.
➢ Adapt to evolving data requirements: Normal forms provide a foundation for a flexible
and extensible database design. By organizing data based on functional dependencies and
avoiding data anomalies, normalized schemas can adapt to changing data requirements
without significant schema modifications. This scalability and flexibility are crucial for
accommodating future data growth and evolving business needs
➢ In summary, the motivations for normal forms in database design are to eliminate data
redundancy and anomalies, ensure data integrity, simplify data management, support
efficient query processing, facilitate data integration, and adapt to evolving data
requirements. By following normal forms, database designers can create well-structured
and efficient database schemas that promote reliable and effective data management
Insertion anomalies: This occurs when we are not able to insert data into a database because
some attributes may be missing at the time of insertion.
Updation anomalies: This occurs when the same data items are repeated with the same
values and are not linked to each other.
Deletion anomalies: This occurs when deleting one part of the data deletes the other
necessary information from the database.
8.NORMAL FORMS :
The process of normalization helps us divide a larger table in the database into various smaller
tables and then link their using relationships. Normal forms are basically useful for reducing
the overall redundancy (repeating data) from the tables present in a database, so as to ensure
logical storage.
There are four types of normal forms that are usually used in relational databases as you can
see in the following figure:
➢ Here, the Corse content is a multi-valued attribute. So, this relation is not in 1NF.
➢ We re-arrange the relation (table) as below, to convert it to First Normal Form.
To convert this table into 1NF, we make new rows with each Course Content as a new row as
shown below
For a relational table to be in second normal form, it must satisfy the following rules:
If a partial dependency exists, we can divide the table to remove the partially dependent
attributes and move them to some other table where they fit in well.
Example : Student_Project relation
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually.
But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by
Proj_ID independently. This is called partial dependency, which is not allowed in Second
Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
• X -> Y
• Y does not -> X
• Y -> Z
For a relational table to be in third normal form, it must satisfy the following rules:
If a transitive dependency exists, we can divide the table to remove the transitively dependent
attributes and place them to a new table along with a copy of the determinant.
Example :
We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is
a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there
exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as
follows –
The 2NF and 3NF impose some extra conditions on dependencies on candidate keys and
remove redundancy caused by that. However, there may still exist some dependencies that
cause redundancy in the database. These redundancies are removed by a more strict normal
form known as BCNF.
For a relational table to be in Boyce-Codd normal form, it must satisfy the following rules:
The above table satisfies all the normal forms till 3NF, but it violates the rules of BCNF because
the candidate key of the above table is {Employee Code, Project ID}. For the non-trivial
functional dependency, Project Leader -> Project ID, Project ID is a prime attribute but Project
Leader is a non-prime attribute. This is not allowed in BCNF.
To convert the given table into BCNF, we decompose it into three tables:
Thus, we’ve converted the <EmployeeProjectLead> table into BCNF by decomposing it into
<EmployeeProject> and <ProjectLead> tables.
9.DECOMPOSITIONS AND DESIRABLE PROPERTIES
➢ A relation in BCNF is free of redundancy and a relation schema in 3NF comes close. If
a relation schema is not in one of these normal forms, the FDs that cause a violation
can give us insight into the potential problems.
➢ A decomposition of a relation schema R consists of replacing the relation schema by
two (or more) relation schemas that each contain a subset of the attributes of R and
together include all attributes in R.
➢ When a relation in the relational model is not appropriate normal form then the
decomposition of a relation is required. In a database, breaking down the table into
multiple tables termed as decomposition.
Attribute Preservation:
➢ Using functional dependencies the algorithms decompose the universal relation schema
R in a set of relation schemas D = { R1, R2, ….. Rn } relational database schema, where
‘D’ is called the Decomposition of R.
➢ The attributes in R will appear in at least one relation schema Ri in the decomposition,
i.e., no attribute is lost. This is called the Attribute Preservation condition of
decomposition.
Dependency Preservation:
For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).
3Lossless Join:
➢ Lossless join property is a feature of decomposition supported by normalization. It is
the ability to ensure that any instance of the original relation can be identified from
corresponding instances in the smaller relations.
➢ The relation is said to be lossless decomposition if natural joins of all the decomposition
give the original relation.
➢ X intersection Y -> X, that is: all attributes common to both X and Y functionally
determine ALL the attributes in X.
➢ X intersection Y -> Y, that is: all attributes common to both X and Y functionally
determine ALL the attributes in Y
➢ If X intersection Y forms a super key of either X or Y, the decomposition of R is a
lossless decomposition.
1. Define Functional dependencies. What are the Armstrong's axioms for FD's?
2. What is Normalization? Discuss what are the types? Discuss the 1NF, 2NF, 3NF with
example?
3. Define BCNF? How does BCNF differ from 3NF? Explain with an example.
4. Explain the importance of good schema design and the problems encountered with
bad schema designs.
5. What is Normalization? Discuss what are the types? Discuss the 1NF, 2NF, 3NF with
example?
6. Define Functional dependencies? How are primary keys related to functional
dependencies?
7. Explain decompositions and desirable properties.
Chapter 5
TRANSACTION MANAGEMENT
Syllabus:
1. Transaction concept
2. Transaction state
3. System log, commit point
4. Desirable properties of transaction
5. Concurrent executions
6. Serializability
7. Recoverability
8. Implementation of isolation
9. Transaction definition in SQL
10. Testing for serializability
11. Serializability by locks-locking systems with several lock modes
12. Concurrency control by timestamps and validation.
1. TRANSACTION CONCEPT
➢ Transactions are a set of operations that are used to perform some logical set of work.
A transaction is made to change data in a database which can be done by inserting new
data, updating the existing data, or by deleting the data that is no longer required.
➢ There are certain types of transaction states which tell the user about the current
condition of that database transaction and what further steps to be followed for the
processing.
Operations in Transaction
A certain set of operations takes place when a transaction is done that is used to perform
some logical set of operations. For example: When we go to withdraw money from ATM, we
encounter the following set of operations:
1. Transaction Initiated
2. You have to insert an ATM card
3. Select your choice of language
4. Select whether savings or current account
5. Enter the amount to withdraw
6. Entering your ATM pin
7. Transaction processes
8. You collect the cash
9. You press finish to end transaction
The above mentioned are the set of operations done by you. But in the case of a transaction in
DBMS there are three major operations that are used for a transaction to get executed in an
efficient manner. These are:
Let's understand the above three sets of operations in a transaction with a real-life example of
transferring money from Account1 to Account2.
Initial balance in both the banks before the start of the transaction
➢ This data before the start of the transaction is stored in the secondary memory (Hard
disk) which once initiated is bought to the primary memory (RAM) of the system for faster
and better access.
➢ Now for a transfer of ₹ 500 from Account1 to Account2 to occur, the following set of
operations will take place.
➢ Read (Account1) --> 5000 Account1 = Account1 - 500 Write (Account1) --> 4500
Read (Account2) --> 2000 Account2 = Account2 + 500 Write (Account2) --> 2500 commit
➢ The COMMIT statement permanently saves the changes made by the current
transaction. When a transaction is successful, COMMIT is applied. If the system fails
before a COMMIT is applied, the transaction reaches its previous state after
ROLLBACK.
➢ After commit operation the transaction ends and updated values of Account1 = ₹ 4500
and Account2 = ₹ 2500. Every single operation that occurs before the commit is said to
be in a partially committed state and is stored in the primary memory (RAM). After the
transaction is committed, the updated data is accepted and updated in the secondary memory
(Hard Disk).
➢ If in some case, the transaction failed anywhere before committing, then that
transaction gets aborted and have to start from the beginning as it can’t be continued from
the previous state of failure. This is known as Roll Back.
During the lifetime of a transaction, there are a lot of states to go through. These states update
the operating system about the current state of the transaction and also tell the user about how
to plan further processing of the transaction. These states decide the regulations which decide
the fate of a transaction whether it will commit or abort.
➢ The ROLLBACK statement undo the changes made by the current transaction. A
transaction cannot undo changes after COMMIT execution.
➢ Active State: When the operations of a transaction are running then the transaction is
said to be active state. If all the read and write operations are performed without any error then
it progresses to the partially committed state, if somehow any operation fails, then it goes to a
state known as failed state.
➢ Partially Committed: After all the read and write operations are completed, the
changes which were previously made in the main memory are now made permanent in the
database, after which the state will progress to committed state but in case of a failure it will
go to the failed state.
➢ Failed State: If any operation during the transaction fails due to some software or
hardware issues, then it goes to the failed state . The occurrence of a failure during a transaction
makes a permanent change to data in the database. The changes made into the local memory
data are rolled back to the previous consistent state.
➢ Aborted State: If the transaction fails during its execution, it goes from failed
state to aborted state and because in the previous states all the changes were only made in the
main memory, these uncommitted changes are either deleted or rolled back. The transaction at
this point can restart and start afresh from the active state.
➢ Committed State: If the transaction completes all sets of operations successfully, all
the changes made during the partially committed state are permanently stored and the
transaction is stated to be completed, thus the transaction can progress to finally get terminated
in the terminated state.
➢ Terminated State: If the transaction gets aborted after roll-back or the transaction
comes from the committed state, then the database comes to a consistent state and is ready for
further new transactions since the previous transaction is now terminated.
There are four major properties that are vital for a transaction to be successful. These are used
to maintain state consistency in the database, both before and after the transaction. These are
called ACID properties.
1. Atomicity: This property means that either the transaction takes place completely at
once or doesn’t happen at all. There is no middle option, i.e., transactions do not occur
partially. Each transaction is considered as one single step which either runs
completely or is not executed at all.
2. Consistency: This property means that the integrity constraints of a database are
maintained so that the database is consistent before and after the transaction. It refers
to the correctness of a database.
Example: Let us consider an example where one person is trying to book a ticket. They
are able to reserve their seat but their payment hasn’t gone through due to bank issues.
In this case, their transaction is rolled back. But just doing that isn’t sufficient. The
number of available seats must also be updated. Otherwise, if it isn’t updated, there will
be an inconsistency where the seat given up by the person is not accounted for. Hence,
the total sum of seats left in the train + the sum of seats booked by users would not be
equal to the total number of seats present in the train if not for consistency.
3. Isolation: This property means that multiple transactions can occur concurrently
without causing any inconsistency to the database state. These transactions occur
independently without any external interference. Changes that occur in a particular
transaction are not visible/ accessible to any other transaction until that particular
change in that transaction has been committed.
Example : Let us use our example of 2 people trying to book the same seat to
understand this ACID Property. Two transactions are happening at the same item on the
same database, but in isolation. To ensure that one transaction doesn’t affect the other,
they are serialized by the system. This is done so as to maintain the data in a consistent
state. Let us consider that the two people that click ‘Book Now’, do so with a gap of a
few seconds. In that case, the first person’s transaction goes through and he/she receives
their ticket as well. The second person will not know of the same until the first person’s
transaction is committed to memory. When the second person clicks on ‘Book Now’
and is redirected to the payment gateway, since the first person’s seat has already been
booked, it will show an error notifying the user that no seats are left on the train.
4. Durability: This property ensures that once the transaction has completed execution,
the updates and modifications to the database are stored in and written to disk and
they remain intact even if a system failure occurs. These updates become permanent
and are stored in the non-volatile memory.
Imagine a system failure or crash occurs in the railway management system and all the trains
that everyone had booked have been removed from the system. That means millions of users
would have paid the money for their seats but will not be able to board the train as all the details
have been destroyed. This could lead to huge losses for the company as users would lose trust
in them. In addition, it would create a lot of panics as these trains would be needed for important
reasons as well.
5. SCHEDULING
Types of Schedules
1. Serial Schedule
2. Concurrent (Non-serial Schedule)
Further, they are divided into their subcategories, as shown below.
Serial Schedule:
As the name says, all the transactions are executed serially one after the other. In serial
Schedule, a transaction does not start execution until the currently running transaction finishes
execution. This type of execution of the transaction is also known as non-interleaved execution.
Serial Schedule are always recoverable, cascades, strict and consistent. A serial schedule
always gives the correct result.
Consider two transactions T1 and T2 shown above, which perform some operations. If it has
no interleaving of operations, then there are the following two possible outcomes - Either
execute all T1 operations, which were followed by all T2 operations. Or execute
all T2 operations, which were followed by all T1 operations. In the above figure, the Schedule
shows the serial Schedule where T1 is followed by T2, i.e. T1 -> T2. Where R(A) -> reading
some data item ‘A’. And, W(B) -> writing/updating some data item ‘B’. If n = number of
transactions, then a number of serial schedules possible = n!
Therefore, for the above 2 transactions, a total number of serial schedules possible = 2.
In this Schedule, there are two transactions, T1 and T2, executing concurrently. The
operations of T1 and T2 are interleaved. So, this Schedule is an example of a Non-Serial
Schedule.
Total number of non-serial schedules = Total number of schedules – Total number of serial
schedules
Non-serial schedules are further categorized into serializable and non-serializable schedules.
Let's now discuss further Serializability.
6. SERIALIZABILITY IN DBMS
Serializability in DBMS is a concept that helps to identify which non-serial schedules are
correct and will maintain the consistency of the database. A serializable schedule always
leaves the database in a consistent state.
Types of Serializability:
1. Conflict Serializability
2. View Serializability
Conflict Serializability
To conclude, let’s take two operations on data: "a". The conflicting pairs are:
1. READ(a) - WRITE(a)
2. WRITE(a) - WRITE(a)
3. WRITE(a) - READ(a)
Note: There can never be a read-read conflict as there is no change in the data
Example:
We have a schedule "S" having two transactions t1, t2, and t3 working simultaneously.
Since there is a loop present, the schedule is non-conflicting serializable schedule. Now, there
are blind writes [t2 -> w(x) and t3 -> w(x)] present, hence check for View Serializability.
Non-Serializability in DBMS
1. Recoverable Schedule
2. Non-recoverable Schedule
Recoverable Schedule
A schedule is recoverable if each transaction commits only after all the transactions from
which it has read have committed. In other words, if some transaction Ty reads a value that
has been updated/written by some other transaction Tx, then the commit of Ty must occur
after the commit of Tx.
The schedule shown above is Recoverable since T1 commits before T2, which makes the
value read by T2 correct.
Cascading Schedule
Cascadeless Schedule
If in a schedule, a transaction is not allowed to read a data item until and unless the last
transaction that has been written is committed/aborted, then such a schedule is called
a Cascadeless Schedule. It avoids cascading rollback and thus saves CPU time. To prevent
cascading rollbacks, it disallows a transaction from reading uncommitted changes from another
transaction in the same Schedule. In other words, if some transaction Ty wants to read a value
that has been updated or written by some other transaction Tx, then only after the commit of Tx,
the commit of Ty must read it. Look at the example shown below.
Here, the updated value of X is read by transaction T2 only after the commit of transaction T1.
Hence, the Schedule is Cascadeless schedule.
Strict Schedule
If in a schedule, until the last transaction that has written it is committed or aborted,
a transaction is neither allowed to read nor write data item, then such a schedule is called
as Strict Schedule. Let's say we have two transactions Ta and Tb. The write operation of
transaction Ta precedes the read or write operation of transaction Tb, so
the commit or abort operation of transaction Ta should also precede the read or write of Tb. A
strict Schedule allows only committed read and write operations. This Schedule implements
more restrictions than cascadeless schedule. Consider an example shown below.
Here, transaction Tb reads/writes the written value of transaction Ta only after the transaction
Ta commits. Hence, the Schedule is a strict Schedule.
Non-Recoverable Schedule
If a transaction reads the value of an operation from an uncommitted transaction and commits
before the transaction from where it has read the value, then such a schedule is called Non-
Recoverable schedule. A non-recoverable schedule means when there is a system failure, we
may not be able to recover to a consistent database state. If the commit operation of Ti doesn't
occur before the commit operation of Tj, it is non-recoverable.
Consider the following Schedule involving two transactions T1 and T2. T2 read the value of A
written by T1, and committed. T1 might later abort/commit; therefore, the value read by T2 is
wrong, but since T2 committed, this Schedule is non-recoverable.
Transactions are defined using SQL statements known as transaction control statements. The
most common transaction control statements are:
1. BEGIN TRANSACTION: Starts a new transaction.
2. COMMIT: Saves the changes made within a transaction to the database permanently.
Once committed, the changes become visible to other transactions.
3. ROLLBACK: Undoes all the changes made within a transaction, reverting the
database to its state before the transaction began.
4. SAVEPOINT: Defines a point within a transaction to which you can roll back later if
needed.
For example, let's consider a scenario where you want to transfer funds from one bank account
to another. You would want to ensure that the withdrawal from the source account and the
deposit into the destination account happen as a single atomic operation. In this case, you would
wrap the withdrawal and deposit operations within a transaction using BEGIN
TRANSACTION, COMMIT, and ROLLBACK statements. If any part of the transaction fails
(e.g., insufficient funds), you can roll back the transaction and ensure that neither the
withdrawal nor the deposit is applied to the database.
By using transactions, you can maintain data consistency, handle concurrent access, and
provide reliability and durability in database operations.
8. IMPLEMENTATION OF ISOLATION
Isolation in transaction refers to the degree to which the changes made by a transaction are
visible to other concurrent transactions. In SQL, isolation is implemented through isolation
levels, which define the behavior of transactions with respect to concurrent access and data
consistency. There are different isolation levels available, and the choice of isolation level
depends on the specific requirements of the application.
1. Read Uncommitted: This is the lowest isolation level where transactions can see
uncommitted changes made by other transactions. It allows dirty reads, meaning a
transaction can read data that has been modified by another uncommitted transaction.
This level provides the highest concurrency but sacrifices consistency.
2. Read Committed: In this isolation level, a transaction can only see committed changes
made by other transactions. It prevents dirty reads by waiting for a transaction to
commit before accessing its changes. However, it still allows non-repeatable reads,
meaning the same query executed multiple times within a transaction may return
different results if other transactions modify the data.
3. Repeatable Read: This isolation level ensures that within a transaction, the same query
executed multiple times will always return the same result. It prevents dirty reads and
non-repeatable reads by acquiring shared locks on the accessed data. Other transactions
can still read the data but cannot modify it until the first transaction completes.
4. Serializable: This is the highest isolation level that provides strict consistency. It
guarantees that transactions are executed as if they were serialized one after another,
even though they may run concurrently. Serializable isolation level prevents dirty reads,
non-repeatable reads, and phantom reads. It achieves this by acquiring range locks or
table locks to restrict access to the data being read or modified.
The isolation level for a transaction can be set using SQL statements specific to the database
system. For example, in many database systems, you can use the SET TRANSACTION
ISOLATION LEVEL statement to set the isolation level for a transaction.
It's important to note that higher isolation levels generally come with increased overhead and
potential for blocking or deadlocks, as they impose stricter restrictions on concurrent access.
Therefore, it's crucial to choose an appropriate isolation level based on the requirements of the
application and the potential trade-offs in terms of performance and concurrency.
Two operations inside a schedule are called conflicting if they meet these three conditions:
To conclude, let’s take two operations on data: "a". The conflicting pairs are:
1. READ(a) - WRITE(a)
2. WRITE(a) - WRITE(a)
3. WRITE(a) - READ(a)
Let's take an example of schedule "S" having three transactions t1, t2, and t3 working
simultaneously, to get a better understanding.
Non-serializable schedule. R(x) of T1 conflicts with W(x) of T3, so there is a directed
edge from T1 to T3. R(y) of T1 conflicts with W(y) of T2, so there is a directed edge from
T1 to T2. W(y\x) of T3 conflicts with W(x) of T1, so there is a directed edge from T3 to T.
Similarly, we will make edges for every conflicting pair. Now, as the cycle is formed, the
transactions cannot be serializable.
It's important to note that serializability testing is typically performed by database management
systems internally, and users rarely need to perform these tests manually. The purpose of these
tests is to ensure that the concurrent execution of transactions does not introduce any data
integrity issues or inconsistencies.
There are two commonly used types of locks in lock-based concurrency control:
Shared Lock (S-lock): Data can only be read when a shared lock is applied. Data cannot be
written. It is denoted as lock-S.
Exclusive Lock (X-lock): Data can be read as well as written when an exclusive lock is
applied. It is denoted as lock-X.
Compatability table
Request
S X
S YES NO
X NO NO
grant
Problems in S/X locking protocol:
Growing Phase: In the growing phase, the transaction only obtains the lock. The transaction
can not release the lock in the growing phase. Only when the data changes are committed the
transaction starts the Shrinking phase.
Shrinking Phase: Neither are locks obtained nor they are released in this phase. When all the
data changes are stored, only then the locks are released.
In the above figure, we can see that in the growing phase, all the locks are obtained till the point
when all the locks needed by the transactions are acquired. This pint is called LockPoint. After
the lock point, the transaction enters the Shrinking phase.
Disadvantage :
1. Strict two-phase locking protocol: it should satisfy the basic 2PL and all exclusive
locks should hold until commit/abort.
• The transaction can release the shared lock after the lock point.
• The transaction can not release any exclusive lock until the transaction
commits
• In strict two-phase locking protocol, if one transaction rollback then the other
transaction should also have to roll back. The transactions are dependent on
each other. This is called Cascading schedule.
Advantages :
2. Rigorous two-phase locking protocol: it should satisfy the basic 2PL and all shared,
exclusive locks should hold until commit/abort.
• The transaction cannot release either of the locks, i.e., neither shared lock nor
exclusive lock.
• Serailizability is guaranteed in a Rigorous two-phase locking protocol
• Deadlock is not guaranteed in rigorous two-phase locking protocol
3. Conservative two-phase locking protocol
• The transaction must lock all the data items it requires in the transaction
before the transaction begins.
• If any of the data items are not available for locking before execution of the
lock, then no data items are locked
• The read-and-write data items need to know before the transaction begins.
This is not possible normally
• Conservative two-phase locking protocol is deadlock-free
• Conservative two-phase locking protocol does not ensure a strict schedule.
11. CONCURRENCY CONTROL BY TIMESTAMPS AND VALIDATION:
1. Timestamp Ordering:
Read time stamp (RTS) =last (latest) transaction which performed read successfully.
Write time stamping (WTS) = last (latest)transaction which performed write successfully.
Eg : WTS(20)
➢ The timestamps are used to determine the relative order of transactions. A transaction
with a lower timestamp is considered to have occurred earlier than a transaction with
a higher timestamp.
➢ When a transaction wants to read a data item, it records the timestamp of the
transaction from which it reads the item.
➢ When a transaction wants to write a data item, it checks if any other transaction with a
higher timestamp has already read or written that item. If so, it aborts to maintain
consistency.
Rules:
Validation:
The key idea behind concurrency control by timestamps and validation is to allow
transactions to proceed concurrently, but ensure that their execution order is consistent with
a serial execution order. By using timestamps, the system can identify conflicting
transactions and resolve conflicts by aborting and restarting some of them.
➢ This approach provides good concurrency as transactions can execute in parallel as long
as there are no conflicts. However, it may lead to transaction restarts and aborts, which
can impact performance in cases of high contention or frequent conflicts.
➢ It's worth noting that concurrency control mechanisms based on timestamps and
validation are just one of the many techniques used in database systems. Other
approaches, such as locking protocols, snapshot isolation, and multiversion
concurrency control, offer different trade-offs in terms of concurrency, isolation, and
performance. The choice of concurrency control mechanism depends on the specific
requirements and characteristics of the application.
IMPORTANT QUESTIONS