KEMBAR78
Data Base Systems | PDF | Databases | Computer File
0% found this document useful (0 votes)
52 views179 pages

Data Base Systems

The document outlines the course material for a Database Systems class in the Mechanical Engineering department at Malla Reddy College of Engineering & Technology for the academic year 2024-25. It includes the course objectives, structure, outcomes, and educational policies aimed at preparing students for industry challenges. Additionally, it details program outcomes, specific outcomes, and educational objectives to ensure students develop necessary skills and knowledge in mechanical engineering and database systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views179 pages

Data Base Systems

The document outlines the course material for a Database Systems class in the Mechanical Engineering department at Malla Reddy College of Engineering & Technology for the academic year 2024-25. It includes the course objectives, structure, outcomes, and educational policies aimed at preparing students for industry challenges. Additionally, it details program outcomes, specific outcomes, and educational objectives to ensure students develop necessary skills and knowledge in mechanical engineering and database systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 179

COURSE MATERIAL

III Year B. Tech II- Semester


MECHANICAL ENGINEERING
AY: 2024-25

DATA BASE SYSTEMS


R22A0553

Prepared by:
Mr. SOMA VIVEKANANDA
Asst. Professor

MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY


DEPARTMENT OF MECHANICAL ENGINEERING
(Autonomous Institution-UGC, Govt. of India)
Secunderabad-500100, Telangana State, India.
www.mrcet.ac.in
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)

VISION
❖ To establish a pedestal for the integral innovation, team spirit, originality and
competence in the students, expose them to face the global challenges and become
technology leaders of Indian vision of modern society.

MISSION
❖ To become a model institution in the fields of Engineering, Technology and
Management.
❖ To impart holistic education to the students to render them as industry ready
engineers.
❖ To ensure synchronization of MRCET ideologies with challenging demands of
International Pioneering Organizations.

QUALITY POLICY

❖ To implement best practices in Teaching and Learning process for both UG and PG
courses meticulously.

❖ To provide state of art infrastructure and expertise to impart quality education.

❖ To groom the students to become intellectually creative and professionally


competitive.

❖ To channelize the activities and tune them in heights of commitment and sincerity,
the requisites to claim the never - ending ladder of SUCCESS year after year.

For more information: www.mrcet.ac.in


MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
www.mrcet.ac.in
Department of Mechanical Engineering

VISION

To become an innovative knowledge center in mechanical engineering through state-of-


the-art teaching-learning and research practices, promoting creative thinking
professionals.

MISSION

The Department of Mechanical Engineering is dedicated for transforming the students


into highly competent Mechanical engineers to meet the needs of the industry, in a
changing and challenging technical environment, by strongly focusing in the
fundamentals of engineering sciences for achieving excellent results in their professional
pursuits.

Quality Policy

 To pursuit global Standards of excellence in all our endeavors namely teaching,


research and continuing education and to remain accountable in our core and
support functions, through processes of self-evaluation and continuous
improvement.

 To create a midst of excellence for imparting state of art education, industry-


oriented training research in the field of technical education.
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
www.mrcet.ac.in
Department of Mechanical Engineering
PROGRAM OUTCOMES
Engineering Graduates will be able to:
1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering
fundamentals, and an engineering specialization to the solution of complex engineering
problems.
2. Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.
3. Design/development of solutions: Design solutions for complex engineering problems
and design system components or processes that meet the specified needs with
appropriate consideration for the public health and safety, and the cultural, societal, and
environmental considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and
research methods including design of experiments, analysis and interpretation of data,
and synthesis of the information to provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent
responsibilities relevant to the professional engineering practice.
7. Environment and sustainability: Understand the impact of the professional engineering
solutions in societal and environmental contexts, and demonstrate the knowledge of, and
need for sustainable development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.
9. Individual and teamwork: Function effectively as an individual, and as a member or
leader in diverse teams, and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as, being able to comprehend and
write effective reports and design documentation, make effective presentations, and give
and receive clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member
and leader in a team, to manage projects and in multidisciplinary environments.
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
www.mrcet.ac.in
Department of Mechanical Engineering
12. Life-long learning: Recognize the need for and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological
change.

PROGRAM SPECIFIC OUTCOMES (PSOs)


PSO1 Ability to analyze, design and develop Mechanical systems to solve the
Engineering problems by integrating thermal, design and manufacturing Domains.

PSO2 Ability to succeed in competitive examinations or to pursue higher studies or


research.

PSO3 Ability to apply the learned Mechanical Engineering knowledge for the
Development of society and self.

Program Educational Objectives (PEOs)


The Program Educational Objectives of the program offered by the department are broadly
listed below:

PEO1: PREPARATION

To provide sound foundation in mathematical, scientific and engineering fundamentals


necessary to analyze, formulate and solve engineering problems.

PEO2: CORE COMPETANCE

To provide thorough knowledge in Mechanical Engineering subjects including theoretical


knowledge and practical training for preparing physical models pertaining to Thermodynamics,
Hydraulics, Heat and Mass Transfer, Dynamics of Machinery, Jet Propulsion, Automobile
Engineering, Element Analysis, Production Technology, Mechatronics etc.

PEO3: INVENTION, INNOVATION AND CREATIVITY

To make the students to design, experiment, analyze, interpret in the core field with the help of
other inter disciplinary concepts wherever applicable.

PEO4: CAREER DEVELOPMENT

To inculcate the habit of lifelong learning for career development through successful completion
of advanced degrees, professional development courses, industrial training etc.
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
www.mrcet.ac.in
Department of Mechanical Engineering

PEO5: PROFESSIONALISM

To impart technical knowledge, ethical values for professional development of the student to
solve complex problems and to work in multi-disciplinary ambience, whose solutions lead to
significant societal benefits.
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
www.mrcet.ac.in
Department of Mechanical Engineering

Blooms Taxonomy
Bloom’s Taxonomy is a classification of the different objectives and skills that educators set for
their students (learning objectives). The terminology has been updated to include the following
six levels of learning. These 6 levels can be used to structure the learning objectives, lessons,
and assessments of a course.

1. Remembering: Retrieving, recognizing, and recalling relevant knowledge from long‐ term
memory.
2. Understanding: Constructing meaning from oral, written, and graphic messages through
interpreting, exemplifying, classifying, summarizing, inferring, comparing, and explaining.
3. Applying: Carrying out or using a procedure for executing or implementing.
4. Analyzing: Breaking material into constituent parts, determining how the parts relate to
one another and to an overall structure or purpose through differentiating, organizing, and
attributing.
5. Evaluating: Making judgments based on criteria and standard through checking and
critiquing.
6. Creating: Putting elements together to form a coherent or functional whole; reorganizing
elements into a new pattern or structure through generating, planning, or producing.
MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution – UGC, Govt. of India)
www.mrcet.ac.in
Department of Mechanical Engineering
COURSE STRUCTURE R22

MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY


III Year B. TECH - II- SEM L/T/P/C
3/-/-/3
(R22A0553) Database Systems
(OPEN ELECTIVE-II)
COURSE OBJECTIVES:

1. To understand the basic concepts and the applications of database systems


2. To Master the basics of SQL and construct queries using SQL
3. To understand the relational database design principles
4. To become familiar with the basic issues of transaction processing and concurrency
control
5. To become familiar with database storage structures and access techniques.

UNIT I
INTRODUCTION
Database: Purpose of Database Systems, File Processing System Vs DBMS, History,
Characteristic- Three schema Architecture of a database, Functional components of a
DBMS, DBMS Languages- Database users and DBA.

UNIT II
DATABASE DESIGN
ER Model: Objects, Attributes and its Type. Entity set and Relationship set-Design Issues of
ER model-Constraints, Keys-primary key, Super key, candidate keys. Introduction to
relational model-Tabular, Representation of Various ER Schemas. ER Diagram
Notations Goals of ER Diagram- Weak Entity Set- Views

UNIT III
STRUCTURED QUERY LANGUAGE
SQL: Overview, The Form of Basic SQL Query -UNION, INTERSECT, and EXCEPT– join
operations: equi join and non equi join -Nested queries - correlated and uncorrelated
Aggregate Functions-Null values. Views, Triggers.

UNIT IV
DEPENDENCIES AND NORMAL FORMS
Importance of a good schema design, - Problems encountered with bad schema designs,
Motivation for normal forms- functional dependencies, -Armstrong's axioms for FD's Closure
of a set of FD’s, - Minimal covers-Definitions of 1NF,2NF, 3NF and BCNF Decompositions and
desirable properties

UNIT V
Transactions: Transaction concept, transaction state, System log, Commit point, Desirable
Properties of a Transaction, concurrent executions, serializability, recoverability,
implementation of isolation, transaction definition in SQL, Testing for serializability,
Serializability by Locks- Locking Systems with Several Lock Modes- Concurrency Control by
Timestamps, validation.

Malla Reddy College of Engineering & Technology www.mrcet.ac.in


COURSE STRUCTURE R22

TEXT BOOKS:
1. Abraham Silberschatz, Henry F. Korth, S. Sudarshan,‖ Database System Concepts‖,
McGraw-Hill, 6th Edition , 2010.
2. Fundamental of Database Systems, by Elmasri, Navathe, Somayajulu, and Gupta, Pearson
Education

REFERENCE BOOKS:
1. Raghu Ramakrishnan, Johannes Gehrke, ―Database Management System‖, McGraw
Hill., 3rd Edition 2007.
2. Elmasri& Navathe, ‖Fundamentals of Database System, ‖ Addison-Wesley Publishing,
5thEdition, 2008.
3. Date.C.J, ―An Introduction to Database, Addison-Wesley Pub Co, 8th Edition, 2006.
4. Peterrob, Carlos Coronel, ―Database Systems – Design, Implementation, and
Management, 9th Edition, Thomson Learning, 2009

COURSE OUTCOMES:
At the end of this course, students will be able to :
1. Understand the basic concepts and the applications of database systems
2. Master the basics of SQL and construct queries using SQL.
3. Understand the relational database design Principles.
4. Familiarize with the basic issues of transaction processing and concurrency
control.
5. Familiarize with database storage structures and access techniques.

Malla Reddy College of Engineering & Technology www.mrcet.ac.in


1. INTRODUCTION TO DBMS

Data: The facts that can be recorded and have implicit meaning are known as 'data'.

Example: Student data

e.g., ELE201, RAM, 19 etc.

Information: meaningful/processed data

Record: Collection of related data items, e.g., in the above example, the three data items had
no meaning. However, if we organize them in the following way, they collectively represent
meaningful information.

ROLL NAME AGE


ELE201 RAM 19

Table or Relation: Collection of related records.

ROLL NAME AGE


ELE201 RAM 19
ELE202 JOSEPH 18
ELE203 HUSSAIN 20

➢ The columns of this relation are called Fields, Attributes or Domains. The rows are
called Tuples or Records.

Database: It is the collection of related relations. Consider the following collection of tables:

Table 1. Table 2.

ROLL NAME AGE ROLL Address


ELE201 RAM 19 ELE201 Delhi
ELE202 JOSEPH 18 ELE202 hyd
ELE203 HUSSAIN 20 ELE203 Karnataka
Table 3. Table 4.

ROLL DOB ROLL hostel


ELE201 1-1-2004 ELE201 H1
ELE202 1-1-2005 ELE202 H2
ELE203 1-1-2003 ELE203 H3

➢ We now have a collection of 4 tables. They can be called a "related collection" because
we can clearly find that some common attributes exist in a selected pair of tables.
Because of these common attributes, we may combine the data of two or more tables
to determine a student's complete details. Questions like "Which hostel does the
youngest student live in?" can be answered now, although Age and hostel attributes are
in different tables. Many people with different responsibilities could view a database in
a DBMS.
➢ For example, a company has different departments and customers, each needing to see
different kinds of data. Each employee in the company will have different levels of
access to the database with their customized front-end application. Data is organized
strictly in a row and column format in a database. The rows are called Tuples or records.
The data items within one row may belong to different data types. On the other hand,
the columns are often called Domain or Attribute. All the data items within a single
attribute are of the same data type.

Figure 1. Employees are accessing Data through DBMS


1. Database Systems:

Database system" refers collectively to the database model and database management system
(DBMS). Physically, database servers are dedicated computers that hold the actual databases
and run only the DBMS and related software.

Figure 1. Database system architecture

A database is a collection of related relations or tables. The database is in the form of structural
and unstructured. Structural Database has a particular format, like a table format. An
unstructured database does not have any particular format. A database may be generated and
manipulated manually, or it may be computerized.

Database Management System (DBMS):

➢ A database-management system (DBMS) is a set of software programs to collect


interrelated data and access those data. The collection of data, usually referred to as the
database, contains information relevant to an enterprise.
➢ DBMS is a collection of operations. e.g., insertion, delete, update
➢ The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient.
2. PURPOSE OF DATABASE SYSTEMS / CHARACTERISTICS OF DBMS

The primary purpose of database systems is to store, manage and retrieve data in an organized
and efficient manner. They provide a centralized repository for data, making it easier for users
to access and manage the information.

1. The goal of a database management system (DBMS) is to transform the following:

a. Data into Information.

b. Information into Knowledge.

c. Knowledge of the action.

The diagram below explains how data transformation to Information to Knowledge to action
happens respectively in the DBMS.

2. Storage of large database and retrieval of data

Database systems arose in response to early methods of computerized commercial data


management. As an example of such methods, typical of the 1960s, consider part of a university
organization that, among other data, keeps the information about all instructors, students,
departments, and course offerings. One way to keep the information on a computer is to store
it in operating system files. To allow users to manipulate the information, the system has
several application programs that manipulate the files, including programs to:

• Add new students, instructors, and courses


• Register students for courses and generate class rosters
• Assign grades to students, compute grade point averages (GPA), and generate
transcripts
A conventional operating system supports this typical file-processing system. The system
stores permanent records in various files and needs different application programs to extract
and add records to the appropriate files. Before database management systems (DBMSs) were
introduced, organizations usually stored information in such systems. Keeping organizational
information in a file-processing system has several significant disadvantages:

1. Data organization: In a FPS, data is organized into separate files, whereas in a


DBMS, data is organized into a centralized database that can be easily accessed and
managed.
2. Data redundancy: FPS can lead to data redundancy, as the same data may be stored
in multiple files, whereas DBMS avoids data redundancy by centralizing the data in
one database.
3. Data inconsistency: FPS may lead to data inconsistency; the various copies of the
same data may no longer agree. For example, a changed student address may be
reflected in the Music department records but not elsewhere in the system.
4. Data integrity: FPS systems do not provide mechanisms to enforce data integrity,
while DBMSs provide constraints and rules to ensure that data entered into the
database is consistent and accurate.
5. Data isolation: Because data are scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate data is difficult.
6. Concurrent access: FPS allows only limited users, and DBMS allows multiple users
simultaneously.
7. Data security: FPS systems provide limited security, while DBMSs provide multiple
levels of security, such as user authentication and authorization, to ensure that sensitive
data is protected.
8. Data sharing: FPS systems do not provide mechanisms for sharing data among
multiple users, while DBMSs allow multiple users to access and modify data
simultaneously.
9. Data scalability: FPS systems can become unwieldy as the amount of data grows,
while DBMSs are designed to scale as the amount of data grows.
10. Performance: FPS systems can become slow as data grows, while DBMSs are
designed to be highly performant even with large amounts of data.
11. Maintenance: FPS systems require manual intervention to update and maintain data,
while DBMSs provide automatic mechanisms for updating and maintaining data.

In summary, while FPS systems are simpler to implement and suitable for small-scale data
management, DBMSs are more suitable for larger-scale data management, as they provide
more robust data organization, integrity, security, and sharing capabilities.

3. Advantages of database systems


There are several advantages of using database systems, including:
1. Data Organization: Databases provide a centralized location for storing and
organizing data, making it easier for users to access and manage information.
2. To avoid data redundancy: FPS can lead to data redundancy, as the same data
may be stored in multiple files, whereas DBMS avoids data redundancy by
centralizing the data in one database.
3. Data Consistency: Databases ensure that data is consistent and accurate, reducing
the risk of data corruption and inconsistencies.
4. Data Integrity: Databases provide mechanisms to maintain the integrity of data,
such as constraints and transactions, which ensure that data remains accurate and
consistent.
5. Data Security: Databases provide mechanisms to secure the data, such as
encryption and access control, to prevent unauthorized access and manipulation of
data.
6. Data Scalability: Databases can handle large amounts of data and can be easily
scaled up to accommodate growth.
7. Data Sharing: Databases provide a centralized repository of data, making it easy
for multiple users to access and share information.
8. Improved Data Accessibility: Databases make it easier for users to access and
retrieve data, reducing the time and effort required to manage the information.
9. Improved Data Backup and Recovery: Databases provide mechanisms for
backing up and recovering data, ensuring that data is not lost in the event of a system
failure.
10. Improved Data Analysis: Databases provide tools for analyzing data, making it
easier for organizations to make informed decisions based on their data.
Overall, database systems provide a powerful and efficient solution for managing data,
improving data consistency, security, and accessibility, and enabling organizations to make
better decisions based on their data.

4. Disadvantages of database systems

There are several disadvantages to using a database system:

1. Cost: Implementing and maintaining a database system can be expensive, especially


for organizations with large amounts of data and complex data relationships.
2. Complexity: Database systems can be complex to set up and use, requiring
specialized knowledge and expertise.
3. Performance: As the size of a database grows, the Performance of database
operations can degrade, leading to slow queries and application response times.
4. Limited flexibility: DBMSs can be rigid and inflexible, making it difficult to
accommodate changes in data structures or requirements.
5. Technology Dependency: DBMSs can create a dependency on technology, making
it difficult to continue operations in the event of a system failure or outage
6. Vendor lock-in: Choosing a proprietary database system can result in vendor lock-
in, limiting the ability to switch to another database system if needed.

5. DBMS APPLICATIONS:
Database Management Systems (DBMS) applications exist in various industries and fields.
Some common examples include.
1. Financial Systems: Banks, stock market companies, and other financial organizations use
DBMS to manage their transactions and financial data.
2. Healthcare Systems: Hospitals, clinics, and other healthcare organizations use DBMS to
store patient information, manage appointments, and keep track of medical records.
3. Retail Systems: Retail companies use DBMS to manage their inventory, sales, and
customer data.
4. Educational Systems: Schools and universities use DBMS to manage student
information, grades, and other academic records.
5. Human Resources Systems: Companies use DBMS to store and manage employee
information, payroll data, and performance records.
6. Transportation Systems: Airlines, shipping companies, and other transportation
organizations use DBMS to manage their scheduling, routing, and billing information.
7. Government Systems: Governments use DBMS to store and manage information such as
census data, tax records, and voting records.
8. Manufacturing Systems: Manufacturing companies use DBMS to manage their supply
chain, production, and inventory information

6. File Processing System Vs DBMS:

File Processing systems (FPS) and Database Management Systems (DBMS) are two different
methods of storing and managing data.

FILE SYSTEM:

The file system is basically a way of arranging the files in a storage medium like a hard disk.
The file system organizes the files and helps in the retrieval of files when they are required.
File systems consist of different files which are grouped into directories. The directories further
contain other folders and files. The file system performs basic operations like management, file
naming, giving access rules, etc.

Example: NTFS (New Technology File System), EXT (Extended File System).

Figure: File system


DBMS (Database Management System):

Database Management System is software that manages the collection of related data. It
is used for storing data and retrieving it effectively when needed. It also provides proper
security measures for protecting the data from unauthorized access. In Database
Management System, the data can be fetched by SQL queries and relational algebra. It
also provides mechanisms for data recovery and data backup.
Example: Oracle, MySQL.

Figure: DBMS
Here are some key differences between the two:
Basis FPS DBMS
structure The file system is software that DBMS is software for
manages and organizes the files in a managing the database
storage medium within a computer.
Data Redundancy Redundant data can be present in a In DBMS, there is no
file system. redundant data
Consistency There is less data consistency in the There is more data
file system consistency because of the
process of normalization
Backup and It doesn’t provide backup and It provides backup and
Recovery recovery of data if it is lost. recovery of data even if it
is lost.
Complexity It is less complex as compared to It has more complexity in
DBMS handling as compared to
the file system.
Query processing There is no efficient query Efficient query processing
processing in the file system. is there in DBMS.
Cost It is less expensive than DBMS. It has a comparatively
higher cost than a file
system.
Data Independence In the File system approach, there Data Independence exists,
exists no Data Independence and it can be of two types.
• Logical Data
Independence
• Physical Data
Independence

User Only one user can access data at a Multiple users can access
Access/concurrent time. data at a time.
access

Data Sharing Data is distributed in many files. So, Due to its centralized
not easy to share data nature, sharing is easy
Data Abstraction It gives details of storage and It hides the internal details
representation of data of the Database
Security File systems provide less security in DBMS has more security
Constraints comparison to DBMS mechanisms as compared
to file systems.
Integrity Integrity Constraints are difficult to Integrity constraints are
Constraints implement easy to implement
Example Cobol, C++ Oracle, SQL Server

In summary, while FPS systems are simpler to implement and suitable for small-scale data
management, DBMSs are more suitable for larger-scale data management, as they provide
more robust data organization, integrity, security, and sharing capabilities.

7. History
• The history of Database Management Systems (DBMS) can be traced back to the
1960s, when the first commercially available DBMSs were introduced. The early
DBMSs were designed to address the needs of businesses to manage and store large
amounts of data.
• Data is a collection of facts and figures. The data collection was increasing daily, and
they needed to be stored in a device or safer software. Charles Bachman was the first
to develop the Integrated Data Store (IDS) based on the network data model for which
he was inaugurated with the Turing Award (The most prestigious award which is
equivalent to the Nobel prize in the field of Computer Science.)
1. The 1960s: The first commercially available DBMSs, such as IBM's Information
Management System (IMS), were introduced in the 1960s. These systems were designed
to manage large amounts of data in mainframe computer systems.
2. The 1970s: The relational database model was introduced in the 1970s, with the release of
the first relational DBMS, IBM's System R. This model organized data into tables with
rows and columns, making it easier to query and manipulate data.
3. The 1980s: The 1980s saw the rise of personal computers and the introduction of the first
DBMSs for the PC, such as dBase and FoxPro. These systems made it easier for small
businesses to manage their data.
4. The 1990s: The 1990s saw the introduction of client-server DBMSs, which allowed
multiple users to access and modify data stored in a centralized database server. This era
also saw the rise of open-source DBMSs, such as MySQL and PostgreSQL.
5. The 2000s: The 2000s saw the rise of cloud computing and the introduction of cloud-based
DBMSs, such as Amazon Relational Database Service (RDS) and Microsoft Azure
Database. These systems made it easier for businesses to store and manage data in the
cloud.
6. In conclusion, the history of DBMS has been marked by the introduction of new
technologies and innovations, which have made it easier for businesses to store, manage,
and analyze large amounts of data. Today, DBMSs are widely used by organizations of all
sizes to support a wide range of applications, from data management to business
intelligence and data analytics.

8. Three schema architecture/ View of Data/ Data Abstraction

A database system is a collection of interrelated data and software programs allowing users to
access and modify these data. A major purpose of a database system is to provide users with
an abstract view of the data. The system hides certain details of how the data are stored and
maintained.
Data Abstraction:

➢ Data abstraction is a fundamental concept in computer science and database


management. It refers to hiding the implementation details of a system or data structure
and presenting only the necessary information to the user.
➢ In database management, data abstraction is implemented through database schemas,
which define the structure of the data stored in the database. The schema defines the
tables, columns, and relationships between the tables, but it does not reveal how the
data is actually stored or processed. This abstraction layer provides users with a
simplified view of the data, making it easier to interact with the database and make
changes to the data without affecting the underlying implementation.

Three schema architecture

The three-schema architecture of a database refers to a design pattern that separates the
conceptual view, the external view, and the internal view of a database into three separate
schemas. These schemas provide a way to represent and understand the data structure in a
database in a clear and organized manner.

The three-schema architecture can be visualized as three layers: the external schema at the top,
the conceptual schema in the middle, and the internal schema at the bottom. The layers can be
considered a series of abstractions, with the external schema representing the highest level of
abstraction and the internal schema representing the lowest level of abstraction.

1. External schema/View Level: The external schema defines how different user groups see
the database. It describes the view of the data presented to end-users and application
programs and defines the level of abstraction and detail they see.
The highest level of abstraction, which provides a user-defined view of the data, hiding
any irrelevant or complex details and allowing the user to interact with the data in a
simplified way.
2. Conceptual schema/logical level:
• The next level of abstraction, which provides a view of the data as it is organized
into tables, columns, and relationships, without any concern for how it is physically
stored.
• This schema represents the overall logical view of the database and defines the
relationships between the different data entities.
• It describes the data structure in a high-level, abstract manner, independent of any
specific data storage technology or implementation details.
• The conceptual schema represents what data is stored in a database.
3. Internal schema/Physical level:
• The internal schema defines the physical storage structure of the database and
provides implementation details such as data storage format, access methods, and
file organization.
• It describes how the data is actually stored on disk, and it provides the information
required to access and manipulate the data.

Figure 3. Levels of Abstraction in a DBMS

The three-schema architecture is useful because it allows the database designer to clearly
separate the database's different aspects and consider each aspect independently. This leads to
a more organized and maintainable design and makes it easier to change the database design
as needed over time.

Each schema builds upon the schema below it. The external schema uses the definitions and
relationships defined in the conceptual, and the internal schema uses the definitions and
relationships defined in the external schema. The different schemas are connected by
mappings, which define the relationships between the different views of the data.
What is Data Independence in DBMS?

The ability to modify the schema definition of a DBMS at one level, without affecting the
schema definition of the next higher level is called data independence.

There are two levels of data independence based on three levels of abstraction. These are as
follows −
• Physical Data Independence
• Logical Data Independence.

Physical Data Independence: Physical data independence refers to the ability to modify the
physical storage structures or devices without affecting the logical view or schema of the
database. It allows changes in the storage structures, such as reorganizing files, adding indexes,
or changing storage systems, without impacting the application programs that use the data. This
type of independence is primarily the concern of database administrators and system developer.

Logical Data Independence: Logical data independence refers to the ability to modify the
logical schema of the database without impacting the external schema or application programs.
It allows changes to the organization or structure of the data, such as adding or modifying
tables, without affecting the way data is accessed or manipulated by applications. Logical data
independence is essential for accommodating changes in business requirements or evolving
data models without disrupting existing applications.

9. Database Languages
✓ Database languages, also known as query languages or data query languages, are a
classification of programming languages that developers use to define and access
databases, which are collections of organized data that users can access electronically.
✓ These languages allow users to complete tasks such as controlling access to data,
defining and updating data and searching for information within the database
management system (DBMS).
✓ A DBMS is a piece of technology that interacts with users, applications and the database
to record and analyze data while also manipulating the database to offer a way to store,
access and retrieve data.
✓ Different examples of database languages are available, including SQL, the standard
programming language for many databases. Database languages comprise four
sublanguages that serve different functions to execute tasks.
1. Data Definition Language (DDL)
➢ Data Definition Language (DDL) is a set of special commands that allows us to
define and modify the database's structure and metadata. These commands can be
used to create, modify, and delete the database structures such as schema, tables,
indexes, etc.
➢ Since DDL commands can alter the structure of the whole database and every
change implemented by a DDL command is auto-committed (the change is saved
permanently in the database), these commands are normally not used by an end-
user (someone who is accessing the database via an application).

Here are some tasks that come under DDL:

Create: It is used to create objects in the database.

Alter: It is used to alter the structure of the database.

Drop: It is used to delete objects from the database.

Truncate: It is used to remove all records from a table.

Rename: It is used to rename an object.


Comment: It is used to comment on the data dictionary.

➢ These commands are used to update the database schema that's why they come
under Data definition language.
2. Data Manipulation Language (DML)
➢ Data Manipulation Language (DML) is a set of special commands allowing us
to access and manipulate data stored in schema objects. These commands are
used to perform certain operations such as insertion, deletion, updation, and
retrieval of the data from the database.
➢ These commands deal with user requests as they are responsible for all types of
data modification. The DML commands that deal with data retrieval are known
as Data Query language.

There are basically two types:

• Procedural DMLs require a user to specify what data are needed and how to get those data.

• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what
data are needed without specifying how to get those data.

Here are some tasks that come under DML:

Select: It is used to retrieve data from a database.


Insert: It is used to insert data into a table.
Update: It is used to update existing data within a table.
Delete: It is used to delete all records from a table.
Merge: It performs UPSERT operation, i.e., insert or update operations.
Call: It is used to call a structured query language or a Java subprogram.
Explain Plan: It has the parameter of explaining data.
Lock Table: It controls concurrency.

3. Data Control Language (DCL)


➢ Data Control Language (DCL) is a set of special commands used to control the
user privileges in the database system. The user privileges include ALL,
CREATE, SELECT, INSERT, UPDATE, DELETE, EXECUTE, etc.
➢ We require data access permissions to execute any command or query in the
database system. This user access is controlled using the DCL statements. These
statements are used to grant and revoke user access to data or the database.

Here are some tasks that come under DCL:

Grant: It is used to give user access privileges to a database.


Revoke: It is used to take back permissions from the user.

There are the following operations which have the authorization of Revoke:

CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.

4. Transaction Control Language (TCL)


➢ Transaction Control Language (TCL) is a set of special commands that deal with
the transactions within the database. A transaction is a collection of related tasks
that are treated as a single execution unit by the DBMS software. Hence,
transactions are responsible for the execution of different tasks within a database.
➢ The modifications performed using the DML commands are executed or rollbacked
with the help of TCL commands. These commands are used to keep a check on
other commands and their effects on the database.
Here are some tasks that come under TCL:
Commit: It is used to permanently save all the modifications permanently are done (all
the transactions) by the DML commands in the database. Once issued, it cannot
be undone.
Rollback: It is used to undo the transactions that have not already been permanently saved
(or committed) to the database.

➢ In practice, a combination of DDL, DML, TCL and DCL is used as a single database
language.

10. Different types of Database Users

➢ Any person who uses a database and avails benefits from the database is known
as database user in DBMS. Database users in DBMS can access the database and
retrieve the data from the database using applications and interfaces provided by
the Database Management System (DBMS).
➢ Database users in DBMS can be categorized based on their interaction with the
databases. According to the tasks performed by the database users on the databases, we
can categorize them into actors on the scene and workers behind the scene.

I. Actors on the scene


1. Database Administrator (DBA):

✓ Database Administrator (DBA) is a person/team who manages all resources in


the database system. In a database system, the primary source is a database and
the secondary source is DBMS and related software. The DBA defines the
schema and also controls the three levels of database.
✓ Database Administrators (DBAs) have full control of the database and are
sometimes known as the super-users of the database. They work alongside
developers in order to discuss and design the overall structure of the database,
including layouts, functionalities, workflow, etc
The main responsibilities of DBA are:
➢ Administering primary and secondary resources
➢ Authorizing access to the database
➢ Coordinating and monitoring the use of database
➢ Acquiring hardware and software resources or needed.
a. Design of the conceptual and physical schemas: The DBA is responsible for
interacting with the system's users to understand what data will be stored in the DBMS
and how it will likely be used. The DBA creates the original schema by writing a set of
definitions and is Permanently stored in the 'Data Dictionary'.
b. Security and Authorization: The DBA is responsible for not permitting unauthorized
data access. The granting of different types of authorization allows the DBA to regulate
which parts of the database various users can access
c. Storage structure and Access method definition: The DBA creates appropriate
storage structures and access methods by writing a set of definitions translated by the
DDL compiler.
d. Data Availability and Recovery from Failures: The DBA must ensure that users can
continue accessing as much uncorrupted data as possible if the system fails. The DBA
also works to restore the data to a consistent state.
e. Database Tuning: The DBA is responsible for modifying the database to ensure
adequate Performance as requirements change.
f. Integrity Constraint Specification: The integrity constraints are kept in a special
system structure that is consulted by the DBA whenever an update takes place in the
system.
2. Database Designers :
✓ Database Designers are the users who design the database structure, including
tables, indexes, views, triggers, stored procedures, and constraints that are
usually enforced before the database is created or populated with data.
✓ He/she controls what data must be stored and how the data items are related.
✓ Database designers are responsible for understanding the requirements of
different user groups and then creating a design that satisfies the needs of all the
user groups.
✓ Database designers create views for different groups of users.
3. End users: People whose jobs require access to the database. They are querying,
updating, and generating reports. There are different types of end users given below:
3.1 Casual Users / Temporary Users: Casual Users are the users who occasionally
use/access the database, but each time when they access the database, they require
new information, for example, a Middle or higher-level manager.
3.2 Naive / Parametric End Users: Parametric End Users are the unsophisticated who
don’t have any DBMS knowledge but frequently use the database applications daily
to get the desired results. For example, Railway’s ticket booking users are naive
users. Clerks in any bank are naive users because they don’t have any DBMS
knowledge but still use the database and perform their given tasks.
3.3 Sophisticated Users: Sophisticated users can be engineers, scientists, or business
analysts familiar with the database. They can develop their own database
applications according to their requirement. They don’t write the program code but
they interact the database by writing SQL queries directly through the query
processor
3.4 Standalone users: The people who maintain the personnel database using ready-
made packages.
4. System Analyst: A system Analyst is a user who analyzes the requirements of
parametric end users. They check whether all the requirements of end users are
satisfied.

5. Application Programmers: Application Programmers, also called System Analysts or


simply Software Engineers, are the back-end programmers who write the code for the
application programs. They are computer professionals. These programs could be
written in Programming languages such as Visual Basic, Developer, C, FORTRAN,
COBOL etc. Application programmers design, debug, test, and maintain a set of
programs called “canned transactions” for the Naive (parametric) users to interact with
the database.

II. Workers behind the scene :


1. System designers & implementers design and implement DBMS modules and
interfaces as packages.
2. Tool developers: people who design and implement tools (software packages)
3. Operators and Maintainance person: responsible for running and maintaining
hardware and software.
11. Database Architecture/ DBMS Components

➢ The architecture of a database system is greatly influenced by the underlying


computer system on which the database system runs.
➢ Database systems can be centralized or client servers, where one server machine
executes work on behalf of multiple client machines.

Figure 11.1 Database System Architecture


➢ A database system is partitioned into modules that deal with each of the
responsibilities of the overall system.
➢ The functional components of a database system can be broadly divided into the
storage manager and the query processor components. The storage manager is
important because databases typically require much storage space. The query
processor is important because it helps the database system simplify and facilitate
access to data. It is the job of the database system to translate updates and queries
written in a nonprocedural language, at the logical level, into an efficient sequence
of operations at the physical level.
➢ Database applications are usually partitioned into two or three parts, as in Figure
11.2. In a two-tier architecture, the application resides at the client machine, where
it invokes database system functionality at the server machine through query
language statements. Application program interface standards like ODBC and
JDBC are used for interaction between the client and the server.
➢ In contrast, in a three-tier architecture, the client machine is merely a front end and
does not contain direct database calls. Instead, the client end communicates with
an application server through a forms interface.
The application server communicates with a database system to access data. The
application's business logic, which says what actions to carry out under what conditions,
is embedded in the application server instead of distributed across multiple clients. Three-
tier applications are more appropriate for large applications and applications that run on
the World Wide Web.

Figure 11.2 Two-tier and three-tier architectures

Functional components of the DBMS architecture :

Query Processor:

The query processor receives and interprets user queries written in a query language (such as
SQL) and converts them into an efficient execution plan. It analyzes the query, optimizes it for
efficient execution, and interacts with other components to retrieve and manipulate the
requested data

The query processor components include:

• DDL interpreter: which interprets DDL statements and records the definitions in
the data dictionary.
• DML compiler: which translates DML statements in a query language into an
evaluation plan consisting of low-level instructions that the query evaluation engine
understands.
• A query can usually be translated into several alternative evaluation plans with the
same result. The DML compiler also performs query optimization, picking the
lowest cost evaluation plan from among the alternatives.
• Query evaluation engine: which executes low-level instructions generated by the
DML compiler.

Storage Manager:

• A storage manager is a program module that provides the interface between the
lowlevel data stored in the database and the application programs and queries
submitted to the system.
• The storage manager is responsible for the interaction with the file manager.
The raw data are stored on the disk using the file system, which is usually
provided by a conventional operating system.
• The storage manager translates the various DML statements into low-level file-
system commands. Thus, the storage manager is responsible for storing,
retrieving, and updating data in the database.

The storage manager components include:

• Authorization and integrity manager: it tests for the satisfaction of integrity


constraints and checks the authority of users to access data.
• Transaction manager: it ensures that the database remains in a consistent
(correct) state despitesystem failures, and that concurrent transaction executions
proceed without conflicting.
• File manager: which manages the allocation of space on disk storage and the
data structures used to represent information stored on disk.
• Buffer manager: it is responsible for fetching data from disk storage into main
memory, and deciding what data to cache in main memory. The buffer manager
is a critical part of the database system, since it enables the database to handle
data sizes that are much larger than the size of main memory.
Disc Storage Component:

• The disc or data storage component is responsible for physically storing the
database on storage devices such as hard disks or solid-state drives.
• It manages the allocation and organization of data on the storage media, including
data files, indexes, and other structures. The data storage component ensures
efficient data retrieval and storage space management.

Data Dictionary :

We can define a data dictionary as a DBMS component that stores the definition of data
characteristics and relationships. You may recall that such “data about data” were labeled
metadata.

The DBMS data dictionary provides the DBMS with its self describing characteristic. In
effect, the data dictionary resembles and X-ray of the company’s entire data set, and is a
crucial element in the data administration function.
Important questions
1. a) Differentiate between a file processing system and DBMS.
b) Explain about the architecture of DBMS

2. a) Explain about various database users and administrators in DBMS.

b. Explain the types of DBMS languages.

3. a. Explain about the three-schema architecture of DBMS.

b. What are the Functional components of a DBMS.

4. a) Explain the advantages of DBMS.

b. Explain the role of a Database administrator.

5. a. List the differences between a file processing system and DBMS.

b. Define DBMS. Explain the various Database languages with examples.

6. a. Explain the three levels of abstraction.

b. Explain applications of DBMS and different data models.

7. a) Explain DBMS System Architecture.

b) What is View and schema and explain in detail

8. a) Explain DBMS applications and Database design.

b) Define DBMS. Explain the various Database languages with examples.


DBMS

Chapter 2 : DATABASE DESIGN

Syllabus :
1. ER Model - Objects, Attributes and its Type.
2. Entity set and Relationship set
3. Design Issues of ER model-Constraints.
4. Keys-primary key, super key, candidate keys.
5. Introduction to relational model-Tabular,
6. Representation of Various ER Schemas.
7. ER Diagram Notations
8. Goals of ER Diagram

9. Weak Entity Set

10. Views
1. Data Models
➢ Data models in DBMS help to understand the design at the conceptual, physical, and logical
levels. It provides a clear picture of the data making it easier for developers to create a physical
database.
➢ Data models describe how the data is stored, accessed, and updated in a DBMS. A set of
symbols and text is used to represent them so that all the members of an organization can
understand how the data is organized.

Many types of data models are used in the industry.

Types of Data Models in DBMS:

1. Entity-Relationship Model (ER Model)


• An Entity-Relationship model is a high-level data model that describes the structure of the
database in a pictorial form which is known as ER-diagram. In simple words, an ER
diagram is used to represent logical structure of the database easily.
• ER model develops a conceptual view of the data hence it can be used as a blueprint to
implement the database in the future.
• Developers can easily understand the system just by looking at ER diagram. Let's first have a
look at the components of an ER diagram.
• Entity - Anything that has an independent existence about which we collect the data. They are
represented as rectangles in the ER diagram
• Entity Set - A set of the same type of entities is known as an entity set. For example - Set of
students studying in a college.
• Attributes - Properties that define entities are called attributes. They are represented by an
ellipse shape.
• Relationships - A relationship in DBMS is used to describe the association between
entities. They are represented as diamond or rhombus shapes in the ER diagram.
In the above-represented ER diagram, we have three entities that are professor, section and course, and
the relationship among them.

2. Relational Model
This is the most widely accepted data model. In this model, the database is represented as
a collection of relations in the form of rows and columns of a two-dimensional table. Each row
is known as a tuple (a tuple contains all the data for an individual record) while each column
represents an attribute.

The above table shows a relation "STUDENT" with attributes such as Roll Number, Name, and
CGPA which consists of 4 records or tuples.
3. Object-Oriented Model
As suggested by its name, the object-oriented data model is a combination of object-oriented
programming and relational data model. In this data model, the data and their relationship are
represented in a single structure which is known as an object.
Since data is stored as objects, we can easily store audio, video, images, etc in the database
which was very difficult and inconvenient to do in the relational model. As shown in the image
below two objects are connected with each other through links.

In the above image, we have two objects, Employee and Department, in which all the data is
contained in a single unit (object). They are linked with each other as they share a common
attribute. Department_Id.
4. Semi structured model

The semi-structured model is a database model where there is no separation between the data and
the schema, and the amount of structure used depends on the purpose.A semi-structured data model is a
generalized form of the relational model, which allows representing data in a flexible way, hence we can
not differentiate between data and schema in this model because, in this model, some entities have a
missing attribute(s) and on the other hand, some entities might have some extra attribute(s) which in turn
makes it easy to update the schema of the database.

For example - We can say a data model is semi-structured if in some attributes we are storing both
atomic values (values that can't be divided further, for example, Roll_No) as well as a collection of
values..

5. Hierarchical Model:

The hierarchical data model is one of the oldest, developed in the 1950s by IBM. In this data model, the
data is organized in a hierarchical tree-like structure. This data model can be easily visualized because
each record in DBMS has one parent and many children (possibly 0), as shown in the image below.
The above-given image represents the data model of the Vehicle database, vehicle are classified into two
types Viz. two-wheelers and four-wheelers and then they are further classified.

The main drawback we can see here is we can only have one too many relationships under this model,
hence the hierarchical data model is very rarely used nowadays.

6. Network Model
In this model, data is organized in a graph-like structure where records can have multiple parent
and child records. A network model is nothing but a generalization of the hierarchical data
model as this data model allows many to many relationships; therefore, in this model, a record
can also have more than one parent.
The network model in DBMS can be represented as a graph, and hence it replaces the
hierarchical tree with a graph in which object types are the nodes and relationships are the
edges.

Here you can see all three departments are linked with the director, which was impossible in the
hierarchical data model.
In the network model, there can be many possible paths to reach a node from the root node (College is
the root node in the above case). Therefore, the data can be accessed efficiently compared to the
hierarchical data model. But, on the other hand, the process of insertion and deletion of data is quite
complex.

2. ER -Model
➢ A Database is used to store our data, however, before making the actual database we should
specify all the requirements of the database in the form of a diagram known as an entity-
relationship diagram which makes it easier to create a database.
➢ For making an ER diagram, we need to first identify the different field/entities that should be
present in a database, the attributes that describe these entities, and the type of relationship that
exist between them.
➢ All this information is collected and formulated in the form of a diagram known as an entity-
relationship diagram (ERD). Therefore, this ER diagram is a logical view of the database and
describes its structure of the database.

What is ER Diagram in DBMS ?

➢ An Entity Relationship Diagram in DBMS is a blueprint of the database that can be later
implemented as an actual database in the form of tables. It is a "diagrammatic representation
of the database".
➢ An ER diagram is used for creating a systematic study of the data requirements and then
making a proper database. It is considered to be one of the best practices before implementing
an actual database.
➢ An ER diagram is made up of three components, an entity, its attributes that describe the
entities such as the color, and price are some of the attributes of a Car entity, and the
relationship that exists between these entities. These ER diagrams are used to show the
relationships among various entities in a database.
I. Entity
➢ An entity is referred to as an object or thing that physical /conceptual exists in the real world.
For example, customer, car, pen, bank account etc.
➢ Entities are stored in the database, and they should be distinguishable, i.e., they should be easily
identifiable from the group.
➢ In ER modeling, rectangle is used to represent the entities in an entity-relationship diagram.
ENTITY

Kinds of Entity:

There are two kinds of entities, which are as follows:

a. Tangible Entity:

It is an entity in DBMS, which is a physical object that we can touch or see. In simple words, an entity
that has a physical existence in the real world is called a tangible entity.

For example, in a database, a table represents a tangible entity because it contains a physical object that
we can see and touch in the real world. It includes colleges, bank lockers, mobiles, cars, watches, pens,
paintings, etc.

b. Intangible Entity:

It is an entity in DBMS, which is a non-physical object that we cannot see or touch. In simple words, an
entity that does not have any physical existence in the real world is known as an intangible entity.

For example, a bank account logically exists, but we cannot see or touch it.

1. Entity Type:

A collection of entities with general characteristics is known as an entity type. It refers to the category
that a particular entity belongs to.

➢ The category of a particular entity in the relation in RDBMS is called the entity type.
➢ It is represented by the name of the table and its schema

For example, a database of a corporate company has entity types such as employees, departments, etc.
In DBMS, every entity type contains a set of attributes that explain the entity.

The Employee entity type can have attributes such as name, age, address, phone number, and salary.

The Department entity type can have attributes such as name, number, and location in the department.

Kinds of Entity Type

There are two kinds of entity type, which are as follows:

a. Strong Entity Type:


➢ It is an entity that has its own existence and is independent. .A strong entity will always have a
primary key.
➢ The entity relationship diagram represents a strong entity type with the help of a single
rectangle. Below is the ERD of the strong entity type:

ENTITY

In the above example, the "Customer" is the entity type with attributes such as ID, Name, Gender, and
Phone Number. Customer is a strong entity type as it has a unique ID for each customer

b. Weak Entity Type:

➢ It is an entity that does not have its own existence and relies on a strong entity for its existence.
unlike a strong entity, a weak entity does not have any primary key. It instead has a partial
discriminator key
➢ The Entity Relationship Diagram represents the weak entity type using double rectangles.
Below is the ERD of the weak entity type:
In the above example, "Address" is a weak entity type with attributes such as House No., City, Location,
and State.The relationship between a strong and a weak entity type is known as an identifying
relationship.

Using a double diamond, the Entity-Relationship Diagram represents a relationship between the strong
and the weak entity type. Let us see an example of the relationship between the Strong entity type and
weak entity type with the help of ER Diagram:

2. Entity Set

An entity set is a group of entities of the same entity type.For example, an entity set of students, an entity
set of motorbikes, an entity of smartphones, an entity of customers, etc.
II. Attribute
➢ The attribute is used to describe the property/characterstics of an entity. Eclipse is used to
represent an attribute.

Attribute

Suppose we have several students in a class; we can uniquely identify a student by the properties
associated with a student, like a student roll number, name, branch, age, etc. These properties that
describe an entity are attributes for students.

Types of attributes in DBMS

1. Single-valued Attribute

Single valued attributes are properties that have a single particular value for a particular entity. This
means that single-valued attributes don't have more than one value for an item.

For example, attributes like age, roll number, and gender associated with the student can only have a
single value for an individual student.
2. Multi-valued Attribute

Attributes that can have multiple sets of values for a single entity are known as Multi-valued attributes.
They are enclosed inside double bounded ovals in the ER diagram (ER diagram stands for entity
relation diagram that displays the relationship of entity sets stored in a database.)

For example, attributes like phone_number, email_id, and hobbies are categorized as multi-valued
attributes because a single student in the class can have more than one phone number, email_id, and
hobbies

3. Simple/Atomic Attribute

Such attributes that cannot be further sub-divided into several attributes are called simple attributes. For
example, attributes like the current class and branch of a student have an absolute value, and they cannot
be further sub-divided into other parts like we divided full_name into first_name, middle_name,
and last_name.

4. Composite Attribute

As the name suggests, those attributes composed of many other simple attributes are called composite
attributes in DBMS.

Properties like student_name can be further divided into sub-parts like first_name, middle_name,
and last_name where each sub-part can form an independent attribute. Similarly, address property can
be composed of state, city, country etc.
5. Stored attribute:

The stored attribute are those attribute which doesn’t require any type of further update since they are
stored in the database.

Example: DOB(Date of birth) is the stored attribute.

6. Derived attribute

Such attributes whose value can be derived from other attributes of the entity are called derived
attributes in DBMS. Valued stored in derived attributes can be obtained from processing other
attributes.

For example, properties like age of a student can easily be obtained by subtracting the current day from
the date of birth attribute of the student entity. So, age is a derived attribute in the student table

7. Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents a primary key.
The key attribute is represented by an ellipse with the text underlined.
8. Complex Attribute

Nesting multi-valued and composite attributes together form a complex attribute. Complex attributes are
formed by arbitrarily nesting the composite and multi-valued attributes.

For example, a person owns more than one house, and each house has more than one phone number.
Then, the attribute phone is represented as a complex attribute.

III. Relationships
➢ Any Associations/connection between entities are called relationships . In ER
modeling, relationship is represented by diamond shape.

➢ Example : An employee works for an organization. Here "works for" is a relation


between the entities employee and organization.
➢ However in ER Modeling, To connect a weak Entity with others, you should use a
weak relationship notation as given below.
1. Degree of a Relationship

In DBMS, the degree of relationship is the number of entities that have taken part in the relationship.
We can analyze the E-R diagram and tell the degree of relationship between various entities. The degree
of relationship in DBMS can be categorized as Unary, Binary, Ternary or N-ary relationship.

there are 4 types of degrees of relationship based on the involved entities-

➢ Unary relationship
➢ Binary relationship
➢ Ternary relationship
➢ N-ary relationship
1. Unary relationship

As the name suggests, in a unary relationship, only one entity is involved. There is only a single entity
set, i.e., the association is between a single entity. Here, the degree of relationship is 1. The unary
relationship is also known as a recursive relationship.

Example: In the college presidential elections, a president is chosen among the students. He/she leads
the entire community and looks after the student-centric activities. Even though he/she is the president,
but after all is a student. So we can say that there is only one entity i.e., the student.

2. Binary relationship
You must have rightly guessed that in a binary relationship, there are two entities involved.
There are two entities that are participating in the relationship. The degree of relationship is 2.
This is the most common type of relationship.
Example- Each Indian citizen has their own Aadhar Card so we can say that citizen and Aadhar
Card are two entities involved.

Citizen Has an Adhar

3. Ternary relationship
In a ternary relationship, there are three entities involved. There are three entities that are
participating in the relationship created. The degree of the relationship is 3. As in this
relationship, the number of entities increases, it is a bit difficult to convert it to a relational table.

Teacher

subject teaches student

4. N-ary Relationship
This is a generalized form of a relationship when the n number of total entities are participating.
When there is an n number of involved entities, we can say that they have an N-ary relationship.
As there are many entities involved, it is hard to convert them to a relational table. It is an
uncommon relationship that is not as widely used as a binary relationship. The degree of
relationship is 'n'
Example- Let us consider the example of a university. It has many entities like students,
teachers, affiliated colleges, courses, etc. Here, there are many entities associated with the
university

students relation courses

Teachers colleges location


Relationship constraints

Constraints are used for modeling limitations on the relations between entities.
There are two types of constraints on the Entity Relationship (ER) model −
I. Mapping cardinality or cardinality ratio.
II. Participation constraints.

i. Cardinality ratio

It is expressed as the number of entities to which another entity can be associated via a
relationship set.
For the binary relationship set there are entity set A and B then the mapping cardinality can be one of
the following −
• One-to-one
• One-to-many
• Many-to-one
• Many-to-many

1. One-to-one relationship (1:1)

An entity set A is associated with at most one entity in B and an entity in B is associated with at
most one entity in A.

Employee 1 manages 1 Department

Here one employee manages one department only.

2. One-to-many

In One-to-Many cardinality, an entity in set A relates to many occurrences of another entity in set B. In
other words, an entity in set A can be linked with any number of entities in set B while an entity in set
B can be linked with only one entity in set A.
Department 1 have M employees

3. Many -to-one
In Many-to-One Cardinality, multiple occurrences of an entity in set A can relate to one
occurrence of another entity in set B. In other words, an entity in set B can be linked with at
most one entity in set A while an entity in set A can be linked with any number of entities in set
B.

employees M works 1 Department

4. Many-to-Many (M:N)
In Many-to-Many cardinality, multiple occurrences in an entity set can relate to multiple
occurrences of another entity set. In other words, an entity in set A can link to any number of
entities in set B while an entity in set B can link to any number of entities in set A.
student M studies N subject

ii. Participation Constraints

It specifies weather existence of an entity depends on its being related to another entity.

Participate constraints are two types as mentioned below −


• Total participation
• Partial Participation

The participation constraints are explained in the diagram below -

a. Total participation
The participation of an entity set E in a relationship set R is said to be total if every entity in E
Participates in at least one relationship in R.
For Example − Participation of loan in the relationship borrower is total participation.

b. Partial Participation
If only some of the entities in E participate in relationship R, then the participation of E in R is said to be
partial participation.
For example − Participation of customers in the relationship borrower is partial participation.
In ER model the total participation is represent double line . below example department is total
participation and employee is partial participation.
Partial partcipation Total Participation
employee Manages department

E1
D1
E2
D2
E3
D3
E4

Keys in dbms

➢ Keys play an important role in the relational database.


➢ It is used to uniquely identify any record or row of data from the table. It is also used to
establish and identify relationships between tables.
➢ Keys in DBMS are introduced to avoid redundancy in data. A table represents a big box
with unique records, and keys help to identify those records efficiently.
➢ Keys in DBMS can be single or a group of attributes that uniquely identify the records.
Using all attributes as a key is less efficient than choosing the minimum combination of
attributes that provide the same result.

Different Types of Keys in the Relational Model


1. Super key
2. Candidate key
3. Primary key
4. Alternate key
5. Unique key
6. Foreign key
7. Composite key
1. Super key
➢ A super key is a set of all the keys (with single or multiple attributes) which can
uniquely identify the records of the table.
➢ It supports NULL values

Example : Employee table

ID Name SSN/Adhar Salary Phone Email


101 A S1 10k P1 E1@gmail.com
102 B S2 5k P2 E2@gmail.com
103 C S3 - P3 -
104 A S4 10k P1 E1@gmail.com
105 D S5 6k - E4@gmail.com

From Employee { ID, Name, SSN,Salary,phone,Email} we can give superkeys like :

Superkeys : {ID},{SSN},
{ID,Name},{D,SSN},{ID,Phone},{ID,Email},{Name,Phone},{SSN,Name}

{SSN,Email},{SSN,Phone},{SSN,Salary},{Name,Email,Phone},

{Name,Salary,Phone},{Name,salary,Phone,Email},{ID,SSN,Name,Phone,,salary,
Email},……..etc.

2. Candidate Key: The minimal set of attributes that can uniquely identify a tuple is known as a
candidate key.
➢ It is a super key with no repeated data is called a candidate key.
➢ The minimal set of attributes that can uniquely identify a record.
➢ It must contain unique values.
➢ It can contain NULL values.
➢ Every table must have at least a single candidate key.
➢ A table can have multiple candidate keys but only one primary key

Candidate keys : {ID},{SSN},{Name,Phone},{Email}.

3. Primary key
There can be more than one candidate key in relation out of which one can be chosen as the
primary key.

Primary key = (unique+No Null)

➢ it is a unique key.
➢ It can identify only one tuple (a record) at a time.
➢ It has no duplicate values, it has unique values.
➢ It cannot be NULL.
➢ Primary keys are not necessarily to be a single column; more than one column can also
be a primary key for a table.

From the employee table candidate keys the key {ID} has chosen as primary key.

Primary key : {ID} .

4. Alternate key :

The candidate key other than the primary key is called an alternate key

➢ All the keys which are not primary keys are called alternate keys.
➢ It is a secondary key.
➢ It contains two or more fields to identify two or more records

Alternate key : {SSN},{Name,Phone},{Email}.

5. Unique key

Unique key looks very similar to primary key but they’re not and have a very different usage
altogether. Once we set an attribute as a unique key then its column value should be unique,
although only ONE value could be null.

Unique key = (Unique+ Null values)


Unique key = {Name,Phone} ,{Email}
6. Composite key

sometimes, a table might not have a single column/attribute that uniquely identifies all the
records of a table. To uniquely identify rows of a table, a combination of two or more
columns/attributes can be used. It still can give duplicate values in rare cases. So, we need to
find the optimal set of attributes that can uniquely identify rows in a table.

➢ It acts as a primary key if there is no primary key in a table


➢ Two or more attributes are used together to make a composite key.
➢ Different combinations of attributes may give different accuracy in terms of identifying
the rows uniquely.

Composite key = {name,phone}

7. Foreign Key
A foreign key is a column or columns of data in one table that refers to the unique data values
often the primary key data in another table. Foreign keys link together two or more tables in
a relational database.

➢ It is a key it acts as a primary key in one table and it acts as


secondary key in another table.
➢ It combines two or more relations (tables) at a time.
➢ They act as a cross-reference between the tables

In the above tables student marks table ID is the foreign key because it has taken reference
from student details table.

ER-Design Issues :

The users often mislead the concept of the elements and the design process of the ER diagram. Thus, it
leads to a complex structure of the ER diagram and certain issues that does not meet the characteristics
of the real-world enterprise model.

Here, we will discuss the basic design issues of an ER database schema in the following points:

1. Use of Entity Set vs Attributes

In the real world situations, sometimes it is difficult to select the property as an attribute
or an entity set.

Is Telephone_no an Attribute or entity set ?


2. Use of Entity sets vs. Relationship sets
Sometimes, an entity set can be better expressed in relationship set. Thus, it is not
always clear whether an object is best expressed by an entity set or a relationship set.

3. Binary vs. n-ary relationship sets


Relationships in databases are often binary. Some relationships that appear to be non-
binary could actually be better represented by several binary relationships
Example1 :
Example 2:

Student_Id Sub_ID
date

student submit submission data

Name
assignment
name
Due date

url

data

Sub_ID Due
Student_Id name
date date

submissi assignme
student submit complete
on nt

Name url

It is always possible to replace a non-binary relationship set by a number of distinct binary


relationship sets. For example, consider a ternary relationship R associated with three entity
sets A, B and C.

We can replace the relationship set R by an entity set E and create three relationship sets as: •
RA, relating E and A • RB, relating E and B • RC, relating E and C If the relationship set R had
any attributes, these are assigned to entity set E. A special identifying attribute is created for
E.

4. Placement of Relationship Attributes


The cardinality ratio of a relationship can affect the placement of relationship
attributes:
• One-to-Many: Attributes of 1:M relationship set can be repositioned to only the
entity set on the many side of the relationship
• One-to-One: The relationship attribute can be associated with either one of the
participating entities
• Many-to-Many: Here, the relationship attributes can not be represented to the entity
sets; rather they will be represented by the entity set to be created for the relationship
set
ER diagram symbols & Notations

Since we know the entity-relationship diagram has entities, attributes, and the relationship between
the data. However, all these components of the ER diagram are represented with the help of certain
symbols. Three symbols are used to make an ER diagram, which are, rectangle, oval, and diamond.

However, all other symbols that are used to make an ER diagram are sub-elements and are based on
the main elements of the ER diagram.

Let us discuss each symbol and its usage in an ER diagram

Rectangle : It is used to represent the entities in an entity-relationship diagram

Ellipses/Oval : This symbol is used to represent the attributes in an entity-relationship diagram

Diamond: This symbol is used to represent the type of relationship that exists between the entities such
as one-to-one, many-to-one, and many-to-many

Lines : It links the entities to the relationship types whereas the attributes to the entity types.

Double Ellipses: It is used to represent a multivalued attribute

Doted eclips : It is used to represent a derivedd attribute

Double rectangle : It is used to represent a weak entity.

Double diamond : It is used to represent a weak relationship between entities

✓ To represent a primary key in an entity-relationship diagram, the attributes are underlined


Relational Model

The relational model for database management is an approach to logically represent and manage the
data stored in a database. In this model, the data is organized into a collection of two-dimensional
inter-related tables, also known as relations. Each relation is a collection of columns and rows,
where the column represents the attributes of an entity and the rows (or tuples) represents the record.

The use of tables to store the data provided a straightforward, efficient, and flexible way to store and
access structured information. Because of this simplicity, this data model provides easy data sorting and
data access. Hence, it is used widely around the world for data storage and processing.

Consider a case where you wish to store the name, the CGPA attained, and the roll number of all the
students of a particular class. This structured data can be easily stored in a table as described below:

As we can notice from the above relation :

✓ Any given row of the relation indicates a student i.e., the row of the table describes a real-world
entity.
✓ The columns of the table indicate the attributes related to the entity. In this case, the roll
number, CGPA, and the name of the student.
➢ A database implemented and organized in terms of the relational model is known as a relational
database management system (RDBMS). Hence, the relational model describes how data is
stored in relational databases.

Relational model components:

a relational database is based on the relational model. This database consists of various components
based on the relational model. These include:
Relation : Two-dimensional table used to store a collection of data elements

Tuple : Row of the relation, depicting a real-world entity

Attribute/Field : Column of the relation, depicting properties that define the relation.

Attribute Domain : Set of pre-defined atomic values that an attribute can take i.e., it describes the legal
values that an attribute can take.

Degree : It is the total number of attributes present in the relation

Cardinality : It specifies the number of entities involved in the relation i.e., it is the total number of rows
present in the relation. Read more about Cardinality in DBMS

Relational Schema : It is the logical blueprint of the relation i.e., it describes the design and the
structure of the relation. It contains the table name, its attributes, and their types:

TABLE_NAME(ATTRIBUTE_1 TYPE_1, ATTRIBUTE_2 TYPE_2, ...)

For our Student relation example, the relational schema will be:

STUDENT(ROLL_NUMBER INTEGER, NAME VARCHAR(20), CGPA FLOAT)

Relational Instance : It is the collection of records present in the relation at a given time

Relation Key : It is an attribute or a group of attributes that can be used to uniquely identify an entity in
a table or to determine the relationship between two tables. Relation keys can be different types.

1. Super key
2. Candidate key
3. Primary key
4. Alternate key
5. Unique key
6. Composite key
7. Foreign key

Constraints on Relational database model


These are the restrictions or sets of rules imposed on the database contents. It validates the quality of the
database. It validates the various operations like data insertion, updation, and other processes which have
to be performed without affecting the integrity of the data. It protects us against threats/damages to the
database. Mainly Constraints on the relational database are of 4 types.

1. Domain constraints
2. Key constraints or Uniqueness Constraints
3. Entity Integrity constraints
4. Referential integrity constraints

1. Domain Constraints
➢ Every domain must contain atomic values(smallest indivisible units) which means
composite and multi-valued attributes are not allowed.
➢ We perform a datatype check here, which means when we assign a data type to a column
we limit the values that it can contain. Eg. If we assign the datatype of attribute age as int,
we can’t give it values other than int datatype.

Explanation: In the above relation, Name is a composite attribute and Phone is a multi-values attribute,
so it is violating domain constraint.

2. Key Constraints or Uniqueness Constraints

➢ These are called uniqueness constraints since it ensures that every tuple in the relation should be
unique.
➢ A relation can have multiple keys or candidate keys(minimal superkey), out of which we choose
one of the keys as the primary key, we don’t have any restriction on choosing the primary key out
of candidate keys, but it is suggested to go with the candidate key with less number of attributes.
➢ Null values are not allowed in the primary key, hence Not Null constraint is also part of the key
constraint.

Explanation: In the above table, EID is the primary key, and the first and the last tuple have the
same value in EID ie 01, so it is violating the key constraint.
3. Entity Integrity Constraints:

➢ Entity Integrity constraints say that no primary key can take a NULL value, since using
the primary key we identify each tuple uniquely in a relation.

Explanation: In the above relation, EID is made the primary key, and the primary key can’t take
NULL values but in the third tuple, the primary key is null, so it is violating Entity Integrity
constraints.
4. Referential Integrity Constraints

➢ The Referential integrity constraint is specified between two relations or tables and used to
maintain the consistency among the tuples in two relations.
➢ This constraint is enforced through a foreign key, when an attribute in the foreign key of
relation R1 has the same domain(s) as the primary key of relation R2, then the foreign key of
R1 is said to reference or refer to the primary key of relation R2.
➢ The values of the foreign key in a tuple of relation R1 can either take the values of the primary
key for some tuple in relation R2, or can take NULL values, but can’t be empty.
Explanation: In the above tables, the DNO of Table 1 is the foreign key, and DNO in Table 2 is
the primary key. DNO = 22 in the foreign key of Table 1 is not allowed because DNO = 22 is not
defined in the primary key of table 2. Therefore, Referential integrity constraints are violated here.

Database Schema

When we talk about a database, we must differentiate between the database schema, which is the
logical design of the database, and the database instance, which is a snapshot of the data in the
database at a given instant in time. The concept of a relation corresponds to the programming-
language notion of a variable, whilethe concept of a relation schema corresponds to the
programming-language notion of type definition.
In general, a relation schema consists of a list of attributes and their corresponding domains. The
concept of arelation instance corresponds to the programming-language notion of a value of a
variable. The value of a given variable may change with time;

Figure 1 : The department relation.


similarly the contents of a relation instance may change with time as the relation is updated. In
contrast, the schema of a relation does not generally change. Although it is important to know the
difference between a relation schema and a relation instance, we often use the same name, such as
instructor, to refer to both the schema and the instance. Where required, we explicitly refer to the
schema or to the instance, for example “the instructor schema,” or “an instance of the instructor
relation.” However, where it is clear whether we mean theschema or the instance, we simply use
the relation name. Consider the department relation of Figure 1.
The schema for that relation is department (dept name, building, budget)

Let us continue with our university database example. Each course in a university may be
offered multiple times, across different semesters, or even within a semester.We need a
relation to describe each individual offering, or section, of the class.
The schema is section (course id, sec id, semester, year, building, room number, time
slot id)

Figure 2 shows a sample instance of the section relation. We need a relation to describe the association
between instructors and the class sections that they teach. The relation schema to describe this
association is

teaches (ID, course id, sec id, semester, year)

Figure 2. The section relation

Figure 3. shows a sample instance of the teaches relation. As you can imagine, there are many more
relations maintained in a real university database. In addition to those relations we have listed already,
instructor, department, course, section, prereq, and teaches,we use the following relations in this text:

Figure 3. The teaches relation


• student (ID, name, dept name, tot cred)
• advisor (s id, i id) •
takes (ID, course id, sec id, semester, year, grade)
• classroom (building, room number, capacity)
• time slot (time slot id, day, start time, end time)
Schema Diagrams

➢ A database schema, along with primary key and foreign key dependencies, can be
depicted by schemadiagrams. Below figure shows the schema diagram for our
university organization. Each relation appears as abox, with the relation name at
the top in blue, and the attributes listed inside the box.
➢ Primary key attributes are shown underlined.
➢ Foreign key dependencies appear as arrows from the foreign key attributes of the
referencing relation to the primary key of the referenced relation.

Figure : schema diagram of university database.

Referential integrity constraints other than foreign key constraints are not shown explicitly in schema
diagrams. Entity relationship diagrams let us represent several kinds of constraints, including general
referential integrity constraints.Many database systems provide design tools with a graphical user
interface for creating schema diagrams.

Weak entity set in ER Diagram:

➢ An entity type should have a key attribute which uniquely identifies each entity in the
entity set, but there exists some entity type for which key attribute can’t be defined.
These are called Weak Entity type.
➢ The entity sets which do not have sufficient attributes to form a primary key are
known as weak entity sets and the entity sets which have a primary key are known as
strong entity sets.
➢ As the weak entities do not have any primary key, they cannot be identified on their
own, so they depend on some other entity (known as owner entity). The weak entities
have total participation constraint (existence dependency) in its identifying
relationship with owner identity. Weak entity types have partial keys. Partial Keys are
set of attributes with the help of which the tuples of the weak entities can be
distinguished and identified. Note – Weak entity always has total participation but
Strong entity may not have total participation.
➢ . Weak entity is represented by double rectangle. The relation between one strong and
one weak entity is represented by double diamond. Partial Key attributes are
represented with dotted lines.

Example-1: In the below ER Diagram, ‘Payment’ is the weak entity. ‘Loan Payment’ is
the identifying relationship and ‘Payment Number’ is the partial key. Primary Key of the
Loan along with the partial key would be used to identify the records.

Views in SQL

➢ Views in SQL are considered as a virtual table. A view also contains rows and
columns.
➢ To create the view, we can select the fields from one or more tables present in the
database.
➢ A view can either have specific rows based on certain condition or all the rows of
a table.
1. Creating view
A view can be created using the CREATE VIEW statement. We can create a view
from a single table or multiple tables.
Syntax:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE condition;

Sample tables.

Creating View from a single table

In this example, we create a View named DetailsView from the table Student_Detail.

Query:

CREATE VIEW DetailsView AS

SELECT NAME, ADDRESS

FROM Student_Details

WHERE STU_ID < 4;

Just like table query, we can query the view to view the data.

SELECT * FROM DetailsView;


Creating View from multiple tables

View from multiple tables can be created by simply include multiple tables in the SELECT
statement.

In the given example, a view is created named MarksView from two tables Student_Detail
and Student_Marks.

Query:

CREATE VIEW MarksView AS


SELECT Student_Detail.NAME, Student_Detail.ADDRESS, Student_Marks.MARKS
FROM Student_Detail, Student_Mark
WHERE Student_Detail.NAME = Student_Marks.NAME;
To display data of View MarksView:
SELECT * FROM MarksView

2. Deleting View
A view can be deleted using the Drop View statement.
Syntax
DROP VIEW view_name;
Example: If we want to delete the View MarksView, we can do this as:
DROP VIEW MarksView;
3. Updating View
We can use the CREATE OR REPLACE VIEW statement to add or remove fields from
a view
syntax
CREATE OR REPLACE VIEW view_name AS
SELECT column1,column2,..
FROM table_name
WHERE condition

For example, if we want to update the view MarksView and add the field AGE to this View
from StudentMarks Table, we can do this as:

CREATE OR REPLACE VIEW MarksView AS


SELECT StudentDetails.NAME, StudentDetails.ADDRESS,
StudentMarks.MARKS, StudentMarks.AGE
FROM StudentDetails, StudentMarks
WHERE StudentDetails.NAME = StudentMarks.NAME;

If we fetch all the data from MarksView now as

SELECT * FROM MarksView;


Example ER-Diagrams
1. University database

2. Library management database


Previous asked questions
1. a. Explain the basic concepts of E-R model and their symbols.

b. Differentiate between strong entity and weak entity with examples

2. a. Mention the features of relational model.

b. Explain the different types of keys in DBMS.

3. a. Explain the basic concepts of relational model.

b. What are the key constraints? Explain the design issues of ER model

4. a. Draw an ER-Diagram for Week entity set and Strong entity set with example.

b. Explain the E-R diagram components and notations with their extended features?

5. a. Explain the E-R diagram components and notations with their extended features?

b. Explain the following with examples. a) Primary Key. b) foreign key c) Candidate
key

6. a. Mention the different types of attributes with examples.

b. Explain the features of E-R model and its design issue constraints.

7. a. Explain the basic concepts of relational model.

b. What are the key constraints? Explain the design issues of ER model.

8. a. Draw an ER-Diagram for Week entity set and Strong entity set with example.

b. Explain the E-R diagram components and notations with their extended features?
Chapter 3: SQL (structured query language)

Syllabus :

1. Overview
2. The form of basic SQL Query :UNION,INTERSECT &EXCEPT
3. Join operations :Equi join and Non Equi join
4. Nested Queries : correlated and uncorrelated
5. Aggregate functions
6. Null values
7. Views
8. Triggers
1. Overview :

What is SQL?

➢ SQL is a language to operate databases; it includes Database Creation, Database


Deletion, Fetching Data Rows, Modifying & Deleting Data rows, etc.
➢ SQL stands for Structured Query Language, a computer language for storing,
manipulating and retrieving data in a relational database. SQL was developed in the
1970s by IBM Computer Scientists and became a standard of the American National
Standards Institute (ANSI) in 1986, and the International Organization for
Standardization (ISO) in 1987.
➢ SQL is the standard language to communicate with Relational Database Systems. All
the Relational Database Management Systems (RDMS) like MySQL, MS Access,
Oracle, Sybase, Informix, Postgres and SQL Server use SQL as their Standard Database
Language.

Why SQL?

SQL is widely popular because it offers the following advantages −

➢ Allows users to access data in the relational database management systems.


➢ Allows users to describe the data.
➢ Allows users to define the data in a database and manipulate that data.
➢ Allows to embed within other languages using SQL modules, libraries &
pre-compilers.
➢ Allows users to create and drop databases and tables.
➢ Allows users to create view, stored procedure, functions in a database.
➢ Allows users to set permissions on tables, procedures and views.
A Brief History of SQL
➢ 1970 − Dr. Edgar F. "Ted" Codd of IBM is known as the father of relational
databases. He described a relational model for databases.
➢ 1974 − Structured Query Language (SQL) appeared.
➢ 1978 − IBM worked to develop Codd's ideas and released a product named
System/R.
➢ 1986 − IBM developed the first prototype of relational database and
standardized by ANSI. The first relational database was released by
Relational Software which later came to be known as Oracle.
➢ 1987 − SQL became the part of the International Organization for
Standardization (ISO).
How SQL Works?

When executing an SQL command for any RDBMS, the system determines the best way to
carry out your request, and the SQL engine figures out how to interpret the task.

There are various components included in this process. These components are −

• Query Dispatcher
• Optimization Engines
• Classic Query Engine
• SQL Query Engine, etc.

A classic query engine handles all the non-SQL queries, but a SQL query engine will not handle
logical files. Following is a simple diagram showing the SQL Architecture –

SQL Basic Commands: refer chapter 1 database langages


2. The form of basic SQL Query :UNION,INTERSECT &EXCEPT

SQL Set Operators :

In SQL, set operators are used to combine the results of two or more SELECT statements into
a single result set. The commonly used set operators are UNION, INTERSECT, and EXCEPT
(or MINUS in some databases). These operators allow you to perform operations on sets of
rows rather than individual rows. Here's an overview of SQL set operators.

The set operators are

1. UNION, UNION ALL


2. INTERSECT
3. EXCEPT

Basic Rules on Set Operations:

➢ The result sets (tables) of all the queries must be the same number of columns.
➢ In all result sets, the data type of each column must be well-matched and compatible
with the data type of its corresponding columns in another result set.
➢ The ORDER BY clause can be applied to the last query to sort the result.

Example :

Create two tables speakers and authors :

Speakers Authors

UNION

➢ Union combines the results of two queries into a single result set of all matching
rows.
➢ Both queries must have the same number of columns and compatible data types to
unite.
➢ All duplicate records are removed automatically unless UNION ALL is used.
➢ Generally, it can be useful in applications where tables are not perfectly normalized,
for example, a data warehouse application.

Synthax :

SELECT column_name(s) FROM table1


UNION
SELECT column_name(s) FROM table2;

Example-1:

You want to invite all the Speakers and Authors for the annual conference. Hence, how will
you prepare the invitation list?

Synthax :

select name from Speakers

union

select name from Authors

order by name

output :

As you can see here, the default order is ascending order and you have to use in the last query
instead of both queries.

UNION ALL

It will not remove duplicate records. It can be faster than UNION.


Example-2

You want to give a prize to all the Speakers and Authors at the annual conference. Hence,
how will you prepare the prize list?

Syntax:

select name, 'Speaker' as 'Role' from Speakers

union all

select name, 'Author' as 'Role' from Authors

order by name

output

INTERSECT
It is used to take the result of two queries and returns the only those rows which are
common in both result sets. It removes duplicate records from the final result set.
Syntax
SELECT column_name(s) FROM table1
INTERSECT
SELECT column_name(s) FROM table2;

Example-3

You want the list of people who are Speakers and they are also Authors. Hence, how will you
prepare such a list?

Syntax:

select name from Speakers

intersect
select name from Authors

order by name

Output:

EXCEPT :
It is used to take the distinct records of two one query and returns the only those rows which
do not appear in the second result set.
Syntax :
SELECT column_name(s) FROM table1
EXCEPT
SELECT column_name(s) FROM table2;

Example-4
You want the list of people who are only Speakers and they are not Authors. Hence, how will
you prepare such a list?

Syntax :
select name from Speakers

except

select name from Authors

order by name

OUTTPUT

Example-5
You want the list of people who are only Authors and they are not Speakers. Hence, how will
you prepare such a list?
Syntax :
select name from Authors

except

select name from Speakers

order by name

output :

➢ UNION combines results from both tables.


➢ UNION ALL combines two or more result sets into a single set, including all
duplicate rows.
➢ INTERSECT takes the rows from both the result sets which are common in both.
➢ EXCEPT takes the rows from the first result data but does not in the second result
set.
3. Join operations

A SQL Join statement combines data or rows from two or more tables based on a common
field between them.

➢ SQL Joins are mostly used when a user is trying to extricate data from multiple
tables (which have one-to-many or many-to-many relationships with each other) at
one time.
➢ Large databases are often prone to data redundancy, i.e., the creation of repetitive
data anomalies by insertion, deletion, and updation. But by using SQL Joins, we
promote database normalization, which reduces data redundancy and eliminates
redundant data.

➢ In SQL joins are mainly two types they are INNER JOINS and OUTER JOINS.
Join can represented by using below symbol.

In SQL, joins can be classified as either equi-joins or non-equi-joins based on the


conditions used to match rows between tables.

JOINS

INNER JOIN OUTER JOIN

THETA JOIN NATURAL JOIN LEFT JOIN RIGHT JOIN FULL OUTER
(Non-Equi join) (Equi-Join)
JOIN

CONDITION

Equi-Join: An equi-join is a type of join where the matching condition is based on equality
between columns from different tables. It matches rows where the specified columns have the
same values.

➢ the Equi Join in SQL returns only the data in all the tables we are comparing based on
the common column field. It does not display null or unmatchable data.
➢ The equality operator in the Equi Join operation is used to refer to the equality in
the WHERE clause. However, it returns the same result when we use
the JOIN keyword with the ON clause along with column names and their respective
tables
Example :

SELECT *

FROM TableName1, TableName2

WHERE TableName1.ColumnName = TableName2.ColumnName;

OR

SELECT *

FROM TableName1

JOIN TableName2

ON TableName1.ColumnName = TableName2.ColumnName;

➢ An equi join is any JOIN operation that uses only and only the equals sign. If there
is a query with more than one join condition, out of which one condition has an
equals sign, and the other doesn't, then this query would be considered a non-equi
join in SQL.

Non-equi join :

➢ A non-equi join, also known as a range join or a theta join, is a type of join operation
where the joining condition involves operators other than equality, such as greater
than (>), less than (<), greater than or equal to (>=), less than or equal to (<=), or
not equal to (!= or <>).
➢ Non-Equi Join is also a type of INNER Join in which we need to retrieve data from
multiple tables. In a non-equi join, rows are matched based on a range of values
rather than a direct equality. Non-equi joins are less common and are often used to
solve specific data analysis problems.

However, we use the Non-Equi joins for the below-mentioned reasons-


➢ Retrieving data matching in a range of values.
➢ Checking for duplicate data between tables.
➢ For calculating totals.
Here's an example

SELECT *
FROM TableName1, TableName2
WHERE TableName1.columnName [> | < | >= | <= | != | BETWEEN ] table_name2.column;
EQUI-Join Example:

Suppose we have two tables, namely state and city, which contain the name of the states and
the name of the cities, respectively. In this example, we will map the cities with the states in
which they are present.

Table City :

Table state: City_ID City_Name


1 Luknow
State_ID State_Name
1 Gorakhpur
1 Uttar Pradesh
1 Noida
2 Uttarakhand
2 Dehradun
3 Madhyapradesh
2 Rishikesh
3 Gwalior

Now, if we execute a query of Equi-join using the equality operation and the WHERE clause,
then

SELECT *
FROM state, city
WHERE state.State_Id = city.City_Id;
Output

State_ID State_Name City_Id City_Name

1 Uttar Pradesh 1 Luknow

1 Uttar Pradesh 1 Gorakhpur

1 Uttar Pradesh 1 Noida


2 Uttarakhand 2 Dehradun

2 Uttarakhand 2 Rishikesh

3 Madhyapradesh 3 Gwalior

Non Equi join Example:

➢ We take two tables, test1 and test2.

Table test1 Table test2

S_NO Name S_NO Name


20 Amith 80 Tarun
30 Ankush 60 Mitali
10 Akash 10 Akash
50 jatin 50 Jatin
5 Aman

Now, if we execute a query of Non-Equi-join using any operator other than the equality
operator, such as >(greater than) with the WHERE clause –

SELECT *
FROM test1,test2
WHERE test1.SNo > test2.SNo;
Output

S_NO Name S_NO Name


20 Amith 10 Akash
30 Ankush 10 Akash
50 Jatin 10 Akash
20 Amith 5 Aman
30 Ankush 5 Aman
50 Jatin 5 Aman

INNER JOIN in SQL


An inner join returns only the rows that have matching values in both tables being joined. It
combines rows from two tables where the join condition is satisfied. The syntax for an inner
join is as follows:
SELECT *

FROM table1

INNER JOIN table2

ON table1.column = table2.column;

NATURAL JOIN

➢ SQL Natural Join is a type of Inner join based on the condition that columns having
the same name and datatype are present in both the tables to be joined.

SELECT *

FROM table-1

NATURAL JOIN table-2;

OUTER JOINS in SQL

SQL Outer joins give both matched and unmatched rows of data depending on the type of outer
joins. These types are outer joins are sub-divided into the following types:

➢ Left Outer Join


➢ Right Outer Join
➢ Full Outer Join

Left Join (or Left Outer Join):

➢ A left join returns all the rows from the left table and the matching rows from the
right table. If there are no matching rows in the right table, NULL values are
included for the columns of the right table. The syntax for a left join is as follows:
SELECT *
FROM table1
LEFT JOIN table2
ON table1.column = table2.column;

Right Join (or Right Outer Join):

A right join returns all the rows from the right table and the matching rows from the left table.
If there are no matching rows in the left table, NULL values are included for the columns of
the left table. The syntax for a right join is as follows:

SELECT *

FROM table1

RIGHT JOIN table2

ON table1.column = table2.column;

Full Join (or Full Outer Join):

A full join returns all the rows from both the left and right tables. If there are no matching rows
in either table, NULL values are included for the columns of the non-matching table.

SELECT *

FROM table1

FULL JOIN table2


ON table1.column = table2.column;

Theta join :

In SQL, a theta join, also known as a non-equi join or a range join, is a type of join operation
where the joining condition involves comparison operators other than equality (=).

The syntax for a theta join typically involves using the JOIN keyword followed by the joined
tables and the join condition with the desired comparison operator(s). Here's an example:

SELECT *

FROM table1

JOIN table2

ON table1.column1 < table2.column2;

Examples :

Customer Orders

Id Cust_Name Address Or_Id amount Cust_id


1 Ram A 601 1000 1
2 Raj B 602 2000 4
3 Rani C 603 3000 2
Inner join :

SELECT Orders.Or_Id ,Customers.Cust_Name


FROM Orders
INNER JOIN Customers
ON Orders.Cust_Id=Customers.Id;
Out put:
Or_Id Cust_Name
601 Ram
603 Raj

1. LEFT JOIN

SELECT Orders.Or_Id ,Customers.Cust_Name


FROM Orders
LEFT JOIN Customers
ON Orders.Cust_Id=Customers.Id;
Out put:
Or_Id Cust_Name
601 Ram
602 NULL
603 Raj

Right Join

SELECT Orders.Or_Id ,Customers.Cust_Name


FROM Orders
RIGHT JOIN Customers
ON Orders.Cust_Id=Customers.Id;
Out put:
Or_Id Cust_Name
601 Ram
603 Raj
NULL Rani
Full join

SELECT Orders.Or_Id ,Customers.Cust_Name


FROM Orders
FULL JOIN Customers
ON Orders.Cust_Id=Customers.Id;
Out put:
Or_Id Cust_Name
601 Ram
602 NULL
603 Raj
NULL Rani

4. Nested Queries:
➢ A nested query in SQL contains a query inside another query. The outer query will
use the result of the inner query. For instance, a nested query can have
two SELECT statements, one on the inner query and the other on the outer query.

Types of Nested Queries in SQL :


Nested queries in SQL can be classified into two different types:

➢ Unco-related Nested Queries (Independent Nested Queries)


➢ Co-related Nested Queries

Independent Nested Queries:

In independent nested queries, the execution order is from the innermost query to the outer
query. An outer query won't be executed until its inner query completes its execution. The outer
query uses the result of the inner query. Operators such as IN, NOT IN, ALL, and ANY are
used to write independent nested queries.
➢ If a subquery uses any refrences from outer query, it would be called
Independent Subquery.
➢ The IN operator checks if a column value in the outer query's result is present in
the inner query's result. The final result will have rows that satisfy the IN condition.
➢ The NOT IN operator checks if a column value in the outer query's result is not
present in the inner query's result. The final result will have rows that satisfy
the NOT IN condition.
➢ The ALL operator compares a value of the outer query's result with all the
values of the inner query's result and returns the row if it matches all the values.
➢ The ANY operator compares a value of the outer query's result with all the inner
query's result values and returns the row if there is a match with any value.

Co-related Nested Queries

In co-related nested queries, the inner query uses the values from the outer query to execute
the inner query for every row processed by the outer query. The co-related nested queries run
slowly because the inner query is executed for every row of the outer query's result.

➢ Outer query needs to be executed before the inner query.


➢ The inner query is executed separately for each row of the outer query.
➢ here are different types of set comparison operators like EXISTS, IN and UNIQUE.
SQL also supports op ANY and op ALL, where op means arithmetic comparison
operators such as <, <=, =, <>, >=, >. SOME are also one of the set comparison
operators but it is similar to ANY.
➢ A correlated subquery is evaluated once for each row processed by the parent
statement. The parent statement can be a SELECT, UPDATE,
or DELETE statement.

Syntax:
SELECT column1, column2, ....
FROM table1 outer
WHERE column1 operator
(SELECT column1, column2
FROM table2
WHERE expr1 = outer.expr2);

Examples :

We will use the Employees and Awards table below to understand independent and co-related
nested queries. We will be using Oracle SQL syntax in our queries.

➢ Let's create the Employees and Awards tables:


Creating employee Table:

CREATE TABLE employee (

id NUMBER PRIMARY KEY,

name VARCHAR2(100) NOT NULL,

salary NUMBER NOT NULL,

role VARCHAR2(100) NOT NULL

);

Creating Awards Table:

CREATE TABLE awards(

id NUMBER PRIMARY KEY,

employee_id NUMBER NOT NULL,

award_date DATE NOT NULL

);

Let's add data to the tables created above:

INSERT INTO employees VALUES (1, 'Augustine Hammond', 10000, 'Developer');

INSERT INTO employees VALUES (2, 'Perice Mundford', 10000, 'Manager');

INSERT INTO employees VALUES (3, 'Cassy Delafoy', 30000, 'Developer');

INSERT INTO employees VALUES (4, 'Garwood Saffen', 40000, 'Manager');


INSERT INTO employees VALUES (5, 'Faydra Beaves', 50000, 'Developer');

INSERT INTO awards VALUES(1, 1, TO_DATE('2022-04-01', 'YYYY-MM-DD'));

INSERT INTO awards VALUES(2, 3, TO_DATE('2022-05-01', 'YYYY-MM-DD'));

Example 1: IN

➢ Select all employees who won an award


Sintax :

SELECT id, name FROM employees

WHERE id IN (SELECT employee_id FROM awards);


Output:

Example 2: NOT IN

Select all employees who never won an award.

SELECT id, name FROM employees

WHERE id NOT IN (SELECT employee_id FROM awards);

Output :

Example 3: ALL

➢ Select all Developers who earn more than all the Managers

SELECT * FROM employees

WHERE role = 'Developer'

AND salary > ALL (

SELECT salary FROM employees WHERE role = 'Manager'

);
Output :

Example 4: ANY

➢ Select all Developers who earn more than any Manager

SELECT * FROM employees


WHERE role = 'Developer'
AND salary > ANY (
SELECT salary FROM employees WHERE role = 'Manager'
);

Output :

Co-related Nested Queries

➢ Select all employees whose salary is above the average salary of employees in their
role.

SELECT * FROM employees emp1

WHERE salary > (

SELECT AVG(salary)

FROM employees emp2

WHERE emp1.role = emp2.role

);
Output :

5. Aggregate Functions in SQL

An aggregate function in SQL performs a calculation on multiple values and returns a single
value. SQL provides many aggregate functions that include avg, count, sum, min, max, etc. An
aggregate function ignores NULL values when it performs the calculation, except for the count
function.

1. COUNT() Function

The COUNT() aggregate function returns the total number of rows from a database table that
matches the defined criteria in the SQL query.

Syntax

COUNT(*) OR COUNT(COLUMN_NAME)

Example :

The given table named EMP_DATA consists of data concerning 10 employees working in the
same organization in different departments.
1. Suppose you want to know the total number of employees working in the organization.
You can do so by the below-given query.

SELECT COUNT(*) FROM EMP_DATA;

As COUNT(*) returns the total number of rows and the table named EMP_DATA provided
above consists of 10 rows, so the COUNT(*) function returns 10. The output is printed as
shown below.

Output: 10

Note: Except for COUNT(*), all other SQL aggregate functions ignore NULL values.

2. Suppose you need to count the number of people who are getting a salary. The query
given below can help you achieve this.

SELECT COUNT(Salary) FROM EMP_DATA;

Output : 9
Here, the Salary column is passed as a parameter to the COUNT() function, and hence, this
query returns the number of non NULL values from the column Salary, i.e. 9.

3. Suppose you need to count the number of distinct departments present in the
organization. The following query can help you achieve this.

SELECT COUNT(DISTINCT Department) FROM EMP_DATA;

Output: 3

The above query returns the total number of distinct non NULL values over the column
Department i.e. 3 (Marketing, Production, R&D). The DISTINCT keyword makes sure that
only non-repetitive values are counted.

4. What if you want to calculate the number of people whose salaries are more than a
given amount(say 70,000)? Check out the example below.

SELECT COUNT(Salary) WHERE Salary >= 70000 FROM EMP_DATA;

Output : 5

The query returns the number of rows where the salary of the employee is greater than or equal
to 70,000 i.e 5.

2. SUM() Function

The SUM() function takes the name of the column as an argument and returns the sum of all
the non NULL values in that column. It works only on numeric fields(i.e the columns contain
only numeric values). When applied to columns containing both non-numeric(ex - strings) and
numeric values, only numeric values are considered. If no numeric values are present, the
function returns 0.

Syntax:

The function name is SUM() and the name of the column to be considered is passed as an
argument to the function.

SUM(COLUMN_NAME)
Example:

1. Suppose you need to build a budget for the organization and you need to know the total
amount needed to provide salaries to all the employees. To calculate the sum of all the
values present in column Salary. You can refer to the below-given example.

SELECT SUM(Salary) FROM EMP_DATA;

Output :646000

The above-mentioned query returns the sum of all non-NULL values over the column Salary
i.e 80000 + 76000 + 76000 + 84000 + 80000 + 64000 + 60000 + 60000 + 66000 = 646000

2. What if you need to consider only distinct salaries? The following query will help you
achieve that.

SELECT SUM(DISTINCT Salary) FROM EMP_DATA;

Output : 430000

The DISTINCT keyword makes sure that only non-repetitive values are considered. The query
returns the sum of all distinct non NULL values over the column Salary i.e. 80000 + 76000 +
84000 + 64000 + 60000 + 66000 = 430000.

3. Suppose you need to know the collective salaries for each department(say Marketing).
The query given below can help you achieve this.

SELECT SUM(SALARY) FROM EMP_DATA WHERE Department = "Marketing";

Output :160000

The query returns the sum of salaries of employees who are working in the Marketing
Department i.e 80000 + 80000 = 160000.

Note: There are 3 rows consisting of Marketing as Department value but the third value is a
NULL value. Thus, the sum is returned considering only the first two entries having Marketing
as Department.
3.AVG() Function

The AVG() aggregate function uses the name of the column as an argument and returns the
average of all the non NULL values in that column. It works only on numeric fields(i.e the
columns contain only numeric values).

Note: When applied to columns containing both non-numeric (ex - strings) and numeric values,
only numeric values are considered. If no numeric values are present, the function returns 0.

Syntax:

The function name is AVG() and the name of the column to be considered is passed as an
argument to the function.

AVG(COLUMN_NAME)

Example:

1. To obtain the average salary of an employee of an organization, the following query


can be used.
SELECT AVG(Salary) FROM EMP_DATA;

Output : 71777.77777

Here, the column name Salary is passed as an argument and thus the values present in column
Salary are considered. The above query returns the average of all non NULL values present in
the Salary column of the table.

Average = (80000 + 76000 + 76000 + 84000 + 80000 + 64000 + 60000 + 60000 + 66000 ) / 9
= 646000 / 9 = 71777.77777

2. If you need to consider only distinct salaries, the following query will help you out.

SELECT AVG(DISTINCT Salary) FROM EMP_DATA;

Output : 71666.66666

The query returns the average of all non NULL distinct values present in the Salary column of
the table.

Average = (80000 + 76000 + 84000 + 64000 + 60000 + 66000) / 6 = 430000/ 6 = 71666.66666.


4. MIN() Function

The MIN() function takes the name of the column as an argument and returns the minimum
value present in the column. MIN() returns NULL when no row is selected.

Syntax:

The function name is MIN() and the name of the column to be considered is passed as an
argument to the function.

MIN(COLUMN_NAME)

Example;

1. Suppose you want to find out what is the minimum salary that is provided by the
organization. The MIN() function can be used here with the column name as an
argument.
SELECT MIN(Salary) FROM EMP_DATA;

Output :60000

The query returns the minimum value of all the values present in the mentioned column i.e
60000.

2. Suppose you need to know the minimum salary of an employee belonging to the
Production department. The following query will help you achieve that.

SELECT MIN(Salary) FROM EMP_DATA WHERE Department = "Production";

Output:60000

The query returns the minimum value of all the values present in the mentioned column and
has Production as Department value i.e 60000.

5. MAX() Function

The MAX() function takes the name of the column as an argument and returns the maximum
value present in the column. MAX() returns NULL when no row is selected.

Syntax:

The function name is MAX() and the name of the column to be considered is passed as an
argument to the function.
MAX(COLUMN_NAME)

Example :

1. Suppose you want to find out what is the maximum salary that is provided by the
organization. The MAX() function can be used here with the column name as an
argument.

SELECT MAX(Salary) FROM EMP_DATA;

Output : 84000

The query returns the maximum value of all the values present in the mentioned column i.e
84000.

2. Suppose you need to know the maximum salary of an employee belonging to the R&D
department. The following query will help you achieve that.

SELECT MAX(Salary) FROM EMP_DATA WHERE Department="R&D";

Output : 84000

The query returns the maximum value of all the values present in the mentioned column and
has R&D as Department value i.e 84000.

6. Null values
➢ In SQL, NULL represents the absence of a value. It is used to indicate that a data
point does not have a value or that the value is unknown or undefined. Here are
some important points to understand about handling NULL values in SQL:
➢ NULL is not the same as an empty string or zero. It is a distinct value that signifies
the absence of a value.
➢ NULL values can be used in columns of any data type, including numeric, string,
date, and other data types.
➢ When performing comparisons involving NULL values, the result is always
unknown (neither true nor false). Therefore, you cannot use standard equality
operators like = or <> to compare NULL values.
➢ To check for NULL values in SQL, you use the IS NULL or IS NOT NULL
operators. For example:
SELECT * FROM table_name WHERE column_name IS NULL;

SELECT * FROM table_name WHERE column_name IS NOT NULL;

➢ When performing calculations involving NULL values, the result is usually NULL.
However, some database systems have specific behaviors when NULL values are
involved in calculations, so it's important to consult the documentation for your
specific database management system (DBMS) to understand its behavior.
➢ When inserting or updating data in a table, you can explicitly set a column to NULL
if you want to represent the absence of a value. For example:

INSERT INTO table_name (column1, column2) VALUES (value1, NULL);


UPDATE table_name SET column1 = NULL WHERE condition;

➢ NULL values can also be used in joins and filtering conditions. For example, you
can include NULL values in a result set using a LEFT JOIN.
➢ It's important to handle NULL values appropriately in your SQL queries to ensure
accurate and reliable data processing. Be aware of any specific behavior and
handling of NULL values in your chosen database system, as it can vary between
different database management systems.
7. Triggers
➢ In SQL, a trigger is a database object that is associated with a table and
automatically executes a set of actions in response to certain database events, such
as INSERT, UPDATE, or DELETE operations on the table. Triggers are useful for
enforcing data integrity rules, auditing changes, maintaining derived data, or
implementing complex business logic within the database. Here's an overview of
SQL trigger
➢ Syntax: The basic syntax to create a trigger in SQL is as follows:
CREATE TRIGGER trigger_name
{BEFORE | AFTER}
{INSERT | UPDATE | DELETE}
ON table_name
[FOR EACH ROW]
[WHEN (condition)]
BEGIN
-- Trigger actions here (trigger body)
END;

Types of Triggers

The following are the different types of triggers present in SQL.

DML Triggers

➢ These triggers fire in response to data manipulation language (DML) statements


like INSERT, UPDATE, or DELETE.

After Triggers

➢ These triggers execute after the database has processed a specified event (such as
an INSERT, UPDATE, or DELETE statement). AFTER triggers are commonly used
to perform additional processing or auditing tasks after a data modification has
occurred.

Instead Triggers

➢ These triggers are used for views and fire instead of the DML statement (INSERT,
UPDATE, DELETE) on the view.

DDL Triggers

➢ These triggers fire in response to data definition language (DDL) statements like
CREATE, ALTER, or DROP.

LOGON Triggers

➢ These triggers fire when a user logs into the database.

LOGOFF Triggers

➢ These triggers fire when a user logs out of the database.

SERVERERROR Triggers

➢ These triggers fire when a server error occurs.

✓ When any DDL operation is done. E.g., CREATE, ALTER, DROP


✓ For a DML operation. e.g., INSERT, UPDATE, DELETE.
✓ For a database operation like LOGON, LOGOFF, STARTUP, SHUTDOWN or
SERVERERROR.

EXAMPLE :

Let’s take an example. Let’s assume a student table with column id, first_name, last_name,
and full_name.

Query 1:

CREATE TABLE student(Id integer PRIMARY KEY, first_name varchar(50), last_name


varchar(50), full_name varchar(50));

First, let’s create a SQL Trigger –

Query 2:

create trigger student_name


after INSERT
on student
for each row
BEGIN
UPDATE student set full_name = first_name || ' ' || last_name;
END;
Let’s insert the students.

Query 3:

/* Create a few records in this table */

INSERT INTO student(id, first_name, last_name) VALUES(1,'Alvaro', 'Morte');

INSERT INTO student(id, first_name, last_name) VALUES(2,'Ursula', 'Corbero');

INSERT INTO student(id, first_name, last_name) VALUES(3,'Itziar', 'Ituno');

INSERT INTO student(id, first_name, last_name) VALUES(4,'Pedro', 'Alonso');

INSERT INTO student(id, first_name, last_name) VALUES(5,'Alba', 'Flores');

Query 4:
/* Display all the records from the table */

SELECT * FROM student;

OUTPUT

8. Views in SQL
Refer in Chapter 2.
Previous years ASKED questions and Answers (SQL_Querrys)

1. Write SQL Queries for the given Schemas employees (emp-id, first-name, last-name,
hire-date, dept-id, salary, job) departments (dept-id, dept-name, manager-id, location-
id)
i) Display the last names and hire dates of all latest hires in their respective
Departments in the location ID 1700.
ii) Write an SQL Query to display all employee names and salary whose salary is
Greater than minimum salary of the company and job title starts with ' 𝑀 '.

Answer :

i) Display the last names and hire dates of all latest hires in their respective Departments

SELECT e.last_name, e.hire_date


FROM employees e
JOIN departments d ON e.dept_id = d.dept_id
WHERE d.location_id = 1700
AND e.hire_date = (
SELECT MAX(hire_date)
FROM employees
WHERE dept_id = e.dept_id
);

ii) Write an SQL Query to display all employee names and salary whose salary is Greater
than minimum salary of the company and job title starts with ' 𝑀 '.

SELECT first_name, last_name, salary


FROM employees
WHERE salary > (
SELECT MIN(salary)
FROM employees
)
AND job LIKE 'M%';
2. Write the following queries in SQL, using the university schema.
a. Find the names of all students who have taken at least one Comp. Sci. course; make sure
there are no duplicate names in the result.

b. Find the IDs and names of all students who have not taken any course offering before
Spring 2009.
c. For each department, lind the maximum salary of instructors in that department, You may
assume that every department has at least one instructor.
d. Find the lowest, across all departments, of the per-department maximum salary computed
by the preceding query.

Answer :

a) Find the names of all students who have taken at least one Comp. Sci. course; make sure
there are no duplicate names in the result.

SELECT DISTINCT s.name


FROM students s
JOIN enrollments e ON s.student_id = e.student_id
JOIN courses c ON e.course_id = c.course_id
WHERE c.department = 'Comp. Sci.';
b. Find the IDs and names of all students who have not taken any course offering before
Spring 2009.

SELECT s.student_id, s.name


FROM students s
WHERE s.student_id NOT IN (
SELECT e.student_id
FROM enrollments e
JOIN course_offerings co ON e.course_offering_id =
co.course_offering_id
WHERE co.semester < 'Spring 2009'
);
c. For each department, lind the maximum salary of instructors in that department, You may
assume that every department has at least one instructor.
SELECT d.department_id, d.department_name, MAX(i.salary) AS max_salary

FROM departments d

JOIN instructors i ON d.department_id = i.department_id

GROUP BY d.department_id, d.department_name;

d. Find the lowest, across all departments, of the per-department maximum salary computed
by the preceding query.

SELECT MIN(max_salary) AS lowest_max_salary


FROM (
SELECT MAX(i.salary) AS max_salary
FROM departments d
JOIN instructors i ON d.department_id = i.department_id
GROUP BY d.department_id, d.department_name
) subquery;
Suppose that we have a relation marks(ID, score) and we wish to assign grades to students
based on the score as follows: grade F if score < 40, grade C if 40 ≤ score < 60, grade B if
60 ≤ score < 80, and grade A if 80 ≤ score. Write SQL queries to do the following:
a. Display the grade for each student, based on the marks relation.
b. Find the number of students with each grade.

This query uses the CASE statement to evaluate the score of each student and assign the
corresponding grade based on the given conditions. The result will include the student ID,
score, and the calculated grade.
a. Display the grade for each student, based on the marks relation.

SELECT ID, score,


CASE
WHEN score < 40 THEN 'F'
WHEN score >= 40 AND score < 60 THEN 'C'
WHEN score >= 60 AND score < 80 THEN 'B'
WHEN score >= 80 THEN 'A'
END AS grade
FROM marks;
b. Find the number of students with each grade.

This query first calculates the grade for each student using the same CASE statement
as in the previous query. It then groups the result by the grade and counts the
number of students for each grade using the COUNT(*) function. The result will
include the grade and the corresponding number of students.

SELECT grade, COUNT(*) AS num_students


FROM (
SELECT ID,
CASE
WHEN score < 40 THEN 'F'
WHEN score >= 40 AND score < 60 THEN 'C'
WHEN score >= 60 AND score < 80 THEN 'B'
WHEN score >= 80 THEN 'A'
END AS grade
FROM marks
) subquery
GROUP BY grade;
Consider the following schema instructor (ID, name, dept_name), teaches (ID, course_id,
sec_id, semester, year), section (course_id, sec_id, semester, year), student (ID, name,
dept_name), takes (ID, course_id, sec_id, semester, year, grade) Write the following queries
in SQL
a) Find the names of the students not registered in any section
b) Find the names of the instructors not teaching any course
c) Find the total number of courses taught department wise
d) Find the total number of courses registered department wise

a) Find the names of the students not registered in any section

SELECT name
FROM student
WHERE ID NOT IN (
SELECT ID
FROM takes
);
b) Find the names of the instructors not teaching any course
SELECT name
FROM instructor
WHERE ID NOT IN (
SELECT ID
FROM teaches
);
c) Find the total number of courses taught department wise

SELECT dept_name, COUNT(DISTINCT course_id) AS


total_courses_taught
FROM teaches
GROUP BY dept_name;
d) Find the total number of courses registered department wise

SELECT dept_name, COUNT(DISTINCT course_id) AS


total_courses_registered
FROM takes
GROUP BY dept_name;

b) Consider the following tables:


Employee (Emp_no, Name, Emp_city)
Company (Emp_no, Company_name, Salary)
i. Write a SQL query to display Employee name and company name.
ii. Write a SQL query to display employee name, employee city, company name and salary of
all the employees whose salary > 10000.
iii. Write a query to display all the employees working in " XYZ " company

Answer :

i. Write a SQL query to display Employee name and company name.

SELECT Employee.Name, Company.Company_name


FROM Employee
JOIN Company ON Employee.Emp_no = Company.Emp_no;
ii. Write a SQL query to display employee name, employee city, company name and salary of
all the employees whose salary > 10000

SELECT Employee.Name, Employee.Emp_city,


Company.Company_name, Company.Salary
FROM Employee
JOIN Company ON Employee.Emp_no = Company.Emp_no
WHERE Company.Salary > 10000;
iii. Write a query to display all the employees working in " XYZ " company

SELECT Employee.Name
FROM Employee
JOIN Company ON Employee.Emp_no = Company.Emp_no
WHERE Company.Company_name = 'XYZ';

Consider the following relations.


Department (dept name, building, budget)
Course (course id, title, dept_name)
Instructor (id, name, dept name, salary)
Teaches (id, course_id, section id, semester, year)
The key fields are underline as primary key. Give the SQL express of the following:-
i. Select the department name of instructor whose name neither "A " nor , 𝐵 ′′ .
ii. Find the number of instructors of each department who teach course "DBMS"
iii. Find the department that have the highest average salary.
iv. Give a 5% salary raise to instructor whose salary is less than average.

Answer :

i. Select the department name of instructor whose name neither "A " nor , 𝐵 ′′ .

SELECT dept_name
FROM Instructor
WHERE name NOT IN ('A', 'B');
ii. Find the number of instructors of each department who teach course "DBMS"

SELECT dept_name, COUNT(*) AS num_instructors


FROM Teaches
WHERE course_id = 'DBMS'
GROUP BY dept_name;
iii. Find the department that have the highest average salary.

SELECT dept_name
FROM Instructor
GROUP BY dept_name
HAVING AVG(salary) = (
SELECT MAX(avg_salary)
FROM (
SELECT AVG(salary) AS avg_salary
FROM Instructor
GROUP BY dept_name
) subquery
);
iv. Give a 5% salary raise to instructor whose salary is less than average.

UPDATE Instructor
SET salary = salary * 1.05
WHERE salary < (
SELECT AVG(salary)
FROM Instructor
);

Write SQL statements for following:


Student( Enrno, name, courseId, emailId, cellno)
Course(courseId, course_nm, duration)
(i)Add a column city in student table.
(ii) Find out list of students who have enrolled in "computer" course.
(iii) List name of all courses with their duration.
(iv) List name of all students start with "a".
(v) List email Id and cell no of all mechanical engineering students
Answer

(i)Add a column city in student table.

ALTER TABLE Student


ADD COLUMN city VARCHAR(255);
(ii) Find out list of students who have enrolled in "computer" course.

SELECT name
FROM Student
WHERE courseId = 'computer';
(iii) List name of all courses with their duration

SELECT course_nm, duration


FROM Course;
(iv) List name of all students start with "a".

SELECT name
FROM Student
WHERE name LIKE 'a%';
(v) List email Id and cell no of all mechanical engineering students

SELECT emailId, cellno


FROM Student
WHERE courseId IN (
SELECT courseId
FROM Course
WHERE course_nm = 'mechanical engineering'
);

Consider the following database schema to write queries in SQL


Sailor(sid, sname, age, rating)
Boats(bid, bname, bcolor)
Reserves(sid, bid, day)
i) Find the sailors who have reserved a red boat
ii) Find the names of the sailors who have reserved at least two boats
iii) Find the colors of the boats reserved by 'Mohan'.
i) Find the sailors who have reserved a red boat

SELECT DISTINCT s.sid, s.sname


FROM Sailor s
JOIN Reserves r ON s.sid = r.sid
JOIN Boats b ON r.bid = b.bid
WHERE b.bcolor = 'red';

ii) Find the names of the sailors who have reserved at least two boats

SELECT s.sname
FROM Sailor s
JOIN Reserves r ON s.sid = r.sid
GROUP BY s.sname
HAVING COUNT(r.bid) >= 2;

iii) Find the colors of the boats reserved by 'Mohan'.

SELECT DISTINCT b.bcolor


FROM Sailor s
JOIN Reserves r ON s.sid = r.sid
JOIN Boats b ON r.bid = b.bid
WHERE s.sname = 'Mohan';
Important questions

1. Explain different types of Join Operations with relevant examples.


2. Explain about aggregate functions and null values.
3. Explain various DDL functions in SQL with examples.
4. Explain the following in SQL with examples a) Set operators b) nested queries
5. Explain the following in SQL with examples. a) Nested Queries b) Correlated Queries
b. Explain about aggregate functions and null values.
6. Explain various DML functions in SQL with examples. b. Explain the following in
SQL with examples a) Group by and orderby Clauses b) Triggers
7. Explain the following. a) Types of Join Operations b) Set Operations
8. Explain about aggregate functions and null values.
9. Explain the SQL queries -UNION, INTERSECT, and EXCEPT with examples.
10. Explain about nested queries and triggers
Chapter 4 : DEPENDENCIES AND NORMAL FORMS
Syllabus:

1. Importance of a good schema design


2. Problems encountered with bad schema design
3. Motivation for normal forms
4. Functional dependencies-armstrong’s axioms for FD’s -closure of a set of FD’s
5. Minimal covers
6. Definitions of 1NF,2NF,3NF and BCNF
7. Decompositions and desirable properties

1. IMPORTANCE OF A GOOD SCHEMA DESIGN:

Schema: A database schema is a blueprint that represents the tables and relations of a data set.
Good database schema design is essential to making your data tractable so that you can make
sense of it and build the dashboards, reports, and data models that you need.

A good schema design is crucial for the efficient and effective management of data in a database
system. It plays a fundamental role in determining how data is organized, stored, and retrieved,
and impacts the overall performance, scalability, and maintainability of the system. Here are
some key reasons why a good schema design is important

Data organization: A well-designed schema helps in structuring data in a logical and organized
manner. It defines the tables, relationships, and constraints that govern the data model, ensuring
data integrity and consistency. This organization facilitates easy navigation and understanding
of the data, making it more manageable and accessible.

Query performance: The schema design significantly impacts the performance of database
queries. By properly structuring tables, defining appropriate indexes, and optimizing data
types, a good schema design can enhance query execution speed and minimize resource
consumption. Efficient query performance leads to faster response times and improved overall
system performance.

Data integrity and consistency: A good schema design enforces data integrity and ensures
consistency. By defining appropriate constraints, such as primary keys, foreign keys, unique
constraints, and check constraints, it prevents the insertion of invalid or inconsistent data. This
helps maintain data quality and reliability throughout the system.

Scalability: A well-designed schema allows for easy scalability as the volume and complexity
of data grow. By considering future requirements and potential expansion, a good schema
design can accommodate evolving needs without significant rework or performance
degradation. This scalability is crucial for applications and systems that need to handle
increasing data loads over time.

Maintainability and extensibility: A good schema design simplifies the maintenance and
evolution of the database system. It provides a solid foundation for making changes and
additions to the schema without causing disruptions or data inconsistencies. A well-designed
schema also allows for seamless integration with new features or modules, making the system
more extensible and adaptable to future enhancements.

Data analysis and reporting: A well-designed schema facilitates effective data analysis and
reporting. By structuring data in a way that aligns with the analytical needs of the system, a
good schema design enables efficient querying, aggregation, and summarization of data. This,
in turn, supports decision-making processes and enables the extraction of meaningful insights
from the data

In summary, a good schema design is essential for data organization, query performance, data
integrity, scalability, maintainability, and data analysis. It is a foundational element in the
design and implementation of a robust and efficient database system.

2.Problems encountered with bad schema designs

Bad schema designs can lead to several problems that can hinder the efficient management and
utilization of data in a database system. Here are some common problems encountered with
bad schema designs:

➢ Poor query performance: A bad schema design can result in slow and inefficient query
performance. This can be due to a lack of proper indexing, inappropriate data types, or
inefficient table relationships. Slow queries can negatively impact the overall system
performance and user experience.
➢ Data redundancy and inconsistency: Inadequate schema designs can lead to data
redundancy and inconsistency. Redundant data takes up unnecessary storage space and
can cause data integrity issues when updates or modifications are made. Inconsistent
data, such as conflicting values or duplicate records, can lead to inaccurate results and
unreliable information.

➢ Difficulty in data maintenance: Bad schema designs can make data maintenance
challenging and error-prone. Without proper constraints and relationships, it becomes
harder to enforce data integrity and ensure consistent updates. This can lead to data
corruption, data loss, or difficulties in updating and modifying data in a controlled and
reliable manner.

➢ Lack of scalability: A poorly designed schema may lack scalability, making it difficult
to accommodate future growth and evolving data requirements. This can result in
performance degradation and the need for extensive schema modifications when the
system needs to handle increased data volumes or changes in data structure.

➢ Limited flexibility and extensibility: Bad schema designs can restrict the flexibility
and extensibility of the database system. It may be challenging to add new features or
modify existing ones without significant schema changes. This can lead to increased
development time, complexity, and potential disruptions to the system.

➢ Data analysis and reporting challenges: Inefficient schema designs can make data
analysis and reporting difficult. Poorly organized data, lack of appropriate relationships,
or inconsistent naming conventions can hinder the extraction of meaningful insights
from the data. This can limit the effectiveness of decision-making processes and hinder
the overall value derived from the data.

➢ Increased development and maintenance costs: Bad schema designs can result in
higher development and maintenance costs. Fixing or modifying a poorly designed
schema requires significant effort and resources. It may involve rewriting queries,
restructuring tables, or migrating data, which can be time-consuming and error-prone.
In summary, bad schema designs can lead to poor query performance, data redundancy and
inconsistency, difficulties in data maintenance, limited scalability and flexibility, challenges in
data analysis and reporting, and increased development and maintenance costs. It is crucial to
invest time and effort in designing a well-thought-out schema to avoid these problems and
ensure the efficient management of data in a database system

3. FUNCTIONAL DEPENDENCY

➢ Functional dependency is a relationship that exists between two sets of attributes of a


relational table where one set of attributes can determine the value of the other set of
attributes.
➢ It typically exists between the primary key and non-key attribute within a table.

For any relation R, attribute Y is functionally dependent on attribute X(usually the Primary
key), It is denoted by X -> Y, where X is called a determinant and Y is called dependent.

Example: Assume we have an employee table with attributes: Emp_Id, Emp_Name,


Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as:

Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional Dependencies in DBMS:


1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency in DBMS
➢ In Trivial functional dependency, a dependent is always a subset of the determinant.
In other words, a functional dependency is called trivial if the attributes on the right
side are the subset of the attributes on the left side of the functional dependency.
➢ X → Y is called a trivial functional dependency if Y is the subset of X.
➢ For example, consider the Employee table below.
➢ Here, { Employee_Id, Name } → { Name } is a Trivial functional dependency, since
the dependent Name is the subset of determinant { Employee_Id, Name }.
➢ { Employee_Id } → { Employee_Id }, { Name } → { Name } and { Age } → { Age
} are also Trivial.
2. Non-Trivial Functional Dependency in DBMS
➢ It is the opposite of Trivial functional dependency. Formally speaking, in Non-Trivial
functional dependency, dependent if not a subset of the determinant.
➢ X → Y is called a Non-trivial functional dependency if Y is not a subset of X. So, a
functional dependency X → Y where X is a set of attributes and Y is also a set of the
attribute but not a subset of X, then it is called Non-trivial functional dependency.
➢ X→ Y is called completely non-trivial when X intersect Y is NULL.
➢ For example, consider the Employee table above
➢ Here, { Employee_Id } → { Name } is a non-trivial functional dependency
because Name(dependent) is not a subset of Employee_Id(determinant).
➢ Similarly, { Employee_Id, Name } → { Age } is also a non-trivial functional
dependency.
3. Multivalued Functional Dependency in DBMS
➢ In Multivalued functional dependency, attributes in the dependent set are not
dependent on each other.
➢ For example, X → { Y, Z }, if there exists is no functional dependency between Y
and Z, then it is called as Multivalued functional dependency.
➢ For example, consider the Employee table above.
➢ Here, { Employee_Id } → { Name, Age } is a Multivalued functional dependency,
since the dependent attributes Name, Age are not functionally dependent(i.e. Name →
Age or Age → Name doesn’t exist !).
4. Transitive Functional Dependency in DBMS
➢ Consider two functional dependencies A → B and B → C then according to
the transitivity axiom A → C must also exist. This is called a transitive functional
dependency.
➢ In other words, dependent is indirectly dependent on determinant in Transitive
functional dependency.
➢ For example, consider the Employee table below.

Here, { Employee_Id → Department } and { Department → Street Number } holds


true. Hence, according to the axiom of transitivity, { Employee_Id → Street Number } is a
valid functional dependency.

4.ARMSTRONG’S AXIOMS IN FUNCTIONAL DEPENDENCY / PROPERTIES OF


FUNCTIONAL DEPENDENCY IN DBMS:

William Armstrong in 1974 suggested a few rules related to functional dependency. They are
called RAT rules.

➢ Reflexivity: If A is a set of attributes and B is a subset of A, then the functional


dependency A → B holds true.
For example, { Employee_Id, Name } → Name is valid.
➢ Augmentation: If a functional dependency A → B holds true, then appending any
number of the attribute to both sides of dependency doesn't affect the dependency. It
remains true.
➢ For example, X → Y holds true then, ZX → ZY also holds true.
For example, if { Employee_Id, Name } → { Name } holds true then, { Employee_Id,
Name, Age } → { Name, Age }
➢ Transitivity: If two functional dependencies X → Y and Y → Z hold true, then X →
Z also holds true by the rule of Transitivity.
For example, if { Employee_Id } → { Name } holds true and { Name } → {
Department } holds true, then { Employee_Id } → { Department } also holds true.

Secondary Rules

– These rules can be derived from the above axioms.

1. Union– Union rule says, if X determines Y and X determines Z, then X must also
determine Y and Z. If X → Y and X → Z then X → YZ

2. Decomposition– Decomposition rule is also known as project rule. It is the reverse of


union rule.

This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
If X → YZ then X → Y and X → Z.

3. Pseudo Transitivity – In Pseudo transitive Rule, if X determines Y and YZ determines W,


then XZ determines W. If X → Y and YZ → W then XZ → W

ADVANTAGES OF FUNCTIONAL DEPENDENCY IN DBMS

Let's discuss some of the advantages of Functional dependency,

➢ It is used to maintain the quality of data in the database.


➢ It expresses the facts about the database design.
➢ It helps in clearly defining the meanings and constraints of databases.
➢ It helps to identify bad designs.
➢ Functional Dependency removes data redundancy where the same values should not
be repeated at multiple locations in the same database table.
➢ The process of Normalization starts with identifying the candidate keys in the
relation. Without functional dependency, it's impossible to find candidate keys
and normalize the database.
5. CLOSURE OF FUNCTIONAL DEPENDENCY

➢ The Closure of Functional Dependency means the complete set of all possible
attributes that can be functionally derived from given functional dependency using the
inference rules known as Armstrong’s Rules.
➢ If “F” is a functional dependency then closure of functional dependency can be
denoted using “{F}+ ”.

There are three steps to calculate closure of functional dependency. These are:

➢ Step-1 : Add the attributes which are present on Left Hand Side in the original
functional dependency.
➢ Step-2 : Now, add the attributes present on the Right Hand Side of the functional
dependency.
➢ Step-3 : With the help of attributes present on Right Hand Side, check the other
attributes that can be derived from the other given functional dependencies. Repeat this
process until all the possible attributes which can be derived are added in the closure.

Example-1 : Consider the table student_details having (Roll_No, Name,Marks, Location)


as the attributes and having two functional dependencies.

➢ FD1 : Roll_No -> Name


➢ FD2 : Name -> Marks, Location
➢ Now, We will calculate the closure of all the attributes present in the relation using the
three steps mentioned below.
➢ Step-1 : Add attributes present on the LHS of the first functional dependency to the
closure.

{Roll_no}+ = {Roll_No}

➢ Step-2 : Add attributes present on the RHS of the original functional dependency to
the closure.

{Roll_no}+ = {Roll_No, Marks}

Step-3 : Add the other possible attributes which can be derived using attributes present on the
RHS of the closure. So Roll_No attribute cannot functionally determine any attribute but Name
attribute can determine other attributes such as Marks and Location using 2nd Functional
Dependency

➢ Therefore, complete closure of Roll_No will be :


➢ {Roll_no}+ = {Roll_No, Marks, Name, Location}

Similarly, we can calculate closure for other attributes too i.e “Name”.

Step-1 : Add attributes present on the LHS of the functional dependency to the closure.

{Name}+ = {Name}

Step-2 : Add the attributes present on the RHS of the functional dependency to the closure.

{Name}+ = {Name, Marks, Location}

➢ Step-3 : Since, we don’t have any functional dependency where “Marks or Location”
attribute is functionally determining any other attribute , we cannot add more attributes
to the closure. Hence complete closure of Name would be :

{Name}+ = {Name, Marks, Location}

➢ NOTE : We don’t have any Functional dependency where marks and location can
functionally determine any attribute. Hence, for those attributes we can only add the
attributes themselves in their closures. Therefore,

➢ {Marks}+ = {Marks}
➢ {Location}+ = { Location}

Example :

In a relation R(ABCD) ,given functional dependencies {A->B , B->C , C->D} find closure
of each attribute.

{A}+ = {ABCD}

{B}+ = {BCD}

{C}+ = {CD}

{D}+ = {D}
Here attribute A have all attributes have in their closure , so it is a candidate key of relation.

Example :

In a relation R(ABCD) ,given functional dependencies {A->B , B->C , C->D, D->A} find
closure of each attribute.

{A}+ = {ABCD}

{B}+ = {BCDA}

{C}+ = {CDAB}

{D}+ = {DABC}

6. MINIMAL COVERS:
➢ A minimal cover is a simplified and reduced version of the given set of functional
dependencies.
Since it is a reduced version, it is also called as Irreducible set.
It is also called as Canonical Cover.

Characteristics :

➢ Canonical cover is free from all the extraneous functional dependencies.


➢ The closure of canonical cover is same as that of the given set of functional
dependencies.
➢ Canonical cover is not unique and may be more than one for a given set of
functional dependencies
➢ We cannot replace any dependency X->A in F with a dependency Y->A, where Y is a
proper subset of X, and still have a set of dependencies that is equivalent to F.
➢ We cannot remove any dependency from F and still have a set of dependencies that
are equivalent to F

Canonical cover is called minimal cover which is called the minimum set of FDs. A set of FD
is called canonical cover of F if each FD in

➢ Simple FD.
➢ Left reduced FD.
➢ Non-redundant FD.

Simple FD − X->Y is a simple FD if Y is a single attribute.


Left reduced FD − X->Y is a left reduced FD if there are no extraneous attributes in X.
{extraneous attributes: Let XA->Y then, A is a extraneous attribute if X->Y}

Non-redundant FD − X->Y is a Non-redundant FD if it cannot be derived from F- {X->y}

Need :

• Working with the set containing extraneous functional dependencies increases the
computation time.
• Therefore, the given set is reduced by eliminating the useless functional
dependencies.
• This reduces the computation time and working with the irreducible set becomes
easier.

Steps To Find Canonical Cover-


Step-01:
Write the given set of functional dependencies in such a way that each functional dependency
contains exactly one attribute on its right side.
Example-

The functional dependency X → YZ will be written as-


X→Y
X→Z
Step-02:

➢ Consider each functional dependency one by one from the set obtained in Step-01.
➢ Determine whether it is essential or non-essential.

To determine whether a functional dependency is essential or not, compute the closure of its
left side-
• Once by considering that the particular functional dependency is present in the set
• Once by considering that the particular functional dependency is not present in the set

Then following two cases are possible-


Case-01: Results Come Out to be Same-
• If resultscome out to be same,it means that the presence or absence of that functional
dependency does not create any difference.
• Thus, it is non-essential.
• Eliminate that functional dependency from the set.
NOTE-
• Eliminate the non-essential functional dependency from the set as soon as it is
discovered.
• Do not consider it while checking the essentiality of other functional dependencies.
Case-02: Results Come Out to be Different-
If results come out to be different,
• It means that the presence or absence of that functional dependency creates a
difference.
• Thus, it is essential.
• Do not eliminate that functional dependency from the set.
• Mark that functional dependency as essential.
Step-03:

• Consider the newly obtained set of functional dependencies after performing Step-02.
• Check if there is any functional dependency that contains more than one attribute on
its left side.
Then following two cases are possible-
Case-01: No-
• There exists no functional dependency containing more than one attribute on its
left side.
• In this case, the set obtained in Step-02 is the canonical cover.

Case-02: Yes-
• There exists at least one functional dependency containing more than one attribute on
its left side.
• In this case, consider all such functional dependencies one by one.
• Check if their left side can be reduced.

PRACTICE PROBLEM BASED ON FINDING CANONICAL COVER-
Problem-

The following functional dependencies hold true for the relational scheme R ( W , X , Y , Z )

X→W
WZ → XY
Y → WXZ
Write the irreducible equivalent for this set of functional dependencies.
Solution-
Step-01:

Write all the functional dependencies such that each contains exactly one attribute on its right
side-
X→W
WZ → X
WZ → Y
Y→W
Y→X
Y→Z
Step-02:
Check the essentiality of each functional dependency one by one.
For X → W:

• ConsideringX → W, (X)+ = { X , W }
• Ignoring X → W, (X)+ = { X }
Now,
• Clearly,
the two results are different.
• Thus, we conclude that X → W is essential and can not be eliminated.

For WZ → X:

• Considering WZ→ X, (WZ)+ = { W , X , Y , Z }


• Ignoring WZ → X, (WZ)+ = { W , X , Y , Z }
Now,
• Clearly,
the two results are same.
• Thus, we conclude that WZ → X is non-essential and can be eliminated.

Eliminating WZ → X, our set of functional dependencies reduces to-


X→W
WZ → Y
Y→W
Y→X
Y→Z
Now, we will consider this reduced set in further checks.
For WZ → Y:

• Considering WZ→ Y, (WZ)+ = { W , X , Y , Z }


• Ignoring WZ → Y, (WZ)+ = { W , Z }
Now,
• Clearly,
the two results are different.
• Thus, we conclude that WZ → Y is essential and can not be eliminated.

For Y → W:

• Considering Y → W, (Y)+ = { W , X , Y , Z }
• Ignoring Y → W, (Y)+ = { W , X , Y , Z }
Now,
• Clearly,
the two results are same.
• Thus, we conclude that Y → W is non-essential and can be eliminated.
Eliminating Y → W, our set of functional dependencies reduces to-
X→W
WZ → Y
Y→X
Y→Z
For Y → X:
• Considering Y → X, (Y)+ = { W , X , Y , Z }
• Ignoring Y → X, (Y)+ = { Y , Z }

Now,
• Clearly, the two results are different.
• Thus, we conclude that Y → X is essential and can not be eliminated.

For Y → Z:

• Considering Y → Z, (Y)+ = { W , X , Y , Z }
• Ignoring Y → Z, (Y)+ = { W , X , Y }

Now,
• Clearly,
the two results are different.
• Thus, we conclude that Y → Z is essential and can not be eliminated.
From here, our essential functional dependencies are-
X→W
WZ → Y
Y→X
Y→Z

Step-03:

• Consider the functional dependencies having more than one attribute on their left side.
• Check if their left side can be reduced.
In our set,
• Only WZ → Y contains more than one attribute on its left side.
• Considering WZ → Y, (WZ)+ = { W , X , Y , Z }
Now,
• Consider all the possible subsets of WZ.
• Check if the closure result of any subset matches to the closure result of WZ.
(W)+ = { W }
(Z)+ = { Z }
Clearly,
• None of the subsets have the same closure result same as that of the entire left side.
• Thus, we conclude that we can not write WZ → Y as W → Y or Z → Y.
• Thus, set of functional dependencies obtained in step-02 is the canonical cover.

Finally, the canonical cover is-


X→W
WZ → Y
Y→X
Y→Z
Example:1

Consider an example to find canonical cover of F.

The given functional dependencies are as follows –

A -> BC, B -> C ,A -> B ,AB -> C

Example 2:
Minimize {A->C, AC->D, E->H, E->AD}
7. NORMALIZATION

Normalization is the process of organizing the data and the attributes of a database. It is
performed to reduce the data redundancy in a database and to ensure that data is stored
logically.
➢ It helps to divide large database tables into smaller tables and make a relationship
between them. It can remove the redundant data and ease to add, manipulate or delete
table fields.

➢ Data redundancy in DBMS means having the same data but at multiple places.
➢ It is necessary to remove data redundancy because it causes anomalies in a database
which makes it very hard for a database administrator to maintain it.
➢ A normal form is a process that evaluates each relation against defined criteria and
removes the multi valued, joins, functional and trivial dependency from a relation.

THE MOTIVATION FOR NORMAL FORMS:

The motivation for normal forms in database design is to eliminate data redundancy and
anomalies, ensure data integrity, and promote efficient data management.

Normal forms provide guidelines and principles for structuring the database schema to achieve
these objectives. Here are some key motivations for normal forms

➢ Eliminate data redundancy: Redundant data occurs when the same information is
repeated across multiple records or tables. This redundancy wastes storage space and
can lead to inconsistencies when updating or modifying data. Normal forms help
identify and eliminate redundant data by organizing data into separate tables based on
their functional dependencies.
➢ Prevent update anomalies: Update anomalies occur when modifying data results in
inconsistencies or unintended changes. For example, if the same data is stored in
multiple places and not all instances are updated correctly, inconsistencies can arise.
Normal forms help prevent these anomalies by ensuring that data is stored in a way that
allows for easy and controlled updates without introducing inconsistencies.
➢ Maintain data integrity: Data integrity refers to the accuracy, validity, and consistency
of data. Normal forms help enforce data integrity by defining appropriate constraints,
such as primary keys, foreign keys, and entity relationships. These constraints ensure
that data is correctly and consistently represented, preventing the insertion of invalid or
inconsistent data.
➢ Simplify data management and maintenance: Normal forms provide guidelines for
organizing data in a logical and structured manner. By following these guidelines,
database management and maintenance tasks become more manageable and less prone
to errors. Normalized schemas are typically easier to understand, navigate, and modify,
reducing the complexity and effort required for data management activities
➢ Support efficient query processing: Normal forms can contribute to improved query
performance and efficiency. By reducing data redundancy and organizing data based on
functional dependencies, normalized schemas allow for more efficient retrieval and
manipulation of data. Well-designed indexes and relationships based on normal forms
can speed up query execution and improve overall system performance.
➢ Facilitate data integration and interoperability: Normalized schemas provide a
standardized and consistent way of representing data. This promotes data integration
and interoperability across different systems and applications. By adhering to normal
forms, databases can easily exchange and share data without conflicts or
inconsistencies, enabling seamless integration and collaboration.
➢ Adapt to evolving data requirements: Normal forms provide a foundation for a flexible
and extensible database design. By organizing data based on functional dependencies and
avoiding data anomalies, normalized schemas can adapt to changing data requirements
without significant schema modifications. This scalability and flexibility are crucial for
accommodating future data growth and evolving business needs
➢ In summary, the motivations for normal forms in database design are to eliminate data
redundancy and anomalies, ensure data integrity, simplify data management, support
efficient query processing, facilitate data integration, and adapt to evolving data
requirements. By following normal forms, database designers can create well-structured
and efficient database schemas that promote reliable and effective data management

Why Do We Need Normalization?

As we have discussed above, normalization is used to reduce data redundancy. It provides a


method to remove the following anomalies from the database and bring it to a more
consistent state:
A database anomaly is a flaw in the database that occurs because of poor planning and
redundancy.

Insertion anomalies: This occurs when we are not able to insert data into a database because
some attributes may be missing at the time of insertion.

Updation anomalies: This occurs when the same data items are repeated with the same
values and are not linked to each other.

Deletion anomalies: This occurs when deleting one part of the data deletes the other
necessary information from the database.

8.NORMAL FORMS :

The process of normalization helps us divide a larger table in the database into various smaller
tables and then link their using relationships. Normal forms are basically useful for reducing
the overall redundancy (repeating data) from the tables present in a database, so as to ensure
logical storage.

There are four types of normal forms that are usually used in relational databases as you can
see in the following figure:

FIRST NORMAL FORM (1NF)

➢ A relation is in 1NF if every attribute is a single-valued attribute or it does not contain


any multi-valued or composite attribute, i.e., every attribute is an atomic attribute.
➢ If there is a composite or multi-valued attribute, it violates the 1NF. To solve this, we
can create a new row for each of the values of the multi-valued attribute to convert the
table into the 1NF.
Let’s take an example of a relational table <CourseDetail> that contains the details of the
course.

➢ Here, the Corse content is a multi-valued attribute. So, this relation is not in 1NF.
➢ We re-arrange the relation (table) as below, to convert it to First Normal Form.

To convert this table into 1NF, we make new rows with each Course Content as a new row as
shown below

SECOND NORMAL FORM (2NF)

➢ The normalization of 1NF relations to 2NF involves the elimination of partial


dependencies.
➢ A partial dependency in DBMS exists when any non-prime attributes, i.e., an
attribute not a part of the candidate key, is not fully functionally dependent on one of
the candidate keys.

For a relational table to be in second normal form, it must satisfy the following rules:

➢ The table must be in first normal form.


➢ It must not contain any partial dependency, i.e., all non-prime attributes are fully
functionally dependent on the primary key.

If a partial dependency exists, we can divide the table to remove the partially dependent
attributes and move them to some other table where they fit in well.
Example : Student_Project relation

We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually.

But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by
Proj_ID independently. This is called partial dependency, which is not allowed in Second
Normal Form.

We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.

THIRD NORMAL FORM (3NF)

The normalization of 2NF relations to 3NF involves the elimination of transitive


dependencies in DBMS.

A functional dependency X -> Z is said to be transitive if the following three functional


dependencies hold:

• X -> Y
• Y does not -> X
• Y -> Z

For a relational table to be in third normal form, it must satisfy the following rules:

1. The table must be in the second normal form.


2. No non-prime attribute is transitively dependent on the primary key.
3. For each functional dependency X -> Z at least one of the following conditions hold:
• X is a super key of the table.
• Z is a prime attribute of the table.

If a transitive dependency exists, we can divide the table to remove the transitively dependent
attributes and place them to a new table along with a copy of the determinant.

Example :

We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is
a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there
exists transitive dependency.

To bring this relation into third normal form, we break the relation into two relations as
follows –

The 2NF and 3NF impose some extra conditions on dependencies on candidate keys and
remove redundancy caused by that. However, there may still exist some dependencies that
cause redundancy in the database. These redundancies are removed by a more strict normal
form known as BCNF.

BOYCE-CODD NORMAL FORM (BCNF)

➢ Boyce-Codd Normal Form(BCNF) is an advanced version of 3NF as it contains


additional constraints compared to 3NF.

For a relational table to be in Boyce-Codd normal form, it must satisfy the following rules:

1. The table must be in the third normal form.


2. For every non-trivial functional dependency X -> Y, X is the superkey of the table.
That means X cannot be a non-prime attribute if Y is a prime attribute.

Let us take an example of the following <EmployeeProjectLead> table to understand how to


normalize the table to the BCNF:

The above table satisfies all the normal forms till 3NF, but it violates the rules of BCNF because
the candidate key of the above table is {Employee Code, Project ID}. For the non-trivial
functional dependency, Project Leader -> Project ID, Project ID is a prime attribute but Project
Leader is a non-prime attribute. This is not allowed in BCNF.

To convert the given table into BCNF, we decompose it into three tables:

Thus, we’ve converted the <EmployeeProjectLead> table into BCNF by decomposing it into
<EmployeeProject> and <ProjectLead> tables.
9.DECOMPOSITIONS AND DESIRABLE PROPERTIES

➢ A relation in BCNF is free of redundancy and a relation schema in 3NF comes close. If
a relation schema is not in one of these normal forms, the FDs that cause a violation
can give us insight into the potential problems.
➢ A decomposition of a relation schema R consists of replacing the relation schema by
two (or more) relation schemas that each contain a subset of the attributes of R and
together include all attributes in R.
➢ When a relation in the relational model is not appropriate normal form then the
decomposition of a relation is required. In a database, breaking down the table into
multiple tables termed as decomposition.

The properties of a relational decomposition are listed below :

Attribute Preservation:

➢ Using functional dependencies the algorithms decompose the universal relation schema
R in a set of relation schemas D = { R1, R2, ….. Rn } relational database schema, where
‘D’ is called the Decomposition of R.
➢ The attributes in R will appear in at least one relation schema Ri in the decomposition,
i.e., no attribute is lost. This is called the Attribute Preservation condition of
decomposition.

Dependency Preservation:

➢ If each functional dependency X->Y specified in F appears directly in one of the


relation schemas Ri in the decomposition D or could be inferred from the dependencies
that appear in some Ri.
➢ This is the Dependency Preservation. If a relation R is decomposed into relation R1 and
R2, then the dependencies of R either must be a part of R1 or R2 or must be derivable
from the combination of functional dependencies of R1 and R2.

For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).

3Lossless Join:
➢ Lossless join property is a feature of decomposition supported by normalization. It is
the ability to ensure that any instance of the original relation can be identified from
corresponding instances in the smaller relations.

For example: R : relation, F : set of functional dependencies on R, X, Y : decomposition of R,


A decomposition {R1, R2, …, Rn} of a relation R is called a lossless decomposition for R if
the natural join of R1, R2, …, Rn produces exactly the relation R.

➢ The relation is said to be lossless decomposition if natural joins of all the decomposition
give the original relation.

Decomposition is lossless if:

➢ X intersection Y -> X, that is: all attributes common to both X and Y functionally
determine ALL the attributes in X.
➢ X intersection Y -> Y, that is: all attributes common to both X and Y functionally
determine ALL the attributes in Y
➢ If X intersection Y forms a super key of either X or Y, the decomposition of R is a
lossless decomposition.

Lack of Data Redundancy

➢ Lack of Data Redundancy is also known as a Repetition of Information.


➢ The proper decomposition should not suffer from any data redundancy.
➢ The careless decomposition may cause a problem with the data.
➢ The lack of data redundancy property may be achieved by Normalization process.
Important questions

1. Define Functional dependencies. What are the Armstrong's axioms for FD's?
2. What is Normalization? Discuss what are the types? Discuss the 1NF, 2NF, 3NF with
example?
3. Define BCNF? How does BCNF differ from 3NF? Explain with an example.
4. Explain the importance of good schema design and the problems encountered with
bad schema designs.
5. What is Normalization? Discuss what are the types? Discuss the 1NF, 2NF, 3NF with
example?
6. Define Functional dependencies? How are primary keys related to functional
dependencies?
7. Explain decompositions and desirable properties.
Chapter 5

TRANSACTION MANAGEMENT

Syllabus:

1. Transaction concept
2. Transaction state
3. System log, commit point
4. Desirable properties of transaction
5. Concurrent executions
6. Serializability
7. Recoverability
8. Implementation of isolation
9. Transaction definition in SQL
10. Testing for serializability
11. Serializability by locks-locking systems with several lock modes
12. Concurrency control by timestamps and validation.

1. TRANSACTION CONCEPT

What is Transaction in DBMS?

➢ Transactions are a set of operations that are used to perform some logical set of work.
A transaction is made to change data in a database which can be done by inserting new
data, updating the existing data, or by deleting the data that is no longer required.
➢ There are certain types of transaction states which tell the user about the current
condition of that database transaction and what further steps to be followed for the
processing.

Operations in Transaction

A certain set of operations takes place when a transaction is done that is used to perform
some logical set of operations. For example: When we go to withdraw money from ATM, we
encounter the following set of operations:

1. Transaction Initiated
2. You have to insert an ATM card
3. Select your choice of language
4. Select whether savings or current account
5. Enter the amount to withdraw
6. Entering your ATM pin
7. Transaction processes
8. You collect the cash
9. You press finish to end transaction

The above mentioned are the set of operations done by you. But in the case of a transaction in
DBMS there are three major operations that are used for a transaction to get executed in an
efficient manner. These are:

1. Read/ Access Data 2. Write/ Change Data 3. Commit

Let's understand the above three sets of operations in a transaction with a real-life example of
transferring money from Account1 to Account2.

Initial balance in both the banks before the start of the transaction

➢ Account1 - ₹ 5000 Account2 - ₹ 2000

➢ This data before the start of the transaction is stored in the secondary memory (Hard
disk) which once initiated is bought to the primary memory (RAM) of the system for faster
and better access.

➢ Now for a transfer of ₹ 500 from Account1 to Account2 to occur, the following set of
operations will take place.

➢ Read (Account1) --> 5000 Account1 = Account1 - 500 Write (Account1) --> 4500
Read (Account2) --> 2000 Account2 = Account2 + 500 Write (Account2) --> 2500 commit

➢ The COMMIT statement permanently saves the changes made by the current
transaction. When a transaction is successful, COMMIT is applied. If the system fails
before a COMMIT is applied, the transaction reaches its previous state after
ROLLBACK.

➢ After commit operation the transaction ends and updated values of Account1 = ₹ 4500
and Account2 = ₹ 2500. Every single operation that occurs before the commit is said to
be in a partially committed state and is stored in the primary memory (RAM). After the
transaction is committed, the updated data is accepted and updated in the secondary memory
(Hard Disk).

➢ If in some case, the transaction failed anywhere before committing, then that
transaction gets aborted and have to start from the beginning as it can’t be continued from
the previous state of failure. This is known as Roll Back.

2. TRANSACTION STATES IN DBMS

During the lifetime of a transaction, there are a lot of states to go through. These states update
the operating system about the current state of the transaction and also tell the user about how
to plan further processing of the transaction. These states decide the regulations which decide
the fate of a transaction whether it will commit or abort.

➢ The ROLLBACK statement undo the changes made by the current transaction. A
transaction cannot undo changes after COMMIT execution.

Following are the different types of transaction States:

➢ Active State: When the operations of a transaction are running then the transaction is
said to be active state. If all the read and write operations are performed without any error then
it progresses to the partially committed state, if somehow any operation fails, then it goes to a
state known as failed state.

➢ Partially Committed: After all the read and write operations are completed, the
changes which were previously made in the main memory are now made permanent in the
database, after which the state will progress to committed state but in case of a failure it will
go to the failed state.

➢ Failed State: If any operation during the transaction fails due to some software or
hardware issues, then it goes to the failed state . The occurrence of a failure during a transaction
makes a permanent change to data in the database. The changes made into the local memory
data are rolled back to the previous consistent state.

➢ Aborted State: If the transaction fails during its execution, it goes from failed
state to aborted state and because in the previous states all the changes were only made in the
main memory, these uncommitted changes are either deleted or rolled back. The transaction at
this point can restart and start afresh from the active state.

➢ Committed State: If the transaction completes all sets of operations successfully, all
the changes made during the partially committed state are permanently stored and the
transaction is stated to be completed, thus the transaction can progress to finally get terminated
in the terminated state.

➢ Terminated State: If the transaction gets aborted after roll-back or the transaction
comes from the committed state, then the database comes to a consistent state and is ready for
further new transactions since the previous transaction is now terminated.

3. PROPERTIES OF TRANSACTION IN DBMS /ACID PROPERTIES

There are four major properties that are vital for a transaction to be successful. These are used
to maintain state consistency in the database, both before and after the transaction. These are
called ACID properties.

1. Atomicity: This property means that either the transaction takes place completely at
once or doesn’t happen at all. There is no middle option, i.e., transactions do not occur
partially. Each transaction is considered as one single step which either runs
completely or is not executed at all.

Example : Transactions taking place in a bank (Credit or Debit of Money)

➢ If Marino has an account namely A with $50 in it and wants to


send $20 to Amanda who has an account namely B. An amount of $200 is already
existing in account B. When $20 is deposited to account B, the total becomes $220.
➢ Two procedures are now scheduled to take place. One is that
the $20 that Marino wishes to send will be deducted from his account A and would be
credited to account B, i.e., into Amanda's account. What happens now is that the
initial debit operation succeeds, but the crediting operation fails.
➢ As a result, the value in Marino's account A becomes 30$, while the value
in Amanda's account remains $200 as it was earlier.

2. Consistency: This property means that the integrity constraints of a database are
maintained so that the database is consistent before and after the transaction. It refers
to the correctness of a database.
Example: Let us consider an example where one person is trying to book a ticket. They
are able to reserve their seat but their payment hasn’t gone through due to bank issues.
In this case, their transaction is rolled back. But just doing that isn’t sufficient. The
number of available seats must also be updated. Otherwise, if it isn’t updated, there will
be an inconsistency where the seat given up by the person is not accounted for. Hence,
the total sum of seats left in the train + the sum of seats booked by users would not be
equal to the total number of seats present in the train if not for consistency.
3. Isolation: This property means that multiple transactions can occur concurrently
without causing any inconsistency to the database state. These transactions occur
independently without any external interference. Changes that occur in a particular
transaction are not visible/ accessible to any other transaction until that particular
change in that transaction has been committed.
Example : Let us use our example of 2 people trying to book the same seat to
understand this ACID Property. Two transactions are happening at the same item on the
same database, but in isolation. To ensure that one transaction doesn’t affect the other,
they are serialized by the system. This is done so as to maintain the data in a consistent
state. Let us consider that the two people that click ‘Book Now’, do so with a gap of a
few seconds. In that case, the first person’s transaction goes through and he/she receives
their ticket as well. The second person will not know of the same until the first person’s
transaction is committed to memory. When the second person clicks on ‘Book Now’
and is redirected to the payment gateway, since the first person’s seat has already been
booked, it will show an error notifying the user that no seats are left on the train.
4. Durability: This property ensures that once the transaction has completed execution,
the updates and modifications to the database are stored in and written to disk and
they remain intact even if a system failure occurs. These updates become permanent
and are stored in the non-volatile memory.

Imagine a system failure or crash occurs in the railway management system and all the trains
that everyone had booked have been removed from the system. That means millions of users
would have paid the money for their seats but will not be able to board the train as all the details
have been destroyed. This could lead to huge losses for the company as users would lose trust
in them. In addition, it would create a lot of panics as these trains would be needed for important
reasons as well.

5. SCHEDULING

What is Scheduling, and why is it required?

➢ Transactions are a set of instructions that perform operations on databases. When


multiple transactions are running concurrently, then a sequence is needed in which the
operations are to be performed because at a time, only one operation can be performed
on the database. This sequence of operations is known as Schedule, and this process is
known as Scheduling.
➢ When multiple transactions execute simultaneously in an unmanageable manner, then
it might lead to several problems, which are known as concurrency problems. In order
to overcome these problems, scheduling is required.

Types of Schedules

There are mainly two types of scheduling -

1. Serial Schedule
2. Concurrent (Non-serial Schedule)
Further, they are divided into their subcategories, as shown below.
Serial Schedule:

As the name says, all the transactions are executed serially one after the other. In serial
Schedule, a transaction does not start execution until the currently running transaction finishes
execution. This type of execution of the transaction is also known as non-interleaved execution.
Serial Schedule are always recoverable, cascades, strict and consistent. A serial schedule
always gives the correct result.
Consider two transactions T1 and T2 shown above, which perform some operations. If it has
no interleaving of operations, then there are the following two possible outcomes - Either
execute all T1 operations, which were followed by all T2 operations. Or execute
all T2 operations, which were followed by all T1 operations. In the above figure, the Schedule
shows the serial Schedule where T1 is followed by T2, i.e. T1 -> T2. Where R(A) -> reading
some data item ‘A’. And, W(B) -> writing/updating some data item ‘B’. If n = number of
transactions, then a number of serial schedules possible = n!

Therefore, for the above 2 transactions, a total number of serial schedules possible = 2.

Concurrent (Non-serial Schedule)

In a non-serial Schedule, multiple transactions execute concurrently/simultaneously, unlike the


serial Schedule, where one transaction must wait for another to complete all its operations. In
the Non-Serial Schedule, the other transaction proceeds without the completion of the previous
transaction. All the transaction operations are interleaved or mixed with each other.
Non-serial schedules are NOT always recoverable, cascades, strict and consistent.

In this Schedule, there are two transactions, T1 and T2, executing concurrently. The
operations of T1 and T2 are interleaved. So, this Schedule is an example of a Non-Serial
Schedule.

Total number of non-serial schedules = Total number of schedules – Total number of serial
schedules

Non-serial schedules are further categorized into serializable and non-serializable schedules.
Let's now discuss further Serializability.
6. SERIALIZABILITY IN DBMS

Serializability in DBMS is a concept that helps to identify which non-serial schedules are
correct and will maintain the consistency of the database. A serializable schedule always
leaves the database in a consistent state.

A serial schedule is always a serializable schedule because, in a serial Schedule, a


transaction only starts when the other transaction has finished execution. A non-
serial schedule of n transactions is said to be a serializable schedule, if it is equivalent to the
serial Schedule of those n transactions. A serial schedule does not allow concurrency; only
one transaction executes at a time, and the other starts when the already running transaction is
finished.

Difference between Serial Schedule and Serializable Schedule

Types of Serializability:

1. Conflict Serializability
2. View Serializability
Conflict Serializability

A schedule is called conflict serializable if it can be transformed into a serial schedule


by swapping non-conflicting operations. An operations pair become conflicting if all
conditions satisfy:

1. Both belong to separate transactions.


2. They have the same data item.
3. They contain at least one write operation.

To conclude, let’s take two operations on data: "a". The conflicting pairs are:
1. READ(a) - WRITE(a)
2. WRITE(a) - WRITE(a)
3. WRITE(a) - READ(a)

Note: There can never be a read-read conflict as there is no change in the data

In this schedule, Write(A)/Read(A) and Write(B)/Read(B) are called as conflicting operations.


This is because all the above conditions hold true for them. Thus, by swapping(non-conflicting)
2nd pair of the read/write operation of 'A' data item and 1st pair of the read/write operation of
'B' data item, this non-serial Schedule can be converted into a serializable Schedule.
Therefore, it is conflict serializable.
View Serializability

A schedule is viewed serializable if it is view equivalent to a serial schedule. If a schedule is


conflict serializable, then it will be view serializable. The view serializable which does not
conflict with serializable, contains blind writes.

Actual process for checking view serializability

1. First, check for conflict serializability.


2. Check for a blind write. If there is a blind write, then the schedule can be view
serializable. So, check its view serializability using the view equivalent schedule
technique (stated above). If there is no blind write, then the schedule can never be
view serializable.

Blind write is writing a value or piece of data without reading it.

Example:

We have a schedule "S" having two transactions t1, t2, and t3 working simultaneously.

It's precedence graph:

Since there is a loop present, the schedule is non-conflicting serializable schedule. Now, there
are blind writes [t2 -> w(x) and t3 -> w(x)] present, hence check for View Serializability.

One of its view equivalent schedules can be:


Hence, the schedule is view serializable.

Non-Serializability in DBMS

A non-serial schedule that is not serializable is called a non-serializable schedule. Non-


serializable schedules may/may not be consistent or recoverable. Non-serializable Schedule is
divided into types:

1. Recoverable Schedule
2. Non-recoverable Schedule

Recoverable Schedule

A schedule is recoverable if each transaction commits only after all the transactions from
which it has read have committed. In other words, if some transaction Ty reads a value that
has been updated/written by some other transaction Tx, then the commit of Ty must occur
after the commit of Tx.

The schedule shown above is Recoverable since T1 commits before T2, which makes the
value read by T2 correct.

Recoverable schedules are further categorized into 3 types:


1. Cascading Schedule
2. Cascadeless Schedule
3. Strict Schedul

Cascading Schedule

If in a schedule, several other dependent transactions are forced to rollback/abort because of


the failure of one transaction, then such a schedule is called a Cascading
Schedule or Cascading Rollback or Cascading Abort. It simply leads to the wastage of CPU
time.

Here, Transaction T2 depends on transaction T1, and transaction T3 depends on


transaction T2. Thus, in this Schedule, the failure of transaction T1 will cause transaction T2 to
roll back, and a similar case for transaction T3. Therefore, it is a cascading schedule. If the
transactions T2 and T3 had been committed before the failure of transaction T1, then the
Schedule would have been irrecoverable.

Cascadeless Schedule

If in a schedule, a transaction is not allowed to read a data item until and unless the last
transaction that has been written is committed/aborted, then such a schedule is called
a Cascadeless Schedule. It avoids cascading rollback and thus saves CPU time. To prevent
cascading rollbacks, it disallows a transaction from reading uncommitted changes from another
transaction in the same Schedule. In other words, if some transaction Ty wants to read a value
that has been updated or written by some other transaction Tx, then only after the commit of Tx,
the commit of Ty must read it. Look at the example shown below.
Here, the updated value of X is read by transaction T2 only after the commit of transaction T1.
Hence, the Schedule is Cascadeless schedule.

Strict Schedule

If in a schedule, until the last transaction that has written it is committed or aborted,
a transaction is neither allowed to read nor write data item, then such a schedule is called
as Strict Schedule. Let's say we have two transactions Ta and Tb. The write operation of
transaction Ta precedes the read or write operation of transaction Tb, so
the commit or abort operation of transaction Ta should also precede the read or write of Tb. A
strict Schedule allows only committed read and write operations. This Schedule implements
more restrictions than cascadeless schedule. Consider an example shown below.

Here, transaction Tb reads/writes the written value of transaction Ta only after the transaction
Ta commits. Hence, the Schedule is a strict Schedule.
Non-Recoverable Schedule

If a transaction reads the value of an operation from an uncommitted transaction and commits
before the transaction from where it has read the value, then such a schedule is called Non-
Recoverable schedule. A non-recoverable schedule means when there is a system failure, we
may not be able to recover to a consistent database state. If the commit operation of Ti doesn't
occur before the commit operation of Tj, it is non-recoverable.

Consider the following Schedule involving two transactions T1 and T2. T2 read the value of A
written by T1, and committed. T1 might later abort/commit; therefore, the value read by T2 is
wrong, but since T2 committed, this Schedule is non-recoverable.

7. TRANSACTION DEFINITION IN SQL

In SQL (Structured Query Language), a transaction refers to a sequence of one or more


database operations that are executed as a single unit of work. A transaction ensures that
either all of the operations within it are completed successfully, or none of them are applied
to the database. The concept of a transaction is essential to maintain data consistency and
integrity.

A transaction typically consists of one or more of the following operations:

1. Insert: Adds new data records to a table.


2. Update: Modifies existing data records in a table.
3. Delete: Removes data records from a table.
4. Select: Retrieves data records from a table.

Transactions are defined using SQL statements known as transaction control statements. The
most common transaction control statements are:
1. BEGIN TRANSACTION: Starts a new transaction.
2. COMMIT: Saves the changes made within a transaction to the database permanently.
Once committed, the changes become visible to other transactions.
3. ROLLBACK: Undoes all the changes made within a transaction, reverting the
database to its state before the transaction began.
4. SAVEPOINT: Defines a point within a transaction to which you can roll back later if
needed.

For example, let's consider a scenario where you want to transfer funds from one bank account
to another. You would want to ensure that the withdrawal from the source account and the
deposit into the destination account happen as a single atomic operation. In this case, you would
wrap the withdrawal and deposit operations within a transaction using BEGIN
TRANSACTION, COMMIT, and ROLLBACK statements. If any part of the transaction fails
(e.g., insufficient funds), you can roll back the transaction and ensure that neither the
withdrawal nor the deposit is applied to the database.

By using transactions, you can maintain data consistency, handle concurrent access, and
provide reliability and durability in database operations.

8. IMPLEMENTATION OF ISOLATION

Isolation in transaction refers to the degree to which the changes made by a transaction are
visible to other concurrent transactions. In SQL, isolation is implemented through isolation
levels, which define the behavior of transactions with respect to concurrent access and data
consistency. There are different isolation levels available, and the choice of isolation level
depends on the specific requirements of the application.

The commonly supported isolation levels in SQL are:

1. Read Uncommitted: This is the lowest isolation level where transactions can see
uncommitted changes made by other transactions. It allows dirty reads, meaning a
transaction can read data that has been modified by another uncommitted transaction.
This level provides the highest concurrency but sacrifices consistency.
2. Read Committed: In this isolation level, a transaction can only see committed changes
made by other transactions. It prevents dirty reads by waiting for a transaction to
commit before accessing its changes. However, it still allows non-repeatable reads,
meaning the same query executed multiple times within a transaction may return
different results if other transactions modify the data.
3. Repeatable Read: This isolation level ensures that within a transaction, the same query
executed multiple times will always return the same result. It prevents dirty reads and
non-repeatable reads by acquiring shared locks on the accessed data. Other transactions
can still read the data but cannot modify it until the first transaction completes.
4. Serializable: This is the highest isolation level that provides strict consistency. It
guarantees that transactions are executed as if they were serialized one after another,
even though they may run concurrently. Serializable isolation level prevents dirty reads,
non-repeatable reads, and phantom reads. It achieves this by acquiring range locks or
table locks to restrict access to the data being read or modified.

Isolation level Dirty read Non repeatable read phantom


Read uncommitted ✓ ✓ ✓
Read committed x ✓ ✓
Repeatable read x x ✓
serializable x x x

The isolation level for a transaction can be set using SQL statements specific to the database
system. For example, in many database systems, you can use the SET TRANSACTION
ISOLATION LEVEL statement to set the isolation level for a transaction.

It's important to note that higher isolation levels generally come with increased overhead and
potential for blocking or deadlocks, as they impose stricter restrictions on concurrent access.
Therefore, it's crucial to choose an appropriate isolation level based on the requirements of the
application and the potential trade-offs in terms of performance and concurrency.

9. TESTING FOR SERIALIZABILITY

Precedence Graph Method:

To test the serializability of a schedule, we can use Serialization Graph or Precedence


Graph. A serialization Graph is nothing but a Directed Graph of the entire transactions
of a schedule.
It can be defined as a Graph G(V, E) consisting of a set of directed-edges E = {E1, E2,
E3, ..., En} and a set of vertices V = {V1, V2, V3, ...,Vn}. The set of edges contains
one of the two operations - READ, WRITE performed by a certain transaction.
Ti -> Tj, means Transaction-Ti is either performing read or write before the transaction-
Tj.
If there is a cycle present in the serialized graph then the schedule is non-serializable
because the cycle resembles that one transaction is dependent on the other transaction and
vice versa. It also means that there are one or more conflicting pairs of operations in the
transactions. On the other hand, no-cycle means that the non-serial schedule is serializable.

What is a conflicting pair in transactions?

Two operations inside a schedule are called conflicting if they meet these three conditions:

1. They belong to two different transactions.


2. They are working on the same data piece.
3. One of them is performing the WRITE operation.

To conclude, let’s take two operations on data: "a". The conflicting pairs are:

1. READ(a) - WRITE(a)
2. WRITE(a) - WRITE(a)
3. WRITE(a) - READ(a)

Let's take an example of schedule "S" having three transactions t1, t2, and t3 working
simultaneously, to get a better understanding.
Non-serializable schedule. R(x) of T1 conflicts with W(x) of T3, so there is a directed
edge from T1 to T3. R(y) of T1 conflicts with W(y) of T2, so there is a directed edge from
T1 to T2. W(y\x) of T3 conflicts with W(x) of T1, so there is a directed edge from T3 to T.
Similarly, we will make edges for every conflicting pair. Now, as the cycle is formed, the
transactions cannot be serializable.

It's important to note that serializability testing is typically performed by database management
systems internally, and users rarely need to perform these tests manually. The purpose of these
tests is to ensure that the concurrent execution of transactions does not introduce any data
integrity issues or inconsistencies.

10. SERIALIZABILITY BY LOCKS

Serializability by locks is a technique used to ensure the serializability of concurrent


transactions by employing lock-based concurrency control mechanisms. Locks are used to
control access to data items, preventing conflicts between transactions and maintaining the
desired serializability properties.

There are two commonly used types of locks in lock-based concurrency control:

Shared Lock (S-lock): Data can only be read when a shared lock is applied. Data cannot be
written. It is denoted as lock-S.

Exclusive Lock (X-lock): Data can be read as well as written when an exclusive lock is
applied. It is denoted as lock-X.

Compatability table

Request

S X

S YES NO

X NO NO
grant
Problems in S/X locking protocol:

1. May not sufficient to produce only serializable schedule


2. May not free from irrecoverability
3. May not free from dead lock (Infinite time waiting)
4. May not free from starvation (finite time waiting )

2 phase locking (2PL) protocol in transaction concurrency control :

The two phases of Locking are :

Growing Phase: In the growing phase, the transaction only obtains the lock. The transaction
can not release the lock in the growing phase. Only when the data changes are committed the
transaction starts the Shrinking phase.

Shrinking Phase: Neither are locks obtained nor they are released in this phase. When all the
data changes are stored, only then the locks are released.

In the above figure, we can see that in the growing phase, all the locks are obtained till the point
when all the locks needed by the transactions are acquired. This pint is called LockPoint. After
the lock point, the transaction enters the Shrinking phase.

Advantage: always ensures serializability

Disadvantage :

1. May not free from irrecoverability


2. May not free from dead lock (Infinite time waiting)
3. May not free from starvation (finite time waiting )
4. May not free from cascading rollback.

Two phase locking types :

1. Strict two-phase locking protocol: it should satisfy the basic 2PL and all exclusive
locks should hold until commit/abort.
• The transaction can release the shared lock after the lock point.
• The transaction can not release any exclusive lock until the transaction
commits
• In strict two-phase locking protocol, if one transaction rollback then the other
transaction should also have to roll back. The transactions are dependent on
each other. This is called Cascading schedule.

Advantages :

1. free from cascading rollback.


2. Strict recoverable.

2. Rigorous two-phase locking protocol: it should satisfy the basic 2PL and all shared,
exclusive locks should hold until commit/abort.

• The transaction cannot release either of the locks, i.e., neither shared lock nor
exclusive lock.
• Serailizability is guaranteed in a Rigorous two-phase locking protocol
• Deadlock is not guaranteed in rigorous two-phase locking protocol
3. Conservative two-phase locking protocol
• The transaction must lock all the data items it requires in the transaction
before the transaction begins.
• If any of the data items are not available for locking before execution of the
lock, then no data items are locked
• The read-and-write data items need to know before the transaction begins.
This is not possible normally
• Conservative two-phase locking protocol is deadlock-free
• Conservative two-phase locking protocol does not ensure a strict schedule.
11. CONCURRENCY CONTROL BY TIMESTAMPS AND VALIDATION:

Concurrency control by timestamps and validation is a technique used to ensure serializability


and maintain data consistency in concurrent database transactions. It combines the use of
timestamps for ordering transactions and validation checks to detect and resolve conflicts
between transactions.

Here's how concurrency control by timestamps and validation typically works:

1. Timestamp Ordering:

➢ Each transaction is assigned a unique timestamp when it begins execution. The


timestamp can be based on the system clock or other mechanisms that ensure
uniqueness and ordering.
➢ It is represented as TS(Ti)

Read time stamp (RTS) =last (latest) transaction which performed read successfully.

Eg: RTS(A) =30

Write time stamping (WTS) = last (latest)transaction which performed write successfully.

Eg : WTS(20)

➢ The timestamps are used to determine the relative order of transactions. A transaction
with a lower timestamp is considered to have occurred earlier than a transaction with
a higher timestamp.

2. Read and Write Operations:

➢ When a transaction wants to read a data item, it records the timestamp of the
transaction from which it reads the item.
➢ When a transaction wants to write a data item, it checks if any other transaction with a
higher timestamp has already read or written that item. If so, it aborts to maintain
consistency.

Rules:

1. Transaction 𝑇𝑖 issues a Read (A) operation.

a) if 𝑊𝑇𝑆⁡(𝐴) > TS⁡(i), Rollback 𝑇i /reject


b) Otherwise Execute operation
set RTS⁡(𝐴) = Max⁡{RTS⁡(𝐴), TS⁡(𝑇𝑖 )}

2. Transaction 𝑇𝑖 issues write (A) operation


(a) if RTS⁡(𝐴) > TS⁡(Ti ) then Rollback 𝑇𝑖 /reject
(b) if WTS(𝐴) > TS(Ti )then Rellback 𝑇1 /reject
(C) otherwise execute write (A) operation.

set WTS⁡(𝐴) = Ts⁡(𝑇i )

Validation:

➢ After a transaction completes its execution, a validation phase is performed.


➢ During validation, the system checks if any conflicts occurred between transactions
based on their timestamps. Conflicts can arise due to concurrent operations on the
same data items.
➢ If conflicts are detected, the system may choose to abort one or more of the
conflicting transactions and restart them with a new timestamp. This process ensures
that the final execution order maintains serializability.

The key idea behind concurrency control by timestamps and validation is to allow
transactions to proceed concurrently, but ensure that their execution order is consistent with
a serial execution order. By using timestamps, the system can identify conflicting
transactions and resolve conflicts by aborting and restarting some of them.

➢ This approach provides good concurrency as transactions can execute in parallel as long
as there are no conflicts. However, it may lead to transaction restarts and aborts, which
can impact performance in cases of high contention or frequent conflicts.
➢ It's worth noting that concurrency control mechanisms based on timestamps and
validation are just one of the many techniques used in database systems. Other
approaches, such as locking protocols, snapshot isolation, and multiversion
concurrency control, offer different trade-offs in terms of concurrency, isolation, and
performance. The choice of concurrency control mechanism depends on the specific
requirements and characteristics of the application.
IMPORTANT QUESTIONS

1. Explaine ACID properties of transaction


2. Explain about the transaction states concept and its desirable properties.
3. Explain how the data will be recovered by concurrent transcations.
4. Explain transaction definition in SQL with example.
5. What is serializability? Explain Concurrency Control by Timestamps
6. What are the transaction isolation levels in SQL?
7. Explain serializability and lock modes

You might also like