KEMBAR78
DBMS Notes | PDF | Relational Database | Databases
0% found this document useful (0 votes)
33 views47 pages

DBMS Notes

The document provides an overview of database management systems, highlighting the advantages of relational databases over traditional file processing systems, including issues like data redundancy, integrity, and security. It discusses various data models, including the relational model, and the three-schema architecture that separates user applications from physical databases. Additionally, it outlines the roles of different database users and the components of a relational database management system (RDBMS).
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views47 pages

DBMS Notes

The document provides an overview of database management systems, highlighting the advantages of relational databases over traditional file processing systems, including issues like data redundancy, integrity, and security. It discusses various data models, including the relational model, and the three-schema architecture that separates user applications from physical databases. Additionally, it outlines the roles of different database users and the components of a relational database management system (RDBMS).
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

CS8492 fDatabase ManagementSystems Department of CSE N IT 2018 -2019 CS8492 fDatabase ManagementSystems Department of CSE N IT 2018 -2019

UNITI RELATIONALDATABASES 2. Difficulty in accessingdata


Purpose of Database System — Views of data — Data Models — Database System Architecture — Introduction to relational databases — File processing environments do not allow needed data to be retrieved in a convenient and efficient manner.
Relational Model — Keys — Relational Algebra — SQL fundamentals — Advanced SQL features — Embedded SQL— Dynamic SQL 3. Dataisolation
INTRODUCTION DATABASE Because data are scattered in various files, and files may be in different formats, writing new application programs to retrieve
Database is collection of data which is related by some aspect. Data is collection of facts and figures which can be the appropriate data is difficult.
processedtoproduceinformation.Mostlydatarepresentsrecordablefacts.Dataaidsinproducinginformationwhich is based on facts. A 4. Integrityproblems
database management system stores data, in such a way which is easier to retrieve, manipulate and helps to produceinformation. The data values stored in the database must satisfy certain types of consistency constraints. Example:
So a database is a collection of related data that we can use for The balance of certain types of bank accounts may never fall below a prescribed amount . Developers enforce these constraints
• Defining - specifying types ofdata in the system by addition appropriate code in the various application programs
5. Atomicityproblems
• Constructing - storing &populating
Atomic means the transaction must happen in its entirety or not at all. It is difficult to ensure atomicity in a conventional file
o Manipulating - querying, updating,reporting
processing system.
Example:
DISADVANTAGES OF FILE SYSTEM OVER DB
Consider a program to transfer $50 from account A to account B. If a system failure occurs during the execution of the program, it
In the early days, File-Processing system is used to store records. It uses various files for storing the records. Drawbacks of
is possible that the $50 was removed from account A but was not credited to account B, resulting in an inconsistent database state.
using file systems to store data:
6. Concurrent accessanomalies
• Data redundancy andinconsistency
For the sake of overall performance of the system and faster response, many systems allow multiple users to update the data
-Multiple file formats, duplication of information in different files
simultaneously. In such an environment, interaction of concurrent updates is possible and may result in inconsistent data. To guard
• Difficulty in accessingdata against this possibility, the system must maintain some form of supervision. But supervision is difficult to
Need to write a new program to carry out each new task provide because data may be accessed by many different application programs that have not been coordinated previously.
• Data isolation — multiple files andformats Example: When several reservation clerks try to assign a seat on an airline flight, the system should ensure that each seat can
• Integrityproblems be accessed by only one clerk at a time for assignment toa passenger.
- Hard to add new constraints or changeexisting ones
• Atomicityproblem 7. Securityproblems
-Failures may leave database in an inconsistent state with partial updates carried Enforcing security constraints to the file processing system is difficult.
Out. E,g. transfer of funds from one account to another should either complete or not happen at all APPLICATION OF DATABASE
• Concurrent accessanomalies Database Applications
- Concurrent accessed needed forperformance • Banking: alltransactions
• Securityproblems • Airlines: reservations,schedules
Database systems offer solutions to all the above problems • Universities: registration,grades
PURPOSE OF DATABASE SYSTEM o Sales: customers, products,purchases
The typical file processing system is supported by a conventional operating system. The system stores permanent records in various • Manufacturing: production, inventory, orders, supplychain
files, and it needs different application programs to extract records from, and add records to, the appropriate files. A file processing » Human resources: employee records, salaries, taxdeductions
system has a number of major disadvantages.
• Telecommunication: Call History,Billing
• Credit card transactions: Purchase details,Statements
• Data redundancy andinconsistency
VIEWS OF DATA
• Difficulty in accessingdata
lt refers that how database is actually stored in database, what data and structure of data used by database for data. So describe all
o Data isolation — multiple files andformats this database provides user with views and theseare
» Integrityproblems • Dataabstraction
» Atornicity ofupdates » Instances andschemas
» Concurrent access by multipleusers
• Securityproblems Asadataindatabasearestoredwithverycomplexdatastructuresowhen usercomeandwanttoaccessanydata, he will not be able to access
data if he has go through this data structure. So to simplify the interaction of user and database, DBMS hides some information
1. Data redundancy andinconsistency: which is not of user interest, a this is called data abstraction:- So developer hides complexity from user and store abstract
ln file processing, every user group maintains its own files for handling its data processingapplications. viewof data.
Example: Data abstraction has three level of abstractions
Consider the UNIVERSITY database. Here, two groups of users might be the course registration personnel and the accounting office. The • level / internallevel
accounting office also keeps data on registration and • Logical level / conceptuallevel
related billing information, whereas the registration office keeps track of student courses and grades.Storing the same data • view level / externallevel
multiple times is called data redundancy.This redundancy leads to several problems. Physical level:- this is the lowest level of data abstraction which describe How data is actual stored in database. This level basically
•Need to perform a single logical update multiple times. describe the data structure and access path /indexing use for accessing file.
•Storage space is wasted. Logical level:- The next level of abstraction describe what data are stored in the database and what are the relationship existed
•Files that represent the same data may become inconsistent. among those of data.
Data inconsistency is the various copies of the same data may no larger Agree. Example: View level:- In this level user only interact with database and the complexity remain unview . user see data and there may be many
One user group may enter a student's birth date erroneously as JAN-19-1984, whereas the other user views of one data like chart and graph.
groups may enter the correct value ofJAN-29-1984.
DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019 CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019

3. Network Model – Network Model is same as hierarchical model except that it has graph-like structure rather than a tree-
based structure. Unlike hierarchical model, this model allows each record to have more than one parentrecord.
DATA MODELS IN DBMS Physical Data Models – These models describe data at the lowest level of abstraction.
A Data Model is a logical structure of Database. It describes the design of database to reflect entities, attributes, relationship among data,
Three Schema Architecture
constrains etc.

P
Types of Data Models:
The goal of the three schema architecture is to separate the user applications and the physical database. The schemas can be defined
Object based logical Models – Describe data at the conceptual and view levels.

P
at the following levels:

AP
1. E-RModel
1. The internal level – has an internal schema which describes the physical storage structure of the database. Uses a physical
An entity–relationship model (ER model) is a systematic way of describing and defining a business process. An ER model is typically
data model and describes the complete details of data storage and access paths for the database.

AP
implemented as a database. The main components of E-R model are: entity set and relationshipset.
2. The conceptual level – has a conceptual schema which describes the structure of the database for users. It hides the details
of the physical storage structures, and concentrates on describing entities, data types, relationships, user operations and
constraints. Usually a representational data model is used to describe the conceptual schema.

R
3. The External or View level – includes external schemas or user vies. Each external schema describes the part of the
database that a particular user group is interested in and hides the rest of the database from that user group. Represented
R
using the representational datamodel.

O
O
The three schema architecture is used to visualize the schema levels in a database. The three schemas are only descriptions of data,

UC
the data only actually exists is at the physical level.
2. Object orientedModel
UC

An object data model is a data model based on object-oriented programming, associating methods (procedures) with objects that
can benefit from class hierarchies. Thus, "objects" are levels of abstraction that include attributes and behavior
Record based logical Models – Like Object based model, they also describe data at the conceptual and view levels. These models
specify logical structure of database with records, fields and attributes.

ST
1. Relational Model
In relational model, the data and relationships are represented by collection of inter-related tables. Each table is a group of column
ST

and rows, where column represents attribute of an entity and rows represents records.
Sample relationship Model: Student table with 3 columns and three records.
Stu_Id Stu_Name Stu_Age
111 Ashish 23
123 Saurav 22
169 Lester 24
2. HierarchicalModel
In hierarchical model, data is organized into a tree like structure with each record is having one parent record and many children.
The main drawback of this model is that, it can have only one to many relationships between nodes. Sample Hierarchical
ModelDiagram:

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019 CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019

COMPONENTS OF DBMS
Database Users
Users are differentiated by the way they expect to interact with the system EVOLUTION OF RDBMS
• Applicationprogrammers Before the acceptance of Codd's Relational Model, database management systems was just an ad hoc collection of data designed to
• Sophisticatedusers solve a particular type of problem, later extended to solve more basic purposes. This led to complex systems, which were difficult to

P
• Naiveusers understand, install, maintain and use. These database systems were plagued with the following problems:
• DatabaseAdministrator • They required large budgets and staffs of people with special skills that were in shortsupply.

P
• Specialized usersetc,. • Database administrators' staff and application developers required prior preparation to access these database systems.

AP
• End-user access to the data was rarelyprovided.
• These database systems did not support the implementation of business logic as a DBMSresponsibility.

AP
Application programmers:
Professionals who write application programs and using these application programs they interact with the
databasesystem Hence, the objective of developing a relational model was to address each and every one of the shortcomings that plagued those
Sophisticated users : systems that existed at the end of the 1960s decade, and make DBMS products more widely appealing to all kinds of users.
These user interact with the database system without writing programs, But they submit queries to retrieve the

R
information The existing relational database management systems offer powerful, yet simple solutions for a wide variety of commercial and
Specialized users: scientific application problems. Almost every industry uses relational systems to store, update and retrieve data for operational,
R
Who write specialized database applications to interact with the database system. transaction, as well as decision support systems.
Naive users: RELATIONAL DATABASE

O
Interacts with the database system by invoking some application programs that have been written A relational database is a database system in which the database is organized and accessed according to the relationships between
previously by application programmers data items without the need for any consideration of physical orientation and relationship. Relationships between data items are
O
Eg : people accessing database over the web expressed by means of tables.

UC
Database Administrator: It is a tool, which can help you store, manage and disseminate information of various kinds. It is a collection of objects, tables, queries,
Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprise's forms, reports, and macros, all stored in a computer program all of which are inter-related.
UC

information resources and needs. It is a method of structuring data in the form of records, so that relations between different entities and attributes can be used for
 Schemadefinition data access and transformation.
 Access methoddefinition RELATIONAL DATABASE MANAGEMENT SYSTEM
 Schema and physical organization modification A Relational Database Management System (RDBMS) is a system, which allows us to perceive data as tables (and nothing but tables),

ST
 Granting user authority to access thedatabase and operators necessary to manipulate that data are at the user's disposal.
 Monitoringperformance Features of an RDBMS
ST

Storage Manager The features of a relational database are as follows:


The Storage Manager include these following components/modules  The ability to create multiple relations (tables) and enter data intothem
 AuthorizationManager  An interactive querylanguage
 TransactionManager  Retrieval of information stored in more than onetable
 FileManager  Provides a Catalog or Dictionary, which itself consists of tables ( called system tables)
 BufferManager
 Storage manager is a program module that provides the interface between the low-level data stored in the database and the Basic Relational Database Terminology Catalog:
application programs and queries submitted to thesystem. A catalog consists of all the information of the various schemas (external, conceptual and internal) and also all of the corresponding
 The storage manager is responsible to the followingtasks: mappings (external/conceptual, conceptual/internal).
 interaction with the filemanager It contains detailed information regarding the various objects that are of interest to the system itself; e.g., tables, views, indexes,
 efficient storing, retrieving and updating ofdata users, integrity rules, security rules, etc.
Authorization Manager In a relational database, the entities of the ERD are represented as tables and their attributes as the columns of their respective tables in a
 Checks whether the user is an authorized person ornot database schema.
 Test the satisfaction of integrityconstraints It includes some important terms, such as:
Transaction Manager • Table:Tablesarethebasicstoragestructuresofadatabasewheredataaboutsomethingintherealworldis
Responsible for concurrent transaction execution It ensures that the database remains in a consistent state despite of the system
failure

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019 CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019
stored. It is also called a relation or an entity.  in a relation with a key attribute, no two tuples can have identical values for keyattributes.
• Row: Rows represent collection of data required for a particular entity. In order to identify each row as unique there should  a key attribute can not have NULLvalues.
be a unique identifier called the primary key, which allows no duplicate rows. For example in a library every member is Key constraints are also referred to as Entity Constraints.
unique and hence is given a membership number, which uniquely identifies each member. A row is also called a record or Domain Constraints
atuple. Attributes have specific values in real-world scenario. For example, age can only be a positive integer. The same constraints have
• Column: Columns represent characteristics or attributes of an entity. Each attribute maps onto a column of a table. Hence, a been tried to employ on the attributes of a relation. Every attribute is bound to have a specific range of values. For example, age
column is also known as anattribute. cannot be less than zero and telephone numbers cannot contain a digit outside 0- Referential integrityConstraints
• Relationship:Relationshipsrepresentalogicallinkbetweentwotables.Arelationshipisdepictedbya Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key attribute of a relation that can be
foreign key column. referred in other relation.
• Degree: number ofattributes Referential integrity constraint states that if a relation refers to a key attribute of a different or same relation, then that key element
• Cardinality: number oftuples must exist.
• An attribute of an entity has a particular value. The set of possible values That a given attribute can have is called Relational database systems are expected to be equipped with a query language that can assist its users to query the database instances.
itsdomain. There are two kinds of query languages − relational algebra and relational calculus.
KEYS AND THEIR USE Relational Algebra
Key: An attribute or set of attributes whose values uniquely identify each entity in an entity set is called a key for that entityset. Relational algebra is a procedural query language, which takes instances of relations as input and yields instances of relations as
Super Key: If we add additional attributes to a key, the resulting combination would still uniquely identify an instance of the entity set. output. It uses operators to perform queries. An operator can be either unary or binary. They accept relations as their input and
Such augmented keys are called super keys. yield relations as their output. Relational algebra is performed recursively on a relation and intermediate results are also

P
P
Primary Key: It is a minimum super key. consideredrelations.
It is a unique identifier for the table (a column or a column combination with the property that at any given time no two rows of the The fundamental operations of relational algebra are as follows −
table contain the same value in that column or column combination).  Select

AP
AP
Foreign Key: A foreign keyis a field (or collection of fields) in one table that uniquely identifies a row of another table. In simpler  Project
words, the foreign key is defined in a second table, but it refers to the primary key in the first table. Candidate Key: There may be  Union
two or more attributes or combinations of attributes that uniquely identify an instance of an entity set. These attributes or combinations  Set different
of attributes are called candidatekeys.  Cartesianproduct
Secondary Key: A secondary key is an attribute or combination of attributes that may not be a candidate key, but that classifies the  Rename
entity set on a particular characteristic. Any key consisting of a single attribute is called a simple key, while that consisting of a We will discuss all these operations in the following sections.
combination of attributes is called a composite key. Select Operation (σ)

R
R
Referential Integrity It selects tuples that satisfy the given predicate from a relation.
Referential Integrity can be defined as an integrity constraint that specifies that the value (or existence) of an attribute in one relation Notation − σp(r)
depend on the value (or existence) of an attribute in the same or another relation. Referential integrity in a relational database is Where σ stands for selection predicate and r stands for relation. p is prepositional logic formula which may use connectors like and, or,

O
O
consistency between coupled tables. It is usually enforced by the combination of a primary key and a foreign key. For referential and not. These terms may use relational operators like − =, ≠, ≥, < , >, ≤.
integrity to hold, any field in a table that is declared a foreign key can contain only values from a parent table's primary key field. For example −
For instance, deleting a record that contains a value referred to by a foreign key in another table would break referentialintegrity.
σsubject = "database"(Books)

UC
UC

Relational Model
Relational data model is the primary data model, which is used widely around the world for data storage and processing. This model Output − Selects tuples from books where subject is 'database'. σsubject =
is simple and it has all the properties and capabilities required to process data with storage efficiency.
"database" and price = "450"(Books)
Concepts
Tables − In relational data model, relations are saved in the format of Tables. This format stores the relation among entities. A table has Output − Selects tuples from books where subject is 'database' and 'price' is 450.
rows and columns, where rows represents records and columns represent the attributes. σsubject = "database" and price = "450" or year > "2010"(Books)
Tuple − A single row of a table, which contains a single record for that relation is called a tuple.

ST
ST

Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those books published after 2010.
Relation instance − A finite set of tuples in the relational database system represents relation instance. Relation instances do not have
Project Operation (∏)
duplicate tuples.
It projects column(s) that satisfy a given predicate.
Relation schema − A relation schema describes the relation name (table name), attributes, and their names. Relation key − Each
Notation − ∏A1,A2,An(r)
row has one or more attributes, known as relation key, which can identify the row in the relation (table) uniquely.
Where A1, A2, Anare attribute names of relation r.
Attribute domain − Every attribute has some pre-defined value scope, known as attribute domain.
Duplicate rows are automatically eliminated, as relation is a set.
Constraints
For example −
Every relation has some conditions that must hold for it to be a valid relation. These conditions are called Relational
Integrity Constraints. There are three main integrity constraints− ∏subject, author (Books)
 Keyconstraints Selects and projects columns named as subject and author from the relation Books.
 Domainconstraints Union Operation (∪)
 Referential integrity constraints It performs binary union between two given relations and is defined as − r ∪ s =
{ t | t ∈ r or t ∈ s}
KeyConstraints
There must be at least one minimal subset of attributes in the relation, which can identify a tuple uniquely. This minimal subset of
attributes is called keyfor that relation. If there are more than one such minimal subsets, these are called candidate keys. Notation − r U s
Where r and s are either database relations or relation result set (temporary relation). For a
Key constraints force that −
union operation to be valid, the following conditions must hold −
 r, and s must have the same number ofattributes.
 Attribute domains must becompatible.

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019 CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019
 Duplicate tuples are automaticallyeliminated. SQL Queries
∏ author(Books) ∪ ∏ author(Articles) With SQL, we can query a database and have a result set returned. A query
Output − Projects the names of the authors who have either written a book or an article or both. like this:
Set Difference (−) SELECT LastName FROM Persons
The result of set difference operation is tuples, which are present in one relation but are not in the second relation. Gives a result set like this:
Notation − r − s LastName
Finds all the tuples that are present in r but not in s. Hansen
∏ author(Books) − ∏ author(Articles) Svendson
Output − Provides the name of authors who have written books but not articles. Pettersen
Cartesian Product (Χ) Note: Some database systems require a semicolon at the end of the SQL statement. We don't use the semicolon in ourtutorials.
Combines information of two different relations into one.
Notation − r Χ s SQL Data Manipulation Language (DML)
Where r and s are relations and their output will be defined as − r Χ s = SQL (Structured Query Language) is a syntax for executing queries. But the SQL language also includes a syntax to update, insert, and
{ q t | q ∈ r and t ∈ s} delete records.
σauthor = 'tutorialspoint'(Books Χ Articles) These query and update commands together form the Data Manipulation Language (DML) part of SQL:
 SELECT - extracts data from a databasetable

P
Output − Yields a relation, which shows all the books and articles written by tutorialspoint.  UPDATE - updates data in a databasetable

P
Rename Operation (ρ)  DELETE - deletes data from a databasetable
The results of relational algebra are also relations but without any name. The rename operation allows us to rename the output  INSERT INTO - inserts new data into a databasetable

AP
relation. 'rename' operation is denoted with small Greek letter rho ρ.

AP
Notation − ρ x(E) SQL Data Definition Language (DDL)
Where the result of expression E is saved with name of x. The Data Definition Language (DDL) part of SQL permits database tables to be created or deleted. We can also define indexes
Additional operations are − (keys), specify links between tables, and impose constraints between database tables.
 Set intersection The most important DDL statements in SQL are:
 Assignment  CREATE TABLE - creates a new databasetable
 Naturaljoin  ALTER TABLE - alters (changes) a databasetable

R
SQL FUNDAMENTALS:  DROP TABLE - deletes a databasetable
R
SQL is a standard computer language for accessing and manipulating databases.  CREATE INDEX - creates an index (searchkey)
 DROP INDEX - deletes anindex

O
What is SQL? The SQL SELECT Statement
O
 SQL stands for Structured QueryLanguage The SELECT statement is used to select data from a table. The tabular result is stored in a result table (called the result-set).
 SQL allows you to access adatabase Syntax

UC
 SQL is an ANSI standard computerlanguage
UC

SELECT column_name(s)
 SQL can execute queries against adatabase
FROM table_name
 SQL can retrieve data from adatabase
 SQL can insert new records in adatabase Note: SQL statements are not case sensitive. SELECT is the same as select.
 SQL can delete records from adatabase
 SQL can update records in adatabase SQL SELECT Example

ST
 SQL is easy tolearn To select the content of columns named "LastName" and "FirstName", from the database table called "Persons", use a SELECT
ST

statement like this:


SQL is a Standard - BUT.... SELECT LastName,FirstName FROM Persons
SQL is an ANSI (American National Standards Institute) standard computer language for accessing and manipulating database systems. The database table "Persons":
SQL statements are used to retrieve and update data in a database. SQL works with database programs like MS Access, DB2, Informix, LastName FirstName Address City
MS SQL Server, Oracle, Sybase, etc.Unfortunately, there are many different versions of the SQL language, but to be in compliance Hansen Ola Timoteivn 10 Sandnes
with the ANSI standard, they must support the same major keywords in a similar manner (such as SELECT, UPDATE, DELETE, Svendson Tove Borgvn 23 Sandnes
INSERT, WHERE, and others). Note: Most of the SQL database programs also have their own proprietary extensions in addition to the
Pettersen Kari Storgt 20 Stavanger
SQL standard!
SQL Database Tables The result
A database most often contains one or more tables. Each table is identified by a name (e.g. "Customers" or "Orders"). Tables LastName FirstName
contain records (rows) with data.Below is an example of a table called"Persons": Hansen Ola
LastName FirstName Address City Svendson Tove
Hansen Ola Timoteivn 10 Sandnes Pettersen Kari
Svendson Tove Borgvn 23 Sandnes
Pettersen Kari Storgt 20 Stavanger Select All Columns
To select all columns from the "Persons" table, use a * symbol instead of column names, like this:
The table above contains three records (one for each person) and four columns (LastName, FirstName, Address, and City).

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019 CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019
SELECT * FROM Persons The WHERE clause is used to specify a selection criterion.
Result
The WHERE Clause
LastName FirstName Address City
To conditionally select data from a table, a WHERE clause can be added to the SELECT statement.
Hansen Ola Timoteivn 10 Sandnes Syntax
Svendson Tove Borgvn 23 Sandnes
SELECT column FROM table
Pettersen Kari Storgt 20 Stavanger WHERE column operator value
With the WHERE clause, the following operators can be used:
The Result Set Operator Description
The result from a SQL query is stored in a result-set. Most database software systems allow navigation of the result set with = Equal
programming functions, like: Move-To-First-Record, Get-Record-Content, Move-To-Next-Record, etc. <> Not equal
Programming functions like these are not a part of this tutorial. To learn about accessing data with function calls,
> Greater than
< Less than
Semicolon after SQL Statements?
Semicolon is the standard way to separate each SQL statement in database systems that allow more than one SQL statement to be >= Greater than or equal
executed in the same call to the server. <= Less than or equal
Some SQL tutorials end each SQL statement with a semicolon. Is this necessary? We are using MS Access and SQL Server 2000 and BETWEEN Between an inclusive range

P
we do not have to put a semicolon after each SQL statement, but some database programs force you to use it. LIKE Search for a pattern

P
Note: In some versions of SQL the <> operator may be written as !=

AP
The SELECT DISTINCT Statement
The DISTINCT keyword is used to return only distinct (different) values. Using the WHERE Clause

AP
The SELECT statement returns information from table columns. But what if we only want to select distinct elements? To select only the persons living in the city "Sandnes", we add a WHERE clause to the SELECT statement:
With SQL, all we need to do is to add a DISTINCT keyword to the SELECT statement: SELECT * FROM Persons
Syntax WHERE City='Sandnes'
SELECT DISTINCT column_name(s) FROM "Persons" table
table_name LastName FirstName Address City Year

R
Hansen Ola Timoteivn 10 Sandnes 1951
R
Using the DISTINCT keyword Svendson Tove Borgvn 23 Sandnes 1978
To select ALL values from the column named "Company" we use a SELECT statement like this:
Svendson Stale Kaivn 18 Sandnes 1980

O
SELECT Company FROM Orders
Pettersen Kari Storgt 20 Stavanger 1960
O
Result
"Orders" table

UC
LastName FirstName Address City Year
Hansen Ola Timoteivn 10 Sandnes 1951
UC

Svendson Tove Borgvn 23 Sandnes 1978


Company OrderNumber
Svendson Stale Kaivn 18 Sandnes 1980
Sega 3412
W3Schools 2312 Using Quotes

ST
Trio 4678 Note that we have used single quotes around the conditional values in the examples.
Result
W3Schools 6798
ST

SQL uses single quotes around text values (most database systems will also accept double quotes). Numeric values should not be enclosed
in quotes.
For text values:
Company
This is correct:
Sega
SELECT * FROM Persons WHERE FirstName='Tove' This is
W3Schools wrong:
Trio SELECT * FROM Persons WHERE FirstName=Tove
Note that "W3Schools" is listed twice in the result-set.
W3Schools For numeric values:
To select only DIFFERENT values from the column named "Company" we use a SELECT DISTINCT statement like this:
SELECT DISTINCT Company FROM Orders This is correct:
SELECT * FROM Persons WHERE Year>1965 This is
Result:
wrong:
SELECT * FROM Persons WHERE Year>'1965'

The LIKE Condition


Company The LIKE condition is used to specify a search for a pattern in a column.
Sega Syntax
Now "W3Schools" is listed only once in the result-set.
W3Schools
Trio

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019 CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019
SELECT column FROM table UPDATE table_name
WHERE column LIKE pattern SET column_name = new_value WHERE
A "%" sign can be used to define wildcards (missing letters in the pattern) both before and after the pattern. column_name = some_value
Person:
Using LIKE LastName FirstName Address City
The following SQL statement will return persons with first names that start with an 'O': SELECT * Nilsen Fred Kirkegt 56 Stavanger
FROM Persons Rasmussen Storgt 67
WHERE FirstName LIKE 'O%'
The following SQL statement will return persons with first names that end with an 'a': SELECT *
Update one Column in a Row
FROM Persons We want to add a first name to the person with a last name of "Rasmussen":
WHERE FirstName LIKE '%a'
UPDATE Person SET FirstName = 'Nina' WHERE
The following SQL statement will return persons with first names that contain the pattern 'la': SELECT * LastName = 'Rasmussen'
FROM Persons Result:
WHERE FirstName LIKE '%la%'
LastName FirstName Address City
The INSERT INTO Statement
The INSERT INTO statement is used to insert new rows into a table. Nilsen Fred Kirkegt 56 Stavanger
Syntax Rasmussen Nina Storgt 67

P
INSERT INTO table_name
VALUES(value1, value2, ) Update several Columns in a Row
We want to change the address and add the name of the city:

AP

AP
You can also specify the columns for which you want to insert data: INSERT
INTO table_name (column1, column2,.) UPDATE Person
VALUES(value1, value2, ) SET Address = 'Stien 12', City = 'Stavanger' WHERE
LastName = 'Rasmussen'
Insert a New Row Result:
This "Persons" table: LastName FirstName Address City
LastName FirstName Address City Nilsen Fred Kirkegt 56 Stavanger
R

R
Pettersen Kari Storgt 20 Stavanger Rasmussen Nina Stien 12 Stavanger
And this SQL statement: The DELETE Statement
The DELETE statement is used to delete rows in a table.
O

O
INSERT INTO Persons
VALUES ('Hetland', 'Camilla', 'Hagabakka 24', 'Sandnes') Will give this Syntax
result: DELETE FROM table_name WHERE
column_name =some_value
UC

UC
LastName FirstName Address City
Pettersen Kari Storgt 20 Stavanger Person:
Hetland Camilla Hagabakka 24 Sandnes
LastName FirstName Address City
Nilsen Fred Kirkegt 56 Stavanger
Insert Data in Specified Columns
This "Persons" table: Rasmussen Nina Stien 12 Stavanger
ST

ST
Delete
LastName FirstName Address City
Drop
Pettersen Kari Storgt 20 Stavanger
Hetland Camilla Hagabakka 24 Sandnes Delete a Row
And This SQL statement: "Nina Rasmussen" is going to be deleted:
INSERT INTO Persons (LastName, Address) DELETE FROM Person WHERE LastName = 'Rasmussen'
VALUES ('Rasmussen', 'Storgt 67')
Result
Will give this result:
LastName FirstName Address City
LastName FirstName Address City Nilsen Fred Kirkegt 56 Stavanger
Pettersen Kari Storgt 20 Stavanger
Hetland Camilla Hagabakka 24 Sandnes Delete All Rows
Rasmussen Storgt 67 It is possible to delete all rows in a table without deleting the table. This means that the table structure, attributes, and indexes will be
Null (no value …not space not empty) intact:
The UPDATE statement is used to modify the data in a table. DELETE FROM table_name or
Syntax DELETE * FROM table_name
The ORDER BY keyword is used to sort the result.

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019 CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019
Sort the Rows VARP(column)
Aggregate functions in SQL Server
The ORDER BY clause is used to sort the rows.
Function Description
Orders:
AVG(column) Returns the average value of a column
Company OrderNumber
BINARY_CHECKSUM
Sega 3412
CHECKSUM
ABC Shop 5678
CHECKSUM_AGG
W3Schools 2312
COUNT(column) Returns the number of rows (without a NULL value) of a column
W3Schools 6798
COUNT(*) Returns the number of selected rows
Example COUNT(DISTINCT column) Returns the number of distinct results
To display the company names in alphabetical order: FIRST(column) Returns the value of the first record in a specified field (not supported in
SELECT Company, OrderNumber FROM Orders ORDER SQLServer2K)
BY Company ASC (asending) LAST(column) Returns the value of the last record in a specified field (not supported in
Result: SQLServer2K)
Company OrderNumber MAX(column) Returns the highest value of a column
ABC Shop 5678 MIN(column) Returns the lowest value of a column

P
Sega 3412 STDEV(column)
W3Schools 6798 STDEVP(column)
W3Schools 2312 SUM(column) Returns the total sum of a column

AP

AP
VAR(column)
Example VARP(column)
To display the company names in alphabetical order AND the OrderNumber in numerical order: SELECT
Company, OrderNumber FROM Orders
ORDER BY Company, OrderNumber
Result: Scalar functions
Company OrderNumber Scalar functions operate against a single value, and return a single value based on the input value.
R

R
ABC Shop 5678 Useful Scalar Functions in MS Access
Sega 3412 Function Description
O

O
W3Schools 2312 UCASE(c) Converts a field to upper case
W3Schools 6798 LCASE(c) Converts a field to lower case
Aggregate functions MID(c,start[,end]) Extract characters from a text field
UC

UC
Aggregate functions operate against a collection of values, but return a single value. LEN(c) Returns the length of a text field
Note: If used among many other expressions in the item list of a SELECT statement, the SELECT must have a GROUP BY clause!! INSTR(c,char) Returns the numeric position of a named character within a text field
"Persons" table (used in most examples)
LEFT(c,number_of_char) Return the left part of a text field requested
Name Age RIGHT(c,number_of_char) Return the right part of a text field requested
ROUND(c,decimals) Rounds a numeric field to the number of decimals specified
ST

ST
Hansen, Ola 34
Svendson, Tove 45 MOD(x,y) Returns the remainder of a division operation
Pettersen, Kari 19
Aggregate functions in MS Access
Aggregate functions (like SUM) often need an added GROUP BY functionality.
Function Description GROUP BY
GROUP BY... was added to SQL because aggregate functions (like SUM) return the aggregate of all column values every time they are
AVG(column) Returns the average value of a column
called, and without the GROUP BY function it was impossible to find the sum for each individual group of column values.
COUNT(column) Returns the number of rows (without a NULL value) of a column The syntax for the GROUP BY function is:
COUNT(*) Returns the number of selected rows SELECT column,SUM(column) FROM table GROUP BY column
FIRST(column) Returns the value of the first record in a specified field
LAST(column) Returns the value of the last record in a specified field GROUP BY Example
MAX(column) Returns the highest value of a column This "Sales" Table:
MIN(column) Returns the lowest value of a column
STDEV(column) Company Amount
STDEVP(column) W3Schools 5500
SUM(column) Returns the total sum of a column IBM 4500
VAR(column) W3Schools 7100
And This SQL:

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019 CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019
SELECT Company, SUM(Amount) FROM Sales indicate that it is a SQL statement. EXEC
Returns this result: SQL CONNECT db_name;
EXEC SQL CONNECT HR_USER; //connects to DB HR_USER
Company SUM(Amount)
Once connection is established with DB, we can perform DB transactions. Since these DB transactions are dependent on the values
W3Schools 17100
and variables of the host language. Depending on their values, query will be written and executed. Similarly, results of DB query will be
IBM 17100 returned to the host language which will be captured by the variables of host language. Hence we need to declare the variables to pass
W3Schools 17100 the value to the query and get the values from query. There are two types of variables used in the hostlanguage.
The above code is invalid because the column returned is not part of an aggregate. A GROUP BY clause will solve this problem:  Host variable : These are the variables of host language used to pass the value to the query as well as to capture the values
SELECT Company,SUM(Amount) FROM Sales GROUP returned by the query. Since SQL is dependent on host language we have to use variables of host language and such
BY Company variables are known as host variable. But these host variables should be declared within the SQL area or within SQL code.
That means compiler should be able to differentiate it from normal C variables. Hence we have to declare host variables
within BEGIN DECLARE and END DECLARE section. Again, these declare block should be enclosed within EXEC SQL
Returns this result:
and';'.
EXEC SQL BEGIN DECLARE SECTION;
Company SUM(Amount)
int STD_ID;

P
W3Schools 12600
char STD_NAME [15];

P
IBM 4500
char ADDRESS[20];

AP
HAVING… EXEC SQL END DECLARE SECTION;

AP
HAVING... was added to SQL because the WHERE keyword could not be used against aggregate functions (like SUM), and without We can note here that variables are written inside begin and end block of the SQL, but they are declared using C code. It does not use
HAVING... it would be impossible to test for result conditions. SQL code to declare the variables. Why? This is because they are host variables – variables of C language. Hence we cannot use SQL
The syntax for the HAVING function is: syntax to declare them. Host language supports almost all the datatypes from int, char, long, float, double, pointer, array, string,
SELECT column,SUM(column) FROM table GROUP structuresetc.
BY column When host variables are used in a SQL query, it should be preceded by colon – ':' to indicate that it is a host variable. Hence when pre

R
HAVING SUM(column) condition value This -compiler compiles SQL code, it substitutes the value of host variable andcompiles.
R
"Sales" Table: EXEC SQL SELECT * FROM STUDENT WHERE STUDENT_ID =:STD_ID;
The following code is a simple embedded SQL program, written in C. The program illustrates many, but not all, of the

O
embedded SQL techniques. The program prompts the user for an order number, retrieves the customer number,
O
Company Amount
W3Schools 5500 salesperson, and status of the order, and displays the retrieved information on the screen.

UC
IBM 4500 int main() {
UC

W3Schools 7100 EXEC SQL INCLUDE SQLCA;


This SQL: EXEC SQL BEGIN DECLARE SECTION;
SELECT Company,SUM(Amount) FROM Sales GROUP intOrderID; /* Employee ID(fromuser) */
BY Company intCustID; /* RetrievedcustomerID */
HAVING SUM(Amount)>10000

ST
char SalesPerson[10] /* Retrievedsalespersonname */
ST

Returns this result charStatus[6] /* Retrievedorderstatus */


EXEC SQL END DECLARE SECTION;
Company SUM(Amount)
W3Schools 12600 /* Set up error processing */
EXEC SQL WHENEVER SQLERROR GOTO query_error; EXEC SQL
EMBEDDED SQL WHENEVER NOT FOUND GOTO bad_number;
Embedded SQL is a method of inserting inline SQL statements or queries into the code of a programming language, which is known as a
host language. Because the host language cannot parse SQL, the inserted SQL is parsed by an embedded SQL preprocessor. /* Prompt the user for order number */
Embedded SQL is a robust and convenient method of combining the computing power of a programming language with SQL's specialized printf ("Enter order number: ");
data management and manipulation capabilities. scanf_s("%d", &OrderID);
Structure of embedded SQL
Structure of embedded SQL defines step by step process of establishing a connection with DB and executing the code in the DB within /* Execute the SQL query */
the high level language. EXEC SQL SELECT CustID, SalesPerson, Status FROM
Connectionto DB Orders
Thisisthefirststepwhilewritingaqueryinhighlevellanguages.FirstconnectiontotheDBthatweareaccessing WHERE OrderID = :OrderID
needs to be established. This can be done using the keyword CONNECT. But it has to precede with 'EXEC SQL'to INTO :CustID, :SalesPerson, :Status;

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019 CS8492 /Database Management Systems Department of CSE & IT 2018 - 2019
EXEC SQL EXECUTE IMMEDIATE 'GRANT SELECT ON STUDENT TO Faculty';
/* Display the results */ EXEC SQL EXECUTE IMMEDIATE 'DELETE FROM STUDENT WHERE STD_ID = 100';
printf ("Customer number: %d\n", CustID); printf EXEC SQL EXECUTE IMMEDIATE 'UPDATE STUDENT SET ADDRESS = 'Troy' WHERE STD_ID =100';
("Salesperson: %s\n", SalesPerson); printf ("Status:
%s\n", Status);
exit();

query_error:
printf ("SQL error: %ld\n", sqlca->sqlcode); exit();

bad_number:
printf ("Invalid order number.\n");
exit();
}
DYNAMIC SQL

P
The main disadvantage of embedded SQL is that it supports only static SQLs. If we need to build up queries at run time, then we

P
can use dynamic sql. That means if query changes according to user input, then it always better to use dynamic SQL. Like we said

AP
above, the query when user enters student name alone and when user enters both student name and address, is different. If we use

AP
embedded SQL, one cannot implement this requirement in the code. In such case dynamic SQL helps the user to develop query
depending on the values entered by him, without making him know which query is being executed. It can also be used when we do not
know which SQL statements like Insert, Delete update or select needs to be used, when number of host variables is unknown, or when
datatypes of host variables are unknown or when there is direct reference to DB objects like tables, views, indexes are required.
However this will make user requirement simple and easy but it may make query lengthier and complex. That means depending upon
R
user inputs, the query may grow or shrink making the code flexible enough to handle all the possibilities. In embedded SQL, compiler

R
knows the query in advance and pre-compiler compiles the SQL code much before C compiles the code for execution. Hence
embedded SQLs will be faster in execution. But in the case of dynamic SQL, queries are created, compiled and executed only at the run
O
time. This makes the dynamic SQL little complex, and timeconsuming.

O
Since query needs to be prepared at run time, in addition to the structures discussed in embedded SQL, we have three more clauses in
dynamic SQL. These are mainly used to build the query and execute them at runtime.
UC

PREPARE

UC
Since dynamic SQL builds a query at run time, as a first step we need to capture all the inputs from the user. It will be stored in a
string variable. Depending on the inputs received from the user, string variable is appended with inputs and SQL keywords. These
SQL like string statements are then converted into SQL query. This is done by using PREPAREstatement.
For example, below is the small snippet from dynamic SQL. Here sql_stmt is a character variable, which holds inputs from the users
ST

along with SQL commands. But is cannot be considered as SQL query as it is still a sting value. It needs to be converted into a proper

ST
SQL query which is done at the last line using PREPARE statement. Here sql_query is also a string variable, but it holds the string as a
SQLquery.
EXECUTE
This statement is used to compile and execute the SQL statements prepared in DB. EXEC SQL
EXECUTE sql_query;
EXECUTEIMMEDIATE
This statement is used to prepare SQL statement as well as execute the SQL statements in DB. It performs the task of PREPARE and
EXECUTE in a singleline.
EXEC SQL EXECUTE IMMEDIATE :sql_stmt;
Dynamic SQL will not have any SELECT queries and host variables. But it can be any other SQL statements like insert, delete, update,
grant etc. But when we use insert/ delete/ updates in this type, we cannot use host variables. All the input values will be hardcoded.
Hence the SQL statements can be directly executed using EXECUTE IMMEDIATE rather than using PREPARE and thenEXECUTE.

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /DatabaseManagementSystems Department of CSE&IT 2019 - 2020 CS8492 /Database Management Systems
Department o

UNITII DATABASE DESIGN Working with ER Diagrams


Entity-Relationship model – E-R Diagrams – Enhanced-ER Model – ER-to-Relational Mapping – Functional Dependencies ER Diagram is a visual representation of data that describes how data is related to each other. In ER Model, we disintegrate data into
– Non-loss Decomposition – First, Second, Third Normal Forms, Dependency Preservation – Boyce/Codd Normal Form entities, attributes and setup relationships between entities, all this can be represented visually using the ER diagram.
– Multi-valued Dependencies and Fourth Normal Form – Join Dependencies and Fifth Normal Form
Components of ER Diagram
DATABASE DESIGN Entitiy, Attributes, Relationships etc form the components of ER Diagram and there are defined symbols and shapes to
A well-designed database shall: represent each one ofthem.
Let's see how we can represent these in our ER Diagram.
 Eliminate Data Redundancy: the same piece of data shall not be stored in more than one place. This is because duplicate data not only
Entity
waste storage spaces but also easily lead toinconsistencies.
Simple rectangular box represents an Entity.
 Ensure Data Integrity andAccuracy

P
Entity-Relationship Data Model
 Classical, popular conceptual datamodel

P
 Firstintroduced(mid70's)asa(relativelyminor)improvementtotherelationalmodel:pictorialdiagramsare easier to read than

AP
Relationships between Entities - Weak and Strong
relational databaseschemas
Rhombus is used to setup relationships between two or more entities.

AP
 Then evolved as a popular model for the first conceptual representation of data structures in the processof
databasedesign
ER Model: Entity and Entity Set
Considering the above example, Student is an entity, Teacher is an entity, similarly, Class, Subjectetc are also entities.
An Entity is generally a real-world object which has characteristics and holds relationships in a DBMS. If a Student is
R
an Entity, then the complete dataset of all the students will be the Entity Set

R
ER Model: Attributes
O
Attributes for any Entity
If a Student is an Entity, then student's roll no., student's name, student's age, student's gender etc will be its attributes.
Ellipse is used to represent attributes of any entity. It is connected to the entity.

O
An attribute can be of many types, here are different types of attributes defined in ER database model:
UC

1. Simple attribute: The attributes with values that are atomic and cannot be broken down further aresimple attributes. For
example, student's age.

UC
2. Composite attribute: A composite attribute is made up of more than one simple attribute. Forexample, student's address
will contain, house no., street name, pincodeetc.
3. Derivedattribute:Thesearetheattributeswhicharenotpresentinthewholedatabasemanagementsystem, but are derived using other
attributes. For example, average age of students in aclass.
ST

4. Single-valued attribute: As the name suggests, they have a singlevalue.


Weak Entity

ST
5. Multi-valued attribute: And, they can have multiplevalues.
A weak Entity is represented using double rectangular boxes. It is generally connected to another entity.
ER Model: Relationships

When an Entity is related to another Entity, they are said to have a relationship. For example, A ClassEntity is related to Student
entity, because students study in classes, hence this is arelationship.
Key Attribute for any Entity
Depending upon the number of entities involved, a degree is assigned torelationships. To represent a Key attribute, the attribute name inside the Ellipse is underlined.
For example, if 2 entities are involved, it is said to be Binary relationship, if 3 entities are involved, it is said to be Ternary
relationship, and so on.

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 1 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 2

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department o Department o

Derived Attribute for any Entity ER Diagram: Binary Relationship


Derived attributes are those which are derived based on other attributes, for example, age can be derived from date of birth. Binary Relationship means relation between two Entities. This is further divided into three types.
To represent a derived attribute, another dotted ellipse is created inside the main ellipse. One to One Relationship
This type of relationship is rarely seen in real world.

Multivalued Attribute for any Entity


Double Ellipse, one inside another, represents the attribute which can have multiple values.

The above example describes that one student can enroll only for one course and a course will also have only one Student. This is not

P
what you will usually see in real-world relationships.

P
One to Many Relationship

AP
Composite Attribute for any Entity The below example showcases this relationship, which means that 1 student can opt for many courses, but a course can only have 1

AP
A composite attribute is the attribute, which also has attributes. student. Sounds weird! This is how it is.
R

R
ER Diagram: Entity
An Entity can be any object, place, person or class. In ER Diagram, an entity is represented using rectangles. Consider an example
O
of an Organisation- Employee, Manager, Department, Product and many more can be taken as entities in anOrganisation. Many to One Relationship

O
It reflects business rule that many entities can be associated with just one entity. For example, Student enrolls for only one Course
but a Course can have many Students.
UC

UC
ST

ST
The yellow rhombus in between represents a relationship.
ER Diagram: Key Attribute
Key attribute represents the main characteristic of an Entity. It is used to represent a Primary key. Ellipse with the text underlined, Many to Many Relationship
represents Key Attribute.

The above diagram represents that one student can enroll for more than one courses. And a course can have more than 1 student
enrolled in it.

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 3 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 4

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department o Department o

ER Diagram: Recursive Relationship class.


When an Entity is related with itself it is known as Recursive Relationship.

ER Diagram: Ternary Relationship


Relationship of degree three is called Ternary relationship.
For example, Saving and Current account types entities can be generalised and an entity with name Account can be created, which
A Ternary relationship involves three entities. In such relationships we always consider two entites together and then look upon the

P
P
covers both.
third.
Specialization

AP
AP
Specialization is opposite to Generalization. It is a top-down approach in which one higher level entity can be broken down into two
lower level entity. In specialization, a higher level entity may not have any lower-level entity sets, it's possible.

R
R
For example, in the diagram above, we have three related entities, Company, Product and Sector. To understand the relationship better or

O
O
to define rules around the model, we should relate two entities and then derive the third one.
A Company produces many Products/ each product is produced by exactly one company. A Company
operates in only one Sector / each sector has many companies operating in it.

UC
UC

Considering the above two rules or relationships, we see that although the complete relationship involves three entities, but we are Aggregation
looking at two entities at a time. Aggregation is a process when relation between two entities is treated as a single entity.
The Enhanced ER Model
As the complexity of data increased in the late 1980s, it became more and more difficult to use the traditional ER Model for database
modelling. Hence some improvements or enhancements were made to the existing ER Model to make it able to handle the complex

ST
ST

applications better.
Hence, as part of the Enhanced ER Model, along with other improvements, three new concepts were added to the existing ER Model,
they were:
1. Generalization
2. Specialization
3. Aggregration
Generalization In the diagram above, the relationship between Center and Course together, is acting as an Entity, which is in relationship with another
entity Visitor. Now in real world, if a Visitor or a Student visits a Coaching Center, he/she will never enquire about the center only or
Generalization is a bottom-up approach in which two lower level entities combine to form a higher level entity. In generalization, the just about the course, rather he/she will ask enquire about both.
higher level entity can also combine with other lower level entities to make further higher level entity.
It's more like Superclass and Subclass system, but the only difference is the approach, which is bottom-up. Hence, entities are ER Model to Relational Model
combined to form a more generalised entity, in other words, sub-classes are combined to form a super- ER Model can be represented using ER Diagrams which is a great way of designing and representing the database design in more
of a flow chart form.

St. Joseph's College of Engineering/St. Joseph's InstituteofTechnology Page6


St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 5

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /DatabaseManagementSystems Department of CSE&IT 2019 - 2020 CS8492 /Database Management Systems
Department o

It is very convenient to design the database using the ER Model by creating an ER diagram and later on converting it into relational name.
model to design your tables. Also proper foreign key constraints must be set for all the tables.
Not all the ER Model constraints and components can be directly transformed into relational model, but an approximate schema can be
derived.
Functional Dependency
Few examples of ER diagrams and convert it into relational model schema, hence creating tables in RDBMS. The functional dependency is a relationship that exists between two attributes. It typically exists between the primary key and
non-key attribute within a table.
Entity becomes Table X→Y
Entity in ER Model is changed into tables, or we can say for every Entity in ER model, a table is created in Relational Model. The left side of FD is known as a determinant, the right side of the production is known as a dependent.
And the attributes of the Entity gets converted to columns of the table. For example:
And the primary key specified for the entity in the ER model, will become the primary key for the table in relationalmodel. Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
For example, for the below ER Diagram in ER Model, Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we know the Emp_Id, we can tell that

P
employee name associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name

P
AP
Types of Functional dependency

AP
A table with nameStudentwill be created in relational model, which will have 4
R
columns, id, name, age, address and id will be the primary key for thistable.
Table:Student

R
O
id name age address

O
Relationship becomes a Relationship Table
UC

In ER diagram, we use diamond/rhombus to represent a relationship between two entities. In Relational model we create a
relationship table for ER Model relationships too.

UC
Trivial functional dependency
A → B has trivial functional dependency if B is a subset ofA.
In the ER diagram below, we have two entities Teacher and Student with a relationship between them.
o
o The following dependencies are also trivial like: A → A, B →B
ST

Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id,Employee_Name} →

ST
Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id,Employee_Name}.
3. Also, Employee_Id → Employee_Id andEmployee_Name → Employee_Name are trivial dependenciestoo.

Non-trivial functional dependency


A → B has a non-trivial functional dependency if B is not a subset of A.
As discussd above, entity gets mapped to table, hence we will create table for Teacher and a table for Student with all the attributes When A intersection B is NULL, then A → B is called as complete non-trivial.
converted into columns.
Example:
Now, an additional table will be created for the relationship, for example StudentTeacher or give it any name you like. This table will ID → Name,
hold the primary key for both Student and Teacher, in a tuple to describe the relationship, which teacher teaches whichstudent. Name → DOB
If there are additional attributes related to this relationship, then they become the columns for this table, like subject
St. Joseph's College of Engineering/St. Joseph's Institute of Normalization of Database
Database Normalization is a technique of organizing the data in the database. Normalization is a systematic

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 8

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department o Department o

approach of decomposing tables to eliminate data redundancy(repetition) and undesirable characteristics like Insertion, Normalization Rule
Update and Deletion anomalies. It is a multi-step process that puts data into tabular form, removing duplicated data from the Normalization rules are divided into the following normal forms:
relation tables. 1. First NormalForm
Normalization is used for mainly two purposes, 2. Second NormalForm
3. Third NormalForm
 Eliminating reduntant(useless)data. 4. BCNF
 Ensuring data dependencies make sense i.e data is logicallystored. 5. Fourth NormalForm
6. Fifth NormalForm
Problems Without Normalization First Normal Form (1NF)
If a table is not properly normalized and have data redundancy then it will not only eat up extra memory space but will also make it For a table to be in the First Normal Form, it should follow the following 4 rules:
difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anomalies are very frequent if 1. It should only have single(atomic) valuedattributes/columns.
database is not normalized. To understand these anomalies let us take an example of a Student table. 2. Values stored in a column should be of the samedomain
3. All the columns in a table should have uniquenames.

P
4. And the order in which data is stored, does notmatter.
rollno name branch hod office_tel
Rules for First Normal Form

AP

AP
The first normal form expects you to follow a few simple rules while designing your database, and they are:
401 Akon CSE Mr. X 53337 Rule 1: Single Valued Attributes
Each column of your table should be single valued which means they should not contain multiple values. We will explain this
with help of an example later, let's see the other rules for now.
402 Bkon CSE Mr. X 53337
Rule 2: Attribute Domain should not change
R

R
This is more of a "Common Sense" rule. In each column the values stored must be of the same kind or type. For example: If
you have a column dob to save date of births of a set of people, then you cannot or you must not save 'names' of some of
403 Ckon CSE Mr. X 53337 them in that column along with 'date of birth' of others in that column. It should hold only 'date of birth' for all the
O

O
records/rows.

404 Dkon CSE Mr. X 53337 Rule 3: Unique name for Attributes/Columns
This rule expects that each column in a table should have a unique name. This is to avoid confusion at the time of retrieving data
UC

UC
or performing any other operation on the stored data.
In the table above, we have data of 4 Computer Sci. students. As we can see, data for the fields branch, hod(Head of Department) If one or more columns have same name, then the DBMS system will be left confused.
and office_tel is repeated for the students who are in the same branch in the college, this is Data Redundancy. Rule 4: Order doesn't matters
Insertion Anomaly This rule says that the order in which you store the data in your table doesn't matter.
Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be inserted, or else we will
ST

ST
have to set the branch information as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch information will be repeated for all those 100 students.
EXAMPLE
These scenarios are nothing but Insertion anomalies. Create a table to store student data which will have student's roll no., their name and the name of subjects they have opted for.
Here is the table, with some sample data added to it.
Updation Anomaly
What if Mr. X leaves the college? or is no longer the HOD of computer science department? In that case all the student records will
have to be updated, and if by mistake we miss any record, it will lead to data inconsistency. This is Updation anomaly.
roll_no name subject
Deletion Anomaly
101 Akon OS, CN
In our Student table, two different informations are kept together, Student information and Branch information. Hence, at the end
of the academic year, if student records are deleted, we will also lose the branch information. This is Deletion anomaly. 103 Ckon Java
102 Bkon C, C++

The table already satisfies 3 rules out of the 4 rules, as all our column names are unique, we have stored data in the order we
wanted to and we have not inter-mixed different type of data incolumns.

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 9 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 10

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department o Department o

But out of the 3 different students in our table, 2 have opted for more than 1 subject. And we have stored the subject names in a as a primary key.
single column. But as per the 1st Normal form each column must contain atomic value. Let's create another table for Subject, which will have subject_id and subject_name fields and subject_id will be the primary
It's very simple, because all we have to do is break the values into atomic values. Here is our key.
updated table and it now satisfies the First Normal Form. subject_i subject_nam
roll_no name subject d e
101 Akon OS 1 Java
101 Akon CN 2 C++
103 Ckon Java 3 Php
102 Bkon C Now we have a Student table with student information and another table Subject for storing subject information.
102 Bkon C++
Let's create another table Score, to store the marks obtained by students in the respective subjects. We will also be saving name
of the teacher who teaches that subject along with marks.
By doing so, although a few values are getting repeated but values for the subject column are now atomic for each

P
record/row. Using the First Normal Form, data redundancy increases, as there will be many columns with same data in multiple subject_i d
score_id student_id marks teacher
rows but each row as a whole will be unique.
1 10 1 70 Java Teacher
Second Normal Form (2NF)

AP

AP
For a table to be in the Second Normal Form, 2 10 2 75 C++ Teacher
1. It should be in the First Normalform. 3 11 1 80 Java Teacher
2. And, it should not have PartialDependency. In the score table we are saving the student_id to know which student's marks are these and subject_id to know for which
Dependency subject the marks are for.
Let's take an example of a Student table with columns student_id, name, reg_no(registration number), Together, student_id + subject_id forms a Candidate Key which can be the Primary key.
R

R
branch and address(student's home address).
To get me marks of student with student_id 10, can you get it from this table? No, because you don't know for which subject. And
student_ id reg_n branc h addre if I give you subject_id, you would not know for which student. Hence we need student_id
name
o ss
O

O
+ subject_id to uniquely identify anyrow.
But where is PartialDependency?
UC

UC
In this table, student_id is the primary key and will be unique for every row, hence we can use student_id to fetch any row of Now if you look at the Score table, we have a column names teacher which is only dependent on the subject, for Java it's Java
data from this table Teacher and for C++ it's C++ Teacher & so on.
Even for a case, where student names are same, if we know the student_id we can easily fetch the correct record. Now as we just discussed that the primary key for this table is a composition of two columns which is student_id &
student_id name reg_no branch address subject_id but the teacher's name only depends on subject, hence the subject_id, and has nothing to do withstudent_id.
10 Akon 07-WY CSE Kerala
ST

ST
This is Partial Dependency, where an attribute in a table depends on only a part of the primary key and not on the whole key.
11 Akon 08-WY IT Gujarat
How to remove Partial Dependency?
Hence we can say a Primary Key for a table is the column or a group of columns(composite key) which can uniquely identify There can be many different solutions for this, but out objective is to remove teacher's name from Score table. The simplest
each record in the table. solution is to remove columns teacher from Score table and add it to the Subject table. Hence, the Subject table will become:
I can ask from branch name of student with student_id 10, and I can get it. Similarly, if I ask for name of student with
student_id 10 or 11, I will get it. So all I need is student_id and every other column depends on it, or can be fetched using it.This
is Dependency and we also call it Functional Dependency.
Partial Dependency subject_id subject_name teacher
Now that we know what dependency is, we are in a better state to understand what partial dependency is. 1 Java Java Teacher
For a simple table like Student, a single column like student_id can uniquely identfy all the records in a table. But this is not true 2 C++ C++ Teacher
all the time. So now let's extend our example to see if more than 1 column together can act 3 Php Php Teacher

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 11 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 12

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department o Department o

Well, the column total_marks depends on exam_name as with exam type the total score changes. For example, practicals are of
And our Score table is now in the second normal form, with no partial dependency. less marks while theory exams are of more marks.
score_ student_ id subject_ id mar But, exam_name is just another column in the score table. It is not a primary key or even a part of the primary key, and
id ks total_marks depends on it.
1 10 1 70 This is Transitive Dependency. When a non-prime attribute depends on other non-prime attributes rather than depending upon the
2 10 2 75 prime attributes or primary key.
3 11 How to remove Transitive Dependency
Again the solution is very simple. Take out the columns exam_name and total_marks from Score table and put them in an Exam
Third Normal Form (3NF) table and use the exam_id wherever required.
A table is said to be in the Third Normal Form when, Score Table: In 3rd Normal Form
1. It is in the Second Normalform.
student_i subject_
2. And, it doesn't have TransitiveDependency. score_id marks exam_id
d id

P
So let's use the same example, where we have 3 tables, Student, Subject and Score.
StudentTable The new Exam table

AP

AP
student_i total_mark
name reg_no branch address exam_id exam_name
d s
10 Akon 07-WY CSE Kerala 1 Workshop 200
11 Akon 08-WY IT Gujarat 2 Mains 70
12 Bkon 09-WY IT Rajasthan 3 Practicals 30
Advantage of removing Transitive Dependency
SubjectTable
R

R
The advantage of removing transitive dependency is,
subject_id subject_name teacher  Amount of data duplication isreduced.
1 Java Java Teacher  Data integrityachieved.
O

O
2 C++ C++ Teacher Boyce and Codd Normal Form (BCNF)
3 Php Php Teacher Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of anomaly that
UC

UC
is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF. For a table
Score Table to be in BCNF, following conditions must be satisfied:
In the Score table, we need to store some more information, which is the exam name and total marks, so let's add 2 more  R must be in 3rd NormalForm
columns to the Score table.  and, for each functional dependency ( X → Y ), X should be a super Key.In simple words, it means, that for a dependency
student_i subject_i A → B, A cannot be a non-prime attribute, if B is a primeattribute.
score_id marks
d d Example
ST

ST
1 10 1 70 College enrolment table with columns student_id, subject and professor.
2 10 2 75 student_id subject professor
3 11 1 80 101 Java P.Java
101 C++ P.Cpp
102 Java P.Java2
Transitive Dependency 103 C# P.Chash
With exam_name and total_marks added to our Score table, it saves more data now. Primary key for the Score table is a 104 Java P.Java
composite key, which means it's made up of two attributes or columns → student_id + subject_id. In the table above:
The new column exam_name depends on both student and subject. For example, a mechanical engineering student will have One student can enroll for multiple subjects. For example, student with student_id 101, has opted for subjects - Java & C++
Workshop exam but a computer science student won't. And for some subjects you have Practical exams and for some  For each subject, a professor is assigned to thestudent.
you don't. So we can say that exam_name is dependent on both student_id and subject_id.  And, there can be multiple professors teaching one subject like Java. What do
And what about our second new column total_marks? Does it depend on our Score table's primary key? you think should be the PrimaryKey?

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 13 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 14

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department o Department o

Well, in the table above student_id, subject together form the primary key, because using Example
student_id and subject, we can find all the columns of the table. Below we have a college enrolment table with columns s_id, course and hobby.
One more important point to note here is, one professor teaches only one subject, but one subject may have two different cours
s_id hobby
professors. e
Hence, there is a dependency between subject and professor here, where subject depends on the professor name. Scienc e
1 Cricket
This table satisfies the 1st Normal form because all the values are atomic, column names are unique and all the values stored in
1 Maths Hockey
a particular column are of same domain.
2 C# Cricket
This table also satisfies the 2nd Normal Form as there is no Partial Dependency.
2 Php Hockey
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form. But this table
From the table above, student with s_id 1 has opted for two courses, Science and Maths, and has two hobbies,
is not in Boyce-Codd Normal Form.
Cricket andHockey.

P
Why this table is not in BCNF?
You must be thinking what problem this can lead to, right?

P
In the table above, student_id, subject form primary key, which means subject column is a prime attribute. But, there is
one more dependency, professor → subject.
Well the two records for student with s_id 1, will give rise to two more records, as shown below, because for one student, two

AP
hobbies exists, hence along with both the courses, these hobbies should be specified.

AP
And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by BCNF.
s_id course hobby
How to satisfy BCNF?
1 Science Cricket
To make this relation(table) satisfy BCNF, we will decompose this table into two tables, student table and professor
1 Maths Hockey
table.
1 Science Hockey
Below we have the structure for both the tables.
R
1 Maths Cricket
Student Table

R
And, in the table above, there is no relationship between the columns course and hobby. They are independent of each other.
student_id p_id
So there is multi-value dependency, which leads to un-necessary repetition of data and other anomalies as well.
O
101 1
How to satisfy 4th Normal Form?

O
101 2
To make the above relation satify the 4th normal form, we can decompose the table into 2 tables.
Professor Table
UC

p_id professor subject


CourseOpted Table

UC
s_id course
1 P.Java Java
1 Science
2 P.Cpp C++
1 Maths
And now, this relation satisfy Boyce-Codd Normal Form.
2 C#
ST

Fourth Normal Form (4NF) 2 Php

ST
A table is said to be in the Fourth Normal Form when, Hobbies Table,
1. It is in the Boyce-Codd NormalForm. s_id hobby
2. And, it doesn't have Multi-ValuedDependency. 1 Cricket
Multi-valued Dependency 1 Hockey
A table is said to have multi-valued dependency, if the following conditions are true, 2 Cricket
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the table mayhave multi- 2 Hockey
valueddependency. Now this relation satisfies the fourth normal form.
2. Also, a table should have at-least 3 columns for it to have a multi-valueddependency. A table can also have functional dependency along with multi-valued dependency. In that case, the functionally dependent
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and C should be columns are moved in a separate table and the multi-valued dependent columns are moved to separatetables.
independent of eachother.
If all these conditions are true for any relation(table), it is said to have multi-valued dependency.

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 15 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 16

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department o Department o

Fifth Normal Form (5NF) The above relations have join dependency, so they are not in 5NF. That would mean that a join relation of the above three
A database is said to be in 5NF, if and only if, relations is equal to our original relation <Employee>.
1. It's in4NF
2. If we can decompose table further to eliminate redundancy and anomaly, and when we re-join the decomposed FIFTH NORMAL FORM EXAMPLE
tables by means of candidate keys, we should not be losing the original data or anynew record set should not arise. Consider an example of different Subjects taught by different lecturers and the lecturers taking classes for different semesters.
In simple words, joining two or more decomposed table should not lose records nor create newrecords. Note: Please consider that Semester 1 has Mathematics, Physics and Chemistry and Semester 2 has only Mathematics in
What is Join Dependency its academic year!!
If a table can be recreated by joining multiple tables and each of this table have a subset of the attributes of the table, then the
table is in Join Dependency. It is a generalization of Multivalued DependencyJoin Dependency can be related to 5NF, wherein a
relation is in 5NF, only if it is already in 4NF and it cannot be decomposed further.
Example
<Employee>

P
P

AP
EmpName EmpSkills EmpJob (Assigned Work)

AP
Tom Networking EJ001 In above table, Rose takes both Mathematics and Physics class for Semester 1, but she does not take Physics class for
Harry Web Development EJ002 Semester 2. In this case, combination of all these 3 fields is required to identify a valid data. Imagine we want to add a new class
- Semester3 but do not know which Subject and who will be taking that subject. We would be simply inserting a new entry with
Katietable can be decomposed into the following
The above Programming EJ002
three tables; therefore it is not in 5NF: Class as Semester3 and leaving Lecturer and subject as NULL. As we discussed above, it's not a good to have such entries.

R
Moreover, all the three columns together act as a primary key, we cannot leave other two columns blank!
Hence we have to decompose the table in such a way that it satisfies all the rules till 4NF and when join them by using keys, it
R
<EmployeeSkills>
should yield correct record. Here, we can represent each lecturer's Subject area and their classes in a better way. We can

O
divide above table into three - (SUBJECT, LECTURER), (LECTURER, CLASS), (SUBJECT, CLASS)
O
EmpName EmpSkills

UC
Tom Networking
UC

Harry Web Development


<EmployeeJob>
Katie Programming

ST
EmpName EmpJob
ST

Tom EJ001
Harry EJ002
<JobSkills>
Katie EJ002 Now, each of combinations is in three different tables. If we need to identify who is teaching which subject to which semester,
we need join the keys of each table and get the result.
EmpSkills EmpJob For example, who teaches Physics to Semester 1, we would be selecting Physics and Semester1 from table 3 above, join with
table1 using Subject to filter out the lecturer names. Then join with table2 using Lecturer to get correct lecturer name. That is
Networking EJ001 we joined key columns of each table to get the correct data. Hence there is no lose or new data - satisfying 5NF condition.
Web Development EJ002
Our Join Dependency: Programming EJ002

{(EmpName, EmpSkills ), ( EmpName, EmpJob), (EmpSkills, EmpJob)}

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 17 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 18

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

UNIT III - TRANSACTIONS


State Transaction types
Transaction Concepts – ACID Properties – Schedules – Serializability – Concurrency Control – Need for Concurrency
– Locking Protocols – Two Phase Locking – Deadlock – Transaction Recovery - Save Points – Isolation Levels – SQL ActiveState A transaction enters into an active state when the execution process begins. Duringthis state read or
Facilities for Concurrency and Recovery write operations can beperformed.

1. TRANSACTION CONCEPTS What Partially A transaction goes into the partially committed state after the end of a transaction.
Committed
isTransaction?
A set of logically related operations is known as transaction. The main operations of a transaction are:
CommittedStat When the transaction is committed to state, it has already completed its execution successfully.
Read(A): Read operations Read(A) or R(A) reads the value of A from the database and stores it in a buffer in main e Moreover, all of its changes are recorded to the database permanently.
memory.
Write (A): Write operation Write(A) or W(A) writes the value back to the database from buffer. Failed State A transaction considers failed when any one of the checks fails or if the transaction is aborted

P
while it is in the active state.

P
Let us take a debit transaction from an account which consists of following operations:
1.R(A); Terminated State of transaction reaches terminated state when certain transactions which are leaving the

AP

AP
2.A=A-1000; State system can't be restarted.
3.W(A);
Assume A's value before starting of transaction is 5000.

The first operation reads the value of A from database and stores it in abuffer.

Second operation will decrease its value by 1000. So buffer will contain4000.
R

R
 Third operation will write the value from buffer to database. So A's final value will be4000.
But it may also be possible that transaction may fail after executing some of its operations. The failure can be because of
O
hardware, software or power etc. For example, if debit transaction discussed above fails after executing operation 2, the

O
value of A will remain 5000 in the database which is not acceptable by the bank.
UC

UC
To avoid this, Database has two important operations: Transition Diagram for a Database Transaction
Commit: After all instructions of a transaction are successfully executed, the changes made by transaction are made permanent in Let's study a state transition diagram that highlights how a transaction moves between these various states.
the database. 1. Once a transaction states execution, it becomes active. It can issue READ or WRITEoperation.
Rollback:Ifatransactionisnotabletoexecutealloperationssuccessfully,allthechangesmadebytransaction areundone. 2. Once the READ and WRITE operations complete, the transactions becomes partially committedstate.
ST

Why do you need concurrency in Transactions? 3. Next,somerecoveryprotocolsneedtoensurethatasystemfailurewillnotresultinaninabilitytorecord changes in the

ST
A database is a shared resource accessed. It is used by many users and processes concurrently. Forexample, the banking transaction permanently. If this check is a success, the transaction commits and enters into the committedstate.
system, railway, and air reservations systems, stock market monitoring, supermarket inventory, and checkouts,etc. 4. If the check is a fail, the transaction goes to the Failedstate.
Not managing concurrent access may create issues like: 5. Ifthetransactionisabortedwhileit'sintheactivestate,itgoestothefailedstate.Thetransactionshould be rolled back to undo the
 Hardware failure and systemcrashes effect of its write operations on thedatabase.
 Concurrent execution of the same transaction, deadlock, or slowperformance 6. The terminated state refers to the transaction leaving thesystem.

States of Transactions 2. ACIDPROPERTIES


A transaction is a single logical unit of work which accesses and possibly modifies the contents of a database.
The various states of a Database Transaction are listed below
Transactions access data using read and write operations.
In order to maintain consistency in a database, before and after transaction, certain properties are followed.
These are called ACID properties.

1 Prepared By: Mrs. E. Ajitha (AP/CSE) 2 Prepared By: Mrs. E. Ajitha (AP/CSE)

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

Atomicity
By this, we mean that either the entire transaction takes place at once or doesn't happen at all. There is no midway i.e.
transactions do not occur partially. Each transaction is considered as one unit and either runs to completion or is not
executed at all. It involves following two operations.
—Abort: If a transaction aborts, changes made to database are not visible.
—Commit: If a transaction commits, changes made are visible. Atomicity is also
known as the 'All or nothing rule'.
Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to account Y. Suppose T has been executed till Read (Y) and then T'' starts. As a result , interleaving of operations takes place due
to which T'' reads correct value of X but incorrect value of Y and sum computed by
T'': (X+Y = 50, 000+500=50, 500)

P
is thus not consistent with the sum at end of transaction:

P
T: (X+Y = 50, 000 + 450 = 50, 450).

AP
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take place in isolation and
changes should be visible only after a they have been made to the main memory.

AP
Durability:
This property ensures that once the transaction has completed execution, the updates andmodifications
If the transaction fails after completion of T1 but before completion of T2.( say, after write(X) but before tothedatabasearestoredinandwrittentodiskandtheypersistevenifsystemfailureoccurs.Theseupdates now become permanent
write(Y)),thenamounthasbeendeductedfromXbutnotaddedtoY.Thisresultsinaninconsistent database state. Therefore, the and are stored in a non-volatile memory. The effects of the transaction, thus, are neverlost.
R
transaction must be executed in entirety in order to ensure correctness of databasestate.
Consistency

R
3. SCHEDULES
This means that integrity constraints must be maintained so that the database is consistent before and after the 1. SerialSchedules
O
transaction. It refers to correctness of a database. Schedules in which the transactions are executed non-interleaved, i.e., a serial schedule is one in which no transaction

O
Referring to the example above, starts until a running transaction has ended are called serial schedules.
UC

The total amount before and after the transaction must be maintained. Total Example: Consider the following schedule involving two transactions T1 and T2.

UC
before T occurs = 500 + 200 = 700. T1 T2
Total after T occurs = 400 + 300 = 700.
Therefore, database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As a result T is incomplete. R(A)
Isolation
W(A)
ST

This property ensures that multiple transactions can occur concurrently without leading to inconsistency of database

ST
state. Transactions occur independently without interference. Changes occurring in a particular transaction will not be R(B)
visible to any other transaction until that particular change in that transaction is written to memory or has been committed.
W(B)
This property ensures that the execution of transactions concurrently will result in a state that is equivalent to a state
achieved these were executed serially in some order. R(A)
Let X= 500, Y = 500.
Consider two transactions T and T". R(B)

where R(A) denotes that a read operation is performed on some data item 'A'
This is a serial schedule since the transactions perform serially in the order T1 —> T2
2. CompleteSchedules
Schedules in which the last operation of each transaction is either abort (or) commit are called complete
schedules.

3 Prepared By: Mrs. E. Ajitha (AP/CSE) 4 Prepared By: Mrs. E. Ajitha (AP/CSE)

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

Example: Consider the following schedule involving three transactions T1, T2 and T3.
T1 T2
T1 T2 T3
R(A)
R(A)
W(A)
W(A)
W(A)
R(B)
commit
W(B)
R(A)
commit
commit

P
commit

P
This schedule is cascadeless. Since the updated value of A is read by T2 only after the updating transaction
i.e. T1 commits.

AP
abort
5. StrictSchedules

AP
This is a complete schedule since the last operation performed under every transaction is either "commit" or A schedule is strict if for any two transactions Ti, Tj, if a write operation of Ti precedes a conflicting operation of
"abort". Tj (either read or write), then the commit or abort event of Ti also precedes that conflicting operation of Tj.
3. RecoverableSchedules In other words, Tj can read or write updated or written value of Ti only after Ti commits/aborts.
Schedules in which transactions commit only after all transactions whose changes they read commit are called Example: Consider the following schedule involving two transactions T1 and T2.
recoverable schedules. In other words, if some transaction Tj is reading value updated or written by some other T1 T2

R
transaction Ti, then the commit of Tj must occur after the commit of Ti.
R
Example – Consider the following schedule involving two transactions T1 and T2. R(A)

O
T1 T2 R(A)
O
R(A) W(A)

UC
W(A)
UC

commit
W(A) W(A)
R(A) R(A)

ST
commit
ST

commit
commit This is a strict schedule since T2 reads and writes A which is written by T1 only after the commit of T1.
Note – It can be seen that:
1. Cascadeless schedules are stricter than recoverable schedules or are a subset of recoverableschedules.
This is a recoverable schedule since T1 commits before T2, that makes the value read by T2 correct.
2. Strict schedules are stricter than cascadeless schedules or are a subset of cascadelessschedules.
4. Cascadeless Schedules –
3. Serial schedules satisfy constraints of all recoverable, cascadeless and strict schedules and hence is a subset of
Also called Avoids cascading aborts/rollbacks (ACA). Schedules in which transactions read values only after all
strictschedules.
transactions whose changes they are going to read commit are called cascadeless schedules. Avoids that a single
transaction abort leads to a series of transaction rollbacks. A strategy to prevent cascading aborts is to disallow a
4. SERIALIZABILITY
transaction from reading uncommitted changes from another transaction in the same schedule.
In other words, if some transaction Tj wants to read value updated or written by some other transaction Ti, then the commit Whenmultipletransactionsarerunningconcurrentlythenthereisapossibilitythatthedatabasemaybe left in an inconsistent
of Tj must read it after the commit of Ti. state. Serializability is a concept that helps us to check which schedules are serializable. A serializable schedule is the one
Example: Consider the following schedule involving two transactions T1 and T2. that always leaves the database in consistentstate.

5 Prepared By: Mrs. E. Ajitha (AP/CSE) 6 Prepared By: Mrs. E. Ajitha (AP/CSE)

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

What is a serializable schedule? R(A)


R(B)
A serializable schedule always leaves the database in consistent state. A serial schedule is always a
W(B)
serializableschedulebecauseinserialschedule,atransactiononlystartswhentheothertransactionfinished execution. However, a
W(A)
non-serial schedule needs to be checked forSerializability.
To convert this schedule into a serial schedule we must have to swap the R(A) operation of transaction
Types of Serializability
T2withtheW(A)operationoftransactionT1.However,wecannotswapthesetwooperationsbecausethey are conflicting operations,
There are two types of Serializability.
thus we can say that this given schedule is not ConflictSerializable.
I. ConflictSerializability
Let's take another example: T1T2
II. ViewSerializability
----- ------
1. Conflict Serializability R(A)
ConflictSerializabilityisoneofthetypeofSerializability,whichcanbeusedtocheckwhetheranon- serial schedule is conflict R(A)

P
serializable ornot. R(B)

P
What is Conflict Serializability? W(B)
R(B)

AP
A schedule is called conflict serializable if we can convert it into a serial schedule after swapping its non-conflicting
W(A)

AP
operations.
Let's swap non-conflicting operations:
Conflicting operations
After swapping R(A) of T1 and R(A) of T2 we get:
Two operations are said to be in conflict, if they satisfy all the following three conditions: T1 T2
1. Both the operations should belong to differenttransactions. ----- ------
2. Both the operations are working on same dataitem. R(A)

R
3. At least one of the operations is a write operation. Let's R(A)
R
see some examples to understandthis: R(B)
W(B)

O
Example 1: Operation W(X) of transaction T1 and operation R(X) of transaction T2 are conflicting operations,
R(B)
O
because they satisfy all the three conditions mentioned above. They belong to different transactions, they are
W(A)
working on same data item X, one of the operations in write operation. After swapping R(A) of T1 and R(B) of T2 we get:

UC
Example 2: Similarly, Operations W(X) of T1 and W(X) of T2 are conflicting operations. T1 T2
UC

Example 3: Operations W(X) of T1 and W(Y) of T2 are non-conflicting operations because both the write operations are not ----- ------
working on same data item so these operations don't satisfy the second condition. R(A)
Example 4: Similarly, R(X) of T1 and R(X) of T2 are non-conflicting operations because none of them is write operation. R(B)
Example 5: Similarly, W(X) of T1 and R(X) of T1 are non-conflicting operations because both the operations belong R(A)

ST
to same transaction T1. W(B)
ST

R(B)
Conflict Equivalent Schedules
W(A)
Two schedules are said to be conflict Equivalent if one schedule can be converted into other schedule after swapping After swapping R(A) of T1 and W(B) of T2 we get:
non-conflicting operations. T1 T2
Conflict Serializable check ----- ------
Let's check whether a schedule is conflict serializable or not. If a schedule is conflict Equivalent to its serial schedule then R(A)
it is called Conflict Serializable schedule. Let's take few examples of schedules. R(B)
Example of Conflict Serializability W(B)
Let's consider thisschedule: T1 R(A)
R(B)
T2 W(A)
We finally got a serial schedule after swapping all the non-conflicting operations so we can say that the given schedule
R(A)
R(B) is Conflict Serializable.

7 Prepared By: Mrs. E. Ajitha (AP/CSE) 8 Prepared By: Mrs. E. Ajitha (AP/CSE)

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

2. View Serializability
View Serializability is a process to find out that a given schedule is view serializable or not.
To check whether a given schedule is view serializable, we need to check whether the given schedule is View
Equivalent to its serial schedule. Lets take an example to understand what I mean bythat.
Given Schedule:
T1 T2
----- ------
R(X)
W(X)
R(X)
W(X)
R(Y)

P
W(Y)

P
R(Y)
W(Y)

AP

AP
Serial Schedule of the above given schedule:
As we know that in Serial schedule a transaction only starts when the current running transaction is finished. So the
serial schedule of the above given schedule would look like this:
T1 T2
----- ------
R(X)
R

R
W(X)
R(Y)
W(Y)
O

O
R(X)
W(X)
UC

R(Y)

UC
W(Y)
If we can prove that the given schedule is View Equivalent to its serial schedule then the given schedule is called view
Serializable.
Testing for Serializability
ST

ST
5. CONCURRENCY CONTROL
In the concurrency control, the multiple transactions can be executed simultaneously.
It may affect the transaction result. It is highly important to maintain the order of execution of those transactions.
6. NEED FOR CONCURRENCY
Problems of concurrencycontrol
Several problems can occur when concurrent transactions are executed in an uncontrolled manner.
Following are the three problems in concurrency control.

9 Prepared By: Mrs. E. Ajitha (AP/CSE) 10 Prepared By: Mrs. E. Ajitha (AP/CSE)

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems
 Lostupdates
o At time t4, Transactions-Y rollbacks. So, it changes A's value back to that of prior tot1.
 Dirtyread
o So, Transaction-X now contains a value which has never become part of the stabledatabase.
 Unrepeatableread
o Such type of problem is known as Dirty Read Problem, as one transaction reads a dirtyvalue which has
1. Lost updateproblem not beencommitted.
When two transactions that access the same database items contain their operations in a way that makes the value 3. Inconsistent RetrievalsProblem
of some database item incorrect, then the lost update problem occurs.
o Inconsistent Retrievals Problem is also known as unrepeatable read. When a transaction calculates some
If two transactions T1 and T2 read a record and then update it, then the effect of updating of the first record will be
summary function over a set of data while the other transactions are updating thedata, then the Inconsistent Retrievals
overwritten by the second update.
Problem occurs.
Example:
o A transaction T1 reads a record and then does some other processing during which the transaction T2
updates the record. Now when the transaction T1 reads the record, then the newvalue will be inconsistent with the
previousvalue.

P
Example:
Suppose two transactions operate on three accounts.

AP

AP
Here,
o At time t2, transaction-X reads A's value.
R
o At time t3, Transaction-Y reads A'svalue.

R
o At time t4, Transactions-X writes A's value on the basis of the value seen at timet2.
o At time t5, Transactions-Y writes A's value on the basis of the value seen at timet3.
O

O
o So at time T5, the update of Transaction-X is lost because Transaction y overwrites itwithout looking at its
currentvalue.
UC

UC
o Such type of problem is known as Lost Update Problem as update made by one transaction is losthere.
2. DirtyRead
o The dirty read occurs in the case when one transaction updates an item of the database,and then the
transaction fails for some reason. The updated database item is accessed by another transaction before it is
changed back to the originalvalue.
ST

ST
o A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now hasvalues which have
never formed part of the stabledatabase.
Example:

o Transaction-X is doing the sum of all balance while transaction-Y is transferring an amount 50 from
Account-1 toAccount-3.
o At time t2, transaction-Y writes A'svalue. o Here, transaction-X produces the result of 550 which is incorrect. If we write this produced result in the
o At time t3, Transaction-X reads A'svalue. database, the database will become an inconsistent state because the actual sum is600.
o Here, transaction-X has seen an inconsistent state of thedatabase.

11 Prepared By: Mrs. E. Ajitha (AP/CSE) 12 Prepared By: Mrs. E. Ajitha (AP/CSE)

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

Concurrency Control Protocol There are two phases of 2PL:


Concurrency control protocols ensure atomicity, isolation, and serializability of concurrent transactions. The  Growing phase: In the growing phase, a new lock on the data item may be acquired by the transaction,
concurrency control protocol can be divided into three categories: but none can be released.
1. Lock basedprotocol  Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released, but
2. Time-stampprotocol no new locks can beacquired.
In the below example, if lock conversion is allowed then the following phase can happen:
7. LOCKING PROTOCOLS Lock- 1. Upgrading of lock (from S(a) to X (a)) is allowed in growingphase.
Based Protocol 2. Downgrading of lock (from X(a) to S(a)) must be done in shrinkingphase.
In this type of protocol, any transaction cannot read or write data until it acquires an appropriate lock on it. There are two Example:
types of lock:

P
1. Shared lock:

P
o It is also known as a Read-only lock. In a shared lock, the data item can only read bythe transaction.

AP
o It can be shared between the transactions because when the transaction holds a lock, thenit can't update

AP
the data on the dataitem.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by thetransaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the samedata simultaneously.
R

R
8. TWO PHASE LOCKING (2PL)
The two-phase locking protocol divides the execution phase of the transaction into three parts.
O
o In the first part, when the execution of the transaction starts, it seeks permission for the lockit requires.

O
o In the second part, the transaction acquires all the locks. The third phase is started as soonas the
UC

transaction releases its firstlock.

UC
o In the third phase, the transaction cannot demand any new locks. It only releases theacquired locks. The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
o Growing phase: from step1-3
o Shrinking phase: from step5-7
ST

o Lock point: at3

ST
Transaction T2:
o Growing phase: from step2-6
o Shrinking phase: from step8-9
o Lock point: at6
Types of Two Phase Locking (2PL)
1. Strict Two-phase locking(Strict-2PL)
o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the locks, the
transaction continues to executenormally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock after using it.

13 Prepared By: Mrs. E. Ajitha (AP/CSE) 14 Prepared By: Mrs. E. Ajitha (AP/CSE)

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

o Strict-2PL waits until the whole transaction to commit, and then it releases all the locks ata time. Where,
o Strict-2PL protocol does not have shrinking phase of lockrelease. TS(TI) denotes the timestamp of the transaction Ti. R_TS(X)
denotes the Read time-stamp of data-item X. W_TS(X) denotes
the Write time-stamp of data-item X.
Thomas write Rule
Thomas Write Rule provides the guarantee of serializability order for the protocol. It improves the Basic
Timestamp Ordering Algorithm.

P
The basic Thomas write rules are as follows:
o If TS(T) < R_TS(X) then transaction T is aborted and rolled back, and operation isrejected.

P
It does not have cascading abort as 2PL does.

AP
2. Rigorous Two-PhaseLocking o If TS(T) < W_TS(X) then don't execute the W_item(X) operation of the transaction and
 Rigorous Two – Phase Locking Protocol avoids cascadingrollbacks.

AP
continueprocessing.
 This protocol requires that all the share and exclusive locks to be held until the transaction commits.
Timestamp Ordering Protocol o If neither condition 1 nor condition 2 occurs, then allowed to execute the WRITE operationby transaction Ti
o The Timestamp Ordering Protocol is used to order the transactions based on their Timestamps. The order and set W_TS(X) to TS(T).
R
of transaction is nothing but the ascending order of the transactioncreation.
o The priority of the older transaction is higher that's why it executes first. To determinethe timestamp of Problems

R
O
the transaction, this protocol uses system time or logicalcounter.
o The lock-based protocol is used to manage the order between conflicting pairsamong transactions

O
at the execution time. But Timestamp based protocols start working as soon as a transaction iscreated.
UC

o Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the system at

UC
007 times and transaction T2 has entered the system at 009 times. T1 has the higher priority, so it executes first as it
is entered the systemfirst.
o The timestamp ordering protocol also maintains the timestamp of last 'read' and'write' operation on
ST

adata.
Basic Timestamp ordering protocol works as follows:

ST
1. Check the following condition whenever a transaction Ti issues a Read (X)operation:
o If W_TS(X) >TS(Ti) then the operation isrejected.
o If W_TS(X) <= TS(Ti) then the operation isexecuted.
o Timestamps of all the data items areupdated.
2. Check the following condition whenever a transaction Ti issues a Write(X)operation:
o If TS(Ti) < R_TS(X) then the operation isrejected.
o If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwisethe operation
isexecuted.

15 Prepared By: Mrs. E. Ajitha (AP/CSE) 16 Prepared By: Mrs. E. Ajitha (AP/CSE)

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

Deadlock Avoidance

P
P
o When a database is stuck in a deadlock state, then it is better to avoid the database rather than aborting or restating the

database. This is a waste of time andresource.

AP
AP
o Deadlockavoidancemechanismisusedtodetectanydeadlocksituationinadvance.Amethodlike"wait for graph" is used for
detecting the deadlock situation but this method is suitable only for the smaller database. For the larger database,
deadlock prevention method can beused.
Deadlock Detection

R
R
9. DEADLOCK In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should detect whether the
A deadlock is a condition wherein two or more tasks are waiting for each other in order to be finished but none of transaction is involved in a deadlock or not. The lock manager maintains a Wait for the graph to detect the deadlock cycle in the

O
O
the task is willing to give up the resources that other task needs. In this situation no task ever gets finished and is in waiting database.
state forever.
Coffman conditions Wait for Graph

UC
UC

Coffman stated four conditions for a deadlock occurrence. A deadlock may occur if all the following conditions
holds true. o This is the suitable method for deadlock detection. In this method, a graph is created based on the transaction and their
 Mutual exclusion condition: There must be at least one resource that cannot be used by more than one process at atime. lock. If the created graph has a cycle or closed loop, then there is adeadlock.
 Hold and wait condition: A process that is holding a resource can request for additional resources that are being held
by other processes in thesystem. o Thewaitforthegraphis maintainedbythesystemforeverytransactionwhichiswaitingforsomedata held by the others. The
 No preemption condition: A resource cannot be forcibly taken from a process. Only the processcan release a

ST
ST

system keeps checking the graph if there is any cycle in thegraph.


resource that is being held byit.
 Circular wait condition: A condition where one process is waiting for a resource that is being heldby second process
and second process is waiting for third process ….so on and the last process is waiting for the first process. Thus, The wait for a graph for the above scenario is shown below:
making a circular chain ofwaiting.

For example: In the student table, transaction T1 holds a lock on some rows and needs to update some rows in the
grade table. Simultaneously, transaction T2 holds locks on some rows in the grade table and needs to update the rows in the
Student table held by Transaction T1.

Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and similarly, transaction T2 is
waiting for T1 to release its lock. All activities come to a halt state and remain at a standstill. It will remain in a standstill until the
DBMS detects the deadlock and aborts one of the transactions.

17 Prepared By: Mrs. E. Ajitha (AP/CSE) 18 Prepared By: Mrs. E. Ajitha (AP/CSE)

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

Deadlock Prevention 10. TRANSACTIONRECOVERY


o Deadlockpreventionmethodissuitableforalargedatabase.Iftheresourcesareallocatedinsuchaway that deadlock never

occurs, then the deadlock can beprevented.


o The Database management system analyzes the operations of the transaction whether they can create a deadlock
situation or not. If they do, then the DBMS never allowed that transaction to beexecuted.

I. Wait-Diescheme
In this scheme, if a transaction requests for a resource which is already held with a conflicting lock by another
transaction then the DBMS simply checks the timestamp of both transactions. It allows the older transaction to wait until the

P
resource is available for execution.
Let's assume there are two transactions Ti and Tj and let TS(T) is a timestamp of any transaction T. If T2 holds a

AP

AP
lock by some other transaction and T1 is requesting for resources held by T2 then the following actions are performed by
DBMS:
1. Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has held some resource, then Ti is allowed to wait
until the data-item is available for execution. That means if the older transaction is waiting for a resource which is
R

R
locked by the younger transaction, then the older transaction is allowed to wait for resource until it isavailable.
2. Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held some resource and if Tj iswaiting for it, then
O

O
Tj is killed and restarted later with the random delay but with the sametimestamp.
II. Wound waitscheme
UC

UC
o In wound wait scheme, if the older transaction requests for a resource which is held by the younger transaction,
then older transaction forces younger one to kill the transaction and release theresource. After the minute delay,
the younger transaction is restarted but with the sametimestamp.
ST

o If the older transaction has held a resource which is requested by the Younger transaction, thenthe younger

ST
transaction is asked to wait until older releasesit.

Here is the table representation of resource allocation for each algorithm. Both of these algorithms take
process age into consideration while determining the best possible way of resource allocation for deadlock avoidance.
One of the famous deadlock avoidance algorithm is Banker's algorithm

19 Prepared By: Mrs. E. Ajitha (AP/CSE) 20 Prepared By: Mrs. E. Ajitha (AP/CSE)

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

P
AP

AP
R

R
O

O
UC

UC
ST

21

DOWNLOADED FROM STUCOR APP


Prepared By: Mrs. E. Ajitha (AP/CSE) 22
ST
DOWNLOADED FROM STUCOR APP
Prepared By: Mrs. E. Ajitha (AP/CSE)
DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

P
AP

AP
R

R
O

O
UC

UC
ST

23

DOWNLOADED FROM STUCOR APP


Prepared By: Mrs. E. Ajitha (AP/CSE) 24
ST
DOWNLOADED FROM STUCOR APP
Prepared By: Mrs. E. Ajitha (AP/CSE)
DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

P
AP

AP
R

R
O

O
UC

UC
ST

25

DOWNLOADED FROM STUCOR APP


Prepared By: Mrs. E. Ajitha (AP/CSE) 26
ST
DOWNLOADED FROM STUCOR APP
Prepared By: Mrs. E. Ajitha (AP/CSE)
DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

P
AP

AP
R

R
O

O
UC

UC
ST

27

DOWNLOADED FROM STUCOR APP


Prepared By: Mrs. E. Ajitha (AP/CSE) 28
ST
DOWNLOADED FROM STUCOR APP
Prepared By: Mrs. E. Ajitha (AP/CSE)
DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

P
AP

AP
R

R
O

O
UC

UC
ST

29

DOWNLOADED FROM STUCOR APP


Prepared By: Mrs. E. Ajitha (AP/CSE) 30
ST
DOWNLOADED FROM STUCOR APP
Prepared By: Mrs. E. Ajitha (AP/CSE)
DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492-Database Management Systems CS8492-Database Management Systems

P
AP

AP
R

R
O

O
UC

UC
ST

31

DOWNLOADED FROM STUCOR APP


Prepared By: Mrs. E. Ajitha (AP/CSE) 32
ST
DOWNLOADED FROM STUCOR APP
Prepared By: Mrs. E. Ajitha (AP/CSE)
DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department of CSE & IT 2018 - 2019 Department of CSE & IT 2018 - 2019

UNITIV IMPLEMENTATION TECHNIQUES data. There is no striping.


RAID – File Organization – Organization of Records in Files – Indexing and Hashing –Ordered Indices – B+ tree Index Files – B tree Index  Read performance is improved since either disk can be read at the same time. Write performance is the same as for single
Files – Static Hashing – Dynamic Hashing – Query Processing Overview – Algorithms for SELECT and JOIN operations – Query optimization diskstorage.
using Heuristics and Cost Estimation  Every write is carried out on both disks. If one disk in a pair fails, data still available in theother.
RAID  Data loss would occur only if a disk fails, and its mirror disk also fails before the system is repaired Probability of
RAID (redundant array of independent disks) originally redundant array of inexpensive disks) is a way of storing the same data in combined event is verysmall.
different places on multiple hard disks to protect data in the case of a drive failure.
RAID: Redundant Arrays of Independent Disks
Disk organization techniques that manage a large numbers of disks, providing a view of a single disk of high capacity and high speed
by using multiple disks in parallel, and high reliability by storing data redundantly, so that data can be recovered even if a diskfails
Motivation for RAID
 Just as additional memory in form of cache, can improve the system performance, in the same way additional disks can also
improve systemperformance.
 In RAID we can use an array of disks which operates independently since there are many disks, multiple I/O requests can

P
be handled in parallel if the data required is on separatedisks
 A single I/O operation can be handled in parallel if the data required is distributed across multipledisks.
Benefits of RAID

AP

AP
 Data loss can be very dangerous for anorganization RAID Level 2:
 RAID technology prevents data loss due to diskfailure This configuration uses striping across disks, with some disks storing error checking and correcting (ECC) information. It has no
 RAID technology can be implemented in hardware orsoftware advantage over RAID 3 and is no longer used.
 Servers make use of RAIDTechnology
RAID Level 0- Stripping and non-redundant
 RAID level 0 divides data into block units and writes them across a number of disks. As data is placed across multiple disks it
is also called "dataStriping".
R

R
 The advantage of distributing data over disks is that if different I/O requests are pending for two different blocks of data,
then there is a possibility that the requested blocks are on differentdisks
O

O
UC

UC
RAID Level 3: Bit-Interleaved Parity
 A single parity bit is enough for error correction, not just detection, since we know which disk hasfailed
– When writing data, corresponding parity bits must also be computed and written to a parity bitdisk
– To recover data in a damaged disk, compute XOR of bits from other disks (including parity bitdisk)
ST

ST
 I/O operation addresses all the drives at the same time, RAID 3 cannot overlap I/O. For this reason, RAID 3 is best for
There is no parity checking of data. So if data in one drive gets corrupted then all the data would be lost. Thus RAID 0 does not single-user systems with long recordapplications.
support data recovery Spanning is another term that is used with RAID level 0 because the logical disk will span all the physical drives.
RAID 0 implementation requires minimum 2 disks.
Advantages
 I/O performance is greatly improved by spreading the I/O load across many channels &drives.
 Best performance is achieved when data is striped across multiple controllers with only one driver per controller

Disadvantages
 It is not fault-tolerant, failure of one drive will result in all data in an array beinglost
RAID Level 1: Mirroring (or shadowing)
 Also known asdisk mirroring, this configuration consists of at least two drives that duplicate the storage of

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 1 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 2

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department of CSE & IT 2018 - 2019 Department of CSE & IT 2018 - 2019

RAID Level 4: Block-Interleaved Parity


 When writing data block, corresponding block of parity bits must also be computed and written to parity disk File Organization
 To find value of a damaged block, compute XOR of bits from corresponding blocks (including parity block) from  The database is stored as a collection offiles.
otherdisks.  Each file is a sequence ofrecords.
 A record is a sequence offields.
 Classifications ofrecords
– Fixed lengthrecord
– Variable lengthrecord
 Fixed length recordapproach:
Assume record size is fixed each file has records of one particular type only different files are used for
different relations
Simple approach

P
- Record access is simple
Example pseudocode

P
AP
type account = record
RAID Level 5: account_number char(10);

AP
 RAID 5 uses striping as well as parity for redundancy. It is well suited for heavy read and low write operations. branch_name char(22); balance
 Block-Interleaved Distributed Parity; partitions data and parity among all N + 1 disks, rather than storing data in N disks and numeric(8);
parity in 1disk. end Total
bytes 40 for arecord
R

R
O

O
UC

UC
Two problems
- Difficult to delete record from thisstructure.
RAID Level 6:
- Some record will cross block boundaries, that is part of the record will be stored in one blockand
 This technique is similar to RAID 5, but includes a second parity scheme that is distributed across the drives in the array. The part in another. It would require two block accesses to read or write
ST

use of additional parity allows the array to continue to function even if two disks fail simultaneously. However, this extra Reuse the free space alternatives:
protection comes at acost. – move records i + 1, . . ., n to n i, . . . , n –1
 P+Q Redundancy scheme; similar to Level 5, but stores extra redundant information to guard against multiple diskfailures.

ST
– do not move records, but link all free records on a
- Better reliability than Level 5 at a higher cost; not used as widely. freelist
– Move the final record to deleted recordplace.
Free Lists
 Store the address of the first deleted record in the fileheader.
 Use this first record to store the address of the second deleted record, and soon

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 3 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 4

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department of CSE & IT 2018 - 2019 Department of CSE & IT 2018 - 2019

Variable-Length Records Organization of Records in Files


Byte string representation • Sequential – store records in sequential order, based on the value of the search key of eachrecord
 Attach an end-of-record () control character to the end of eachrecord • Heap – a record can be placed anywhere in the file where there isspace
 Difficulty withdeletion • Hashing – a hash function computed on some attribute of each record; the result specifies in which block of the file the
record should beplaced
0 perryridge A-102 400 A-201 900  Sequential File Organization
• Suitable for applications that require sequential processing of the entirefile
• The records in the file are ordered by asearch-key
1 roundhill A-305 350 

2 mianus A-215 700 


Disadvantage
 It is not easy to reuse space occupied formerly by deletedrecord.

P
 There is no space in general for records growslonger

P
Slotted Page Structure

AP
AP Deletion – use pointer chains
Insertion – locate the position where the record is to be inserted
– if there is free space insertthere
– if no free space, insert the record in an overflowblock

R
R
– In either case, pointer chain must beupdated
 Slotted page header contains:
– number of record entries Indexing and Hashing

O
– end of free space in theblock
O
Basic Concepts
– location and size of eachrecord  Indexing mechanisms used to speed up access to desireddata.
– E.g., author catalog inlibrary

UC
UC

PointerMethod  Search Key - attribute to set of attributes used to look up records in afile. An index
file consists of records (called index entries) of theform

Search-key pointer
 Index files are typically much smaller than the originalfile

ST
ST

 Two basic kinds ofindices:


– Ordered indices: search keys are stored in sortedorder
– Hashindices: search keys are distributed uniformly across "buckets" and by using a "hash function" the
 Avariable-lengthrecordisrepresentedbyalistoffixed-lengthrecords,chainedtogetherviapointers. values aredetermined.
 Can be used even if the maximum record length is notknown. Ordered Indices
Disadvantage to pointer structure; space is wasted in all records except the first in a a chain. Solution is to  In an ordered index, index entries are stored sorted on the search keyvalue.
allow two kinds of block in file:  Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of thefile.
 Anchor block – contains the first records ofchain  Secondary index: an index whose search key specifies an order different from the sequential order of the file.
 Overflow block – contains records other than those that are the first records ofchains. Types of Ordered Indices
 Denseindex
 Sparseindex
Dense Index Files

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 5 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 6

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department of CSE & IT 2018 - 2019 Department of CSE & IT 2018 - 2019

 Dense index — Index record appears for every search-key value in thefile. Index Update:Deletion
If deleted record was the only record in the file with its particular search-key value, the search-key is deleted from the indexalso.
 Single-level indexdeletion:
– Dense indices – deletion of search-key is similar to file recorddeletion.
– Sparse indices – if an entry for the search key exists in the index, it is deleted by replacing the entry in the index
with the next search-key value in the file (in search-key order). If the next search-key value already has an index
entry, the entry is deleted instead of beingreplaced.
Index Update:Insertion
 Single-level indexinsertion:
– Perform a lookup using the search-key value appearing in the record to beinserted.
– Dense indices – if the search-key value does not appear in the index, insertit.
Sparse Index Files – Sparse indices – if index stores an entry for each block of the file, no change needs to be made to the index
 SparseIndex unless a new block is created. In this case, the first search-key value appearing in the new block is inserted into

P
theindex.

P
contains index records for only some search-key values.
 To locate a record with search-key value Kwe: Secondary Index on balance field of account

AP
– Find index record with largest search-key value that is less than or equal to Search

AP
file sequentially starting at the record to which the index recordpoints

R
R

O
O
Multilevel Index
Primary and Secondary Indices
 If primary index does not fit in memory, access becomesexpensive.
 Secondary indices have to bedense.

UC
 To reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and construct a  Indices offer substantial benefits when searching forrecords.
UC

sparse index onit.


 When a file is modified, every index on the file must be updated, Updating indices imposes overhead on databasemodification.
– outer index – a sparse index of primaryindex
– inner index – the primary indexfile  Sequential scan using primary index is efficient, but a sequential scan using a secondary index isexpensive
– each record access may fetch a new block fromdisk
 If even outer index is too large to fit in main memory, yet another level of index can be created, and soon.
`
B+-Tree Index Files

ST
ST

Example of a B+-tree : B+-tree for account file (n = 3)

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 7 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 8

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department of CSE & IT 2018 - 2019 Department of CSE & IT 2018 - 2019
+
 Disadvantage of indexed-sequential files: performance degrades as file grows, since many overflow blocks get created. B -tree for account file (n = 5)
Periodic reorganization of entire file isrequired.
 Advantage of B+-tree index files: automatically reorganizes itself with small, local, changes, in the face of insertions and
deletions. Reorganization of entire file is not required to maintainperformance.
 Disadvantage of B+-trees: extra insertion and deletion overhead, spaceoverhead.
+
A B -tree is a rooted tree satisfying the following properties:
 All paths from root to leaf are of the samelength
 Each node that is not a root or a leaf has between [n/2] and nchildren.
 Specialcases:  Non-leaf nodes other than root must have between 3 and 5 children ((n/2and n with n=5).
– If the root is not a leaf, it has at least 2children.  Root must have at least 2 children.
– If the root is a leaf, it can have between 0 and (n–1)values. +
Observations about B -trees
+

P
B -Tree Node Structure

P
 Since the inter-node connections are done by pointers, "logically" close blocks need not be "physically" close.
 Typical node
 The B+-tree contains a relatively small number of levels thus searches can be conductedefficiently.
 Insertions and deletions to the main file can be handledefficiently.

AP

AP
Updates on B+-Trees: Insertion
 Find the leaf node in which the search-key value wouldappear
– Kiare the search-keyvalues
 If the search-key value is already there in the leaf node, record is added to file and if necessary a pointer is inserted into
– Piare pointers to children (for non-leaf nodes) or pointers to records or buckets of records (for leaf nodes).
thebucket.
 The search-keys in a node areordered
 If the search-key value is not there, then add the record to the main file and create a bucket if necessary.Then:
K1< K2< K3< . . . < Kn–1
– If there is room in the leaf node, insert (key-value, pointer) pair in the leaf node otherwise, split the node.
Properties of Leaf Nodes
Example: B+-Tree before and after insertion of "Clearview"
R

R
 For i = 1, 2, . . ., n–1, pointer Pieither points to a file record with search-key value Ki, or to a bucket of pointers to file
records, each record having search-key valueKi.
 Pnpoints to next leaf node in search-keyorder
O

O
UC

UC
Non-Leaf Nodes in B+-Trees
ST

ST
Non leaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node with m pointers:
– All the search-keys in the subtree to which P1points are less thanK1.

Example of a B -tree: B+-tree for account file (n = 3)


+
Updates on B+-Trees: Deletion
• Find the record to be deleted, and remove it from the main file and from the bucket (ifpresent)
• Remove (search-key value, pointer) from the leaf node if there is no bucket or if the bucket has become empty
• If the node has too few entries due to the removal, and the entries in the node and a sibling fit into a single node, then
– Insertallthesearch-keyvaluesinthetwonodesintoasinglenode(theoneontheleft),anddelete

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 9 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 10

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department of CSE & IT 2018 - 2019 Department of CSE & IT 2018 - 2019

the other node. B-Tree Index Files


– Delete the pair (Ki–1, Pi), where Piis the pointer to the deleted node, from its parent, recursively using the • Similar to B+-tree, but B-tree allows search-key values to appear only once; eliminates redundant storage of searchkeys.
aboveprocedure. • Search keys in nonleaf nodes appear nowhere else in the B-tree; an additional pointer field for each search key in a
nonleaf node must beincluded.

Before and after deleting "Downtown" Generalized B-tree leaf node

P
P

AP
AP
Nonleaf node – pointers Biare the bucket or file record pointers.
B+-tree on same data

• The removal of the leaf node containing "Downtown" did not result in its parent having too little pointers. So the cascaded

R
deletions stopped with the deleted leaf node'sparent.
R
Deletion of "Perryridge" from result of previous example

O
O
Advantages of B-Tree indices:

UC
– May use less tree nodes than a correspondingB -Tree.
+
UC

– Sometimes possible to find search-key value before reaching leafnode.


Disadvantages of B-Tree indices:
– Only small fraction of all search-key values are foundearly
• Node with "Perryridge" becomes empty and merged with itssibling. – Non-leaf nodes are larger, so fan-out is reduced (no. of pointers). Thus, B-Trees typically have greater depth
• Root node then had only one child, and was deleted and its child became the new rootnode +
than correspondingB -Tree

ST
– Insertion and deletion more complicated than inB -Trees
+ +
B -Tree File Organization
ST

• The leaf nodes in a B+-tree file organization store records, instead ofpointers. – Implementation is harder thanB+-Trees.
• Since records are larger than pointers, the maximum number of records that can be stored in a leaf node is less than the
number of pointers in a nonleafnode. HASHING
• Leaf nodes are still required to be halffull. • Hashing is an effective technique to calculate the direct location of a data record on the disk without using indexstructure.
• Insertion and deletion are handled in the same way as insertion and deletion of entries in a B -treeindex. • Hashing uses hash functions with search keys as parameters to generate the address of a datarecord.
+

Hash Organization
Bucket
A hash file stores data in bucket format. Bucket is considered a unit of storage. A bucket typically stores one
complete disk block, which in turn can store one or more records.
Hash Function
A hash function, h, is a mapping function that maps all the set of search-keys K to the address where actual
records are placed. It is a function from search keys to bucketaddresses.
 Worst hash function maps all search-key values to the samebucket.

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 11 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 12

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department of CSE & IT 2018 - 2019 Department of CSE & IT 2018 - 2019

 An ideal hash function is uniform, i.e., each bucket is assigned the same number of search-key values from the set of all
possiblevalues.
 Ideal hash function is random, so each bucket will have the same number of records. Types
• StaticHashing
• DynamicHashing
Static Hashing
 In static hashing, when a search-key value is provided, the hash function always computes the sameaddress.
 For example, if mod-4 hash function is used, then it shall generate only 5 values. The output address shall always be same
for thatfunction.
 The number of buckets provided remains unchanged at alltimes.
Example of Hash File Organization
 There are 10buckets,
 The hash function returns the sum of the binary representations of the characters modulo10

P
– E.g. h(Perryridge)=5

P
h(Round Hill) = 3 h(Brighton) =3
Hash Indices
• Hashing can be used not only for file organization, but also for index-structurecreation.

AP
• A hash index organizes the search keys, with their associated record pointers, into a hash filestructure.

AP
• Hash indices are always secondaryindices
R

R
O

O
UC

UC
Operation Deficiencies of Static Hashing
 Insertion − When a record is required to be entered using static hash, the hash function h computes the bucket address for  In static hashing, function h maps search-key values to a fixed set of B of bucketaddresses.
search key K, where the record will bestored. – Databases grow with time. If initial number of buckets is too small, performance will degrade due to too
Bucket address = h(K) muchoverflows.
ST

 Search − When a record needs to be retrieved, the same hash function can be used to retrieve the address of the bucket – If file size at some point in the future is anticipated and number of buckets allocated accordingly, significant amount

ST
where the data isstored. of space will be wastedinitially.
 Delete − This is simply a search followed by a deletionoperation. – If database shrinks, again space will bewasted.
Handling of Bucket Overflows  These problems can be avoided by using techniques that allow the number of buckets to be modified dynamically.
 Bucket overflow can occur becauseof Dynamic Hashing
– Insufficientbuckets • Good for database that grows and shrinks insize
– Skew in distribution of records. This can occur due to: • Allows the hash function to be modifieddynamically
• multiple records have same search-keyvalue • Extendable hashing – one form of dynamichashing
 Although the probability of bucket overflow can be reduced, it cannot be eliminated; it is handled by using overflowbuckets. – Hash function generates values over a large range — typically b-bit integers, with b =32.
 Overflow chaining – the overflow buckets of a given bucket are chained together in a linkedlist. – At any time use only a prefix of the hash function to index into a table of bucketaddresses.
 Above scheme is called closedhashing. – Let the length of the prefix be i bits, 0 i 32.
– Bucket address table size = 2 Initially i =0
i.
– An alternative, called open hashing, which does not use overflow buckets, is not suitable for databaseapplications.
– Value of i grows and shrinks as the size of the database grows andshrinks.
– Multiple entries in the bucket address table may point to abucket.
– Thus, actual number of buckets is <2i

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 13 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 14

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department of CSE & IT 2018 - 2019 Department of CSE & IT 2018 - 2019

– The number of buckets also changes dynamically due to coalescing and splitting ofbuckets.
General Extendable Hash
In this structure, i2= i3= i, whereas i1= i – 1

P
Insertion in Extendable Hash Structure

AP
To split a bucket j when inserting record with search-key value Kj:
 If i > ij(more than one pointer to bucketj)
– allocate a new bucket z, and set ij= iz= (ij+1)
– Update the second half of the bucket address table entries originally pointing to j, to point toz
– remove each record in bucket j and reinsert (in j orz)
– recompute new bucket for Kjand insert record in the bucket (further splitting is required if the bucket is stillfull)
 If i = ij(only one pointer to bucketj)

R
– If i reaches some limit b, or too many splits have happened in this insertion, create an overflow bucket
 Else
– increment i and double the size of the bucket addresstable.

O
– replace each entry in the table by two entries that point to the samebucket.
– recompute new bucket address table entry forKj
Now i > ijso use the first case above.

UC
Deletion in Extendable Hash Structure
 To delete a keyvalue,
– locate it in its bucket and removeit.
– The bucket itself can be removed if it becomes empty (with appropriate updates to the bucket addresstable).
– Coalescing of buckets can be done (can coalesce only with a "buddy" bucket having same value of ijand same ij–1
prefix, if it ispresent)

ST
– Decreasing bucket address table size is alsopossible
• Note: decreasing bucket address table size is an expensive operation and should be done only if number
of buckets becomes much smaller than the size of thetable
Example

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 15 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 16

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department of CSE & IT 2018 - 2019 Department of CSE & IT 2018 - 2019

appropriate bucket
Updates in Extendable Hash Structure
 To insert a record with search-key valueKj
– follow same procedure as look-up and locate the bucket, sayj.
– If there is room in the bucket j insert record in thebucket.
Initial Hash structure, bucket size = 2 – Overflow buckets used instead in somecases.
Hash structure after insertion of one Brighton and two Downtown records  To delete a keyvalue,
– locate it in its bucket and removeit.
– The bucket itself can be removed if it becomesempty
– Coalescing of buckets can bedone
– Decreasing bucket address table size is alsopossible
 Benefits of extendablehashing:
– Hash performance does not degrade with growth offile
– Minimal space overhead

P
P
 Disadvantages of extendable hashing
– Extra level of indirection to find desired record

AP
AP
Bucket address table may itself become verybig.
Hash structure after insertion of Mianus record
QUERY PROCESSING OVERVIEW
1. The scanning, parsing, and validating module produces an internal representation of thequery.
2. The query optimizer module devises an execution plan which is the execution strategy to retrieve the result of the query from
the database files. A query typically has many possible execution strategies differing in performance, and the process of
choosing a reasonably efficient one is known as query optimization. Query optimization is beyond this course. The code

R
generator generates the code to execute the plan.The runtime database processor runs the generated code to produce the
R
queryresult.

O
O
Hash structure after insertion of three Perryridge records

UC
UC

ST
ST

Evaluation of SQL Statement


The query is evaluated in a different order.
Hash structure after insertion of Redwood and Round Hill records  The tables in the from clause are combined using Cartesian products.The where predicate is thenapplied.
 The resulting tuples are grouped according to the group by clause.The having predicate is applied to each group, possibly
eliminating some groups.The aggregates are applied to each remaining group.The select clause is performedlast.
Translating SQL Queries into Relational Algebra
 SQL query is first translated into an equivalent extended relational algebraexpression.
 SQL queries are decomposed into query blocks, which form the basic units that can be translated into the algebraic
operators andoptimized.
 Query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clauses.
Use of Extendable Hash Structure  Nested queries within a query are identified as separate queryblocks.
 To locate the bucket containing search-keyKj: Example:
1. Compute h(Kj) =X
2. UsethefirstihighorderbitsofXasadisplacementintobucketaddresstable,andfollowthepointerto

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 17 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 18

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department of CSE & IT 2018 - 2019 Department of CSE & IT 2018 - 2019

SELECT LNAME, FNAME – (OP3):σDNO=5(EMPLOYEE)


FROM EMPLOYEE – (OP4):σ DNO=5 AND SALARY>30000 AND SEX = 'F'(EMPLOYEE)
WHERE SALARY > (SELECT MAX(SALARY) – (OP5):σESSN='123456789' AND PNO=10(WORKS_ON)
FROM EMPLOYEE  Many search methods can be used for simple selection: S1 throughS6
WHERE DNO=5);  S1: Linear Search (brute force) –full scan in Oracle'sterminology-
 The innerblock – Retrieves every record in the file, and test whether its attribute values satisfy the selection condition: an
• (SELECT MAX (SALARY) FROM EMPLOYEE WHEREDNO=5) expensiveapproach.
–Translatedin: – Cost: b/2 if key and b if nokey
• ℑ MAX SALARY(σDNO=5(EMPLOYEE))  S2: BinarySearch
 The Outerblock – If the selection condition involves an equality comparison on a key attribute on which the file is ordered.
• SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY >C – σSSN='1234567' (EMPLOYEE), SSN is the orderingattribute.
–Translatedin: – Cost: log2b ifkey.
• ∏ LNAZME, FNAME ( σSALARY>C(EMPLOYEE))  S3: Using a Primary Index (hashkey)

P
(C represents the result returned from the inner block.) – An equality comparison on a key attribute with a primary index (or hashkey).

P
 The query optimizer would then choose an execution plan for eachblock. – This condition retrieves a single record (atmost).
 The inner block needs to be evaluated only once. (Uncorrelated nestedquery). – Cost :primary index : bind/2 + 1(hash key: 1bucket if nocollision).

AP
 It is much harder to optimize the more complex correlated nestedqueries. 

AP
S4: Using a primary index to retrieve multiplerecords
External Sorting – Comparison condition is >, >=, <, or <= on a key field with a primaryindex
It refers to sorting algorithms that are suitable for large files of records on disk that do not fit entirely in main memory, such – σDNUMBER>5(DEPARTMENT)
as most database files.. – Use the index to find the record satisfying the corresponding equality condition (DNUMBER=5), then retrieve all
 ORDERBY. subsequent records in the (ordered)file.
 Sort-merge algorithms for JOIN and other operations (UNION, INTERSECTION). Duplicate elimination algorithms for the – For the condition (DNUMBER <5), retrieve all the precedingrecords.
– Method used for range queries too(i.e. queries to retrieve records in certainrange)

R
PROJECT operation (DISTINCT). Typical external sorting algorithm uses a sort-merge strategy: Sort phase: Create sort
– Cost: bind/2 + ?. '?' could be known if the number of duplicatesknown.
R
small sub-files (sorted sub-files are calledruns).
 Merge phase: Then merges the sorted runs. N-way merge uses N memory buffers to buffer input runs, and 1 block to  S5: Using a clustering index to retrieve multiplerecords
– If the selection condition involves an equality comparison on a non-key attribute with a

O
buffer output. Select the 1st record (in the sort order) among input buffers, write it to the output buffer and delete it from the
O
input buffer. If output buffer full, write it to disk. If input buffer empty, read next block from the correspondingrun.. clusteringindex.
E.g. 2-way Sort-Merge – σDNO=5(EMPLOYEE)
– Use the index to retrieve all the records satisfying thecondition.

UC
– Cost: log2bind + ?. '?' could be known if the number of duplicatesknown.
UC

 S6: Using a secondary (B+-tree) index on an equalitycomparison


– The method can be used to retrieve a single record if the indexing field is a key or to retrieve multiple records if the
indexing field is not akey.
– This can also be used for comparisons involving >, >=, <, or<=.
– Method used for range queriestoo.

ST
– Cost to retrieve: a key= height + 1; a non key= height+1(extra-level)+?,comparison=(height-1)+?+?
ST

 Many search methods can be used for complex selection which involve a Conjunctive Condition: S7 through as S9.
Basic Algorithms for Executing Relational Query Operations – Conjunctive condition: several simple conditions connected with the AND logicalconnective.
 An RDBMS must include one or more alternative algorithms that implement each relational algebra operation (SELECT, – (OP4): s DNO=5 AND SALARY>30000 AND SEX = 'F'(EMPLOYEE).
JOIN,…) and, in many cases, that implement each combination of theseoperations.  S7:Conjunctive selection using an individualindex.
 Each algorithm may apply only to particular storage structures and access paths (suchindex,…). – If an attribute involved in any single simple condition in the conjunctive condition has an access path that permits the
 Only execution strategies that can be implemented by the RDBMS algorithms and that apply to the particular query and use of one of the Methods S2 to S6, use that condition to retrieve therecords.
particular database design can be considered by the query optimizationmodule. – Then check whether each retrieved record satisfies the remaining simple conditions in the conjunctive condition
Algorithms for implementing SELECT operation  S8:Conjunctive selection using a compositeindex:
 These algorithms depend on the file having specific access paths and may apply only to certain types of selectionconditions. – If two or more attributes are involved in equality conditions in the conjunctive condition and a composite index (or
 We will use the following examples of SELECToperations: hash structure) exists on the combinedfields.
– (OP1):σSSN='123456789'(EMPLOYEE) – Example:Ifanindexhasbeencreatedonthecompositekey(ESSN,PNO)oftheWORKS_ONfile,
– (OP2):σ DNUMBER > 5(DEPARTMENT)

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 19 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 20

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems CS8492 /Database Management Systems
Department of CSE & IT 2018 - 2019 Department of CSE & IT 2018 - 2019

we can use the index directly. 3. Using commutativity and associativity of binary operations, rearrange the leaf nodes of thetree
– (OP5):σESSN='123456789' AND PNO=10(WORKS_ON). 4. Combine a CARTESIAN PRODUCT operation with a subsequent SELECT operation in the tree into a JOIN operation, if the
 S9: Conjunctive selection by intersection of recordpointers condition represents a joincondition
– If the secondary indexes are available on more than one of the fields involved in simple conditions in the conjunctive 5. Using the cascading of PROJECT and the commuting of PROJECT with other operations, break down and move lists of
condition, and if the indexes include record pointers (rather than block pointers), then each index can be used to projection attributes down the tree as far as possible by creating new PROJECT operations as needed
retrieve the set of record pointers that satisfy the individualcondition. 6. Identify sub-trees that represent groups of operations that can be executed by a singlealgorithm
– The intersection of these sets of record pointers gives the record pointers that satisfy the conjunctive condition. Example
– If only some of the conditions have secondary indexes, each retrieval record is further tested to determine whether it  Query
satisfies the remainingconditions. "Find the last names of employees born after 1957 who work on a project named 'Aquarius'."
 SQL
Algorithms for implementing JOIN Operation SELECT LNAME
 Join: time-consuming operation. We will consider only natural joinoperation FROM EMPLOYEE, WORKS_ON, PROJECT
– Two-way join: join on twofiles. WHERE PNAME='Aquarius' AND PNUMBER=PNO AND ESSN=SSN AND BDATE.'1957-12-31';

P
P
– Multiway join: involving more than twofiles.
 The following examples of two-way JOIN operation (RΘ A=BS) will beused:
– OP6: EMPLOYEE Θ DNO=DNUMBERDEPARTMENT

AP
AP
– OP7: DEPARTMENT Θ MGRSSN=SSNEMPLOYEE
 J1: Nested-loop join (bruteforce)
– For each record t in R (outer loop), retrieve every record s from S (inner loop) and test whether the two records
satisfy the join condition t[A] =s[B].
 J2: Single-loop join (using an access structure to retrieve the matchingrecords)
– If an index (or hash key) exists for one of the two join attributes (e.g B of S), retrieve each record t in R, one at a

R
R
time (single loop), and then use the access structure to retrieve directly all matching records s from S that satisfy s[B]
=t[A]
 J3: Sort-mergejoin:
– If the records of R and S are physically sorted (ordered) by value of the join attributes A and B, respectively, we

O
O
can implement the join in the most efficientway.
– Both files are scanned concurrently in order of the join attributes, matching the records that have the same values
for A and B.

UC
UC

– If the files are not sorted, they may be sorted first by using externalsorting.
– Pairs of file blocks are copied into memory buffers in order and records of each file are scanned only once each
for matching with the other file if A & B are keyattributes.
– The method is slightly modified in case where A and B are not keyattributes. Cost Estimation in Query Optimization
 J4:Hash-join The main aim of query optimization is to choose the most efficient way of implementing the relational algebra operations at the
– The records of files R and S are both hashed to the same hash file using the same hashing function on the join lowest possible cost.

ST
ST

attributes A of R and B of S as hashkeys.  The query optimizer should not depend solely on heuristic rules, but, it should also estimate the cost of executing the different
 PartitioningPhase strategies and find out the strategy with the minimum costestimate.
– First, a single pass through the file with fewer records (say, R) hashes its records to the hash file buckets. The cost functions used in query optimization are estimates and not exact cost functions.
– Assumption: The smaller file fits entirely into memory buckets after the firstphase.  The cost of an operation is heavily dependent on its selectivity, that is, the proportion of select operation(s) that forms the
– (If the above assumption is not satisfied, the method is a more complex one and number of variations have been output.
proposed to improve efficiency: partition has join and hybrid hashjoin.)  In general the different algorithms are suitable for low or high selectivityqueries.
 ProbingPhase  In order for query optimizer to choose suitable algorithm for an operation an estimate of the cost of executing that algorithm
– A single pass through the other file (S) then hashes each of its records to probe appropriate bucket, and that record must beprovided
is combined with all matching records from R in thatbucket. The cost of an algorithm depends on cardinality of its input.
Heuristic-Based Query Optimization  To estimate the cost of different query execution strategies, the query tree is viewed as containing a series of basic
1. Break up SELECT operations with conjunctive conditions into a cascade of SELECToperations operations which are linked in order to perform thequery.
2. Using the commutativity of SELECT with other operations, move each SELECT operation as far down the query tree as is  It is also important to know the expected cardinality of an operation's output because this forms the input to the next operation.
permitted by the attributes involved in the selectcondition

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 21 St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 22

DOWNLOADED FROM STUCOR APP DOWNLOADED FROM STUCOR APP


DOWNLOADED FROM STUCOR APP
CS8492 /Database Management Systems
Department of CSE & IT 2018 - 2019

P
Cost Components of Query Execution
The cost of executing the query includes the following components:

AP
– Access cost to secondarystorage.
– Storagecost.
– Computationcost.
– Memory usescost.
– Communicationcost.
Importance of Access cost
Out of the above five cost components, the most important is the secondary storage access cost.
R
 The emphasis of the cost minimization depends on the size and type of databaseapplications.
 For example in smaller database the emphasis is on the minimizing computing cost as because most of the data in the files
involve in the query can be completely store in the mainmemory.
O
 For large database, the main emphasis is on minimizing the access cost to secondarydevice.
 For distributed database, the communication cost is minimized as because many sites areinvolved
 for the datatransfer.
UC

Cost functions for SELECT Operation Linear


Search:
– [nBlocks(R)/2], if the record isfound.
– [nBlocks(R)], if no record satisfied thecondition.
Binary Search :
o [log2(nBlocks(R))], if equality condition is on key attribute, because SCA(R) = 1 in this case.
ST

o[log2(nBlocks(R))] + [SCA(R)/bFactor(R)] – 1, otherwise.


Equity condition on Primary key
– [nLevelA(I) +1]
Equity condition on Non-Primary key
– [nLevelA(I) + 1] +[nBlocks(R)/2]
Cost functions for JOIN Operation
Join operation is the most time consuming operation to process.
 An estimate for the size (number of tuples) of the file that results after the JOIN operation is required to develop
reasonably accurate cost functions for JOINoperations.
 The JOIN operations define the relation containing tuples that satisfy a specific predicate F from the Cartesian product of
two relations R andS.

St. Joseph's College of Engineering/St. Joseph's Institute of Technology Page 23

DOWNLOADED FROM STUCOR APP

You might also like