0% found this document useful (0 votes)

23 views28 pages

Unit II DBMS Notes

DBMS Notes

Uploaded by

kashokkumar scteng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views28 pages

Unit II DBMS Notes

DBMS Notes

Uploaded by

kashokkumar scteng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 28

UNIT – II

SQL Standards - Data types - Basic Structure of SQL Queries - DDL-DML-DCL-TCL - Views-
Advanced SQL -Embedded SQL - Static Vs Dynamic SQL - Query Processing - Query Optimization-
Heuristic and Cost based Query Optimization.

STRUCTURED QUERY LANGUAGE(SQL) FUNDAMENTALS:

Structured Query Language(SQL) as we all know is the database language by the use of which
we can perform certain operations on the existing database and also we can use this language to create
a database. SQL uses certain commands like Create, Drop, Insert etc. to carry out the required tasks.
These SQL commands are mainly categorized into four categories as:
 DDL – Data Definition Language

 DQl – Data Query Language

 DML – Data Manipulation Language DCL – Data Control Language

Though many resources claim there to be another category of SQL clauses

DDL (Data Definition Language) : DDL or Data Definition Language actually consists of the SQL
commands that can be used to define the database schema. It simply deals with descriptions of the
database schema and is used to create and modify the structure of database objects in the database.

Examples of DDL commands:

 CREATE – is used to create the database or its objects (like table, index, function, views, store
procedure and triggers).

Ex:
CREATE TABLE students (id INT, name VARCHAR(50));

 DROP – is used to delete objects from the database.

Ex:
DROP TABLE students;

 ALTER-is used to alter the structure of the database.

Ex:
ALTER TABLE students ADD age INT;

 TRUNCATE–is used to remove all records from a table, including all spaces allocated for the
records are removed.

Ex:
TRUNCATE TABLE students;

 COMMENT –is used to add comments to the data

dictionary.

 RENAME –is used to rename an object existing in the

database.
Syntax:
RENAME TABLE old_table_name TO new_table_name;
RENAME TABLE students TO learners;

DQL(Data Query Language) :DML statements are used for performing queries on the data within
schema objects. The purpose of DQL Command is to get some schema relation based on the
query passed to it.

Example of DQL:

SELECT – is used to retrieve data from the a database.

Ex:
SELECT * FROM students;

DML (Data Manipulation Language): The SQL commands that deals with the manipulation of data present
in the database belong to DML or Data Manipulation Language and this includes most of the SQL
statements.

Examples of DML:

 INSERT – is used to insert data into a table.

Ex:
INSERT INTO students (id, name) VALUES (1, 'Alice');

 UPDATE – is used to update existing data within a table.

Ex:
UPDATE students SET name = 'Bob' WHERE id = 1;

 DELETE – is used to delete records from a database table.

Ex:
`DELETE FROM students WHERE id = 1;

DCL (Data Control Language): DCL includes commands such as GRANT and REVOKE which
mainly deals with the rights, permissions and other controls of the database system.

Examples of DCL commands:

 GRANT-gives user’s access privileges to database.

Ex:
GRANT SELECT ON students TO user1

 REVOKE-withdraw user’s access privileges given by using the GRANT command.

Ex:
GRANT SELECT ON students TO user1
TCL(transaction Control Language) : TCL commands deals with the transaction within
the database.

Examples of TCL commands:

 COMMIT– commits a Transaction.

 ROLLBACK– rollbacks a transaction in case of any error occurs.

 SAVEPOINT–sets a savepoint within a transaction.

 SET TRANSACTION–specify characteristics for the transaction.

1.11.1SQL DATA TYPES:

SQL Data Type is an attribute that specifies the type of data of any object. Each column, variable and expression
has a related data type in SQL. You can use these data types while

creating your tables. You can choose a data type for a table column based on your requirement.
SQL Server offers six categories of data types for your use which are listed below −

Exact Numeric Data Types

DATA TYPE- FROM- TO

Bigint 9,223,372,036,854,775, 9,223,372,036,854,775,807

Int 2,147,483, 648 2,147,483,647

smallint 32,768 32,767

tinyint- 0 255

bit 0 1

decimal -10^38 +1 -10^38 -1

numeric -10^38 +1 -10^38 -1

Approximate Numeric Data Types

DATA TYPE FROM TO

Float -1.79E + 308 1.79E + 308

Real -3.40E + 38 3.40E + 38

Date and Time Data Types

DATA TYPE FROM TO

Datetime Jan 1, 1753 Dec 31, 9999

Smalldatetime Jan 1, 1900 Jun 6, 2079

Date Stores a date like June 30, 1991

Time Stores a time of day like 12:30 P.M.

Note − Here, date time has 3.33 milliseconds accuracy where as small date time has 1 minute accuracy.
Character Strings Data Types

DATA TYPE & Description

 Char

Maximum length of 8,000 characters.( Fixed length non-Unicode characters)

 Varchar

Maximum of 8,000 characters.(Variable-length non-Unicode data).

 varchar(max)
Maximum length of 2E + 31 characters, Variable-length non-Unicode data
(SQL Server 2005 only).

 Text

Variable-length non-Unicode data with a maximum length of 2,147,483,647

characters.

Unicode Character Strings Data Types

S.No. DATA TYPE & Description

Nchar
1
Maximum length of 4,000 characters.( Fixed length Unicode)

Nvarchar
2
Maximum length of 4,000 characters.(Variable length Unicode)

nvarchar(max)
3
Maximum length of 2E + 31 characters (SQL Server 2005 only).(Variable length Unicode)

Ntext
4
Maximum length of 1,073,741,823 characters. ( Variable length Unicode )
Binary Data Types
S.No. DATA TYPE & Description

Binary
1
Maximum length of 8,000 bytes(Fixed-length binary data )

Varbinary
2
Maximum length of 8,000 bytes.(Variable length binary data)

varbinary(max)
3
Maximum length of 2E + 31 bytes (SQL Server 2005 only). (Variable length Binary data)

Image
4
Maximum length of 2,147,483,647 bytes. ( Variable length Binary Data)

Misc Data Types

S.No. DATA TYPE & Description

sql_variant
1 Stores values of various SQL Server-supported data types, except text, ntext,
and timestamp.

Timestamp
2
Stores a database-wide unique number that gets updated every time a row gets
updated

uniqueidentifier
3
Stores a globally unique identifier (GUID)

Xml
Stores XML data. You can store xml instances in a column or a variable (SQL
4
Server 2005 only).

cursor
5
Reference to a cursor object

table
6
Stores a result set for later processing

DATABASE OBJECT:
A database object is any defined object in a database that is used to store or reference data.
Anything which we make from create command is known as Database Object.It can be used to hold and
manipulate the data. Some of the examples of database objects are : view, sequence, indexes, etc.

 Table – Basic unit of storage; composed rows and columns

 View – Logically represents subsets of data from one or more tables

 Sequence – Generates primary key values

 Index – Improves the performance of some queries

 Synonym – Alternative name for an object

1.11.2 ADVANCED SQL FEATURES:

There are various advanced SQL query features.

 Union
 Union All
 Intersect
 Minus

UNION Operation

UNION is used to combine the results of two or more SELECT statements. However it will
eliminate duplicate rows from its resultset. In case of union, number of columns and datatype must be
same in both the tables, on which UNION operation is being applied.

Example of UNION
The First table,
ID Name

1 Abhi

2 Adam

The Second table,

ID Name

2 Adam

3 Chester

Union SQL query will be,

SELECT * FROM First UNION SELECT * FROM Second;

The result set table will look like,
ID NAME

1 Abhi

2 Adam

3 Chester

UNION ALL

This operation is similar to Union. But it also shows the duplicate rows.
Example of Union All
The First table,

ID NAME

1 Abhi

2 Adam

The Second table,

ID NAME

2 Adam

3 Chester

Union All query will be like,

SELECT * FROM First UNION ALL SELECT * FROM Second;

The resultset table will look like,

ID NAME

1 Abhi

2 Adam

3 Chester

INTERSECT

Intersect operation is used to combine two SELECT statements, but it only retuns the records which are
common from both SELECT statements. In case of Intersect the number of columns and datatype must
be same.
NOTE: MySQL does not support INTERSECT operator.
Example of Intersect
The First table,
ID NAME
1 Abhi
2 Adam

The Second table,

ID NAME

2 Adam

3 Chester

Intersect query will be,

SELECT * FROM First INTERSECT SELECT * FROM Second

The result set table will look like

ID NAME

2 Adam

MINUS

The Minus operation combines results of two SELECT statements and return only those in the final result,
which belongs to the first set of the result.
Example of Minus
The First table,

ID NAME

1 Abhi

2 Adam

The Second table,

ID NAME
2 Adam
3 Chester

Minus query will be,

SELECT * FROM First MINUS SELECT * FROM Second;

The result set table will look like,

ID NAME
1 Abhi

1.12 EMBEDDED & DYNAMIC SQL

Static or Embedded SQL are SQL statements in an application that do not change at runtime
and, therefore, can be hard-coded into the application. Dynamic SQL is SQL statements that are
constructed at runtime; for example, the application may allow users to enter their own queries.

Dynamic SQL is a programming technique that enables you to build SQL statements dynamically at runtime.
You can create more general purpose, flexible applications by using dynamic SQL because the full text of a
SQL statement may be unknown at compilation.

Using Static SQL has a benefit which is the optimization of the statement that results an application with high
performance as it offers a good flexibility better than Dynamic SQL, and since access plans for dynamic
statements are generated at run-time so they must be prepared in the application, and this is something you
will never look at in the static SQL, but these are not the only differences between them, so we can say that
dynamic SQL has only one advantage over static statements which can be clearly noticed once the
application is edited or upgraded, so with Dynamic statements there’s no need for pre-compilation or re-
building as long as the access plans are generated at run-time, whereas static statements require
regeneration of access plans if they were modified, in addition to the fact that Dynamic SQL requires more
permissions, it also might be a way to execute unauthorized code, we don’t know what kind of users we’ll
have, so for security it can be dangerous if the programmer didn’t handle it.

Limitation of Dynamic SQL:

We cannot use some of the SQL statements Dynamically. Performance of these statements is
poor as compared to Static SQL.
11.1 INTRODUCTION

Query processing requires that the DBMS identify and execute a strategy for
retrieving the results of the query. The query determines what data is to be
found, but does not define the method by which the data manager searches the
database. Therefore Query optimization is necessary to determine the optimal
alternative to process a query. There are two main techniques for query
optimization. The first approach is to use a rule based or heuristic method for
ordering the operations in a query execution strategy. The second approach
estimates the cost of different execution strategies and chooses the best
solution. In general most commercial database systems use a combination of
both techniques.

11.2 BASICS OF QUERY PROCESSING

Query Processing : Query Processing is a procedure of converting a query

written in high- level language (Ex. SQL, QBE) into a correct and efficient
execution plan expressed in low- level language, which is used for data
manipulation.
Query Processor : Query processor is responsible for generating execution
plan.
Execution Plan : Query processing is a stepwise process. Before retrieving
or updating data in database, a query goes through a series of query compilation
steps. These steps are known as execution plan.
The success of a query language also depends upon its query processor i.e., how
much efficient execution plan it can create? The better execution plan leads to
low time and cost.
In query processing, the first phase is transformation in which parser first
checks the syntax of query and also checks the relations and attributes used in
the query that are defined in the database. After checking the syntax and verifying
the relations, query is transformed into equivalent expression that are more
efficient to execute. Transformation, depends upon various
factors like existence of certain database structures, presence of different
indexes, file is sorted or not, cost of transformation, physical characteristics of
data etc. After transformation of query, transformed query is evaluated by using
number of strategies known as access plans. While generating access plans,
factors like physical properties of data and storage are taken into account and
the optimal access plan is executed. The next step is to validate the user privileges
and ensure that the query does not disobey the relevant integrity constraints.
Finally, execution plan is executed to generate the result.

11.2.1 General Strategy for Query Processing

The general strategy for query processing is as follows:
(i) Representation of query : Query written by user cannot be processed
directly by system. Query processor first checks the syntax and existence of
relations and their attributes in database. After validations, query processor
transform it into equivalent and more efficient expression for example query
will be converted into a standard internal format that parser can
manipulate. Parser also adds some additional predicates to the query to
enforce security. Internal form may be relational algebra, relational calculus,
any low-level language, operator graphs etc.
(ii) Operator graphs : Operator graphs are used to represent query. It gives
the sequence of operations that can be performed. It is easy to
understand the query represented by operator graphs. It is useful to
determine redundancy in query expressions, result of transformation,
simplify the view etc.
(iii) Response time and Data characteristics consideration : Data
characteristics like length of records, expected sizes of both intermediate
and final results, size of relations etc., are also considered for optimizing
the query. In addition to this overall response time is also determined.

11.2.2 Steps in Query Processing

Various steps in query processing are shown in Figure 11.1. Suppose that user
inputs a query in general query language say QBE, then it is first converted into
high-level query language say SQL etc. Other steps in query processing are
discussed below in detail:
(i) Syntax Analysis : Query in high-level language is parsed into tokens and
tokens are analyzed for any syntax error. Order of tokens are also maintained to
make sure that all the rules of language grammars are followed. In case of any
error, query is rejected and an error code with explanation for rejection is
returned to the user. (Only syntax is checked in this step).
(ii) Query Decomposition : In this step, query is decomposed into query
blocks which are the low-level operations. It starts with the high-level query that
is transformed into low- level operations and checks whether that query is
syntactically and semantically correct. For example, a SQL query is decomposed
into blocks like Select block, From block, Where block etc. Various stages in
query decomposition are shown in Figure 11.2.
User input General query

Transform into

High level query

language ex.
SQL

Scanning, Syntax checking and verification

parsing, by parser
validating

Check existence of relations and attributes,

Databas Query semantic analysis, decomposing
e decomposer complex query into smaller ones
catalog

Optimization to reduce execution time

Query optimizer and cost by substituting equivalent
expressions for those in the query

Execution plan Query modification

Query code Generate code for queries

generator

Main Runtime database Deals with database to do necessary

databas processor operation
e

Query result

FIGURE 11.1. Steps in query processing.

SQL query Query analysis

Query
normalization

Semantic
analysis

Query simplifier

Query
restructuring

Algebraic expressions

FIGURE 11.2. Steps in query decomposition.

(a) Query Analysis : In the query analysis stage, programming language
compiler checks that the query is lexically and syntactically correct. A
syntactically correct query is analyzed using system catalogues to verify the
existence of relations and attributes used in query. After analysis a correct
query is converted into some internal representation, which is more
efficient for processing.
The type specification of the query qualifier and result is also checked at
this stage. The internal representation may be, query tree or query
graph.
Query tree notation : A Typical internal representation of query is
query tree. It is also known as relational algebra tree. A query tree is
constructed using tree data structure that corresponds to the relational
algebra expression. Main components of query tree are:
 Root of tree–represents result of query.
 Leaf nodes–represent input relations of the query.
 Internal nodes–represent intermediate relation that is the output of
applying an operation in the algebra.
 The sequence of operations is directed from leaves to the root node.
For example,
Employee_Name, Address, JOB, Department, Department-
location

œ E.Department =
D.Department

E D
Employee

FIGURE 11.3. Query treee notation.

Query graph notation : Graph data structure is also used for internal
representation of query. In graphs:
 Relation nodes–represent relations by single circle.
 Constant nodes–represent constant values by double circle.
 Edges–represent relation and join conditions.
 Square brackets–represent attributes retrieved from each relation.
[E.Employee_Name, E.Address,
[D.Manager-ID]
E.Job, E.Department]

‘Delhi’ D E
Department-Location = Delhi D.Department = E.Department

FIGURE 11.4. Query graph notation.

(b) Query normalization : After query analysis, it is normalized to remove any
redundancy. In this phase query is converted into normalized form that
can be easily manipulated. A set of equivalency rules are applied to query to
simplify the projection and selection operations to avoid redundancy. Query
can be converted into one of the following two normal forms :
 Conjunctive normal form : It is a sequence of conjuncts that are
connected with ‘AND’ operator. A conjunct consists of one or more terms
connected with ‘OR’ operator. A conjuctive selection consists only those
tuples that satisfy all conjuncts.
Example. (Emp_Job = ‘Analyst’  salary < 50000)  (Hire_Date > 1–1–
2000)
 Disjunctive normal forms : It is a sequence of disjuncts that are
connected with ‘OR’ operator. A disjunct consists of one or more terms
connected with ‘AND’ operator. A disjunctive selection contains those
tuples that satisfy anyone of the disjunct. Example. (Emp_Job=
‘Analyst’  salary < 5000)  (Hire_Date > 1–1–2000)
Disjunctive normal form is more useful as it allows the query to break
into a series of independent sub-queries linked by union.
(c) Semantic analyzer : The semantic analyzer performs the following tasks
:
 It helps in reducing the number of predicates.
 It rejects contradictory normalized forms.
 In case of missing join specification, components of query do not
contribute to generation of results. It identifies these queries and
rejects them.
 It makes sure that each object in query is referenced correctly
according to its data type.
(d) Query simplifler : The major tasks of query simplifier are as follows :
 It eliminates common sub-expressions.
 It eliminates redundant qualification.
 It introduces integrity constraints, view definitions into the query graph
representation.
 It eliminates query that voids any integrity constraint without accessing
the database.
 It transforms sub-graphs into semantically equivalent and more efficient
form.
 It deals with user access rights.
Idempotence rules of Boolean Algebra are applied to get final form of
simplification.
(e) Query Restructuring : At the final stage of query decomposition,
transformation rules are applied to restructure the query to give a more
efficient implementation.
(iii)Query Optimization : The aim of the query optimization step is to choose
the best possible query execution plan with minimum resources required to execute
that plan. Query optimization is discussed in detail in section 11.3.
(iv) Execution Plan : Execution plan is the basic algorithm used for each
operation in the query. Execution plans are classified into following Four types : (a)
Left-deep tree query execution plane, (b) Right-deep tree query execution plan,
(c) Linear tree execution plan,
(d) Bushy execution plan.
(a) Left-deep tree query execution plan : In left-deep tree query execution
plan, development of plan starts with a single relation and successively
adding a operation involving a single relation until the query is
completed. For example, Only the left hand side of a join is allowed to
participate in result from a previous join and hence named left-deep tree.
It is shown in Figure 11.5.

Resul
t

œ R4

œ R3

R1 R2

FIGURE 11.5. Left-deep execution plan.

Advantages : The main advantages of left-deep tree query execution plan

are
 It reduces search space.
 Query optimiser is based on dynamic programming techniques.
 It is convenient for pipelined evaluation as only one input to each join is
pipelined.
Disadvantage : The disadvantages of left-deep tree query execution plan
are
 Reduction in search space leads to miss some lower cost execution
strategies.
(b) Right-deep tree query execution plan : It is almost same as left-deep
query execution plan with the only difference that only the right hand side of
a join is allowed to participate in result from a previous join and hence
named right-deep tree. It is applicable on applications having a large main
memory. It is shown in Figure 11.6.
Resul
t

R4
œ

œ
R3

R2 R1

FIGURE 11.6. Right-deep execution plan.

(c) Linear tree execution plan : The combination of left-deep and right-
deep execution plans with a restriction that the relation on one side of
each operator is always a base relation is known as linear trees. It is
shown in Figure 11.7.
Resul
t

œ R4

œ
R3

R2 R1

FIGURE 11.7. Linear tree execution plan.

(d) Bushy execution plan : Bushy execution plan is the most general type
of execution plan. More than one relation can participate in intermediate
results. It is shown in Figure 11.8.

Resul
t

œ œ

œ R4
R3 R5

R2 R1

FIGURE 11.8. Bushy execution plan.

The main advantage of bushy execution plan is the flexibility provided by it

in choosing the best execution plan by increasing search space but this
flexibility may leads to considerably increase the search space.
(v) Query Code Generator : After selecting the best possible execution plan
query is converted into low-level language so that it can be taken as input by
runtime database process.
(vi) Runtime Database Processor : It deals directly with main database and
do the necessary operation mentioned in query and returns the result to user.

11.3 QUERY OPTIMIZATION

Query performance of a database systems is dependent not only on the database

structure, but also on the way in which the query is optimized. Query optimization
means converting a query into an equivalent form which is more efficient to execute.
It is necessary for high-level relation queries and it provides an opportunity to
DBMS to systematically evaluate alterative
query execution strategies and to choose an optimal strategy. A typical query
optimization process is shown in Figure 11.9.
Statistical data
Estimation formulas
Simplified
(determine cardinality
relational
of intermediate result
algebra query
tables)
Query optimiser

Cost model Execution plan generator

Execution plan in form of

optimized relational algebra
query
FIGURE 11.9. Query optimization process.

The main issues that need to be considered in query optimization are:

1. Reduction of data transfer with database.
2. Use of available indexes for fast searching.
3. Reduction of number of times the database is manipulated.
4. The order in which joins should be performed.
5. How to store intermediate results?
Following are the three relations we used in each example:
Employee (Emp-ID, Emp-Name, Age, Salary, Dept-ID)
Department (Dept-ID, Proj-ID, Dept-Name)
Project (Proj-ID, Name, Location, Duration)
There are two main techniques used to implement query optimization. These
are heuristic query optimization and cost based query optimization.

11.3.1 Transformation Rules for Relational Algebra

The transformation rules are used to formulate a relational algebra expression into
different ways and query optimizer choose the most efficient equivalent expression
to execute. Two expressions are considered to be equivalent if they have same
set of attributes in different order but representing the same information.
Let us consider relations R, S, and T with set of attributes
X = {X1, X2, ..., Xn}, Y = {Y1, Y2, ..., Yn} and Z = {Z1, Z2, ...,
Zn}
respectively, where X, Y, and Z represent predicates and L, L1, L2, M, M1, M2,
and N denote sets of attributes.
Rule 1. Cascading of selection ()
X  Y  Z (R)  X (Y(Z (R))).
It means that conjunctive selection operations can be transformed into individual
selection operations and vice versa.
Ex. Age = 35 salary > 50000 (Employee) = Age = 35 (salary> 50000
(Employee)).
Rule 2. Commutativity of selection ()
X (Y (R))  Y (X (R)).
Ex. Age = 35 (Salary > 50000 (Employee))  Salary > 5000 (Age = 35
(Employee)).
Rule 3. Cascading of projection ()
L M ... N (R)  L (R)
Ex. Emp-name Emp-name, Age (Employee)  Emp-name (Employee)
Rule 4. Commutativity of selection () and projection ()
X1, X2, ..., Xn (A (R))  A (X1, X2, ..., Xn (R))
Emp-name, Age (Salary > 50000 (Employee))  Salary > 5000 (Emp-name, Age
(Employee))
Rule 5. Commutativity of join (œ) and Cartesian Product (X)
R œY S  S œY
RR × S  S ×
R
Ex. Employee œEmployee.Dept-ID = Department.Dept-ID Department 
Department œEmployee.Dept-ID = Department.Dept-ID
Employee
Rule 6. Commutavity of selection () and join (œ) or Cartesian
product (X) (X R œY S  (X (R) œY S)
X (R × S)  (X (R)) × S
Ex. Employee.Age > 30  Dept-Name = ‘MARKETING’ (Employee) œEmployee.Dept-ID =
Department.Dept-ID
(Department)  Employee.Age>30 (Employee) œEmployee.Dept-ID = Department.Dept-ID
(Dept-Name = ‘MARKETING’ (Department))
Rule 7. Commutavity of projection () and join (œ) or Cartesian product (X).
 L1  L2 (R œZ S)  (L1 (R)) œZ (L2 (S))
Ex. Emp-Name, Dept-Name, Dept-ID (Employee œE.Dept-ID = D.Dept-ID Department) 
(Emp-name, Dept-ID (Employee)) œE.Dept-ID = D.Dept-ID (Dept-Name, Dept-ID
(Department))
Rule 8. Commutativity of Union () and Intersection ()
R  S = S 
RR  S = S
 R
Rule 9. Commutativity of Selection () and Union () or
Intersection () or Differerence (–).
X (R  S ) = X (S)  X (R)
X (R  S) = X (S)  X (R)
X (R – S) = X (S) – X (R)
Rule 10. Comutativity of projection () and Union ()
L (R  S) = L (S)  L (R)
Rule 11. Associativity of Join (œ) and Cartesian product (X)
(R œ S) œ T  R œ (S
œ T) (R X S) X T  R X
(S X T)
Rule 12. Associativity of Union () and Intersection ()
(R  S )  T  S  (R 
T) (R  S)  T  S  (R
 T)
Rule 13. Converting a selection () and Cartesian product (X) sequence into
Join (œ)
X (R X S )  (R œX S)

11.3.2 Heuristic Query Optimization

Heuristic query optimization technique is used to modify the internal
representation of a query by using heuristic rules and transformation rules.
Heuristic rules are used in the form of a query tree or query graph structure.
Optimiser starts with initial query tree and transform it into an equivalent and
efficient query tree using transformation rules.
Heuristic Optimization Algorithm : DBMS use heuristic optimization
algorithms to improve the efficiency of query by converting initial query tree into
an equivalent and optimized query tree. Optimizers utilize transformation rules to
optimize the structure of query tree. Following are the steps of heuristic
optimization algorithm.
Step 1. Perform Selection operation as early as possible : By using
selection operation at early stages, you can reduce the unwanted
number of record or data, to transfer from database to primary
memory.
Optimizer use transformation rule 1 to divide selection operations with
conjunctive conditions into a cascade of selection operations.
Step 2. Perform commutativity of selection operation with other
operations as early as possible : Optimizer use transformation rule 2,
4, 6, and 9 to move selection operation as far down the tree as
possible and keep selection predicates on the same relation together.
By keeping selection operation down at tree reduces the unwanted data
transfer and by keeping selection predicates together on same relations
reduces the number of times of database manipulation to retrieve
records from same database table.
Step 3. Combine the Cartesian Product with subsequent selection operation
whose predicates represents a join condition into a JOIN operation :
Optimizer uses transformation rule
13 to convert a selection and cartesian product sequence into join. It
reduces data transfer. It is always better to transfer only required data
from database instead of transferring whole data and then refine it.
(Cartesian product combines all data of all the tables mention in
query while join operation retrieves only those records from database
that satisfy the join condition).
Step 4. Use Commutativity and Associativity of Binary operations : Optimizer
use transformation rules 5, 11, and 12 to execute the most
restrictive selection operations first.
It rearranges the leaf nodes of query tree. By using the most restrictive
selection operations, the number of records fetched from database
reduces and also subsequent operations can be performed on less
number of records.
Step 5. Perform projection operations as early as possible : After performing
selection operations, optimizer use transformation rules 3, 4, 7 and
10 to reduce the number of columns of a relation by moving projection
operations as far down the tree as possible and keeping projection
predicates on the same relation together.
Step 6. Compute common expressions only once: It is used to identify sub-trees
that represent groups of operations that can be executed by a single
algorithm.
Consider the query below in SQL and transformation of its initial query tree
into an optimal query tree.
SELECT Emp_Name
FROM Employee e, Department d,
Project p WHERE p.Name = ‘LUXMI
PUB.’
AND d.Proj_ID =
p.Proj_ID AND
e.Dept_ID= d.Dept_ID
AND e.Age > 35
This query needs to display names of all employees working for project
“LUXMI PUB.” and having age more than 35 years.
Figure 11.10 shows the initial query tree for the given SQL query. If the tree is
executed directly then it results in the Cartesian product of entire Employee,
Department, and Project table but in reality, the query needed only one record
from relation Project and only the employee records for those whose age is
greater than 35 years.
Emp_Nam
e

p.Name = ‘LUXMI PUB.’  d.Proj_ID = p.Proj_ID  e.Dept_ID = d.Dept_ID 

e.Age > 35

X Projec
t

Employe Departme
e nt

FIGURE 11.10. Initial query tree.

— We can improve the performance by first applying selection operations to

reduce the number of records that appear in Casterian product. Figure 11.11 shows
the improved query tree.
Emp_Nam
e

d.Proj_ID =
p.Proj_ID

e.Dept_ID = d.Dept_ID p.Name = ‘LUXMI

PUB.’

X Projec
t

e.Age >
Departme
35
nt

Employe
e

FIGURE 11.11. Improved query tree by first applying selection operations.

— The query tree can be further improved by applying more restrictive selection
operation. So, switch the positions of relations Project and Employee as you
know that in a single project it may be more than one employee. Figure 11.12
shows the improved query tree.
Emp_Nam
e

e.Dept_ID =
d.Dept_ID

d.Proj_ID = 
e.Age >
p.Proj_ID 35

X Employe
e


p.Name = ‘LUXMI PUB.’

Projec
t

FIGURE 11.12. Improved query tree by applying more restrictive selection operations.

— A further improvement can be done by replacing Cartesian product

operations by Join operations with a join condition as shown in Figure 11.13.
Emp_Nam
e

œp.Dept_ID =
d.Dept_ID

œd.Proj_ID = 
e.Age >
p.Proj_ID 35

 Employe
p.Name = ‘LUXMI PUB.’ e

Projec
t

FIGURE 11.13. Improved query tree by replacing Cartesian

product and selection operations by join
operations.

— Further improvement can be done in query tree by keeping only required

attributes (columns) of relations by applying projection operations as early as
possible in the query tree. Optimizer keep the attributes required to display,
and the attributes needed by the subsequent operation in the intermediate
relations. Modified query tree is shown in Figure 11.14.
Emp_Nam
e

œe.Dept_ID =
d.Dept_ID

œd.Dept_I e.Emp_ID, e.Emp_Name,

D e.Dept_ID

œd.Proj_ID = e.Age >

p.Proj_ID 35

p.Proj_ID d.Dept_ID, Employe

d.Proj_ID e

p.Name = ‘LUXMI PUB.’

Projec
t

FIGURE 11.14. Improved query tree by applying and moving

projection operations down the query tree.

The SELECT clause in SQL is equivalent to projection operation and WHERE clause
in SQL is equivalent to selection operation in relational algebra.
11.3.3 Cost Based Query Optimization
In cost based query optimization, optimizer estimates the cost of running of all
alternatives of a query and choose the optimum alternative. The alternative which
uses the minimum resources is having minimum cost. The cost of a query operation
is mainly depend on its selectivity i.e., the proportion of the input relations that
forms the output. Following are the main components used to determine the
cost of execution of a query:
(a) Access cost of secondary storage : Access cost to secondary storage
consists of cost of database manipulation operations which includes
searching, writing, reading of data blocks stored in the secondary memory.
The cost of searching depends upon the type of indexes (primary,
secondary, hashed), type of file structure, ordering of relation in addition
to physical storage location like file blocks are allocated contiguously on
the same disk or scattered on the disk.
(b) Storage cost : Storage cost consists of cost of storing intermediate
results (tables or files) that are generated by the execution strategy for
the query.
(c) Computation cost : Computation cost consists of performing in-memory
operations during query execution such as sorting of records in a file,
merging of records, performing computations on field values, searching of
records. These are mainly performed on data buffers.
(d) Memory usage cost : It consists of cost of pertaining to the number of
memory buffers needed during query execution.
(e) Communication cost : It consists of the cost of transferring query and its
result from database location to the location of terminal where the query
is originated.
From all the above components, the most important is access cost to secondary
storage because secondary storage is comparatively slower than other devices.
Optimizer try to minimize computation cost for small databases as most of the data
files are stored in main memory. For large database, it try to minimize the access
cost to secondary storage and for distributed databases, it trys to minimize the
communication cost because various sites are involved for data transfer.
To estimate the cost of execution strategies, optimizer access statistical data
stored in DBMS catalog. The information stored in DBMS catalog is given
below:
(i) Number of records in relation X, given as R.
(ii) Number of blocks required to store relation X, given as B.
(iii) Blocking factor of relation X, given as BFR.
(iv) Primary access method for each file and attributes for each file.
(v) Number of levels for each multi-level index for a attribute A given as IA.
(vi) Number of first-level index blocks for a attribute A, given as BAI1.
(vii) Selection cardinality of attribute A in relation R, given as SA, where SA=
R × SLA, where SLA is the selectivity of the attributes.
Cost Function for Selection Operation : Selection operation works on a
single relation in relation algebra and retrieves the records that satisfy the given
condition. Depending upon the structure of file, available indexes, searching
methods, the estimated cost of strategies for selection operation is as given
below.
S.No. Strategies Cost
1. Linear search B/2, if record is
found B, if record
not found
2. Binary search log2 B, if equality condition is one a
unique key attribute
log2 B + (SA/BFR)-1, otherwise
3. Using primary index to retrive a 1, assuming no overflow
single record
4. Equality condition on primary key IA + 1
5. Equality condition on hash key 1, assuming no overflow
6. Inequality condition of primary index IA + B/2
7. Inequality condition of any ordered IA + B/2
index
8. Equality condition on clustering [IA + 1] + [SA/BFR]
index (secondary index)
9. Equality condition on non- [IA + 1] + [SA]
clustering index (secondary
index)
10. Inequality condition on B+-tree index IA + [BAI1/2] + [R/2]
(secondary index)
Now consider the example of Employee table. Consider, there are 10,000
records stored in 2000 blocks. Also the following indexes are available.
 A secondary index on the Emp-ID with 4 levels.
 A clustering index on salary with 3 levels and average selection
cardinality of 30.
 A secondary index on non-key attributes Age with 2 levels and 4 first
level index blocks. There are 200 distinct values for Age.
Consider the queries:
(i) Emp-ID = 1A (Employee).
(ii) Age > 20 (Employee).
(iii) Age = 20 AND salary > 9000

(Employee). Consider the following

cost components.
 Number of records (R) = 10,000
 Number of blocks (B)= 2000
 Blocking Factor (BFR) = 10000/2000 = 5
 IEmp-ID = 4, SEmp-ID = 10000/10000 = 1
 IAge = 2, SAge = 10000/200 = 50
 ISalary = 2, SSalary = 30
Now cost of the above queries (suppose records are in table)
(i) If linear search is used then cost will be B/2 = 2000/2= 1000
(ii) If linear search is used then cost will be B = 2000.
(iii) This query has a conjunctive selection condition. To estimate the cost of use
using anyone of the two components of selection condition, to retrieve
the records plus the linear search. The linear search costs 2000 and
condition salary > 9000 first gives cost estimate of Isalary + (B/2)= 2 +
1000 = 1002 and cost of condition Age = 20 is
30 + 2 = 32 So, total cost is 1034.
Cost Function for Join Operation : Join operation is the most time
consuming operation in databases and an accurate cost function for join
operations depends upon the estimate of size of file (number of records) after
the join operation. The estimated cost of strategies for join operation is given
below:
S.No. Strategies Cost
1. Nested loop joins (Block) (a) B(R) + [B(R) * B(S)], if the buffer has only
one block
(b) B(R) + [B(S) * (B(R)/(Buffer-2))], if [Buffer-2]
block for relation R.
(c) B(R) + B(S), if all blocks of R can be read
into database buffer
2. Indexed nested-loop join (a) B(R) + R * (IA + 1), if join attribute
A in relation S is a primary key.
(b) B(R) + R * [IA + (SA(R)/BFR(R))], for clustering
index I on attribute A
3. Sort merged join (a) B(R) * [log2 (B(R) + B(S) * log2 (BCR)))], for
sorts
(b)B(R) + B(S) for merge
4. Hash join (a) (3 * B(R)) + B(S), if hash index is held in
memory
(b)2[B(R) + B(S)] * [log (B(S) – 1)] + B(R) +
B(S), otherwise
(R) and (S) are relations R and S respectively.
Example. Consider we have 600 records in Department table. BFR for
department table is 60 and number of blocks are 600/60 = 10.
For the Join operation, Employee œDept-ID Department
Type of Join Cost
Block nested-loop joins 22000 (Buffer has only one block)
2010 (if all blocks of employee can be
read into database buffer)
Hash Join 6010 (is hash index is held in memory)
Example. Given two unary relation (contains only one attribute), R1 and R2.
R1 R2
7 8
2 4
9 2
8 1
3 3
9 2
1 7
3 3
6
Show the result of joining R1 and R2 using each of the following algorithms.
List the
results in the that would be output by the join algorithm.
order
(a) Nested Loop Join. Use R1 for outer Loop and R2 for the inner loop.
(b) Sort Merge Join.
Solution. The result relation contains only one attribute, which is the
common attribute between R and S.
(a) Nested loop join. For every tuple in R1, find
matches in R2. 7, 2, 2, 8, 3, 3, 1, 3, 3
(b) Sort merge join. The result appears in sorted order on the
join attribute. 1, 2, 2, 3, 3, 3, 3, 7, 8

1.Define heuristic query optimization and mention one

advantage.

Faster optimization time — Heuristic optimization is less computationally

expensive than cost-based optimization because it doesn’t explore all possible
query plans. It uses simple rules to quickly generate a good (though not
always the best) execution plan.

2.Compare heuristic and cost-based query optimization

techniques and explain query processing in detail.

Comparison: Heuristic vs Cost-Based Query Optimization

Feature Heuristic Optimization Cost-Based Optimization
Uses predefined rules (heuristics) Evaluates and compares the
Definition
to optimize queries cost of different query plans
Approach Rule-driven Cost-driven
Complexity Low (simple and fast) High (can be time-consuming)
More likely to find the optimal
Accuracy May not find the optimal plan
or near-optimal plan
Examples of Push selection/projection early, Estimate I/O, CPU, memory
Rules join smaller tables first cost for each possible plan
Statistics Relies on statistics like table
Not required
Used size, index, data distribution
Feature Heuristic Optimization Cost-Based Optimization
Slower optimization but
Performanc Faster optimization but possibly
usually more efficient
e less efficient execution
execution
In simpler systems or when In modern DBMSs where
When Used performance of planning is query execution efficiency is
critical crucial

Unit III SQL
No ratings yet
Unit III SQL
25 pages
SQL Basics for Beginners
No ratings yet
SQL Basics for Beginners
19 pages
Database 3,4
No ratings yet
Database 3,4
24 pages
Self Unit 3
No ratings yet
Self Unit 3
39 pages
SQL Basics and Commands Guide
No ratings yet
SQL Basics and Commands Guide
77 pages
Unit 3
No ratings yet
Unit 3
27 pages
DBMS Chapter 4
No ratings yet
DBMS Chapter 4
39 pages
Chapter 7
No ratings yet
Chapter 7
54 pages
Basics of SQL: Select From Where Select From Where
No ratings yet
Basics of SQL: Select From Where Select From Where
13 pages
Information System Management
No ratings yet
Information System Management
29 pages
SQL Notes
No ratings yet
SQL Notes
22 pages
SQL Lab Manual 1
No ratings yet
SQL Lab Manual 1
12 pages
ch-8 Computer Class 11
No ratings yet
ch-8 Computer Class 11
6 pages
DBMS2
No ratings yet
DBMS2
10 pages
DBMS - UNIT-2 (Complete)
No ratings yet
DBMS - UNIT-2 (Complete)
22 pages
Unit 3 SQL and PL-SQL
No ratings yet
Unit 3 SQL and PL-SQL
127 pages
Unit-II: - SQL Statements & Working With Tables: MIT College of C.S. & I.T., Basmath
No ratings yet
Unit-II: - SQL Statements & Working With Tables: MIT College of C.S. & I.T., Basmath
16 pages
SQL Basics: Syntax, Data Types, Commands
No ratings yet
SQL Basics: Syntax, Data Types, Commands
8 pages
Unit 2
No ratings yet
Unit 2
67 pages
Good Afternoon To Everyone
No ratings yet
Good Afternoon To Everyone
4 pages
1 - SQL - DE - Feb25
No ratings yet
1 - SQL - DE - Feb25
79 pages
DBMS Unit-9
No ratings yet
DBMS Unit-9
62 pages
Database and SQL All Slides
No ratings yet
Database and SQL All Slides
134 pages
Dbms Lab File Final
No ratings yet
Dbms Lab File Final
43 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
106 pages
Sachin Merged
No ratings yet
Sachin Merged
55 pages
SQL Guide
No ratings yet
SQL Guide
19 pages
SQL Lab Assign
No ratings yet
SQL Lab Assign
22 pages
DBMS Unit 2
No ratings yet
DBMS Unit 2
134 pages
SQL Notes
No ratings yet
SQL Notes
8 pages
Relational Model-Merged-Pages-1-163,166-211,214-250,253-289,292-343,346-393,396-438
100% (1)
Relational Model-Merged-Pages-1-163,166-211,214-250,253-289,292-343,346-393,396-438
426 pages
CS2072 Database Engineering Laboratory & CS2082 Database Management Systems Laboratory (LAB3)
No ratings yet
CS2072 Database Engineering Laboratory & CS2082 Database Management Systems Laboratory (LAB3)
36 pages
1096 Dbms
No ratings yet
1096 Dbms
38 pages
Module 3
No ratings yet
Module 3
85 pages
Dbms File
No ratings yet
Dbms File
41 pages
Module 4 DBMS
No ratings yet
Module 4 DBMS
54 pages
Database Connectivity 100014
No ratings yet
Database Connectivity 100014
35 pages
SQL Unit Iv
No ratings yet
SQL Unit Iv
32 pages
SQL Notes
No ratings yet
SQL Notes
23 pages
SQL Notes ?
No ratings yet
SQL Notes ?
10 pages
SQL Notes
No ratings yet
SQL Notes
31 pages
Introduction To Structured Query Language
No ratings yet
Introduction To Structured Query Language
35 pages
Dbms Module-2 SQL Fundamentals
No ratings yet
Dbms Module-2 SQL Fundamentals
16 pages
Dbms (Unit 3,4,5)
No ratings yet
Dbms (Unit 3,4,5)
38 pages
SQL Assignment 01
No ratings yet
SQL Assignment 01
12 pages
5.unit 5
No ratings yet
5.unit 5
24 pages
Question Bank For ETE DBMS
No ratings yet
Question Bank For ETE DBMS
11 pages
Unit-3 DBMS
No ratings yet
Unit-3 DBMS
61 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
11 pages
Unit-03 Structured Query Language (SQL)
No ratings yet
Unit-03 Structured Query Language (SQL)
17 pages
SQL Data Types and Commands Guide
No ratings yet
SQL Data Types and Commands Guide
39 pages
SQL Basics and Commands Guide
No ratings yet
SQL Basics and Commands Guide
34 pages
SQL Basics for Beginners
100% (1)
SQL Basics for Beginners
32 pages
Unit 4 SQL
No ratings yet
Unit 4 SQL
45 pages
22 DBMS Unit3 1
No ratings yet
22 DBMS Unit3 1
17 pages
Final DBMS Print
No ratings yet
Final DBMS Print
111 pages
Unit 3 SQL Concepts
No ratings yet
Unit 3 SQL Concepts
172 pages
Unit-4 - Introduction To Structured Query Language
No ratings yet
Unit-4 - Introduction To Structured Query Language
96 pages
SQL Basics for Database Beginners
No ratings yet
SQL Basics for Database Beginners
67 pages
UNITIICYBERCRIME
No ratings yet
UNITIICYBERCRIME
7 pages
Question Bank - 23ITC02-DBMS - IT
No ratings yet
Question Bank - 23ITC02-DBMS - IT
5 pages
Query Optimization Algorithms Query Optimization Algorithms
No ratings yet
Query Optimization Algorithms Query Optimization Algorithms
30 pages
DBMS Assignment-2
No ratings yet
DBMS Assignment-2
2 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
63 pages
23itc04 DBMS Laboratory
No ratings yet
23itc04 DBMS Laboratory
35 pages
Unit IV
No ratings yet
Unit IV
60 pages
Question Bank - 2IITC12-Block Chain Technology - IT
No ratings yet
Question Bank - 2IITC12-Block Chain Technology - IT
5 pages
BT - UNIT-1 - Notes
No ratings yet
BT - UNIT-1 - Notes
29 pages
BT - UNIT - 2 - Notes
No ratings yet
BT - UNIT - 2 - Notes
29 pages
PART A DSC
No ratings yet
PART A DSC
2 pages
Asap
No ratings yet
Asap
296 pages
Solution Manual For Concepts of Database Management, 9th Edition, Joy L. Starks, Philip J. Pratt, Mary Z. Last Download
100% (12)
Solution Manual For Concepts of Database Management, 9th Edition, Joy L. Starks, Philip J. Pratt, Mary Z. Last Download
33 pages
Data Analytics - Unit - 1
No ratings yet
Data Analytics - Unit - 1
25 pages
Learn Xquery in 10 Minutes, SQL Server 2005 Style: What Is Xquery For?
No ratings yet
Learn Xquery in 10 Minutes, SQL Server 2005 Style: What Is Xquery For?
8 pages
Project
No ratings yet
Project
14 pages
Pvts Im Help
No ratings yet
Pvts Im Help
189 pages
B.E. Database Exam Guide
No ratings yet
B.E. Database Exam Guide
21 pages
The SISTEMA Cookbook 2: Use of Network Libraries
No ratings yet
The SISTEMA Cookbook 2: Use of Network Libraries
14 pages
Tangazo La Kazi Tanesco
No ratings yet
Tangazo La Kazi Tanesco
30 pages
INet Database Transition Guide
No ratings yet
INet Database Transition Guide
50 pages
Data Science - Toc
No ratings yet
Data Science - Toc
5 pages
AWS Solutions Architect - Associate (INEW-2020) Syllabus
No ratings yet
AWS Solutions Architect - Associate (INEW-2020) Syllabus
3 pages
AZ-500 Exam - Free Actual Q&as, Page 2 - ExamTopics
No ratings yet
AZ-500 Exam - Free Actual Q&as, Page 2 - ExamTopics
6 pages
Example Srs Document For Web Application
No ratings yet
Example Srs Document For Web Application
13 pages
ITS Database - MockExam - 20
No ratings yet
ITS Database - MockExam - 20
62 pages
SQL Server Lab Exercises
No ratings yet
SQL Server Lab Exercises
24 pages
Azmat Ali - CV
No ratings yet
Azmat Ali - CV
3 pages
Management Information Systems 13th Edition Kenneth Laudonpdf Download
100% (4)
Management Information Systems 13th Edition Kenneth Laudonpdf Download
55 pages
BSIT SoS 2020
No ratings yet
BSIT SoS 2020
85 pages
Oracle DBA Training Chennai Guide
No ratings yet
Oracle DBA Training Chennai Guide
10 pages
C# & .NET Programming Errors
No ratings yet
C# & .NET Programming Errors
4 pages
Facial Recognition Attendance System Using (Tkinter) Python
No ratings yet
Facial Recognition Attendance System Using (Tkinter) Python
16 pages
Manual Systems Vs Computer Systems
No ratings yet
Manual Systems Vs Computer Systems
8 pages
JMeter Tutorial
No ratings yet
JMeter Tutorial
66 pages
ODMS Open Data Management System
No ratings yet
ODMS Open Data Management System
235 pages
NIELIT CCC Exam Paper
No ratings yet
NIELIT CCC Exam Paper
5 pages
Allot NX Server Migration From Windows To Linux
No ratings yet
Allot NX Server Migration From Windows To Linux
29 pages
SmartMetering 8S en Mail
No ratings yet
SmartMetering 8S en Mail
8 pages
ABAQUS - MApping Thermal and MAgnetic Loads
No ratings yet
ABAQUS - MApping Thermal and MAgnetic Loads
2 pages
Aud. in Cis Env. Final Exam 2nd Sem Ay2017-18 (Sam)
No ratings yet
Aud. in Cis Env. Final Exam 2nd Sem Ay2017-18 (Sam)
9 pages