KEMBAR78
Unit II DBMS Notes | PDF | Databases | Parsing
0% found this document useful (0 votes)
23 views28 pages

Unit II DBMS Notes

DBMS Notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views28 pages

Unit II DBMS Notes

DBMS Notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

UNIT – II

SQL Standards - Data types - Basic Structure of SQL Queries - DDL-DML-DCL-TCL - Views-
Advanced SQL -Embedded SQL - Static Vs Dynamic SQL - Query Processing - Query Optimization-
Heuristic and Cost based Query Optimization.

STRUCTURED QUERY LANGUAGE(SQL) FUNDAMENTALS:

Structured Query Language(SQL) as we all know is the database language by the use of which
we can perform certain operations on the existing database and also we can use this language to create
a database. SQL uses certain commands like Create, Drop, Insert etc. to carry out the required tasks.
These SQL commands are mainly categorized into four categories as:
 DDL – Data Definition Language

 DQl – Data Query Language

 DML – Data Manipulation Language DCL – Data Control Language

Though many resources claim there to be another category of SQL clauses

DDL (Data Definition Language) : DDL or Data Definition Language actually consists of the SQL
commands that can be used to define the database schema. It simply deals with descriptions of the
database schema and is used to create and modify the structure of database objects in the database.

Examples of DDL commands:


 CREATE – is used to create the database or its objects (like table, index, function, views, store
procedure and triggers).

Ex:
CREATE TABLE students (id INT, name VARCHAR(50));

 DROP – is used to delete objects from the database.

Ex:
DROP TABLE students;

 ALTER-is used to alter the structure of the database.

Ex:
ALTER TABLE students ADD age INT;

 TRUNCATE–is used to remove all records from a table, including all spaces allocated for the
records are removed.

Ex:
TRUNCATE TABLE students;

 COMMENT –is used to add comments to the data


dictionary.

 RENAME –is used to rename an object existing in the


database.
Syntax:
RENAME TABLE old_table_name TO new_table_name;
RENAME TABLE students TO learners;

DQL(Data Query Language) :DML statements are used for performing queries on the data within
schema objects. The purpose of DQL Command is to get some schema relation based on the
query passed to it.

Example of DQL:

SELECT – is used to retrieve data from the a database.


Ex:
SELECT * FROM students;

DML (Data Manipulation Language): The SQL commands that deals with the manipulation of data present
in the database belong to DML or Data Manipulation Language and this includes most of the SQL
statements.

Examples of DML:

 INSERT – is used to insert data into a table.

Ex:
INSERT INTO students (id, name) VALUES (1, 'Alice');

 UPDATE – is used to update existing data within a table.

Ex:
UPDATE students SET name = 'Bob' WHERE id = 1;

 DELETE – is used to delete records from a database table.

Ex:
`DELETE FROM students WHERE id = 1;

DCL (Data Control Language): DCL includes commands such as GRANT and REVOKE which
mainly deals with the rights, permissions and other controls of the database system.

Examples of DCL commands:

 GRANT-gives user’s access privileges to database.

Ex:
GRANT SELECT ON students TO user1

 REVOKE-withdraw user’s access privileges given by using the GRANT command.

Ex:
GRANT SELECT ON students TO user1
TCL(transaction Control Language) : TCL commands deals with the transaction within
the database.

Examples of TCL commands:

 COMMIT– commits a Transaction.

 ROLLBACK– rollbacks a transaction in case of any error occurs.

 SAVEPOINT–sets a savepoint within a transaction.

 SET TRANSACTION–specify characteristics for the transaction.

1.11.1SQL DATA TYPES:


SQL Data Type is an attribute that specifies the type of data of any object. Each column, variable and expression
has a related data type in SQL. You can use these data types while

creating your tables. You can choose a data type for a table column based on your requirement.
SQL Server offers six categories of data types for your use which are listed below −

Exact Numeric Data Types

DATA TYPE- FROM- TO

Bigint 9,223,372,036,854,775, 9,223,372,036,854,775,807

Int 2,147,483, 648 2,147,483,647

smallint 32,768 32,767

tinyint- 0 255

bit 0 1

decimal -10^38 +1 -10^38 -1

numeric -10^38 +1 -10^38 -1

Approximate Numeric Data Types


DATA TYPE FROM TO

Float -1.79E + 308 1.79E + 308

Real -3.40E + 38 3.40E + 38

Date and Time Data Types


DATA TYPE FROM TO

Datetime Jan 1, 1753 Dec 31, 9999


Smalldatetime Jan 1, 1900 Jun 6, 2079

Date Stores a date like June 30, 1991

Time Stores a time of day like 12:30 P.M.

Note − Here, date time has 3.33 milliseconds accuracy where as small date time has 1 minute accuracy.
Character Strings Data Types

DATA TYPE & Description

 Char

Maximum length of 8,000 characters.( Fixed length non-Unicode characters)


 Varchar

Maximum of 8,000 characters.(Variable-length non-Unicode data).

 varchar(max)
Maximum length of 2E + 31 characters, Variable-length non-Unicode data
(SQL Server 2005 only).

 Text

Variable-length non-Unicode data with a maximum length of 2,147,483,647

characters.

Unicode Character Strings Data Types


S.No. DATA TYPE & Description

Nchar
1
Maximum length of 4,000 characters.( Fixed length Unicode)

Nvarchar
2
Maximum length of 4,000 characters.(Variable length Unicode)

nvarchar(max)
3
Maximum length of 2E + 31 characters (SQL Server 2005 only).(Variable length Unicode)

Ntext
4
Maximum length of 1,073,741,823 characters. ( Variable length Unicode )
Binary Data Types
S.No. DATA TYPE & Description

Binary
1
Maximum length of 8,000 bytes(Fixed-length binary data )

Varbinary
2
Maximum length of 8,000 bytes.(Variable length binary data)

varbinary(max)
3
Maximum length of 2E + 31 bytes (SQL Server 2005 only). (Variable length Binary data)

Image
4
Maximum length of 2,147,483,647 bytes. ( Variable length Binary Data)

Misc Data Types


S.No. DATA TYPE & Description

sql_variant
1 Stores values of various SQL Server-supported data types, except text, ntext,
and timestamp.

Timestamp
2
Stores a database-wide unique number that gets updated every time a row gets
updated

uniqueidentifier
3
Stores a globally unique identifier (GUID)

Xml
Stores XML data. You can store xml instances in a column or a variable (SQL
4
Server 2005 only).

cursor
5
Reference to a cursor object

table
6
Stores a result set for later processing

DATABASE OBJECT:
A database object is any defined object in a database that is used to store or reference data.
Anything which we make from create command is known as Database Object.It can be used to hold and
manipulate the data. Some of the examples of database objects are : view, sequence, indexes, etc.

 Table – Basic unit of storage; composed rows and columns

 View – Logically represents subsets of data from one or more tables

 Sequence – Generates primary key values

 Index – Improves the performance of some queries

 Synonym – Alternative name for an object

1.11.2 ADVANCED SQL FEATURES:

There are various advanced SQL query features.

 Union
 Union All
 Intersect
 Minus

UNION Operation

UNION is used to combine the results of two or more SELECT statements. However it will
eliminate duplicate rows from its resultset. In case of union, number of columns and datatype must be
same in both the tables, on which UNION operation is being applied.

Example of UNION
The First table,
ID Name

1 Abhi

2 Adam

The Second table,

ID Name

2 Adam

3 Chester

Union SQL query will be,

SELECT * FROM First UNION SELECT * FROM Second;


The result set table will look like,
ID NAME

1 Abhi

2 Adam

3 Chester

UNION ALL

This operation is similar to Union. But it also shows the duplicate rows.
Example of Union All
The First table,

ID NAME

1 Abhi

2 Adam

The Second table,


ID NAME

2 Adam

3 Chester

Union All query will be like,

SELECT * FROM First UNION ALL SELECT * FROM Second;

The resultset table will look like,


ID NAME

1 Abhi

2 Adam

2 Adam

3 Chester

INTERSECT

Intersect operation is used to combine two SELECT statements, but it only retuns the records which are
common from both SELECT statements. In case of Intersect the number of columns and datatype must
be same.
NOTE: MySQL does not support INTERSECT operator.
Example of Intersect
The First table,
ID NAME
1 Abhi
2 Adam

The Second table,


ID NAME

2 Adam

3 Chester

Intersect query will be,

SELECT * FROM First INTERSECT SELECT * FROM Second

The result set table will look like


ID NAME

2 Adam

MINUS

The Minus operation combines results of two SELECT statements and return only those in the final result,
which belongs to the first set of the result.
Example of Minus
The First table,

ID NAME

1 Abhi

2 Adam

The Second table,


ID NAME
2 Adam
3 Chester

Minus query will be,

SELECT * FROM First MINUS SELECT * FROM Second;

The result set table will look like,

ID NAME
1 Abhi

1.12 EMBEDDED & DYNAMIC SQL

Static or Embedded SQL are SQL statements in an application that do not change at runtime
and, therefore, can be hard-coded into the application. Dynamic SQL is SQL statements that are
constructed at runtime; for example, the application may allow users to enter their own queries.

Dynamic SQL is a programming technique that enables you to build SQL statements dynamically at runtime.
You can create more general purpose, flexible applications by using dynamic SQL because the full text of a
SQL statement may be unknown at compilation.

Using Static SQL has a benefit which is the optimization of the statement that results an application with high
performance as it offers a good flexibility better than Dynamic SQL, and since access plans for dynamic
statements are generated at run-time so they must be prepared in the application, and this is something you
will never look at in the static SQL, but these are not the only differences between them, so we can say that
dynamic SQL has only one advantage over static statements which can be clearly noticed once the
application is edited or upgraded, so with Dynamic statements there’s no need for pre-compilation or re-
building as long as the access plans are generated at run-time, whereas static statements require
regeneration of access plans if they were modified, in addition to the fact that Dynamic SQL requires more
permissions, it also might be a way to execute unauthorized code, we don’t know what kind of users we’ll
have, so for security it can be dangerous if the programmer didn’t handle it.

Limitation of Dynamic SQL:


We cannot use some of the SQL statements Dynamically. Performance of these statements is
poor as compared to Static SQL.
11.1 INTRODUCTION

Query processing requires that the DBMS identify and execute a strategy for
retrieving the results of the query. The query determines what data is to be
found, but does not define the method by which the data manager searches the
database. Therefore Query optimization is necessary to determine the optimal
alternative to process a query. There are two main techniques for query
optimization. The first approach is to use a rule based or heuristic method for
ordering the operations in a query execution strategy. The second approach
estimates the cost of different execution strategies and chooses the best
solution. In general most commercial database systems use a combination of
both techniques.

11.2 BASICS OF QUERY PROCESSING

Query Processing : Query Processing is a procedure of converting a query


written in high- level language (Ex. SQL, QBE) into a correct and efficient
execution plan expressed in low- level language, which is used for data
manipulation.
Query Processor : Query processor is responsible for generating execution
plan.
Execution Plan : Query processing is a stepwise process. Before retrieving
or updating data in database, a query goes through a series of query compilation
steps. These steps are known as execution plan.
The success of a query language also depends upon its query processor i.e., how
much efficient execution plan it can create? The better execution plan leads to
low time and cost.
In query processing, the first phase is transformation in which parser first
checks the syntax of query and also checks the relations and attributes used in
the query that are defined in the database. After checking the syntax and verifying
the relations, query is transformed into equivalent expression that are more
efficient to execute. Transformation, depends upon various
factors like existence of certain database structures, presence of different
indexes, file is sorted or not, cost of transformation, physical characteristics of
data etc. After transformation of query, transformed query is evaluated by using
number of strategies known as access plans. While generating access plans,
factors like physical properties of data and storage are taken into account and
the optimal access plan is executed. The next step is to validate the user privileges
and ensure that the query does not disobey the relevant integrity constraints.
Finally, execution plan is executed to generate the result.

11.2.1 General Strategy for Query Processing


The general strategy for query processing is as follows:
(i) Representation of query : Query written by user cannot be processed
directly by system. Query processor first checks the syntax and existence of
relations and their attributes in database. After validations, query processor
transform it into equivalent and more efficient expression for example query
will be converted into a standard internal format that parser can
manipulate. Parser also adds some additional predicates to the query to
enforce security. Internal form may be relational algebra, relational calculus,
any low-level language, operator graphs etc.
(ii) Operator graphs : Operator graphs are used to represent query. It gives
the sequence of operations that can be performed. It is easy to
understand the query represented by operator graphs. It is useful to
determine redundancy in query expressions, result of transformation,
simplify the view etc.
(iii) Response time and Data characteristics consideration : Data
characteristics like length of records, expected sizes of both intermediate
and final results, size of relations etc., are also considered for optimizing
the query. In addition to this overall response time is also determined.

11.2.2 Steps in Query Processing


Various steps in query processing are shown in Figure 11.1. Suppose that user
inputs a query in general query language say QBE, then it is first converted into
high-level query language say SQL etc. Other steps in query processing are
discussed below in detail:
(i) Syntax Analysis : Query in high-level language is parsed into tokens and
tokens are analyzed for any syntax error. Order of tokens are also maintained to
make sure that all the rules of language grammars are followed. In case of any
error, query is rejected and an error code with explanation for rejection is
returned to the user. (Only syntax is checked in this step).
(ii) Query Decomposition : In this step, query is decomposed into query
blocks which are the low-level operations. It starts with the high-level query that
is transformed into low- level operations and checks whether that query is
syntactically and semantically correct. For example, a SQL query is decomposed
into blocks like Select block, From block, Where block etc. Various stages in
query decomposition are shown in Figure 11.2.
User input General query

Transform into

High level query


language ex.
SQL

Scanning, Syntax checking and verification


parsing, by parser
validating

Check existence of relations and attributes,


Databas Query semantic analysis, decomposing
e decomposer complex query into smaller ones
catalog

Optimization to reduce execution time


Query optimizer and cost by substituting equivalent
expressions for those in the query

Execution plan Query modification

Query code Generate code for queries


generator

Main Runtime database Deals with database to do necessary


databas processor operation
e

Query result

FIGURE 11.1. Steps in query processing.

SQL query Query analysis

Query
normalization

Semantic
analysis

Query simplifier

Query
restructuring

Algebraic expressions

FIGURE 11.2. Steps in query decomposition.


(a) Query Analysis : In the query analysis stage, programming language
compiler checks that the query is lexically and syntactically correct. A
syntactically correct query is analyzed using system catalogues to verify the
existence of relations and attributes used in query. After analysis a correct
query is converted into some internal representation, which is more
efficient for processing.
The type specification of the query qualifier and result is also checked at
this stage. The internal representation may be, query tree or query
graph.
Query tree notation : A Typical internal representation of query is
query tree. It is also known as relational algebra tree. A query tree is
constructed using tree data structure that corresponds to the relational
algebra expression. Main components of query tree are:
 Root of tree–represents result of query.
 Leaf nodes–represent input relations of the query.
 Internal nodes–represent intermediate relation that is the output of
applying an operation in the algebra.
 The sequence of operations is directed from leaves to the root node.
For example,
Employee_Name, Address, JOB, Department, Department-
location

œ E.Department =
D.Department

E D
Employee

FIGURE 11.3. Query treee notation.

Query graph notation : Graph data structure is also used for internal
representation of query. In graphs:
 Relation nodes–represent relations by single circle.
 Constant nodes–represent constant values by double circle.
 Edges–represent relation and join conditions.
 Square brackets–represent attributes retrieved from each relation.
[E.Employee_Name, E.Address,
[D.Manager-ID]
E.Job, E.Department]

‘Delhi’ D E
Department-Location = Delhi D.Department = E.Department

FIGURE 11.4. Query graph notation.


(b) Query normalization : After query analysis, it is normalized to remove any
redundancy. In this phase query is converted into normalized form that
can be easily manipulated. A set of equivalency rules are applied to query to
simplify the projection and selection operations to avoid redundancy. Query
can be converted into one of the following two normal forms :
 Conjunctive normal form : It is a sequence of conjuncts that are
connected with ‘AND’ operator. A conjunct consists of one or more terms
connected with ‘OR’ operator. A conjuctive selection consists only those
tuples that satisfy all conjuncts.
Example. (Emp_Job = ‘Analyst’  salary < 50000)  (Hire_Date > 1–1–
2000)
 Disjunctive normal forms : It is a sequence of disjuncts that are
connected with ‘OR’ operator. A disjunct consists of one or more terms
connected with ‘AND’ operator. A disjunctive selection contains those
tuples that satisfy anyone of the disjunct. Example. (Emp_Job=
‘Analyst’  salary < 5000)  (Hire_Date > 1–1–2000)
Disjunctive normal form is more useful as it allows the query to break
into a series of independent sub-queries linked by union.
(c) Semantic analyzer : The semantic analyzer performs the following tasks
:
 It helps in reducing the number of predicates.
 It rejects contradictory normalized forms.
 In case of missing join specification, components of query do not
contribute to generation of results. It identifies these queries and
rejects them.
 It makes sure that each object in query is referenced correctly
according to its data type.
(d) Query simplifler : The major tasks of query simplifier are as follows :
 It eliminates common sub-expressions.
 It eliminates redundant qualification.
 It introduces integrity constraints, view definitions into the query graph
representation.
 It eliminates query that voids any integrity constraint without accessing
the database.
 It transforms sub-graphs into semantically equivalent and more efficient
form.
 It deals with user access rights.
Idempotence rules of Boolean Algebra are applied to get final form of
simplification.
(e) Query Restructuring : At the final stage of query decomposition,
transformation rules are applied to restructure the query to give a more
efficient implementation.
(iii)Query Optimization : The aim of the query optimization step is to choose
the best possible query execution plan with minimum resources required to execute
that plan. Query optimization is discussed in detail in section 11.3.
(iv) Execution Plan : Execution plan is the basic algorithm used for each
operation in the query. Execution plans are classified into following Four types : (a)
Left-deep tree query execution plane, (b) Right-deep tree query execution plan,
(c) Linear tree execution plan,
(d) Bushy execution plan.
(a) Left-deep tree query execution plan : In left-deep tree query execution
plan, development of plan starts with a single relation and successively
adding a operation involving a single relation until the query is
completed. For example, Only the left hand side of a join is allowed to
participate in result from a previous join and hence named left-deep tree.
It is shown in Figure 11.5.

Resul
t

œ R4

œ R3

R1 R2

FIGURE 11.5. Left-deep execution plan.

Advantages : The main advantages of left-deep tree query execution plan


are
 It reduces search space.
 Query optimiser is based on dynamic programming techniques.
 It is convenient for pipelined evaluation as only one input to each join is
pipelined.
Disadvantage : The disadvantages of left-deep tree query execution plan
are
 Reduction in search space leads to miss some lower cost execution
strategies.
(b) Right-deep tree query execution plan : It is almost same as left-deep
query execution plan with the only difference that only the right hand side of
a join is allowed to participate in result from a previous join and hence
named right-deep tree. It is applicable on applications having a large main
memory. It is shown in Figure 11.6.
Resul
t

R4
œ

œ
R3

R2 R1

FIGURE 11.6. Right-deep execution plan.

(c) Linear tree execution plan : The combination of left-deep and right-
deep execution plans with a restriction that the relation on one side of
each operator is always a base relation is known as linear trees. It is
shown in Figure 11.7.
Resul
t

œ R4

œ
R3

R2 R1

FIGURE 11.7. Linear tree execution plan.

(d) Bushy execution plan : Bushy execution plan is the most general type
of execution plan. More than one relation can participate in intermediate
results. It is shown in Figure 11.8.

Resul
t

œ œ

œ R4
R3 R5

R2 R1

FIGURE 11.8. Bushy execution plan.

The main advantage of bushy execution plan is the flexibility provided by it


in choosing the best execution plan by increasing search space but this
flexibility may leads to considerably increase the search space.
(v) Query Code Generator : After selecting the best possible execution plan
query is converted into low-level language so that it can be taken as input by
runtime database process.
(vi) Runtime Database Processor : It deals directly with main database and
do the necessary operation mentioned in query and returns the result to user.

11.3 QUERY OPTIMIZATION

Query performance of a database systems is dependent not only on the database


structure, but also on the way in which the query is optimized. Query optimization
means converting a query into an equivalent form which is more efficient to execute.
It is necessary for high-level relation queries and it provides an opportunity to
DBMS to systematically evaluate alterative
query execution strategies and to choose an optimal strategy. A typical query
optimization process is shown in Figure 11.9.
Statistical data
Estimation formulas
Simplified
(determine cardinality
relational
of intermediate result
algebra query
tables)
Query optimiser

Cost model Execution plan generator

Execution plan in form of


optimized relational algebra
query
FIGURE 11.9. Query optimization process.

The main issues that need to be considered in query optimization are:


1. Reduction of data transfer with database.
2. Use of available indexes for fast searching.
3. Reduction of number of times the database is manipulated.
4. The order in which joins should be performed.
5. How to store intermediate results?
Following are the three relations we used in each example:
Employee (Emp-ID, Emp-Name, Age, Salary, Dept-ID)
Department (Dept-ID, Proj-ID, Dept-Name)
Project (Proj-ID, Name, Location, Duration)
There are two main techniques used to implement query optimization. These
are heuristic query optimization and cost based query optimization.

11.3.1 Transformation Rules for Relational Algebra


The transformation rules are used to formulate a relational algebra expression into
different ways and query optimizer choose the most efficient equivalent expression
to execute. Two expressions are considered to be equivalent if they have same
set of attributes in different order but representing the same information.
Let us consider relations R, S, and T with set of attributes
X = {X1, X2, ..., Xn}, Y = {Y1, Y2, ..., Yn} and Z = {Z1, Z2, ...,
Zn}
respectively, where X, Y, and Z represent predicates and L, L1, L2, M, M1, M2,
and N denote sets of attributes.
Rule 1. Cascading of selection ()
X  Y  Z (R)  X (Y(Z (R))).
It means that conjunctive selection operations can be transformed into individual
selection operations and vice versa.
Ex. Age = 35 salary > 50000 (Employee) = Age = 35 (salary> 50000
(Employee)).
Rule 2. Commutativity of selection ()
X (Y (R))  Y (X (R)).
Ex. Age = 35 (Salary > 50000 (Employee))  Salary > 5000 (Age = 35
(Employee)).
Rule 3. Cascading of projection ()
L M ... N (R)  L (R)
Ex. Emp-name Emp-name, Age (Employee)  Emp-name (Employee)
Rule 4. Commutativity of selection () and projection ()
X1, X2, ..., Xn (A (R))  A (X1, X2, ..., Xn (R))
Emp-name, Age (Salary > 50000 (Employee))  Salary > 5000 (Emp-name, Age
(Employee))
Rule 5. Commutativity of join (œ) and Cartesian Product (X)
R œY S  S œY
RR × S  S ×
R
Ex. Employee œEmployee.Dept-ID = Department.Dept-ID Department 
Department œEmployee.Dept-ID = Department.Dept-ID
Employee
Rule 6. Commutavity of selection () and join (œ) or Cartesian
product (X) (X R œY S  (X (R) œY S)
X (R × S)  (X (R)) × S
Ex. Employee.Age > 30  Dept-Name = ‘MARKETING’ (Employee) œEmployee.Dept-ID =
Department.Dept-ID
(Department)  Employee.Age>30 (Employee) œEmployee.Dept-ID = Department.Dept-ID
(Dept-Name = ‘MARKETING’ (Department))
Rule 7. Commutavity of projection () and join (œ) or Cartesian product (X).
 L1  L2 (R œZ S)  (L1 (R)) œZ (L2 (S))
Ex. Emp-Name, Dept-Name, Dept-ID (Employee œE.Dept-ID = D.Dept-ID Department) 
(Emp-name, Dept-ID (Employee)) œE.Dept-ID = D.Dept-ID (Dept-Name, Dept-ID
(Department))
Rule 8. Commutativity of Union () and Intersection ()
R  S = S 
RR  S = S
 R
Rule 9. Commutativity of Selection () and Union () or
Intersection () or Differerence (–).
X (R  S ) = X (S)  X (R)
X (R  S) = X (S)  X (R)
X (R – S) = X (S) – X (R)
Rule 10. Comutativity of projection () and Union ()
L (R  S) = L (S)  L (R)
Rule 11. Associativity of Join (œ) and Cartesian product (X)
(R œ S) œ T  R œ (S
œ T) (R X S) X T  R X
(S X T)
Rule 12. Associativity of Union () and Intersection ()
(R  S )  T  S  (R 
T) (R  S)  T  S  (R
 T)
Rule 13. Converting a selection () and Cartesian product (X) sequence into
Join (œ)
X (R X S )  (R œX S)

11.3.2 Heuristic Query Optimization


Heuristic query optimization technique is used to modify the internal
representation of a query by using heuristic rules and transformation rules.
Heuristic rules are used in the form of a query tree or query graph structure.
Optimiser starts with initial query tree and transform it into an equivalent and
efficient query tree using transformation rules.
Heuristic Optimization Algorithm : DBMS use heuristic optimization
algorithms to improve the efficiency of query by converting initial query tree into
an equivalent and optimized query tree. Optimizers utilize transformation rules to
optimize the structure of query tree. Following are the steps of heuristic
optimization algorithm.
Step 1. Perform Selection operation as early as possible : By using
selection operation at early stages, you can reduce the unwanted
number of record or data, to transfer from database to primary
memory.
Optimizer use transformation rule 1 to divide selection operations with
conjunctive conditions into a cascade of selection operations.
Step 2. Perform commutativity of selection operation with other
operations as early as possible : Optimizer use transformation rule 2,
4, 6, and 9 to move selection operation as far down the tree as
possible and keep selection predicates on the same relation together.
By keeping selection operation down at tree reduces the unwanted data
transfer and by keeping selection predicates together on same relations
reduces the number of times of database manipulation to retrieve
records from same database table.
Step 3. Combine the Cartesian Product with subsequent selection operation
whose predicates represents a join condition into a JOIN operation :
Optimizer uses transformation rule
13 to convert a selection and cartesian product sequence into join. It
reduces data transfer. It is always better to transfer only required data
from database instead of transferring whole data and then refine it.
(Cartesian product combines all data of all the tables mention in
query while join operation retrieves only those records from database
that satisfy the join condition).
Step 4. Use Commutativity and Associativity of Binary operations : Optimizer
use transformation rules 5, 11, and 12 to execute the most
restrictive selection operations first.
It rearranges the leaf nodes of query tree. By using the most restrictive
selection operations, the number of records fetched from database
reduces and also subsequent operations can be performed on less
number of records.
Step 5. Perform projection operations as early as possible : After performing
selection operations, optimizer use transformation rules 3, 4, 7 and
10 to reduce the number of columns of a relation by moving projection
operations as far down the tree as possible and keeping projection
predicates on the same relation together.
Step 6. Compute common expressions only once: It is used to identify sub-trees
that represent groups of operations that can be executed by a single
algorithm.
Consider the query below in SQL and transformation of its initial query tree
into an optimal query tree.
SELECT Emp_Name
FROM Employee e, Department d,
Project p WHERE p.Name = ‘LUXMI
PUB.’
AND d.Proj_ID =
p.Proj_ID AND
e.Dept_ID= d.Dept_ID
AND e.Age > 35
This query needs to display names of all employees working for project
“LUXMI PUB.” and having age more than 35 years.
Figure 11.10 shows the initial query tree for the given SQL query. If the tree is
executed directly then it results in the Cartesian product of entire Employee,
Department, and Project table but in reality, the query needed only one record
from relation Project and only the employee records for those whose age is
greater than 35 years.
Emp_Nam
e

p.Name = ‘LUXMI PUB.’  d.Proj_ID = p.Proj_ID  e.Dept_ID = d.Dept_ID 


e.Age > 35

X Projec
t

Employe Departme
e nt

FIGURE 11.10. Initial query tree.

— We can improve the performance by first applying selection operations to


reduce the number of records that appear in Casterian product. Figure 11.11 shows
the improved query tree.
Emp_Nam
e

d.Proj_ID =
p.Proj_ID

e.Dept_ID = d.Dept_ID p.Name = ‘LUXMI


PUB.’

X Projec
t

e.Age >
Departme
35
nt

Employe
e

FIGURE 11.11. Improved query tree by first applying selection operations.

— The query tree can be further improved by applying more restrictive selection
operation. So, switch the positions of relations Project and Employee as you
know that in a single project it may be more than one employee. Figure 11.12
shows the improved query tree.
Emp_Nam
e

e.Dept_ID =
d.Dept_ID

d.Proj_ID = 
e.Age >
p.Proj_ID 35

X Employe
e


p.Name = ‘LUXMI PUB.’

Projec
t

FIGURE 11.12. Improved query tree by applying more restrictive selection operations.

— A further improvement can be done by replacing Cartesian product


operations by Join operations with a join condition as shown in Figure 11.13.
Emp_Nam
e

œp.Dept_ID =
d.Dept_ID

œd.Proj_ID = 
e.Age >
p.Proj_ID 35

 Employe
p.Name = ‘LUXMI PUB.’ e

Projec
t

FIGURE 11.13. Improved query tree by replacing Cartesian


product and selection operations by join
operations.

— Further improvement can be done in query tree by keeping only required


attributes (columns) of relations by applying projection operations as early as
possible in the query tree. Optimizer keep the attributes required to display,
and the attributes needed by the subsequent operation in the intermediate
relations. Modified query tree is shown in Figure 11.14.
Emp_Nam
e

œe.Dept_ID =
d.Dept_ID

œd.Dept_I e.Emp_ID, e.Emp_Name,


D e.Dept_ID

œd.Proj_ID = e.Age >


p.Proj_ID 35

p.Proj_ID d.Dept_ID, Employe


d.Proj_ID e

p.Name = ‘LUXMI PUB.’

Projec
t

FIGURE 11.14. Improved query tree by applying and moving


projection operations down the query tree.

The SELECT clause in SQL is equivalent to projection operation and WHERE clause
in SQL is equivalent to selection operation in relational algebra.
11.3.3 Cost Based Query Optimization
In cost based query optimization, optimizer estimates the cost of running of all
alternatives of a query and choose the optimum alternative. The alternative which
uses the minimum resources is having minimum cost. The cost of a query operation
is mainly depend on its selectivity i.e., the proportion of the input relations that
forms the output. Following are the main components used to determine the
cost of execution of a query:
(a) Access cost of secondary storage : Access cost to secondary storage
consists of cost of database manipulation operations which includes
searching, writing, reading of data blocks stored in the secondary memory.
The cost of searching depends upon the type of indexes (primary,
secondary, hashed), type of file structure, ordering of relation in addition
to physical storage location like file blocks are allocated contiguously on
the same disk or scattered on the disk.
(b) Storage cost : Storage cost consists of cost of storing intermediate
results (tables or files) that are generated by the execution strategy for
the query.
(c) Computation cost : Computation cost consists of performing in-memory
operations during query execution such as sorting of records in a file,
merging of records, performing computations on field values, searching of
records. These are mainly performed on data buffers.
(d) Memory usage cost : It consists of cost of pertaining to the number of
memory buffers needed during query execution.
(e) Communication cost : It consists of the cost of transferring query and its
result from database location to the location of terminal where the query
is originated.
From all the above components, the most important is access cost to secondary
storage because secondary storage is comparatively slower than other devices.
Optimizer try to minimize computation cost for small databases as most of the data
files are stored in main memory. For large database, it try to minimize the access
cost to secondary storage and for distributed databases, it trys to minimize the
communication cost because various sites are involved for data transfer.
To estimate the cost of execution strategies, optimizer access statistical data
stored in DBMS catalog. The information stored in DBMS catalog is given
below:
(i) Number of records in relation X, given as R.
(ii) Number of blocks required to store relation X, given as B.
(iii) Blocking factor of relation X, given as BFR.
(iv) Primary access method for each file and attributes for each file.
(v) Number of levels for each multi-level index for a attribute A given as IA.
(vi) Number of first-level index blocks for a attribute A, given as BAI1.
(vii) Selection cardinality of attribute A in relation R, given as SA, where SA=
R × SLA, where SLA is the selectivity of the attributes.
Cost Function for Selection Operation : Selection operation works on a
single relation in relation algebra and retrieves the records that satisfy the given
condition. Depending upon the structure of file, available indexes, searching
methods, the estimated cost of strategies for selection operation is as given
below.
S.No. Strategies Cost
1. Linear search B/2, if record is
found B, if record
not found
2. Binary search log2 B, if equality condition is one a
unique key attribute
log2 B + (SA/BFR)-1, otherwise
3. Using primary index to retrive a 1, assuming no overflow
single record
4. Equality condition on primary key IA + 1
5. Equality condition on hash key 1, assuming no overflow
6. Inequality condition of primary index IA + B/2
7. Inequality condition of any ordered IA + B/2
index
8. Equality condition on clustering [IA + 1] + [SA/BFR]
index (secondary index)
9. Equality condition on non- [IA + 1] + [SA]
clustering index (secondary
index)
10. Inequality condition on B+-tree index IA + [BAI1/2] + [R/2]
(secondary index)
Now consider the example of Employee table. Consider, there are 10,000
records stored in 2000 blocks. Also the following indexes are available.
 A secondary index on the Emp-ID with 4 levels.
 A clustering index on salary with 3 levels and average selection
cardinality of 30.
 A secondary index on non-key attributes Age with 2 levels and 4 first
level index blocks. There are 200 distinct values for Age.
Consider the queries:
(i) Emp-ID = 1A (Employee).
(ii) Age > 20 (Employee).
(iii) Age = 20 AND salary > 9000

(Employee). Consider the following


cost components.
 Number of records (R) = 10,000
 Number of blocks (B)= 2000
 Blocking Factor (BFR) = 10000/2000 = 5
 IEmp-ID = 4, SEmp-ID = 10000/10000 = 1
 IAge = 2, SAge = 10000/200 = 50
 ISalary = 2, SSalary = 30
Now cost of the above queries (suppose records are in table)
(i) If linear search is used then cost will be B/2 = 2000/2= 1000
(ii) If linear search is used then cost will be B = 2000.
(iii) This query has a conjunctive selection condition. To estimate the cost of use
using anyone of the two components of selection condition, to retrieve
the records plus the linear search. The linear search costs 2000 and
condition salary > 9000 first gives cost estimate of Isalary + (B/2)= 2 +
1000 = 1002 and cost of condition Age = 20 is
30 + 2 = 32 So, total cost is 1034.
Cost Function for Join Operation : Join operation is the most time
consuming operation in databases and an accurate cost function for join
operations depends upon the estimate of size of file (number of records) after
the join operation. The estimated cost of strategies for join operation is given
below:
S.No. Strategies Cost
1. Nested loop joins (Block) (a) B(R) + [B(R) * B(S)], if the buffer has only
one block
(b) B(R) + [B(S) * (B(R)/(Buffer-2))], if [Buffer-2]
block for relation R.
(c) B(R) + B(S), if all blocks of R can be read
into database buffer
2. Indexed nested-loop join (a) B(R) + R * (IA + 1), if join attribute
A in relation S is a primary key.
(b) B(R) + R * [IA + (SA(R)/BFR(R))], for clustering
index I on attribute A
3. Sort merged join (a) B(R) * [log2 (B(R) + B(S) * log2 (BCR)))], for
sorts
(b)B(R) + B(S) for merge
4. Hash join (a) (3 * B(R)) + B(S), if hash index is held in
memory
(b)2[B(R) + B(S)] * [log (B(S) – 1)] + B(R) +
B(S), otherwise
(R) and (S) are relations R and S respectively.
Example. Consider we have 600 records in Department table. BFR for
department table is 60 and number of blocks are 600/60 = 10.
For the Join operation, Employee œDept-ID Department
Type of Join Cost
Block nested-loop joins 22000 (Buffer has only one block)
2010 (if all blocks of employee can be
read into database buffer)
Hash Join 6010 (is hash index is held in memory)
Example. Given two unary relation (contains only one attribute), R1 and R2.
R1 R2
7 8
2 4
9 2
8 1
3 3
9 2
1 7
3 3
6
Show the result of joining R1 and R2 using each of the following algorithms.
List the
results in the that would be output by the join algorithm.
order
(a) Nested Loop Join. Use R1 for outer Loop and R2 for the inner loop.
(b) Sort Merge Join.
Solution. The result relation contains only one attribute, which is the
common attribute between R and S.
(a) Nested loop join. For every tuple in R1, find
matches in R2. 7, 2, 2, 8, 3, 3, 1, 3, 3
(b) Sort merge join. The result appears in sorted order on the
join attribute. 1, 2, 2, 3, 3, 3, 3, 7, 8

1.Define heuristic query optimization and mention one


advantage.

Faster optimization time — Heuristic optimization is less computationally


expensive than cost-based optimization because it doesn’t explore all possible
query plans. It uses simple rules to quickly generate a good (though not
always the best) execution plan.

2.Compare heuristic and cost-based query optimization


techniques and explain query processing in detail.

Comparison: Heuristic vs Cost-Based Query Optimization


Feature Heuristic Optimization Cost-Based Optimization
Uses predefined rules (heuristics) Evaluates and compares the
Definition
to optimize queries cost of different query plans
Approach Rule-driven Cost-driven
Complexity Low (simple and fast) High (can be time-consuming)
More likely to find the optimal
Accuracy May not find the optimal plan
or near-optimal plan
Examples of Push selection/projection early, Estimate I/O, CPU, memory
Rules join smaller tables first cost for each possible plan
Statistics Relies on statistics like table
Not required
Used size, index, data distribution
Feature Heuristic Optimization Cost-Based Optimization
Slower optimization but
Performanc Faster optimization but possibly
usually more efficient
e less efficient execution
execution
In simpler systems or when In modern DBMSs where
When Used performance of planning is query execution efficiency is
critical crucial

You might also like