KEMBAR78
Chapter - 2-Query Prosessing and Optimization | PDF | Databases | Parsing
0% found this document useful (0 votes)
31 views44 pages

Chapter - 2-Query Prosessing and Optimization

This chapter discusses query processing and optimization in database management systems, detailing the steps involved in transforming high-level SQL queries into efficient execution plans. It covers syntax checking, query decomposition, optimization techniques, and the use of relational algebra for query representation. Additionally, it highlights the importance of selecting efficient execution strategies to minimize resource usage during query execution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views44 pages

Chapter - 2-Query Prosessing and Optimization

This chapter discusses query processing and optimization in database management systems, detailing the steps involved in transforming high-level SQL queries into efficient execution plans. It covers syntax checking, query decomposition, optimization techniques, and the use of relational algebra for query representation. Additionally, it highlights the importance of selecting efficient execution strategies to minimize resource usage during query execution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 44

Advanced Database system

Chapter Two:
Query Processing and
Optimization
Prepared by:
Misganaw Abeje
University of Gondar
College Of Informatics
Department of computer science
Misganaw.Abeje13@gmail.com
 Introduction to Query Processing
 Steps of Query Processing
 Translating SQL Queries into Relational Algebra
 Algorithms for SELECT and JOIN Operations
 Using Heuristics in Query Optimization
 Using Selectivity and Cost Estimates in Query
Optimization
 Semantic Query Optimization

BY: MA
Introduction to Query Processing

 In this chapter we shall discuss the techniques used by a


DBMS to process, optimize and execute high-level queries.
 The techniques used to split complex queries into multiple
simple operations and methods of implementing these low-
level operations.
 Query Processing is a procedure of transforming a high-level
query (such as SQL) into a correct and efficient execution plan
expressed in low-level language.
 When a database system receives a query for update or
retrieval of information, it goes through a series of compilation
steps, called execution plan.
BY: MA
Steps of Query Processing

BY: MA
Query processing phases

 The first phase is called syntax checking (Query Analyzer): The


syntax analyzer takes the query from the users, parses it into tokens
and analyses the tokens and their order to make sure they follow the
rules of the language grammar.
– The scanner identifies the query tokens such as SQL keywords,
attribute names, and relation names that appear in the text of the
query.
– The parser checks the query syntax to determine whether it is
formulated according to the syntax rules of the query language.
– The query must also be validated by checking that all attribute
and relation names are valid and semantically meaningful names
in the schema of the particular database being queried.
BY: MA
 In second phase the SQL query is translated in to an algebraic
expression using various rules.
– So that the process of transforming a high-level SQL query
into a relational algebraic form is called Query
Decomposition. The relational algebraic expression now
passes to the query optimizer.
– In query decomposition the query processing aims are to
transfer the high-level query into a relational algebra query
and to check whether that query is syntactically and
semantically correct.

BY: MA
 In third phase optimization is performed by substituting
equivalent expression depends on the factors such that the
existence of certain database structures, whether or not a
given file is stored, the presence of different indexes & so on.
 Query optimization: The process of choosing a suitable
execution strategy for processing a query typically has many
possible execution strategies.
 The query optimization techniques are used to choose an
efficient execution plan that will minimize the runtime as well
as many other types of resources such as number of disk I/O,
CPU time and so on.
BY: MA
Cont…

 In query optimizer stage the cost of model and several


other estimation formulas are used to rewrite the
query.
 The modified query is written to utilize system
resources so as to bring the optimal performance.
 The query optimizer then generates an action plan
also called a execution plan.
 This action plans are converted into a query codes that
are finally executed by a run time database processor.
 The run time database processor estimate the cost of
each action plan and chose the optimal one for the
execution.
BY: MA
 Query Optimization: The activity of choosing a single
“efficient” execution strategy (from hundreds) as determined by
database catalog statistics.
 Which relational algebra expression, equivalent to the given
query, will lead to the most efficient solution plan?
 For each algebraic operator, what algorithm (of several available)
do we use to compute that operator?
 How do operations pass data (main memory buffer, disk buffer,
…)?
 Will this plan minimize resource usage? (CPU/Response
Time/Disk)
9
BY: MA
Example: consider the following query

Identify all managers who work in a London branch


SELECT * FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND s.position = ‘Manager’
AND
b.city = ‘london’;

Results in these equivalent relational algebra statements


(1) (position=‘Manager’)^(city=‘London’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch)
(2) (position=‘Manager’)^(city=‘London’) (Staff Staff.branchNo = Branch.branchNo Branch)
(3) [(position=‘Manager’) (Staff)] Staff.branchNo = Branch.branchNo [(city=‘London’) (Branch)]
Assume:

– 1000 tuples in Staff.


– 50 Managers
– 50 tuples in Branch.
– 5 London branches
– No indexes or sort keys
10
BY: MA – All temporary results are written back to disk (memory is small)
Query 1 (Bad)

(position=‘Manager’)^(city=‘London’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch)


– Requires (1000+50) disk accesses to read from Staff and Branch
relations

– Creates temporary relation of Cartesian Product (1000*50) tuples


– Requires (1000*50) disk access to read in temporary relation and tes
predicate
Total Work = (1000+50) + 2*(1000*50) =
101,050 I/O operations
Query 2 (Better)
(position=‘Manager’)^(city=‘London’) (Staff Staff.branchNo = Branch.branchNo
– Again requires (1000+50) Branch )
disk accesses to read from Staff and Branch
– Joins Staff and Branch on branchNo with 1000 tuples
(1 employee : 1 branch )
– Requires (1000) disk access to read in joined relation and check
predicate
Total Work = (1000+50) + 2*(1000) =
3050 I/O operations
11
BY: MA 3300% Improvement over Query 1
Query 3 (Best)
[ (position=‘Manager’) (Staff) ] Staff.branchNo = Branch.branchNo [ (city=‘London’) (Bran

– Read Staff relation to determine ‘Managers’ (1000 reads)


 Create 50 tuple relation(50 writes)

– Read Branch relation to determine ‘London’ branches (50 reads)


 Create 5 tuple relation(5 writes)

– Join reduced relations and check predicate (50 + 5 reads)

Total Work = 1000 + 2*(50) + 5 + (50 + 5) =


1160 I/O operations

12
BY: MA 8700% Improvement over Query 1
Translating SQL Queries into Relational Algebra

 An SQL query is first translated into an equivalent extended relational


algebra expression represented as a query tree data structure to be optimized.
 Query block:
– SQL queries are decomposed into query blocks, which form the basic
units that can be translated into the algebraic operators and optimized.
– A query block contains a single SELECT-FROM-WHERE expression,
as well as GROUP BY and HAVING clause if these are part of the
block.
– Nested queries within a query are identified as separate query blocks.
Because SQL included Aggregate operators(MAX, SUM, COUNT…) in
the extended algebra.

BY: MA
Cont…
 Relational algebra defines the basic set of operations of relational
database model.
 A sequence of relational algebra operations forms a relational
algebra expression. The result of this expression represents the
result of a database query.
 The basic operations are –
– σ : selection (sigma) it returns rows
– Л: projection (pi) it returns columns
– ⋈: join operation: combine two table with common attributes
– X : cross product operation, paired two tables
– U: Union: (puQ) is the set of all tuples that is either in P or in Q or in
both without duplicates.
– n: Intersection
BY: MA
Example1: Translating high level to relational
algebra expression

 If we want to display the names and courses of all


students. we will use the following relational algebra Exp.
Л {Name,Course}(STUDENT)
 If we want to display the details of all students who have
registered course ‘database’, we will use the following
relational algebra expression -
σ (Course = " database “) (STUDENT)
 Retrieve the first name, last name, and salary of all
employees who work in department number 5
π Fname, Lname, Salary(σDno=5(EMPLOYEE)) or σ
Dno=5(π Fname, Lname, Salary((EMPLOYEE))

BY: MA
Translating SQL Queries into Relational
Algebra : Example2 nested query

SELECT LNAME, FNAME FROM EMPLOYEE WHERE


SALARY > (SELECT MAX (SALARY) FROM
EMPLOYEE WHERE DNO = 5);

SELECT LNAME, FNAME SELECT MAX (SALARY)


FROM EMPLOYEE FROM EMPLOYEE
WHERE SALARY > C WHERE DNO = 5

πLNAME, FNAME (σSALARY>C(EMPLOYEE)) ℱMAX SALARY (σDNO=5 (EMPLOYEE))


BY: MA
Cont…
 Example3:
– SELECT emp_nm FROM EMPLOYEE WHERE
– emp_desg>100
– This query will be rejected because the
comparison “>100” is incompatible with the
data type of emp_desg which is a variable
character string.
 At the end of query analysis phase, the high-level
query (SQL) is transformed into some internal
representation that is more suitable for
processing.
 This internal representation is typically a kind of
BY: MA
Cont…

• A Query Tree is a tree data structure that


corresponds expression.
• A Query Tree is also called a relational algebra
tree.
– Leaf node of the tree, representing the base
input relations of the query.
– Internal nodes result of applying an operation
in the algebra.
– Root of the tree representing a result of the
BY: MA
query.
example4

– SELECT (P.proj_no, P.dept_no, E.name,


E.add, E.dob) FROM PROJECT P, DEPARTMENT
D, EMPLOYEE E WHERE P.dept_no = D.d_no
AND D.mgr_id = E.emp_id AND P.proj_loc =
‘Mumbai’;
– Translate into relational algebra and draw the
query tree
BY: MA
Cont…
Cont…

 The three relations PROJECT, DEPARTMENT,


EMPLOYEE are represent as a leaf nodes P, D
and E, while the relational algebra operations of
the represented by internal tree nodes.
 Same SQL query can have man different
relational algebra expressions and hence many
different query trees.
 The query parser typically generates a standard
initial (canonical) query tree).

BY: MA
Algorithms for SELECT and JOIN Operations
1. Implementing the SELECT Operation

 There are many algorithms for executing a SELECT operation,


which is basically a search operation to locate the records in a
disk file that satisfy a certain condition. Some of the search
algorithms depend on the file having specific access paths, and
they may apply only to certain types of selection conditions.
 Examples:
– (OP1): s SSN='123456789' (EMPLOYEE)

– (OP2): s DNUMBER>5(DEPARTMENT)
– (OP3): s DNO=5(EMPLOYEE)
– (OP4): s DNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE)
– (OP5): s ESSN=123456789 AND PNO=10(WORKS_ON)
BY: MA
Algorithms for SELECT and JOIN
Operations (2)
 Implementing the SELECT Operation (contd.):
 Search Methods for Simple Selection:
– S1 Linear search (brute force):
 Retrieve every record in the file, and test whether its attribute values
satisfy the selection condition.
– S2 Binary search:
 If the selection condition involves an equality comparison on a key
attribute on which the file is ordered, binary search (which is more
efficient than linear search) can be used. (See OP1).
– S3 Using a primary index or hash key to retrieve a single
record:
 If the selection condition involves an equality comparison on a key
attribute with a primary index (or a hash key), use the primary index
(or the hash key) to retrieve the record.
BY: MA
Algorithms for SELECT and JOIN Operations (3)

 Implementing the SELECT Operation (contd.):


 Search Methods for Simple Selection:
– S4 Using a primary index to retrieve multiple records:
 If the comparison condition is >, ≥, <, or ≤ on a key field with a primary
index, use the index to find the record satisfying the corresponding equality
condition, then retrieve all subsequent records in the (ordered) file, Eg OP2.
– S5 Using a clustering index to retrieve multiple records:
 If the selection condition involves an equality comparison on a non-key
attribute with a clustering index, use the clustering index to retrieve all the
records satisfying the selection condition. OP3
– S6 Using a secondary (B+-tree) index:
 On an equality comparison, this search method can be used to retrieve a
single record if the indexing field has unique values (is a key) or to retrieve
multiple records if the indexing field is not a key.
 In addition, it can be used to retrieve records on conditions involving >,>=, <,
or <=. (FOR RANGE QUERIES)
BY: MA
Query Optimization

 The primary goal of query optimization is of


choosing an efficient execution strategy for
processing a query.
 The query optimizer attempts to minimize the
use of certain resources (mainly the number of
I/O and CPU time) by selecting a best execution
plan (access plan).
 A query optimization start during the validation
phase by the system to validate the user has
appropriate privileges.
 Now an action plan is generate to perform the
BY: MA query.
BY: MA Block Diagram of Query
Cont…
 Relational algebraquery tree generated by the
query simplifier module of query decomposer.
 Estimation formulas used to determine the
cardinality of the intermediate result table.
 A cost Model.
 Statistical data from the database catalogue.
 The output of the query optimizer is the execution
plan in form of optimized relational algebra query.
 A query typically has many possible execution
strategies, and the process of choosing a suitable
one for processing a query is known as Query
Optimization.
BY: MA
The basic issues in Query
Optimization

 How to use available indexes?


 How to use memory to accommodate information
and perform immediate steps such as sorting?
• How to determine the order in which
joins should be performed?

BY: MA
Techniques for Query Optimization

 Among the approaches for query optimization,


– exhaustive search,
– heuristics-based algorithms and
– systematic estimation are mostly used.
– The second technique involves the
systematic estimation of the cost of the
different execution strategies and choosing
the execution plan with the lowest cost.

BY: MA
1. Exhaustive Search Optimization

 In these techniques, for a query, all possible


query plans are initially generated and then the
best plan is selected.
 Though these techniques provide the best
solution, it has an exponential time and space
complexity owing to the large solution space.
 For example, dynamic programming technique
divided and conquer method .

BY: MA
2. Heuristic Based Optimization

 The first technique is based on Heuristic Rules for


ordering the operations in a query execution strategy.
 The heuristic rules are used as an optimization
technique to modify the internal representation of query.
 Usually, heuristic rules are used in the form of query tree
of query graph data structure, to improve its
performance.
 Heuristic based optimization uses rule-based
optimization approaches for query optimization. These
algorithms have polynomial time and space complexity,
which is lower than the exponential complexity of
exhaustive search-based algorithms.
 However, these algorithms do not necessarily produce
BY: MA
Cont…
 Some of the common heuristic rules are −
– Perform select and project operations before join or
other BINARY operations.
– This is because the size of the file resulting from a
binary operation such as JOIN is usually a multi- value
function of the sizes of the input files.
– This is done by moving the select and project
operations down the query tree. This reduces the
number of tuples available for join.
– Perform the most restrictive select/project operations
at first before the other operations.
– Avoid cross-product operation since they result in very
BY: MA large-sized intermediate tables
Cont…

 The SELECT and PROJECT reduced the size of


the file and hence, should be applied before the
JOIN or other binary operation.
 Heuristic query optimizer transforms the initial
(canonical) query tree into final query tree
using equivalence transformation rules.
 This final query tree is efficient to execute.

BY: MA
 Example: Consider relations r(AB) and s(CD). We require r X s.
 Method 1 :
a. Load next record of r in RAM.
b. Load all records of s, one at a time and concatenate with r.

c. All records of r concatenated?


 NO: goto a.
 YES: exit (the result in RAM or on disk).
 Performance: Too many accesses.
 Method 2: Improvement
a. Load as many blocks of r as possible leaving room for one block of s.
b. Run through the s file completely one block at a time.
 Performance: Reduces the number of times s blocks are loaded by a factor of
equal to the number of r records than can fit in main memory.
 Considerations during query Optimization:
– Narrow down intermediate result sets quickly. SELECT and

34
PROJECTION before JOIN
BY: MA
35
BY: MA
36
BY: MA
37
BY: MA
Cost Estimation Approach to Query Optimization

 The main idea is to minimize the cost of processing a query. The cost
function is comprised of:
 I/O cost + CPU processing cost + communication cost + Storage cost
 These components might have different weights in different processing
environments
 The DBMs will use information stored in the system catalogue for the
purpose of estimating cost.
 The main target of query optimization is to minimize the size of the
intermediate relation. The size will have effect in the cost of:
– Disk Access
– Data Transportation
– Storage space in the Primary Memory
– Writing on Disk
BY: MA
Using Selectivity and Cost Estimates in Query
Optimization

 Cost-based query optimization:


– Estimate and compare the costs of executing a query using
different execution strategies and choose the strategy with the
lowest cost estimate. (Compare to heuristic query optimization)
 Cost Components for Query Execution
1. Access cost to secondary storage
2. Storage cost
3. Computation cost
4. Communication cost

BY: MA
1. Access Cost of Secondary Storage

 Data is going to be accessed from secondary storage, as a query will


be needing some part of the data stored in the database. The disk
access cost can again be analyzed in terms of:
– Searching
– Reading, and
– Writing, data blocks used to store some portion of a relation.
 Remark: The disk access cost will vary depending on
– The file organization used and the access method implemented
for the file organization.
– whether the data is stored contiguously or in scattered manner,
will affect the disk access cost.
BY: MA
2. Storage Cost

 While processing a query, as any query would be composed


of many database operations, there could be one or more
intermediate results before reaching the final output.
 These intermediate results should be stored in primary
memory for further processing.
 The bigger the intermediate relation, the larger the memory
requirement, which will have impact on the limited available
space. This will be considered as a cost of storage.

BY: MA
3. Computation Cost : Query is composed of many operations. The
operations could be database operations like reading and writing to a
disk, or mathematical and other operations like:
– Searching
– Sorting
– Merging
– Computation on field values
4. Communication Cost : In most database systems the database resides in
one station and various queries originate from different terminals. This
will have impact on the performance of the system adding cost for
query processing. Thus, the cost of transporting data between the
database site and the terminal from where the query originate should be
BY: MA analyzed.
3. Semantic query
optimization

 It is used with the combination heuristic query transformation rules.


 It uses constraints specified on the database schema such as unique
attributes and other more complex constraints, in order to modify one
query into another query that is more efficient to execute.
 Consider the following SQL query,
SELECT E.LNAME, M.LNAME FROM EMPLOYEE E M
WHEREE.SUPERSSN=M.SSN AND E.SALARY>M.SALARY
 Explanation:
– Suppose that we had a constraint on the database schema that stated that
no employee can earn more than his or her direct supervisor. If the
semantic query optimizer checks for the existence of this constraint, it
need not execute the query at all because it knows that the result of the
query will be empty. Techniques known as theorem proving can be used
for this purpose.
BY: MA
 THANK YOU !

You might also like