UNIT 6
Query Processing and Optimization
Contents
• Evaluation of relational algebra expressions
• Query equivalence
• Join strategies
Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Basic Steps in Query Processing (Cont.)
• Parsing and translation
• translate the query into its internal form. This is then translated into relational
algebra.
• Parser checks syntax, verifies relations
• Evaluation
• The query-execution engine takes a query-evaluation plan, executes that plan,
and returns the answers to the query.
Basic Steps in Query Processing: Optimization
• A relational algebra expression may have many equivalent expressions
• E.g., salary75000(salary(instructor)) is equivalent to
salary(salary75000(instructor))
• Each relational algebra operation can be evaluated using one of several
different algorithms
• Correspondingly, a relational-algebra expression can be evaluated in many
ways.
• Annotated expression specifying detailed evaluation strategy is called
an evaluation-plan. E.g.,:
• Use an index on salary to find instructors with salary < 75000,
• Or perform complete relation scan and discard instructors with salary 75000
Basic Steps: Optimization (Cont.)
• Query Optimization: Amongst all equivalent evaluation plans choose
the one with lowest cost.
• Cost is estimated using statistical information from the
database catalog
• e.g.. number of tuples in each relation, size of tuples, etc.
Alternative ways of evaluating a given query
An evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the
operations is coordinated.
• Cost difference between evaluation plans for a query can be enormous
• E.g., seconds vs. days in some cases
• Steps in cost-based query optimization
1. Generate logically equivalent expressions using equivalence rules
2. Annotate resultant expressions to get alternative query plans
3. Choose the cheapest plan based on estimated cost
• Estimation of plan cost based on:
• Statistical information about relations. Examples:
• number of tuples, number of distinct values for an attribute
• Statistics estimation for intermediate results
• to compute cost of complex expressions
• Cost formulae for algorithms, computed using statistics
Transformation of Relational Expressions
• Two relational algebra expressions are said to be equivalent if the two
expressions generate the same set of tuples on every legal database instance
• Note: order of tuples is irrelevant
• we don’t care if they generate different results on databases that violate integrity
constraints
• In SQL, inputs and outputs are multisets of tuples
• Two expressions in the multiset version of the relational algebra are said to be
equivalent if the two expressions generate the same multiset of tuples on every legal
database instance.
• An equivalence rule says that expressions of two forms are equivalent
• Can replace expression of first form by second, or vice versa
Equivalence Rules
1. Conjunctive selection operations can be deconstructed into a sequence of
individual selections.
σ1 2 (E) ≡ σ1 (σ2 (E))
2. Selection operations are commutative.
σ1(σ2(E)) ≡ σ2 (σ1(E))
3. Only the last in a sequence of projection operations is needed, the others
can be omitted.
L1( L2(…( Ln(E))…)) ≡ L1(E)
where L1 ⊆ L2 … ⊆ Ln
4. Selections can be combined with Cartesian products and theta joins.
a. σ (E1 x E2) ≡ E1 ⨝ E2
b. σ 1 (E1 ⨝2 E2) ≡ E1 ⨝ 1∧2 E2
Equivalence Rules (Cont.)
5. Theta-join operations (and natural joins) are commutative.
E1 ⨝ E2 ≡ E2 ⨝ E1
6. (a) Natural join operations are associative:
(E1 ⨝ E2) ⨝ E3 ≡ E1 ⨝ (E2 ⨝ E3)
(b) Theta joins are associative in the following manner:
(E1 ⨝ 1 E2) ⨝2 3 E3 ≡ E1 ⨝1 3 (E2 ⨝2 E3)
where 2 involves attributes from only E2 and E3.
Pictorial Depiction of Equivalence Rules
Equivalence Rules (Cont.)
7. The selection operation distributes over the theta join operation
under the following two conditions:
(a) When all the attributes in 0 involve only the attributes of one
of the expressions (E1) being joined.
0 E1 ⨝ E2) ≡ (0(E1)) ⨝ E2
(b) When 1 involves only the attributes of E1 and 2 involves
only the attributes of E2.
1 2 E1 ⨝ E2) ≡ (1(E1)) ⨝ (2(E2))
Equivalence Rules (Cont.)
8. The projection operation distributes over the theta join operation
as follows:
(a) if involves only attributes from L1 L2:
L1 L2(E1 ⨝ E2) ≡ L1(E1) ⨝ L2(E2)
(b) In general, consider a join E1 ⨝ E2.
• Let L1 and L2 be sets of attributes from E1 and E2, respectively.
• Let L3 be attributes of E1 that are involved in join condition , but are not in L1
L2, and
• let L4 be attributes of E2 that are involved in join condition , but are not in L1
L2.
L1 L2(E1 ⨝ E2) ≡ L1 L2( L1 L3(E1) ⨝ L2 L4(E2))
Equivalence Rules (Cont.)
•9. The set operations union and intersection are commutative
E1 E2 ≡ E2 E1
E1 E2 ≡ E2 E1
(set difference is not commutative).
10. Set union and intersection are associative.
(E1 E2 ) E3 ≡ E1 (E2 E3)
(E1 E2 ) E3 ≡ E1 (E2 E3)
11. The selection operation distributes over , and –.
a. (E1 E2) ≡ (E1) (E2)
b. (E1 E2) ≡ (E1) (E2)
c. (E1 – E2) ≡ (E1) – (E2)
d. (E1 E2) ≡ (E1) E2
e. (E1 – E2) ≡ (E1) – E2
preceding equivalence does not hold for
12. The projection operation distributes over union
L(E1 E2) ≡ (L(E1)) (L(E2))
Transformation Example: Pushing Selections
• Query: Find the names of all instructors in the Music department,
along with the titles of the courses that they teach
• name, title(dept_name= ‘Music’
(instructor ⨝ (teaches ⨝ course_id, title (course))))
• Transformation using rule 7a.
• name, title((dept_name= ‘Music’(instructor)) ⨝
(teaches ⨝ course_id, title (course)))
• Performing the selection as early as possible reduces the size of the
relation to be joined.
Multiple Transformations (Cont.)
Join Ordering Example
• For all relations r1, r2, and r3,
(r1 ⨝ r2) ⨝ r3 = r1 ⨝ (r2 ⨝ r3 )
(Join Associativity) ⨝
• If r2 ⨝ r3 is quite large and r1 ⨝ r2 is small, we choose
(r1 ⨝ r2) ⨝ r3
so that we compute and store a smaller temporary relation.
Join Ordering Example (Cont.)
• Consider the expression
name, title(dept_name= “Music” (instructor) ⨝ teaches)
⨝ course_id, title (course))))
• Could compute teaches ⨝ course_id, title (course) first, and join result with
dept_name= “Music” (instructor)
but the result of the first join is likely to be a large relation.
• Only a small fraction of the university’s instructors are likely to be from
the Music department
• it is better to compute
dept_name= “Music” (instructor) ⨝ teaches
first.
THANK
YOU