KEMBAR78
Ch-2 Query Processing and Optimization | PDF | Relational Model | Databases
0% found this document useful (0 votes)
68 views21 pages

Ch-2 Query Processing and Optimization

Uploaded by

sefuasfaw021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views21 pages

Ch-2 Query Processing and Optimization

Uploaded by

sefuasfaw021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Chapter 2

2 Query Processing and optimization


2.1 query processing
Query Processing would mean the entire process or activity which involves query translation into low level
instructions, query optimization to save resources, cost estimation or evaluation of query, and extraction of
data from the database.

 Aims of QP:
 transform query written in high-level language (e.g. SQL), into correct and
efficient execution strategy expressed in low-level language (implementing RA);
 Execute strategy to retrieve required data.
The major steps involved in query processing are depicted in the figure below.

Figure 1. Steps in database query


Let us discuss the whole process with an example. Let us consider the following two relations as
the example tables for our discussion;
Employee(Eno, Ename, Phone)
Proj_Assigned(Eno, Proj_No, Role, DOP)
where,
Eno is Employee number,
Ename is Employee name,

1 | P a g e Advanced database pre by Jilo D


Proj_No is Project Number in which an employee is assigned,
Role is the role of an employee in a project,
DOP is duration of the project in months.
With this information, let us write a query to find the list of all employees who are working in a
project which is more than 10 months old.
SELECT Ename
FROM Employee, Proj_Assigned
WHERE Employee.Eno = Proj_Assigned.Eno AND DOP > 10;
Input:

A query written in SQL is given as input to the query processor. For our case, let us consider the
SQL query written above.
Step 1: Parsing

In this step, the parser of the query processor module checks the syntax of the query, the user’s
privileges to execute the query, the table names and attribute names, etc. The correct table names,
attribute names and the privilege of the users can be taken from the system catalog (data
dictionary).
Step 2: Translation
If we have written a valid query, then it is converted from high level language SQL to low level
instruction in Relational Algebra.
For example, our SQL query can be converted into a Relational Algebra equivalent as follows;
πEname(σDOP>10 Λ Employee.Eno=Proj_Assigned.Eno(Employee X Prof_Assigned))
Step 3: Optimizer
Optimizer uses the statistical data stored as part of data dictionary. The statistical data are
information about the size of the table, the length of records, the indexes created on the table, etc.
Optimizer also checks for the conditions and conditional attributes which are parts of the query.
Step 4: Execution Plan
A query can be expressed in many ways. The query processor module, at this stage, using the
information collected in step 3 to find different relational algebra expressions that are equivalent
and return the result of the one which we have written already.
For our example, the query written in Relational algebra can also be written as the one given below;
πEname(Employee ⋈Eno (σDOP>10 (Prof_Assigned)))

2 | P a g e Advanced database pre by Jilo D


So far, we have got two execution plans. Only condition is that both plans should give the same
result.
Step 5: Evaluation
Though we got many execution plans constructed through statistical data, though they return same
result (obvious), they differ in terms of Time consumption to execute the query, or the Space
required executing the query. Hence, it is mandatory choose one plan which obviously consumes
less cost.
At this stage, we choose one execution plan of the several we have developed. This Execution plan
accesses data from the database to give the final result.
In our example, the second plan may be good. In the first plan, we join two relations (costly
operation) then apply the condition (conditions are considered as filters) on the joined relation.
This consumes more time as well as space.
In the second plan, we filter one of the tables (Proj_Assigned) and the result is joined with the
Employee table. This join may need to compare less number of records. Hence, the second plan is
the best (with the information known, not always).
Output:
The final result is shown to the user.
2.2 Query Decomposition

The query decomposition is the first phase of query processing whose aims are to transform a
high-level query into a relational algebra query and to check whether that query is syntactically
and semantically correct. Thus, a query decomposition phase starts with a high-level query and
transforms into a query graph of low-level operations (algebraic expressions), which satisfies the
query. In practice, SQL (a relational calculus query) is used as high-level query language, which
is used in most commercial RDBMSs. The SQL is then decomposed into query blocks (low-level
operations), which form the basic units. The query decomposer goes through five stages of
processing for decomposition into low-level operation and to accomplish the translation into
algebraic expressions. Fig. 2.2 shows the five stages of query decomposer. The five stages of
query decomposition are:

3 | P a g e Advanced database pre by Jilo D


Fig. 2.2 Stages of query decomposer

 Query analysis.
 Query normalization.
 Semantic analysis.
 Query simplifier.
 Query restructuring.
2.2.1 Query Analysis

During the query analysis phase, the query is lexically and syntactically analyzed using the
programming language compilers (parsers) in the same way as conventional programming to
find out any syntax errors. A syntactically legal query is then validated, using the system
catalogues, to ensure that all database objects (relations and attributes) referred to by the query
are defined in the database. It is also verified whether relationships of the attributes and relations
mentioned in the query are correct as per the system catalogue. The type specification of the
query qualifiers and result is also checked at this stage.

2.2.1.1 Query tree notation

At the end of query analysis phase, the high-level query (SQL) is transformed into some internal
representation that is more suitable for processing. This internal representation is typically a kind
of query tree. A query tree is a tree data structure that corresponds to a relational algebra
expression. A query tree is also called as relational algebra tree. The query tree has the following
components:
4 | P a g e Advanced database pre by Jilo D
 Leaf nodes of the tree, representing the base input relations of the query.
 Internal (non-leaf) nodes of the tree, representing an intermediate relation which is the
result of applying an operation in the algebra.
 Root of the tree, representing the result of the query.
 The sequence of operations (or data flow) is directed from leaves to the root.

The query tree is executed by executing an internal node operation wherever its operands are
available. The internal node is then replaced by the relation that results from executing the
operation. The execution terminates when the root node is executed and produces the result
relation for the query.

Let us consider a SQL query in which it is required to list the project number (PROJ-NO.), the
controlling department number (DEPT-NO.), and the department manager’s name (MGR-
NAME), address (MGR-ADD) and date of birth (MGR-DOB) for every project located in
‘Mumbai’. The SQL query can be written as follows:

SELECT (P.PROJ-NO, P.DEPT-NO, E.NAME, E. MGR-ADD, E. MGR-DOB)

FROM
PROJECT
AS P, DEPARTMENT AS D, EMPLOYEE AS E

WHERE P.DEPT-NO = D.D-NUM AND D.MGR-ID = E.EMP-ID AND


P.PROJ-LOCATION = ‘Mumbai’

In the above SQL query, the join condition DEPT-NO = D-NUM relates a project to its
controlling department, whereas the join condition MGR-ID = EMP-ID relates the controlling
department to the employee who manages that department. The equivalent relational algebra
expression for the above SQL query can be written as:

Mumbai-PROJECT ← σPROJ-LOCATION=Mumbai (PROJECT)

CONTROL-DEPT ← (Mumbai-PROJECT ⋈DEPT-NO=D-NUM (DEPARTMENT)

PROJ-DEPT-MGR ← (CONTROL-DEPT ⋈MGR-ID=EMP-ID (EMPLOYEE)

FINAL-RESULT ← ∏PROJ-NO, DEPT-NO, NAME, ADD, DOB PROJ-DEPT-MGR)

Or

5 | P a g e Advanced database pre by Jilo D


∏PROJ-NO, DEPT-NO, EMP-NAME, EMP-ADD, DOB ‘(σ PROJ-LOCATION = ‘Mumbai’ (PROJECT))’ ⋈DEPT-NO=D-
NUM ‘(DEPARTMENT)’ ⋈MGR-ID=EMP-ID ‘(EMPLOYEE)’

Fig. 2.3 shows an example of a query tree for the above SQL statement and relational algebra
expression. This type of query tree is also referred as relational algebra tree.

Fig. 2.3. Query tree representation

(a) Query tree corresponding to the relational algebra expressions

(b) Initial query tree for SQL query

6 | P a g e Advanced database pre by Jilo D


2.2.1.2 Query graph notation

Query graph is sometimes also used for representation of a query, as shown in Fig. 2.4. In query
graph representation, the relations (PROJECT, DEPARTMENT and EMPLOYEE in our
example) in the query are represented by relation nodes. These relation nodes are displayed as
single circle. The constant values from the query selection (project location =‘Mumbai’ in our
example) are represented by constant nodes, displayed as double circles. The selection and join
conditions are represented by the graph edges, for example, P.DEPT-NO = D.DEPT-NUM and
D.MGR-ID=E.EMP-ID, as shown in Fig. 2.4. Finally, the attributes to be retrieved from each
relation are displayed in square brackets above each relation, for example [P.PROJ-NUM,
P.DEPT-NO] and [E.EMP-NAME, E.EMP-ADD, E.EMP-DOB], as shown in Fig. 2.4. A query
graph representation corresponds to a relation calculus expression.

Fig. 2.4. Query graph notation

7 | P a g e Advanced database pre by Jilo D


The disadvantages of a query graph are that it does not indicate an order on which operation to
perform first, as is the case with query tree. Therefore, a query tree representation is preferred
over the query graph in practice. There is only one graph corresponding to each query. Query
tree and query graph notations are used as the basis for the data structures that are used for
internal representation of queries.

2.2.2 Query Normalization

The primary goal of normalization phase is to avoid redundancy. The normalization phase
converts the query into a normalized form that can be more easily manipulated. In the
normalization phase, a set of equivalency rules is applied so that the projection and selection
operations included in the query are simplified to avoid redundancy.

 The transformation of the quantifier-free predicate is straightforward using the well-


known equivalence rules for logical operations (^, v, and ┐):

1. p1 ^ p2 ⇔ p2 ^ p1

2. p1 v p2 ⇔ p2 v p1

3. p1 ^(p2 ^ p3) ⇔(p1 ^ p2)^ p3

4. p1 v(p2 v p3) ⇔(p1 v p2)v p3

5. p1 ^(p2 v p3) ⇔(p1 ^ p2)v(p1 ^ p3)

6. p1 v(p2 ^ p3) ⇔(p1 v p2)^(p1 v p3)

7. ┐(p1 ^ p2) ⇔ ┐ p1 v ┐ p2

8. ┐(p1 v p2) ⇔ ┐ p1 ^ ┐ p2

9. ┐(┐ p) ⇔ p

By applying these equivalency rules, the normalization phase rewrites the query into a normal
form which can be readily manipulated in later steps. The predicate is converted into one of the
following two normal forms:

 Conjunctive normal form.


 Disjunctive normal form.

8 | P a g e Advanced database pre by Jilo D


Conjunctive normal form is a sequence of conjuncts that are connected with the ‘AND’ (‘∧’)
operator. Each conjunct contains one or more terms connected by the ‘OR’ (‘∨’) operator. A
conjunctive selection contains only those tuples that satisfy all conjuncts. An example of
conjunctive normal form can be given as:

(EMP-DESIG=‘Programmer’ ∨ EMP-SALARY > 40000) ∧ LOCATION=‘Mumbai’

Disjunctive normal form is a sequence of disjunct that are connected with the ‘OR’ (‘∨’)
operator. Each disjunct contains one or more terms connected by the ‘AND’ (‘∧’) operator. A
disjunctive selection contains those tuples formed by the union of all tuples that satisfy the
disjunct. An example of disjunctive normal form can be given as:

(EMP-DESIG=‘Programmer’ ∧ LOCATION=‘Mumbai’) ∨ (EMP-SALARY > 40000∧


LOCATION=‘Mumbai’)

2.2.3 Semantic Analysis

The objective of semantic analyzer phase of query processing is to reduce the number of
predicates that must be evaluated by refuting incorrect or contradictory queries or qualifications.
The semantic analyzer rejects the normalized queries that are incorrectly formulated or
contradictory. A query is incorrectly formulated if components do not contribute to the
generation of the result. This happens in case of missing join specification. A query is
contradictory if its predicate cannot satisfy by any tuple in the relation. The semantic analyzer
examines the relational calculus query (SQL) to make sure it contains only data objects (that is,
tables, columns, views, indexes) that are defined in the database catalogue. It makes sure that
each object in the query is referenced correctly according to its data type.

In case of missing join specifications the components do not contribute to the generation of the
results, and thus, a query may be incorrectly formulated. A query is contradictory if its predicate
cannot be satisfied by any tuple. For example, let us consider the following query:

(EMP-DESIG=‘Programmer’∧ EMP-DESIG=‘Analyst’)

As an employee cannot be both ‘Programmer’ and ‘Analyst’ simultaneously, the above predicate
on the EMPLOYEE relation is contradictory.

9 | P a g e Advanced database pre by Jilo D


Example of Correctness and Contradiction

Let us consider the following SQL query:

SELECT (P.PROJ-NO, P.PROJ-LOCATION)

FROM PROJECT AS P, VIEWING AS V, DEPARTMENT AS D

WHERE D.DEPT-ID = V.DEPT-ID AND D.MAX-BUDGET > = 85000 AND


D.COMPLETION YEAR = ‘2005’ AND P.PROJ-MGR = ‘Mathew’;

That means, query is not correctly formulated. In this graph, the join condition (V.PROJ-NO =
P.PROJ-NO) has been omitted.

Now let us consider the SQL query given as:

SELECT (P.PROJ-NO, P.PROJ-LOCATION)

FROM PROJECT AS P, COST_OF_PROJECT AS C, DEPARTMENT AS D

WHERE D.MAX-BUDGET > 85000 AND D.DEPT-ID = V.DEPT-ID AND


V.PROJ-NO = P.PROJ-NO AND D.COMPL-YEAR = ‘2005’ AND
D.MAX-BUDGET < 50000;

This graph has a cycle between the nodes D.MAX-BUDGET and 0 with a negative valuation
sum. Thus, it indicates that the query is contradictory. Clearly, we cannot have a department with
a maximum budget that is both greater than INR 85,000 and less than INR 50000.
2.2.4 Query Simplifier

The objectives of a query simplifier are to detect redundant qualification, eliminate common sub-
expressions and transform sub-graphs (query) to semantically equivalent but more easily and
efficiently computed forms. Commonly integrity constraints, view definitions and access
restrictions are introduced into the graph at this stage of analysis so that the query can be
simplified as much as possible. Integrity constraints define constants which must hold for all
states of the database, so any query that contradicts an integrity constraint must be void and can
be rejected without accessing the database. If the user does not have the appropriate access to all
the components of the query, the query must be rejected. Queries expressed in terms of views
can be simplified by substituting the view definition, since this will avoid having to materialize

10 | P a g e Advanced database pre by Jilo D


the view before evaluating the query predicate on it. A query that violates an access restriction
cannot have an answer returned to the user, so can be answered without accessing the database.
The final form of simplification is obtained by applying the idempotence rules of Boolean
algebra, as shown in Table 2.1

Table 2.1. Idempotence rules of Boolean algebra

Rule Description Rule Format

1. PRED AND PRED ↔ PRED (P ∧ (P) ≡ P)

2. PRED AND TRUE ↔ PRED (P ∧ TRUE ≡ P)

3. PRED AND FALSE ↔ FALSE (P ∧ FALSE ≡ FALSE)

4. PRED AND NOT (PRED) ↔ FALSE (P ∧ (~P) ≡ FALSE)

5. PRED1 AND (PRED1 OR PRED2) ↔ (P1 ∧ (P1 ∨ P2) ≡ P1)


PRED1

6. PRED OR PRED ↔ PRED (P ∨ (P) ≡ P)

7. PRED OR TRUE ↔ TRUE (P ∨ TRUE ≡ TRUE)

8. PRED OR FALSE ↔ PRED (P ∨ FALSE ≡ P)

9. PRED OR NOT (PRED) ↔ TRUE (P ∨ (~P) ≡ TRUE)

10. PRED1 OR (PRED1 AND PRED2) ↔ (P1 ∨ (P1 ∧ P2) ≡ P1)


PRED1

11 | P a g e Advanced database pre by Jilo D


Example of using idempotence rules

Let us consider the following query:

SELECT D.DEPT-ID, M.BRANCH-MGR, M.BRANCH-ID,


B.BRANCH-ID, B.BRANCH-LOCATION,
E.EMP-NAME, E.EMP-SALARY

FROM DEPARTMENT AS D, MANAGER AS M, BRANCH AS B

WHERE D.DEPT-ID =M.DEPT-ID

AND M.BRANCH-ID = B.BRANCH-ID

AND M.BRANCH-MGR = E.EMP-ID

AND B.BRANCH-LOCATION = ‘Mumbai’

AND NOT (B.BRANCH-LOCATION = ‘Mumbai’

AND B.BRANCH-LOCATION = ‘Delhi’)

AND B.PROFITS-TO-DATE > 100,00,000.00

AND E.EMP-SALARY > 85,000.00

AND NOT (B.BRANCH-LOCATION = ‘Delhi’)

AND D.DEPT-LOCATION = ‘Bangalore’

12 | P a g e Advanced database pre by Jilo D


Let us examine the following part of the above query statement in greater detail:

AND B.BRANCH-LOCATION = ‘Mumbai’

AND NOT (B.BRANCH-LOCATION = ‘Mumbai’

AND B.BRANCH-LOCATION = ‘Delhi’)

In the above query statement, let us equate as follows:

B.BRANCH-LOCATION = ‘Mumbai’ = PRED1

B.BRANCH-LOCATION = ‘Mumbai’ = PRED2

B.BRANCH-LOCATION = ‘Delhi’ = PRED3

Now, the above part of the query can be represented in the form of idempotence rules of Boolean
algebra as follows:

PRED1 AND NOT (PRED2 AND PRED3) = P1 ∧ ~(P2 ∧ P3)

By applying equivalency rule

(PRED1 AND (NOT (PRED1)) AND NOT (PRED3) = (P1 ∧ (~P1)) AND ~ (P3)

The query normalizer now applies rule 4 of idempotency rules (Table 2.1) of query simplifier
phase and obtains the following form:

FALSE AND NOT (PRED3) = FALSE ∧ ~ (P3)

The above form is equivalent to NOT (PRED3) or ~ (P3).

Thus, in the above example, the original query contained many redundant predicates, which were
eliminated without changing the semantics of the query.

13 | P a g e Advanced database pre by Jilo D


2.2.5 Query Restructuring

In the final stage of query decomposition, the query can be restructured to give a more efficient
implementation. Transformation rules are used to convert one relational algebra expression into
an equivalent form that is more efficient. The query can now be regarded as a relational algebra
program, consisting of a series of operations on relations.

2.3 Query Optimization

The primary goal of query optimizer is of choosing an efficient execution strategy for processing
a query. The query optimizer attempts to minimize the use of certain resources (mainly the
number of I/Os and CPU time) by choosing the best of a set of alternative query access plans.
Query optimization starts during the validation phase by the system to validate whether the user
has appropriate privileges. Existing statistics for the tables and columns are located, such as how
many rows (tuples) exist in the table and relevant indexes are found with their own applicable
statistics. Now an access plan is generated to perform the query. The access plan is then put into
effect with the execution plan of generated during query processing phase, wherein the indexes
and tables are accessed and the answer to the query is derived from the data.

Fig. 2.5 shows a detailed block diagram of query optimizer. Following four main inputs are used
in the query optimizer module:

 Relational algebra query trees generated by the query simplifier module of query
decomposer.
 Estimation formulas used to determine the cardinality of the intermediate result tables.
 A cost model.
 Statistical data from the database catalogue.
Fig. 2.5. Detailed block diagram of query optimizer

14 | P a g e Advanced database pre by Jilo D


The output of the query optimizer is the execution plan in form of optimized relational algebra
query. A query typically has many possible execution strategies, and the process of choosing a
suitable one for processing a query is known as query optimization. The basic issues in query
optimization are:

 How to use available indexes.


 How to use memory to accumulate information and perform intermediate steps such as
sorting.
 How to determine the order in which joins should be performed.

The term query optimization does not mean giving always an optimal (best) strategy as the
execution plan. It is just a reasonably efficient strategy for execution of the query. The
decomposed query blocks of SQL is translated into an equivalent extended relational algebra
expression (or operators) and then optimized. There are two main techniques for implementing
query optimization.

1 heuristic query optimization

2 Transformation rule

2.3.1 Heuristic query optimization


Heuristic rules are used as an optimization technique to modify the internal representation of a
query. Now, the heuristic query optimizer transforms the initial (canonical) query tree into a final
query tree using equivalence transformation rules. These final query trees are efficient to
execute.

15 | P a g e Advanced database pre by Jilo D


Let us consider the following relations of a company database:

EMPLOYEE (EMP-NAME, EMP-ID, BIRTH-DATE,


EMP-ADDRESS, SEX, EMP-SALARY,
EMP-DEPT-NO)

DEPARTMENT (DEPT-NAME, DEPT-NO, DEPT-MGR-ID,


DEPT-MGR-START-DATE)

DEPT_LOCATION (DEPT-NO, DEPT-LOCATION)

PROJECT (PROJ-NAME, PROJ-NO, PROJ-LOCATION, PROJ-DEPT-NO

WORKS_ON (E-ID, P-NO, HOURS)

DEPENDENT (E-ID, DEPENDENT-NAME, SEX, BIRTH-DATE, RELATION)

Now, let us consider a query in the above database to find the names of employees born after
1970 whop work on a project named ‘Growth’. This SQL query can be written as follows:

SELECT EMP-NAME

FROM EMPLOYEE, WORKS_ON, PROJECT

WHERE PROJ-NAME = ‘Growth’ AND PROJ-NO = P-NO AND

E-ID = EMP-ID AND BIRTH-DATE = ‘31-12-1970’;

Fig. 2.6 shows the improved query tree for the above SQL query. It can be observed that by
executing this initial query tree directly creates a very large file containing the CARTESIAN
PRODUCT (×) of the entire EMPLOYEE, WORKS_ON and PROJECT files. But, the query
needed only one tuple (record) from the PROJECT relation for the ‘Growth’ project and only the
EMPLOYEE records for those whose date of birth is after ‘31-12-1970’.

16 | P a g e Advanced database pre by Jilo D


Fig. 2.6. Steps in converting query tree during heuristic optimization

Improved query tree by moving PROJECT operations down the query

The improvement in the query tree can be achieved by keeping only the attributes needed by the
subsequent operations in the intermediate relations, by including PROJECT (∏) operations in the
query tree, as shown in Fig. 2.6. This reduces the attributes (columns or fields) of the
intermediate relations, whereas the SELECT operations reduce the number of tuples (rows or
records).

To summarize, we can conclude from the preceding example that a query tree can
be transformed step by step into another more efficient executable query tree. But,
one must ensure that the transformation steps always lead to an equivalent query
tree and the desired output is achieved.

17 | P a g e Advanced database pre by Jilo D


2.3.2 Transformation Rule

Transformation rules are used by the query optimizer to transform one relational algebra
expression into an equivalent expression that is more efficient to execute. A relation is
considered as equivalent of another relation if two relations have the same set of attributes in a
different order but representing the same information. These transformation rules are used to
restructure the initial (canonical) relational algebra query tree generated during query
decomposition. Let us consider three relations R, S and T, with R defined over the attributes A =
{A1, A2,........, An} and S defined over B = {B1, B2,........, Bn}. c = {c1, c2,........, cn}, denote
predicates and L, L1, L2, M, M1, M1, N denote sets of attributes.

Rule 1: Cascading of Selection (σ)

σc1.AND.c2.AND...AND.Cn (R) ≡ σC1 (σc2 (...(σcn (R))...))

Or

σc1∧c2.∧...∧cn (R) ≡ σC1 (σc2 (...(σcn (R))...))

Example:

σBRANCH-LOCATION=‘Mumbai’∧EMP-SALARY> 85000(EMPLOYEE) ≡ σBRANCH-LOCATION=‘Mumbai’ (σEMP-


SALARY> 85000 (EMPLOYEE))

Rule 2: Commutativity of Selection (σ)

σc1 (σc2 (R)) ≡ σc2 (σc1 (R))

Example:

σBRANCH-LOCATION=‘Mumbai’ (σEMP-SALARY > 85000)(EMPLOYEE) ≡ (σEMP-SALARY > 85000= (σBRANCH-


LOCATION=‘Mumbai’ (EMPLOYEE)

Rule 3: Cascading of Projection (∏)

∏L ∏m..........∏N(R) ≡ ∏L

Example:

∏EMP-NAME ∏BRANCH-LOCATION,EMP-NAME (EMPLOYEE) ≡ ∏EMP-NAME (EMPLOYEE)

Rule 4: Commutativity of Selection (σ) and Projection (∏)

∏A1,A2..........,An (σc(R)) ≡ (σc(∏A1,A2,..........An (R))

Example:

18 | P a g e Advanced database pre by Jilo D


∏EMP-NAME, EMP-DOB(σEMP-NAME=‘Thomas’ (EMPLOYEE) ≡ σEMP-NAME=‘Thomas’ (∏EMP-NAME,EMP-
DOB (EMPLOYEE)

Rule 5: Commutatively of Join (⋈) and Cartesian product (×)

R⋈c S ≡ S⋈c R

R×S≡S×R

Example:

EMPLOYEE ⋈EMPLYEE.BRANCH-NO=BRANCH.BRANCH-NO (BRANCH) ≡ STAFF⋈EMPLOYEE.BRANCH-


NO=BRANCH.BRANCH-NO (EMPLOYEE)

Rule 6: Commutativity of Selection (σ) and Join (⋈) or Cartesian product (×)

σc R⋈S ≡ (σc ⋈ S

σc (R × S) ≡ (σc (R)) × S

Alternatively, if the selection predicate is a conjunctive predicate of the form (c1 AND c2,
or c1 ∧ c2) condition c1 involves only the attributes of R and condition c2 involves only the
attributes of S, the selection and join operations commute as follows:

σc1∧c2 (R ⋈ S) ≡ (σc1 (R)) ⋈ σc2 (S))

σc1∧c2 (R × S) ≡ (σc1 (R)) × σc2 (S))

Example:

σEMP-TITLE = ‘Manager’ ∧ CITY=‘Mumbai’ (EMPLOYEE)⋈EMPLOYEE.BRANCH-NO=BRANCH.BRANCH-


NO (BRANCH) ≡ σEMP-TITLE=‘Manager’ (EMPLOYEE)⋈EMPLOYEE.BRANCH-NO=BRANCH.BRANCH-
NO (σCITY=‘Mumbai’ (BRANCH)

Rule 7: Commutativity of Projection (∏) and Join (⋈) or Cartesian product (×)

∏L1∪L2 (R⋈c S) ≡ ∏L1 (R)) ⋈c ∏L2 (S))

Example:

19 | P a g e Advanced database pre by Jilo D


∏EMP-TITLE, CITY, BRANCH-NO (EMPLOYEE)⋈EMPLOYEE.BRANCH-NO=BRANCH.BRANCH-NO (BRANCH) ≡
(∏EMP-TITLE, BRANCH-NO (EMPLOYEE))⋈EMPLOYEE.BRANCH-NO=BRANCH.BRANCH-NO (∏CITY, BRANCH-
NO (BRANCH))

If the join condition c contains additional attributes not in L (say


attributes M = M1 ∪ M2 where M1 involves only attributes of R, and M2 involves only attributes
of S, then these must be added to the projection list and a final projection (P) operation is needed
as follows:

∏L1∪L2 (R⋈c S) ≡ ∏L1 ∪ L1 (∏L1∪M1 (R))⋈c ∏L2 ∪M2 (S))

Example:

∏EMP-TITLE, CITY (EMPLOYEE)⋈EMPLOYEE.BRANCH-NO=BRANCH.BRANCH-NO (BRANCH) ≡ (∏EMP-


TITLE, CITY (∏EMP-TITLE, BRANCH-NO.(EMPLOYEE) ⋈EMPLOYEE.BRANCH-NO=BRANCH.BRANCH-NO (∏CITY,
BRANCH-NO. (BRANCH))

Rule 8: Commutativity of Union (∪) and Intersection (∩)

R∪S≡S∪R

R∩S≡S∩R

Rule 9: Commutativity of Selection (σ) and set of operations such as Union (∪), Intersection (∩) and set
difference (–)

σc(R ∪ S) ≡ σc (S) ∪ σc (R)

σc(R ∩ S) ≡ σc (S) ∩ σc (R)

σc(R – S) ≡ σc (S) – σc (R)

If θ stands for any of the set of operations such as Union (∪), Intersection (∩) or set difference (–
), then the above expression can be written as:

σc (R θ S) ≡ (σc (R)) θ (σc (S))

Rule 10: Commutativity of Projection (∏) and Union (∪)

∏L (R ∪ S) ≡ (∏L (R)) ∪ (∏L S))

Rule 11: Associativity of Join (⋈) and Cartesian product (×)

(R ⋈ S) ⋈ T ≡ R ⋈ (S ⋈ T)

20 | P a g e Advanced database pre by Jilo D


(R × S) × T ≡ R × (S × T)

If the join condition c involves only attributes from the relation S and T, then join is associative
in the following manner:

(R ⋈c1 S) ⋈ c2/c3 T ≡ R ⋈ c1∧c3 (S ⋈c2 T)

If θ stands for any of the set of operations such as Join (⋈), Union (∪), Intersection (∩) or
Cartesian product (×), then the above expression can be written as:

(R θ S) θ T ≡ R θ (S θ T)

Rule 12: Associativity of Union (∪) and Intersection (∩)

(R ∪ S) ∪ T ≡ S ∪ (R ∪ T)

(R ∩ S) ∩ T ≡ S ∩ (R ∩ T)

Rule 13: Converting a Selection and Cartesian Product (σ, ×) sequence into Join (⋈)

σc (R × S) ≡ (R ⋈c S)

2.4 Pipelining
When a query is composed of several relational algebra operators, the result of one operator is
sometimes pipelined to another operator without creating a temporary relation to hold the
intermediate result. When the input relation to a unary operation (for example, selection or
projection) is pipelined into it, it is sometimes said that the operation is applied on-the-
fly. Pipelining (or on-the-fly processing) is sometimes used to improve the performance of the
queries. As we know that the results of intermediate algebra operations are stored on the
secondary storage or disk, which are temporarily written. If the output of an operator operation is
saved in a temporary relation for processing by the next operator, it is said that the tuples
are materialized. Thus, this process of temporarily writing intermediate algebra operations is
called materialization.

21 | P a g e Advanced database pre by Jilo D

You might also like