KEMBAR78
Advanced Database Systems Chapter 2 | PDF | Parsing | Databases
100% found this document useful (1 vote)
1K views16 pages

Advanced Database Systems Chapter 2

This chapter discusses query processing and optimization concepts. It describes the typical phases of query processing as decomposition, optimization, code generation, and execution. Query decomposition involves parsing, validation, normalization, semantic analysis, simplification, and restructuring. The goal of query optimization is to choose the most efficient execution plan by considering different strategies and estimating their costs. Traditional techniques work well for standard databases but new approaches are needed for complex, distributed data environments.

Uploaded by

Jundu Omer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views16 pages

Advanced Database Systems Chapter 2

This chapter discusses query processing and optimization concepts. It describes the typical phases of query processing as decomposition, optimization, code generation, and execution. Query decomposition involves parsing, validation, normalization, semantic analysis, simplification, and restructuring. The goal of query optimization is to choose the most efficient execution plan by considering different strategies and estimating their costs. Traditional techniques work well for standard databases but new approaches are needed for complex, distributed data environments.

Uploaded by

Jundu Omer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

CHAPTER TWO

2. Query Processing and Optimization

Learning Objectives: This chapter, discusses different query processing and query optimization
concepts by using different algorithms. After completing this chapter the learner should be familiar
with the following concepts:
 Query Processing
 Query processing steps
 Query optimization
 Query optimizer approaches
 Transformation rules
 Cost estimation approach for query
 Pipelining
2.1. Overview of Query Processing and Optimization
Query processing: The activities involved in retrieving data from the database are called as query
processing. The activities involved in par1sing, validating, optimizing, and executing a query. The
aims of query processing are to transform a query written in a high-level language (SQL) into low-
level language (implementing the relational algebra). An important aspect of query processing is
query optimization.
Query optimization: The activity of choosing an efficient execution strategy for processing a
query is called as query optimization. The aim of query optimization is to choose the one that
minimizes the resource usage. A DBMS uses different techniques to process, optimize, and execute
highlevel queries (SQL). A query expressed in high-level query language must be first scanned,
parsed, and validated.
The scanner identifies the language components (tokens) in the text of the query, while the parser
checks the correctness of the query syntax. The query is also validated (by accessing the system
catalog) whether the attribute names and relation names are valid. An internal representation (tree
or graph) of the query is created. Queries are parsed and then presented to a query optimizer, which
is responsible for identifying an efficient plan. The optimizer generates alternative plans and
chooses the plan with the least estimated cost.

1
2.2. Query Processing
The aim of query processing is to find information in one or more databases and deliver it to the
user quickly and efficiently. Traditional techniques work well for databases with standard, single-
site relational structures, but databases containing more complex and diverse types of data demand
new query processing and optimization techniques.

2.2.1. Query Processing Phases


Query processing can be divided into four main phases: decomposition (consisting of parsing and
validation), optimization, code generation, and execution, as illustrated in Figure 2.1.

Query in high-level language (SQL)


Query
Decomposition
Database catalog
Relational algebra expression
Query
Optimization
Execution plan Database statistics
Query
Generation
Generated code
Runtime query
execution
Query output Main database

Figure 2-1: Typical phases when processing a high-level query.


Basic Steps in Query Processing:

Step 1. Parsing and translation: System checks the syntax of the query.
 Creates a parse-tree representation of the query.
 Translates the query into a relational-algebra expression.
 Parser checks syntax, verifies relations
Step2: Optimization: Optimization – finding the cheapest evaluation plan for a query.
 Each relational-algebra operation can be executed by one of several different algorithms.
 A query optimizer must know the cost of each operation.

2
Step 3: Evaluation: The query-execution engine takes a query-evaluation plan, executes that plan,
and returns the answers to the query.

2.2.1.1. Query Decomposition


Query decomposition is the first phase of query processing. The aims of query decomposition are
to transform a high-level query into a relational algebra query, and to check that the query is
syntactically and semantically correct. The typical stages of query decomposition are analysis,
normalization, semantic analysis, simplification, and query restructuring. Also, query
decomposition consists of parsing and validation. Typical stages in query decomposition are:
1. Analysis: lexical and syntactical analysis of the query correctness. In this stage, the high-level
query has been transformed into some internal representation that is more suitable for
processing. Query tree will be built for the query processing. The internal form that is typically
chosen is some kind of query tree, which is constructed as follows:
 A leaf node is created for each base relation in the query.
 A non-leaf node is created for each intermediate relation produced by a relational algebra
operation.
 The root of the tree represents the result of the query.
 The sequence of operations is directed from the leaves to the root.

Figure 2-2: Example relational algebra tree.


2. Normalization: The normalization stage of query processing converts the query into a
normalized form that can be more easily manipulated. The predicate WHERE will be

converted to Conjunctive (v) or Disjunctive (^) Normal form.

 Conjunctive normal form: A sequence of conjuncts that are connected with the ∧ (AND)

3
operator. Each conjunct contains one or more terms connected by the ∨ (OR) operator. For
example: (position = ‘Manager’ ∨ salary > 20000) ∧ branchNo = ‘B003’. A conjunctive
selection contains only those tuples that satisfy all conjuncts.
 Disjunctive normal form: A sequence of disjuncts that are connected with the ∨ (OR)
operator. Each disjunct contains one or more terms connected by the ∧ (AND) operator. For
example, we could rewrite the above conjunctive normal form as: (position =‘Manager’ ∧
branchNo =‘B003’ ) ∨(salary >20000 ∧ branchNo =‘B003’). A disjunctive selection contains
those tuples formed by the union of all tuples that satisfy the disjuncts.
3. Semantic Analysis: The objective of semantic analysis is to reject normalized queries that are
incorrectly formulated or contradictory. A query is incorrectly formulated if components do
not contribute to the generation of the result, which may happen if some join specifications are
missing. A query is contradictory if its predicate cannot be satisfied by any tuple. For example,
the predicate (position = ‘Manager’ ∧ position = ‘Assistant’) on the Staff relation is
contradictory, as a member of staff cannot be both a Manager and an Assistant simultaneously.
However, the predicate ((position = ‘Manager’ ∧ position = ‘Assistant’) ∨ salary > 20000)
could be simplified to (salary > 20000) by interpreting the contradictory clause as the boolean
value FALSE. Unfortunately, the handling of contradictory clauses is not consistent between
DBMSs. Algorithms to handle contradictory clauses are.
 Construct a relation connection graph: If the graph is not connected, the query is incorrectly
formulated that represent the source of projection operations.
 Construct a normalized attribute connection graph: If the graph has a cycle for which the
valuation sum is negative, the query is contradictory that represents a selection operation.
4. Simplification: The objectives of the simplification stage are to detect redundant
qualifications, eliminate common subexpressions, and transform the query to a semantically
equivalent but more easily and efficiently computed form. Typically, access restrictions, view
definitions, and integrity constraints are considered at this stage. If the user does not have the
appropriate access to all the components of the query, the query must be rejected. For example:
CREATE VIEW Staff3 AS SELECT * SELECT staffNo, fName, lName, salary, branchNo
FROM Staff WHERE branchNo = ‘B003’ and salary > 20000;
5. Query Restructuring: In the final stage of query decomposition, the query is restructured to
provide a more efficient implementation. More than one translation is possible use

4
transformation rules.
Most real-world data is not well structured. Today's databases typically contain much non-
structured data such as text, images, video, and audio, often distributed across computer networks.
In this complex environment, efficient and accurate query processing becomes quite challenging.
There could be tons of tricks (not only in storage and query processing, but also in concurrency
control, recovery, etc.)
2.3. Query Optimization
The activity of choosing an efficient execution strategy for processing a query is called as query
optimization. Everyone wants the performance of their database to be optimal. In particular, there
is often a requirement for a specific query or object that is query based, to run faster. Problem of
query optimization is to find the sequence of steps that produces the answer to user request in the
most efficient manner, given the database structure. The performance of a query is affected by the
tables or queries that underlies the query and by the complexity of the query. When data/workload
characteristics change:
 The best navigation strategy changes
 The best way of organizing the data changes
Query optimizers are one of the main means by which modern database systems achieve their
performance advantages. Given a request for data manipulation or retrieval, an optimizer will
choose an optimal plan for evaluating the request from among the manifold alternative
strategies. That means there are many ways (access paths) for accessing desired file/record. The
optimizer tries to select the most efficient (cheapest) access path for accessing the data. DBMS is
responsible to pick the best execution strategy based on various considerations. Query optimizers
were already among the largest and most complex modules of database systems.
Most efficient processing: Least amount of I/O and CPU resources.
Selection of the best method: In a non-procedural language the system does the optimization at
the time of execution. On the other hand, in a procedural language, programmers have some
flexibility in selecting the best method. For optimizing the execution of a query the programmer
must know:
 File organization.
 Record access mechanism and primary or secondary key.
 Data location on disk.

5
 Data access limitations.
To write correct code, application programmers need to know how data is organized physically
(e.g., which indexes exist), to write efficient code, application programmers also need to worry
about data/workload characteristics. E.g.: Consider relations r(AB) and s(CD). We require r X s.
Method 1
a. Load next record of r in RAM.
b. Load all records of s, one at a time and concatenate with r.
c. All records of r concatenated?
NO: goto a.
YES: exit (the result in RAM or on disk).
Performance: Too many accesses.
Method 2: Improvement
a. Load as many blocks of r as possible leaving room for one block of s.
b. Run through the s file completely one block at a time.
Performance: Reduces the number of times s blocks are loaded by a factor of equal to the number
of r records than can fit in main memory. Considerations during query Optimization:
 Narrow down intermediate result sets quickly. SELECT before JOIN
 Use access structures (indexes).
2.3.1. Approaches to Query Optimization
2.3.1.1. Heuristics Approach
The heuristical approach to query optimization, which uses transformation rules to convert one
relational algebra expression into an equivalent form that is known to be more efficient. The
heuristic approach uses the knowledge of the characteristics of the relational algebra operations
and the relationship between the operators to optimize the query. Thus the heuristic approach of
optimization will make use of:
 Properties of individual operators:
 Association between operators:
 Query Tree: a graphical representation of the operators, relations, attributes and predicates
and processing sequence during query processing. Query tree is composed of three main
parts:

6
o The Leafs: the base relations used for processing the query/ extracting the required
information
o The Root: the final result/relation as an output based on the operation on the relations
used for query processing
o Nodes: intermediate results or relations before reaching the final result.
Sequence of execution of operation in a query tree will start from the leaves and continues to the
intermediate nodes and ends at the root. The properties of each operations and the association
between operators is analyzed using set of rules called transformation rules. Use of the
transformation rules will transform the query to relatively good execution strategy.
2.3.2. Transformation Rules for the Relational Algebra Operations
By applying transformation rules, the optimizer can transform one relational algebra expression
into an equivalent expression that is known to be more efficient. Use these rules to restructure the
(canonical) relational algebra tree generated during query decomposition. In listing these rules, we
use three relations R, S, and T, with R defined over the attributes A ={A1, A2, . . . , An}, and S
defined over B ={B1, B2, . . . , Bn}; p, q, and r denote predicates, and L, L1, L2, M, M1, M2, and
N denote sets of attributes.
1. Conjunctive selection operations can cascade into individual selection operations (and vice
versa). This transformation is sometimes referred to as cascade of selection.

σp∧q∧r(R) =σ p(σq(σr(R))) where p, q and r are predicates


Example: σ branchNo=‘B003’ ∧ salary>15000(Staff) =σ branchNo=‘B003’(σ salary>15000(Staff))
2. Commutativity of Selection operations.
σp(σq(R))=σq(σp(R)) where p and q are predicates
Example: σ branchNo=‘B003’(σ salary>15000(Staff)) =σ salary>15000(σ branchNo=‘B003’(Staff))
3. In a sequence of Projection operations, only the last in the sequence is required. Also, called
Cascade of projection: Π L Π M ...Π N(R) =Π L(R)
Example: Π lNameΠ branchNo, lName(Staff) =Π lName(Staff)
4. Commutativity of Selection and Projection. If the predicate p involves only the attributes in
the projection list, then the Selection and Projection operations commute:
Π A1, . . . , Am(σ p(R)) =σ p(Π A1, . . . , Am(R)) where p ∈{A1, A2, . . . , Am}

Example: Π fName, lName(σ lName=‘Beech’(Staff)) =σ lName=‘Beech’(Π fName, lName(Staff))

7
5. Commutativity of Theta join and Cartesian product.
Theta join: R ⋈p S = S ⋈p R Cartesian product: R × S = S × R
As the Equijoin and Natural join are special cases of the Theta join, then this rule also applies
to these Join operations. For example, using the Equijoin of Staff and Branch:
Staff ⋈Staff.branchNo=Branch.branchNo Branch = Branch ⋈Staff.branchNo=Branch.branchNo Staff
6. Commutativity of Selection and Theta join (or Cartesian product). If the selection predicate
involves only attributes of one of the relations being joined, then the Selection and Join (or
Cartesian product) operations commute:
σ p(R ⋈r S) = (σ p(R)) ⋈r S
σ p(R × S) = (σ p(R)) × S where p ∈{A1, A2, . . . , An}
Example: σposition=‘Manager’∧city=‘London’(Staff⋈Staff.branchNo=Branch.branchNo Branch)=(σ
position=‘Manager’(Staff)) ⋈Staff.branchNo=Branch.branchNo (σ city=‘London’(Branch))
7. Commutativity of Projection and Theta join (or Cartesian product).
a. If the projection list is of the form L = L1 ∪ L2, where L1 involves only attributes of R,
and L2 involves only attributes of S, then provided the join condition only contains
attributes of L, the Projection and Theta join operations commute as:
ΠL1 ∪ L2(R 1r S) = (ΠL1(R)) ⋈r (Π L2(S))
Example: Πposition, city, branchNo(Staff⋈ Staff.branchNo=Branch.branchNo Branch)=(Πposition,
branchNo(Staff)) ⋈Staff.branchNo=Branch.branchNo(Π city, branchNo(Branch))
b. If the join condition contains additional attributes not in L, say attributes M = M1 ∪ M2
where M1 involves only attributes of R, and M2 involves only attributes of S, then a final
Projection operation is required:
ΠL1 ∪ L2(R ⋈r S) =Π L1 ∪ L2(Π L1 ∪ M1(R)) ⋈r (Π L2 ∪ M2(S))
Example: Πposition, city(Staff⋈Staff.branchNo=Branch.branchNo Branch)=Πposition, city((Πposition,

branchNo(Staff)) ⋈ Staff.ranchNo=Branch.branchNo (Π city, branchNo (Branch)))


8. Commutativity of Union and Intersection (but not Set difference).
R ∪ S = S ∪ R and R ∩ S = S ∩ R
9. Commutativity of Selection and set operations (Union, Intersection, and Set difference).
σp(R ∪ S) =σ p(S) ∪σ p(R)
σ p(R ∩ S) =σ p(S) ∩σ p(R)
8
σ p(R − S) =σ p(S) −σ p(R)
10. Commutativity of Projection and Union.
Π L(R ∪ S) =Π L(S) ∪Π L(R)
11. Associativity of Theta join (and Cartesian product). Cartesian product and Natural join are
always associative:
(R ⋈S) ⋈ T = R ⋈ (S ⋈ T)
(R × S) × T = R × (S × T)
If the join condition q involves only attributes from the relations S and T, then Theta join is
associative in the following manner: (R⋈p S) ⋈q ∧ r T = R ⋈p ∧ r (S ⋈q T)
Example: (Staff⋈Staff.staffNo=PropertyForRent.staffNoPropertyForRent) ⋈ownerNo=Owner.ownerNo∧

Staff.lName=Owner.lName Owner = Staff ⋈Staff.staffNo=PropertyForRent.staffNo ∧ Staff.lName=lName

(PropertyForRent ⋈ownerNo Owner)


12. Associativity of Union and Intersection (but not Set difference).
(R ∪ S) ∪ T = S ∪ (R ∪ T)
(R ∩ S) ∩ T = S ∩ (R ∩ T)
For prospective renters who are looking for flats, find the properties that match their
requirements and are owned by owner CO93. We can write this query in SQL as:
SELECT p.propertyNo, p.street FROM Client c, Viewing v, PropertyForRent p WHERE
c.prefType = ‘Flat’ AND c.clientNo = v.clientNo AND v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND c.prefType = p.type AND p.ownerNo = ‘CO93’;
Converting the SQL to relational algebra, we have: Πp.propertyNo, p.street(σ c.prefType=‘Flat’ ∧

c.clientNo=v.clientNo ∧ v.propertyNo=p.propertyNo ∧ c.maxRent>=p.rent ∧ c.prefType=p.type ∧ p.ownerNo=‘CO93’((c ×


v) × p))
Heuristic Approach will be implemented by using the above transformation rules in the following
sequence or steps. Sequence for Applying Transformation Rules are:
1. Use
Rule-1 Cascade SELECTION
2. Use
Rule-2: Commutativity of SELECTION
Rule-4: Commuting SELECTION with PROJECTION

9
Rule-6: Commuting SELECTION with JOIN and CARTESIAN
Rule-10: commuting SELECTION with SETOPERATIONS
3. Use
Rule-9: Associativity of Binary Operations (JOIN, CARTESIAN, UNION and
INTERSECTION). Rearrange nodes by making the most restrictive operations to be
performed first (moving it as far down the tree as possible)
4. Perform Cartesian Operations with the subsequent Selection Operation
5. Use
Rule-3: Cascade of PROJECTION
Rule-4: Commuting PROJECTION with SELECTION
Rule-7: Commuting PROJECTION with JOIN and CARTESIAN
Rule-11: commuting PROJECTION with UNION
Main Heuristic
The main heuristic is to first apply operations that reduce the size (the cardinality and/or the degree)
of the intermediate relation. That is:
 Perform SELECTION as early as possible: that will reduce the cardinality (number of tuples)
of the relation.
 Perform PROJECTION as early as possible: that will reduce the degree (number of attributes)
of the relation. Both a and b will be accomplished by placing the SELECT and PROJECT
operations as far down the tree as possible.
 SELECT and JOIN operations with most restrictive conditions resulting with smallest absolute
size should be executed before other similar operations. This is achieved by reordering the
nodes with JOIN
Example: consider the following schemas and the query, where the EMPLOYEE and the
PROJECT relations are related by the WORKS_ON relation.
 EMPLOYEE (EEmpID, FName, LName, Salary, Dept, Sex, DoB)
 PROJECT (PProjID, PName, PLocation, PFund, PManagerID)
 WORKS_ON (WEmpID, WProjID)
WEmpID (refers to employee identification) and WProjID (refers to project identification) are
foreign keys to WORKS_ON relation from EMPLOYEE and PROJECT relations respectively.

10
Query: The manager of the company working on road construction would like to view employees
name born before January 1 1965 who are working on the project named Ring Road. Relational
Algebra representation of the query will be:

<FName, LName> ( <DoB<Jan1 1965 WEmpID=EEmpIDPProjID=WProjID  PName=’Ring

Road’> (EMPLOYEEX WORKS_ON X PROJECT))

The SQL equivalence for the above query will be: SELECT FName, LName FROM EMPLOYEE,
WORKS_ON, PROJECT WHEREDoB<Jan 1 1965 EEmpID=WEmpID  WProjID=PProjID 

PName=”Ring Road”

The initial query tree will be:

<FName, LName>

(DoB<Jan1 1965) (WEmpID=EEmpID) (PProjID=WProjID)(PName=’Ring Road’)

X PROJECT

EMPLOYEE WORKS_ON

By applying the first step (cascading the selection) we will come up with the following structure.

(DoB<Jan1 1965)( (WEmpID=EEmpID)( (PProjID=WProjID)( (PName=’Ring Road’) (EMPLOYEE X

WORKS_ON X PROJECT)) ) )
By applying the second step it can be seen that some conditions have attribute that belong to a
single relation (DoB belongs to EMPLOYEE and PName belongs to PROJECT) thus the selection

operation can be commuted with Cartesian Operation. Then, since the condition WEmpID=EEmpID
base the employee andWORKS_ON relation the selection with this condition can be cascaded.

( (PProjID=WProjID) ( ( PName=’Ring Road’) PROJECT ) X ( (WEmpID=EEmpID) (WORKS_ONX


( (DoB<Jan1 1965) EMPLOYEE)))) The query tree after this modification will be:

11
<FName, LName>

(PProjID=WProjID)

X
(PName=’Ring Road’)
(WEmpID=EEmpID)

X
PROJECT
(DoB<Jan1 1965)
WORKS_ON

EMPLOYEE

Using the third step, perform most restrictive operations first. From the query given we can see
that selection on PROJECT is most restrictive than selection on EMPLOYEE. Thus, it is better to
perform selection on PROJECT before selection on EMPLOYEE. Rearrange the nodes to achieve
this.

<FName, LName>

(WEmpID=EEmpID)

X
(DoB<Jan1 1965)
(PProjID=WProjID)

X EMPLOYEE

(PName=’Ring Road’)
WORKS_ON

PROJECT

Using the forth step, Perform Cartesian Operations with the subsequent Selection Operation.

12
<FName, LName>

(WEmpID=EEmpID)

(PProjID=WProjID) (DoB<Jan1 1965)

(PName=’Ring Road’) EMPLOYEE


WORKS_ON

PROJECT

Using the fifth step, Perform the projection as early as possible.

<FName, LName>

(WEmpID=EEmpID)

<FName, LName,EEmpID>
<WEmpID>

(DoB<Jan1 1965)
(PProjID=WProjID)

EMPLOYEE
<PProjID>
WORKS_ON
(PName=’Ring Road’)

PROJECT

2.3.3. Cost Estimation Approach to Query Optimization


The main idea is to minimize the cost of processing a query. The cost function is comprised of:
 I/O cost + CPU processing cost + communication cost + Storage cost
These components might have different weights in different processing environments. The DBMs
will use information stored in the system catalogue for the purpose of estimating cost. The main
target of query optimization is to minimize the size of the intermediate relation. The size will have
effect in the cost of:
 Disk Access

13
 Data Transpiration
 Storage space in the Primary Memory
 Writing on Disk
The statistics in the system catalogue used for cost estimation purpose are:
 Cardinality of a relation: the number of tuples contained in a relation currently (r)
 Degree of a relation: number of attributes of a relation
 Number of tuples on a relation that can be stored in one block of memory
 Total number of blocks used by a relation
 Number of distinct values of an attribute (d)
 Selection Cardinality of an attribute (S): that is average number of records that will satisfy
an equality condition S=r/d
By sing the above information one could calculate the cost of executing a query and selecting the
best strategy, which is with the minimum cost of processing.

2.3.3.1. Cost Components for Query Optimization


The costs of query execution can be calculated for the following major process we have during
processing.
1. Access Cost of Secondary Storage: Data is going to be accessed from secondary storage, as
a query will be needing some part of the data stored in the database. The disk access cost can
again be analyzed in terms of:
 Searching
 Reading, and
 Writing, data blocks used to store some portion of a relation.
The disk access cost will vary depending on the file organization used and the access method
implemented for the file organization. In addition to the file organization, the data allocation
scheme, whether the data is stored contiguously or in scattered manner, will affect the disk access
cost.
2. Storage Cost: While processing a query, as any query would be composed of many database
operations, there could be one or more intermediate results before reaching the final output.
These intermediate results should be stored in primary memory for further processing. The
bigger the intermediate relation, the larger the memory requirement, which will have impact
on the limited available space. This will be considered as a cost of storage.

14
3. Computation Cost: Query is composed of many operations. The operations could be database
operations like reading and writing to a disk, or mathematical and other operations like:
Searching, Sorting, Merging, Computation on field values
4. Communication Cost: In most database systems the database resides in one station and
various queries originate from different terminals. This will have impact on the performance
of the system adding cost for query processing. Thus, the cost of transporting data between
the database site and the terminal from where the query originate should be analyzed.
2.4. Pipelining
Pipelining is another method used for query optimization. It used to improve the performance of
queries. It is sometime known as stream-based processing or on-the-fly processing or queries. As
query optimization tries to reduce the size of the intermediate result, pipelining uses a better way
of reducing the size by performing different conditions on a single intermediate result
continuously. Thus the technique is said to reduce the number of intermediate relations in query
execution. Pipelining performs multiple operations on a single relation in a pipeline.
Generally, a pipeline is implemented as a separate process or thread within the DBMS. Each
pipeline takes a stream of tuples from its inputs and creates a stream of tuples as its output. A
buffer is created for each pair of adjacent operations to hold the tuples being passed from the first
operation to the second one. One drawback with pipelining is that the inputs to operations are not
necessarily available all at once for processing. This can restrict the choice of algorithms.
Examples
Let’s say we have a relation on employee with the following schema Employee(ID, FName,
LName, DoB, Salary, Position, Dept)
If a query would like to extract supervisors with salary greater than 2000, the relational algebra
representation of the query will be:

(Salary>2000)  (Position=Supervisor)(Employee)

After reading the relation from the memory, the system could perform the operation by cascading
the SELECT operation.

1. Approach One

(Salary>2000) ( (Position=Supervisor) (Employee))

15
Using this approach, we will have the following relations
 Employee
 Relation created by the Operation:

R1 = (Position=Supervisor) (Employee)
 The resulting Relation with the Operation

R2 = (Salary>2000)(R1)

2. Approach Two
One can select a single tuple from the relation Employee and perform both tests in a pipeline and
create the final relation at once. This is what is called PIPELINING.

16

You might also like