ADB Chapter Two
ADB Chapter Two
SQL Relational algebra query operations are performed recursively on a relation. The output of
these operations is a new relation, which might be formed from one or more input relations. It uses
operators to perform queries. An operator can be unary or binary.
Basic Operations:
Additional Operations:
Intersection, join, division, and renaming are not essential, but (very!) useful.
1. Selection
Selection operator is a unary operator that acts like a filter on a relation by returning
only a certain number of rows that satisfy the selection condition.
The resulting relation will have the same degree (schema) as the original (input)
relation.
The resulting relation may have fewer rows than the original relation.
It is denoted by sigma:
Condition (R) (Returns only those tuples in R that satisfy condition C)
The tuples (rows) to be returned are dependent on a condition that is part of the
selection operator.
A condition C can be made up of any combination of comparison or logical operators
that operate on the attributes of R.
Comparison operators:
Logical operators:
Page | 1
Examples:
Select only those Employees with last name Smith who are assistant professors:
Select * from EMP where Name=Smith and Rank=Assistant.
Name = 'Smith' Rank = 'Assistant' (EMP)
Select only those Employees who are either Assistant Professors or in the Economics
department:
Select * from EMP Rank=Assistant or Dept=Econ
Rank = 'Assistant' Dept = 'Econ' (EMP)
Page | 2
The resulting relation will be:
Select only those Employees who are not in the CS department or Adjuncts:
Select * from EMP where Dept=not CS or Adjunct
(Rank = 'Adjunct' Dept = 'CS') (EMP)
2. Projection
Projection is also a Unary operator that limits the attributes that will be returned
from the original relation.
The Projection operator is pi:
The general syntax is:
attributes R
(where attributes is the list of attributes to be displayed and R is the relation.)
The projection method defines a relation that contains a vertical subset of relation.
The resulting relation will have the same number of tuples as the original relation
(unless there are duplicate tuples produced).
The degree (schema) of the resulting relation may be equal to or less than that of the
original relation because it contains exactly the fields in the projection list.
Projection operator has to eliminate duplicates.
Examples
Assume the same EMP relation above is used.
Page | 3
name, dept (EMP)
Name Dept
Smith CS
Jones Econ
Green Econ
Brown CS
Smith Fin
Sname Rating
Smith 9
lubber 8
Guppy 5
Rusty 10
Page | 4
The selection and projection operators can be combined to perform both
operations.
Name
Smith
Brown
Example: - Show the name and rank of those Employees who are not in the CS department or
Adjuncts:
name, rank ( (Rank = 'Adjunct' Dept = 'CS') (EMP) )
The resulting relation will be:
Name Rank
Green Assistant
Smith Associate
3. Cartesian product
The Cartesian product also referred to as a cross-join, returns all the rows in all the tables listed in
the query. Each row in the first table is paired with all the rows in the second table. This happens
when there is no relationship defined between the two tables.
R X S: The Cartesian product operation defines a relation that is the concatenation of every tuple
of relation R with every tuple of relation S
Page | 5
R S
First Last Age Dinner Dessert
Bill Smith 22 Steak Ice Cream
Mary Keen 23 Lobster Cheesecake
Tony Jones 32
RXS
Union
R U S: The union of two relations R and S defines a relation that contains all the tuples of R or S
or both R and S. Duplicate tuples are eliminated.
R and S must be union-compatible: Same number of fields and corresponding’ fields have the
same type.
R S
Page | 6
First Last Age
First Last Age
Bill Smith 22
Forrest Gump 36
Sally Green 28
Sally Green 28
Mary Keen 23
DonJuan DeMarco 27
Tony Jones 32
Intersection
R n S: The intersection operation defines a relation consisting of the set of all tuples that are in
both R and S. R and S must be union-compatible.
Relation S
Page | 7
First Last Age
Forrest Gump 36
Sally Green 28
DonJuan DeMarco 27
Relation R Relation S
5. Join Operations
Join operation is used to combine rows from two or more tables based on a related column between
them. Typically we want only combinations of the Cartesian product that satisfy certain conditions
and so we would normally use a Join Operation instead of the Cartesian product operation. Join is
a derivative Cartesian product, equivalent to performing a selection operation, using the join
predicate as the selection formula, over the Cartesian product of the operand relations. It is one of
Page | 8
the most difficult operations to implement efficiently in an RDBMS and is one of the reasons why
relational systems have intrinsic performance problems.
1. (INNER) JOIN: It returns records that have matching values in both tables.
2. LEFT (OUTER) JOIN: It returns all records from the left table and the matched records
from the right table. Tuples from R that do not have matching values in the common
attributes of S are also included in the result relation. Missing values in the second relation
are set to null.
3. RIGHT (OUTER) JOIN: It returns all records from the right table and the matched
records from the left table.
4. FULL (OUTER) JOIN: It returns all records when there is a match in either left or right
table.
Page | 9
Condition Join: R CS = C (RXS)
Example: S1 S1.sid<R1.sidR1
S1 R1
58 Solomon 10 35
5. Equi join: Natural join (to remove one of the similar attributes). It is a special case of
condition join where the condition C contains only equalities.
Example:
S1 R1
Page | 10
Sid Sname Rating Age Sid Bid Day
58 Solomon 10 35
6. Division Operation
The division operation is not supported as a primitive operator but useful for
expressing queries like:
Find sailor who have reserved all boats
Precondition: in A/B the attributes in B must be included in the schema for A also
the result has attributes A-B.
SALES(SupId,ProdId);
PRODUCTS(ProdId);
Relations SALES and PRODUCTS must be built using projections.
SALES/PRODUCTS: the ids of the suppliers supplying all products.
Example:
Page | 11
A
Sno Pno
S1 P1
S1 P2
S1 P3
S1 P4
S2 P1
S2 P2
S3 P2
S4 P2
S4 P4
Sno
B1 S1
Pno S2
P2 A/B1= S3
S4
B2
Pno Sno
P2 A/B2= S1
P4 S4
Page | 12
B3
Pno
P1
Sno
P2
S1
P4 A/B3=
Page | 13
OVERVIEW OF QUERY PROCESSING
There are many ways that complex query can be performed, and one of the aims of query
processing is to determine which one is the most cost effective. In declarative languages such as
SQL, the user specifies what data is required rather than how it is retrieved
Query processing
Query processing is the activities involved in retrieving data from the database. Giving the DBMS
the responsibility for selecting the best strategy prevents users from choosing strategies that are
known to be inefficient and gives the DBMS more control system performance
The aim of query processing are to transform a query written in a high level language, typically
SQL, into a correct and efficient execution strategy expressed in low level language (implementing
the relational algebra),and to execute the strategy to retrieve the required data.
2. Optimization
4. Execution
Page | 14
Query in high-level language
(typically SQL)
Relational Algebra
expression
Database statistics
Compile time Query optimization
Execution
plan
Cost generation
Generated code
Query output
Query Decomposition
The aim of query decomposition is to transform a high-level query into a relational algebra query,
and to check that the query is syntactically and semantically correct
Analysis
Normalization
Semantic analysis
Simplification and
Query restructuring.
1. Analysis
In this stage, the query is lexically and syntactically analyzed using the techniques of programming
language compilers
Page | 15
In addition, this stage verifies that the relations and attributes specified in the query are defined in
the system catalog. It also verifies that any operations applied to database objects are appropriate
for the object type.
Example:
SELECT StaffNumber
FROM staff
On compilation of this stage, the high-level query has been transformed into some internal
representation that is more suitable for processing.
The internal form that is typically chosen is some kind of tree, which is constructed as follows:
∞s.branchNo=b.branchNo
root root
Page | 16
2. Normalization
Converts the query into a normalized form that can be more easily manipulated. The predicate
WHERE will be converted to conjunctive (ᴧ) or disjunctive (ᴠ) normal form
Conjunction normal form: A sequence of conjunction that are connected with the ᴧ (And)
operator. A conjunction selection contains only those tuples that satisfy all conjuncts
Example:
Disjunctive normal form: A sequence of disjuncts that are connected with the ᴠ (OR) operator.
A disjunctive selection contains those tuples formed by the union of all tuples that satisfy the
disjuncts.
Example:
3. Semantic Analysis
Objective of semantic analysis is to reject normalized queries that are incorrectly formalized or
contradictory.
Incorrect: If components do not contribute to the generation of the results. It happens if some join
specifications are missing.
Example:
F v salary >20000
4. Simplifications
The objective of the simplification stage is:
Page | 17
Typically:
Access restrictions
View definitions and
Integrity constraints are considered at this stage ,some of which may also introduce
redundancy
If the user the does not have the appropriate access to all the components of the query the query
must be rejected.
View definition
SELECT staffNo,fName,lName,salary,branchNo
FROM Staff
WHERE branchNo=‘B003’;
SELECT *
FROM Staff3
SELECT staffNo,fName,lName,salary
FROM Staff
Assuming that the user has the appropriate access privileges, an initial optimization is to apply the
well-known idempotent rules of Boolean algebra, such as:
p ^ p =p
P ^ False= False
P ^ true = P
P ^ -P=False
p^ (p V q)=p
p v p=p
p v False =p
p v true =true
Page | 18
p v –p = true
p v (p^q)=p
Integrity constraints
5. Query restructuring
The query is restructured to provide a more efficient implementation.
Heuristically Approach-uses transformation rules to convert one relational algebra expression into
an equivalent form that is known to be more efficient.
TRANSFORMATION RULES
Page | 19
If the predicate P involves only the attributes in the projection list, then the selection and projection
operations commute:
QUERY OPTIMIZATION
The activity of choosing an efficient execution strategy for processing a query. The aim of query
optimization is to choose the one that minimizes resource usage.
Generally, we try to reduce the total execution time of the query, which is the sum of execution
time of all individual operations that make up the query
However, resource usage may also be viewed as the response time of the query in which case we
concentrate on maximizing the number of parallel operations. (Pipelining)
Both methods of query optimization depend on database statistics to evaluate properly the
difference options that are available.
Page | 20
The accuracy and currency of these statistics have a significant bearing on the efficiency of the
execution strategy chosen.
SELECT *
The equivalent relational algebra queries corresponding to these SQL statements are:
• Assume there are 1000 tuples in staff, 50 tuples in Branch, 50 Managers (one for each
branch), and 5 Wolkite branches.
Page | 21
• Assume there are no indexes or sort keys on either relations and that the result of any
intermediate operations is stored on disk.
• Main memory is large enough to process entire relations for each relational algebra
operation
The first query calculates the Cartesian product of staff and Branch, which requires (1000+50)
disk accesses to read the relations, and creates a relation with (1000*50)tuples. We then have to
read each of these tuples again to test them against the selection predicate at a cost of another
(1000*50) disk accesses, giving a total cost of :
The second query joins staff and Branch on the branch number branchNo, which again requires
(1000+50) disk accesses to read each of the relations. We know that the join of the two relations
has 1000 tuples, one for each member of staff ( a member of staff can only work at one
branch).Consequently, the selection operation requires 1000 disk accesses to read the result of the
join ,giving a total cost
The final query first reads each staff tuple to determine the manager tuples, which requires 1000
disk access and produces relations with 50 tuples .The second selections operations, reads each
Branch tuple to determine the Wolkite branch, which requires 50 disk access and produces a
relations with 5 tuples. The final operation is the join of the reduced staff and Branch relations,
which requires (50+5) disk accesses, giving a total cost of:
The dominant cost in query processing is usually that of disk access which are slow compared with
memory access. Many of the cost estimates are based on the cardinality of the relation
The success of estimating the size and cost of intermediate relational algebra operations depends
on the amount and currency of the statistical information that the DBMS holds.
Page | 22