Outline
Introduction
Background
Distributed DBMS Architecture
Distributed Database Design
Semantic Data Control
Distributed Query Processing
Query Processing Methodology
Distributed Query Optimization
Distributed DBMS
Distributed Transaction Management
Parallel Database Systems
Distributed Object DBMS
Database Interoperability
Current Issues
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 1
Query Processing
high level user query
query
processor
low level data manipulation
commands
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 2
Query Processing Components
Query language that is used
SQL: intergalactic dataspeak
Query execution methodology
The steps that one goes through in executing high-
level (declarative) user queries.
Query optimization
How do we determine the best execution plan?
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 3
Selecting Alternatives
SELECT
FROM
WHERE
AND
ENAME
EMP,ASG
EMP.ENO = ASG.ENO
DUR > 37
Strategy 1
ENAME(DUR>37EMP.ENO=ASG.ENO(EMP ASG))
Strategy 2
ENAME(EMP
ENO
(DUR>37 (ASG)))
Strategy 2 avoids Cartesian product, so is better
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 4
What is the Problem?
Site 1
Site 2
ASG1=ENOE3(ASG)
Site 3
ASG2=ENO>E3(ASG)
EMP1=ENOE3(EMP)
result = EMP1EMP2
Site 3
EMP1=EMP1
Site 1
ENO
Site 4
ASG1
ASG1
ASG1=DUR>37(ASG1)
Distributed DBMS
Site 5
EMP2=ENO>E3(EMP)
Result
Site 5
Site 5
EMP1
Site 4
result2=(EMP1EMP2) ENODUR>37(ASG1ASG1)
EMP2
EMP2=EMP2
Site 2
ENO
ASG2
ASG1
ASG2
EMP1
EMP2
Site 1
Site 2
Site 3
Site 4
ASG2
ASG2=DUR>37(ASG2)
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 5
Cost of Alternatives
Assume:
size(EMP) = 400, size(ASG) = 1000
tuple access cost = 1 unit; tuple transfer cost = 10 units
Strategy 1
produce ASG': (10+10)tuple access cost
transfer ASG' to the sites of EMP: (10+10)tuple transfer cost
produce EMP': (10+10) tuple access cost2
transfer EMP' to result site: (10+10) tuple transfer cost
Total cost
Strategy 2
transfer EMP to site 5:400tuple transfer cost
transfer ASG to site 5 :1000tuple transfer cost
produce ASG':1000tuple access cost
join EMP and ASG':40020tuple access cost
Total cost
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
20
200
40
200
460
4,000
10,000
1,000
8,000
23,000
Page 7-9. 6
Query Optimization Objectives
Minimize a cost function
I/O cost + CPU cost + communication cost
These might have different weights in different distributed
environments
Wide area networks
communication cost will dominate
low bandwidth
low speed
high protocol overhead
most algorithms ignore all other cost components
Local area networks
communication cost not that dominant
total cost function should be considered
Can also maximize throughput
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 7
Complexity of Relational
Operations
Operation
Assume
relations of cardinality n
sequential scan
Complexity
Select
Project
(without duplicate elimination)
O(n)
Project
(with duplicate elimination)
Group
O(nlog n)
Join
Semi-join
Division
O(nlog n)
Set Operators
Cartesian Product
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
O(n2)
Page 7-9. 8
Query Optimization Issues
Exhaustive
search
Types
of Optimizers
cost-based
optimal
combinatorial complexity in the number of relations
Heuristics
not optimal
regroup common sub-expressions
perform selection, projection first
replace a join by a series of semijoins
reorder operations to reduce intermediate relation size
optimize individual operations
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 9
Query Optimization Issues
Optimization Granularity
Single query at a time
cannot use common intermediate results
Multiple queries at a time
efficient if many similar queries
decision space is much larger
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 10
Query Optimization Issues
Static
Optimization
Timing
compilation optimize prior to the execution
difficult to estimate the size of the intermediate results
error propagation
can amortize over many executions
R*
Dynamic
run time optimization
exact information on the intermediate relation sizes
have to reoptimize for multiple executions
Distributed INGRES
Hybrid
compile using a static algorithm
if the error in estimate sizes > threshold, reoptimize at
run time
MERMAID
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 11
Query Optimization Issues
Statistics
Relation
cardinality
size of a tuple
fraction of tuples participating in a join with another
relation
Attribute
cardinality of domain
actual number of distinct values
Common assumptions
independence between different attribute values
uniform distribution of attribute values within their
domain
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 12
Query Optimization Issues
Decision Sites
Centralized
single site determines the best schedule
simple
need knowledge about the entire distributed
database
Distributed
cooperation among sites to determine the schedule
need only local information
cost of cooperation
Hybrid
one site determines the global schedule
each site optimizes the local subqueries
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 13
Query Optimization Issues
Wide area
Network
Topology
networks (WAN) point-to-point
characteristics
low bandwidth
low speed
high protocol overhead
communication cost will dominate; ignore all other
cost factors
global schedule to minimize communication cost
local schedules according to centralized query
optimization
Local area networks (LAN)
Distributed DBMS
communication cost not that dominant
total cost function should be considered
broadcasting can be exploited (joins)
special algorithms exist for star networks
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 14
Distributed Query
Processing Methodology
Calculus Query on Distributed
Relations
Query
Query
Decomposition
Decomposition
GLOBAL
GLOBAL
SCHEMA
SCHEMA
Algebraic Query on Distributed
Relations
CONTROL
SITE
Data
Data
Localization
Localization
FRAGMENT
FRAGMENT
SCHEMA
SCHEMA
Fragment Query
Global
Global
Optimization
Optimization
STATS
STATSON
ON
FRAGMENTS
FRAGMENTS
Optimized Fragment Query
with Communication Operations
LOCAL
SITES
Local
Local
Optimization
Optimization
LOCAL
LOCAL
SCHEMAS
SCHEMAS
Optimized Local
Queries
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 15
Step 1 Query Decomposition
Input : Calculus query on global relations
Normalization
manipulate query quantifiers and qualification
Analysis
detect and reject incorrect queries
possible for only a subset of relational calculus
Simplification
eliminate redundant predicates
Restructuring
calculus query algebraic query
more than one translation is possible
use transformation rules
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 16
Normalization
Lexical and syntactic analysis
check validity (similar to compilers)
check for attributes and relations
type checking on the qualification
Put into normal form
Conjunctive normal form
(p11p12p1n) (pm1pm2pmn)
Disjunctive normal form
(p11p12 p1n) (pm1 pm2pmn)
OR's mapped into union
AND's mapped into join or selection
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 17
Analysis
Refute incorrect queries
Type incorrect
If any of its attribute or relation names are not defined in
the global schema
If operations are applied to attributes of the wrong type
Semantically incorrect
Components do not contribute in any way to the
generation of the result
Only a subset of relational calculus queries can be tested
for correctness
Those that do not contain disjunction and negation
To detect
Distributed DBMS
connection graph (query graph)
join graph
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 18
Analysis Example
SELECT
FROM
WHERE
AND
AND
AND
AND
ENAME,RESP
EMP, ASG, PROJ
EMP.ENO = ASG.ENO
ASG.PNO = PROJ.PNO
PNAME = "CAD/CAM"
DUR 36
TITLE = "Programmer"
Query graph
Join graph
DUR36
EMP.ENO=ASG.ENO
TITLE =
Programmer
EMP
ENAME
Distributed DBMS
ASG
ASG.PNO=PROJ.PNO
RESP
RESULT
PROJ
EMP.ENO=ASG.ENO
EMP
ASG
ASG.PNO=PROJ.PNO
PROJ
PNAME=CAD/CAM
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 19
Analysis
If the query graph is not connected, the query is
wrong.
SELECT
FROM
WHERE
AND
AND
AND
ENAME,RESP
EMP, ASG, PROJ
EMP.ENO = ASG.ENO
PNAME = "CAD/CAM"
DUR 36
TITLE = "Programmer"
ASG
EMP
ENAME
Distributed DBMS
RESP
RESULT
PROJ
PNAME=CAD/CAM
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 20
Simplification
Why simplify?
Remember the example
How? Use transformation rules
elimination of redundancy
idempotency rules
p1 ( p1) false
p1 (p1 p2) p1
p1 false p1
application of transitivity
use of integrity rules
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 21
Simplification Example
SELECT
FROM
WHERE
OR
AND
OR
AND
TITLE
EMP
EMP.ENAME = J. Doe
(NOT(EMP.TITLE = Programmer)
(EMP.TITLE = Programmer
EMP.TITLE = Elect. Eng.)
NOT(EMP.TITLE = Elect. Eng.))
SELECT
FROM
WHERE
Distributed DBMS
TITLE
EMP
EMP.ENAME = J. Doe
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 22
Restructuring
Convert relational calculus to
relational algebra
Make use of query trees
Example
Find the names of employees other than
J. Doe who worked on the CAD/CAM
project for either 1 or 2 years.
SELECT ENAME
FROM
EMP, ASG, PROJ
WHERE
EMP.ENO = ASG.ENO
AND
ASG.PNO = PROJ.PNO
AND
ENAME J. Doe
AND
PNAME = CAD/CAM
AND
(DUR = 12 OR DUR = 24)
ENAME
DUR=12 OR DUR=24
PNAME=CAD/CAM
Select
ENAMEJ. DOE
PNO
Join
ENO
PROJ
Distributed DBMS
Project
1998 M. Tamer zsu & Patrick Valduriez
ASG
EMP
Page 7-9. 23
Restructuring
Transformation Rules
Commutativity of binary operations
RSSR
R
SS
RSSR
Associativity of binary operations
( R S ) T R (S T)
(R
S)
TR
(S
T)
Idempotence of unary operations
A(A(R)) A(R)
p1(A1)(p2(A2)(R)) = p1(A1) p2(A2)(R)
where R[A] and A' A, A" A and A' A"
Commuting selection with projection
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 24
Restructuring
Transformation Rules
Commuting selection with binary operations
p(A)(R S) (p(A) (R)) S
p(Ai)(R
(Aj,Bk)
S) (p(Ai) (R))
(Aj,Bk)
p(Ai)(R T) p(Ai) (R) p(Ai) (T)
where Ai belongs to R and T
Commuting projection with binary operations
C(R S) A(R) B(S)
C(R
(Aj,Bk)
S) A(R)
(Aj,Bk)
B(S)
C(R S) C (R) C (S)
where R[A] and S[B]; C = A' B' where A' A, B' B
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 25
Example
Recall the previous example:
Find the names of employees other
than J. Doe who worked on the
CAD/CAM project for either one or two
years.
SELECT
FROM
WHERE
AND
AND
AND
AND
ENAME
PROJ, ASG, EMP
ASG.ENO=EMP.ENO
ASG.PNO=PROJ.PNO
ENAMEJ. Doe
PROJ.PNAME=CAD/CAM
(DUR=12 OR DUR=24)
ENAME
DUR=12 OR DUR=24
PNAME=CAD/CAM
Select
ENAMEJ. DOE
PROJ
Distributed DBMS
Project
PNO
Join
ENO
ASG
1998 M. Tamer zsu & Patrick Valduriez
EMP
Page 7-9. 26
Equivalent Query
ENAME
PNAME=CAD/CAM (DUR=12 DUR=24) ENAMEJ. DOE
PNO ENO
ASG
Distributed DBMS
PROJ
1998 M. Tamer zsu & Patrick Valduriez
EMP
Page 7-9. 27
Restructuring
ENAME
PNO
PNO,ENAME
ENO
PNO
PNAME = "CAD/CAM"
PROJ
Distributed DBMS
PNO,ENO
PNO,ENAME
DUR =12 DUR=24
ENAME "J. Doe"
ASG
1998 M. Tamer zsu & Patrick Valduriez
EMP
Page 7-9. 28
Step 2 Data Localization
Input: Algebraic query on distributed relations
Determine which fragments are involved
Localization program
substitute for each global query its materialization
program
optimize
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 29
Example
ENAME
Assume
EMP is fragmented into EMP1, EMP2,
EMP3 as follows:
EMP1=ENOE3(EMP)
EMP2= E3<ENOE6(EMP)
EMP3=ENOE6(EMP)
DUR=12 OR DUR=24
PNAME=CAD/CAM
ASG fragmented into ASG1 and ASG2 as
ENAMEJ. DOE
follows:
ASG1=ENOE3(ASG)
ASG2=ENO>E3(ASG)
PNO
ENO
Replace EMP by (EMP1EMP2EMP3 ) and
PROJ
ASG by (ASG1 ASG2) in any query
EMP1 EMP2 EMP3 ASG1
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
ASG2
Page 7-9. 30
Provides Parallellism
ENO
EMP1
ENO
ASG1
Distributed DBMS
EMP2
ENO
ASG2
EMP3
1998 M. Tamer zsu & Patrick Valduriez
ENO
ASG1
EMP3
ASG2
Page 7-9. 31
Eliminates Unnecessary Work
ENO
EMP1
Distributed DBMS
ENO
ASG1
EMP2
ENO
ASG2
EMP3
1998 M. Tamer zsu & Patrick Valduriez
ASG2
Page 7-9. 32
Reduction for PHF
Reduction with selection
Relation R and FR={R1, R2, , Rw} where Rj= pj(R)
pi(Rj)= if x in R: (pi(x) pj(x))
Example
SELECT
FROM
WHERE
*
EMP
ENO=E5
ENO=E5
ENO=E5
EMP1
Distributed DBMS
EMP2
EMP3
1998 M. Tamer zsu & Patrick Valduriez
EMP2
Page 7-9. 33
Reduction for PHF
Reduction with join
Possible if fragmentation is done on join attribute
Distribute join over union
(R1 R2)
S (R1 S) (R2 S)
Given Ri = pi(R) and Rj = pj(R)
Ri Rj = if x in Ri, y in Rj: (pi(x) pj(y))
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 34
Reduction for PHF
Reduction with join - Example
Assume EMP is fragmented as before and
ASG1: ENO "E3"(ASG)
ASG2: ENO > "E3"(ASG)
Consider the query
SELECT*
FROM EMP, ASG
WHERE EMP.ENO=ASG.ENO
ENO
EMP1
Distributed DBMS
EMP2
EMP3
ASG1
1998 M. Tamer zsu & Patrick Valduriez
ASG2
Page 7-9. 35
Reduction for PHF
Reduction with join - Example
Distribute join over unions
Apply the reduction rule
ENO
EMP1
Distributed DBMS
ENO
ASG1
EMP2
ENO
ASG2
1998 M. Tamer zsu & Patrick Valduriez
EMP3
ASG2
Page 7-9. 36
Reduction for VF
Find useless (not empty) intermediate relations
Relation R defined over attributes A = {A1, ..., An} vertically
fragmented as Ri = A' (R) where A' A:
D,K(Ri) is useless if the set of projection attributes D is not in A'
Example: EMP1= ENO,ENAME (EMP); EMP2= ENO,TITLE (EMP)
SELECT ENAME
FROM
EMP
ENAME
ENAME
ENO
EMP1
Distributed DBMS
EMP2
1998 M. Tamer zsu & Patrick Valduriez
EMP1
Page 7-9. 37
Reduction for DHF
Rule :
Distribute joins over unions
Apply the join reduction for horizontal fragmentation
Example
ASG1: ASG ENO EMP1
ASG2: ASG ENO EMP2
EMP1: TITLE=Programmer (EMP)
EMP2: TITLE=Programmer (EMP)
Query
SELECT
FROM
WHERE
AND
Distributed DBMS
*
EMP, ASG
ASG.ENO = EMP.ENO
EMP.TITLE = Mech. Eng.
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 38
Reduction for DHF
Generic query
ENO
TITLE=Mech. Eng.
ASG1
ASG2
EMP1
EMP2
Selections first
ENO
TITLE=Mech. Eng.
ASG1
Distributed DBMS
ASG2
1998 M. Tamer zsu & Patrick Valduriez
EMP2
Page 7-9. 39
Reduction for DHF
Joins over unions
ENO
ENO
TITLE=Mech. Eng.
ASG1
TITLE=Mech. Eng.
EMP2
ASG2
EMP2
Elimination of the empty intermediate relations
(left sub-tree)
ENO
TITLE=Mech. Eng.
ASG2
Distributed DBMS
EMP2
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 40
Reduction for HF
Combine the rules already specified:
Remove empty relations generated by contradicting
selections on horizontal fragments;
Remove useless relations generated by projections
on vertical fragments;
Distribute joins over unions in order to isolate and
remove useless joins.
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 41
Reduction for HF
Example
Consider the following hybrid
fragmentation:
EMP1=ENO"E4" (ENO,ENAME (EMP))
EMP2=ENO>"E4" (ENO,ENAME (EMP))
ENAME
ENAME
ENO=E5
EMP3= ENO,TITLE (EMP)
ENO
and the query
SELECT
FROM
WHERE
ENO=E5
ENAME
EMP
ENO=E5
EMP2
EMP1 EMP2 EMP3
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 42
Step 3 Global Query Optimization
Input: Fragment query
Find the best (not necessarily optimal) global
schedule
Minimize a cost function
Distributed join processing
Bushy vs. linear trees
Which relation to ship where?
Ship-whole vs ship-as-needed
Decide on the use of semijoins
Semijoin saves on communication at the expense of
more local processing.
Join methods
Distributed DBMS
nested loop vs ordered joins (merge join or hash join)
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 43
Cost-Based Optimization
Solution space
The set of equivalent algebra expressions (query trees).
Cost function (in terms of time)
I/O cost + CPU cost + communication cost
These might have different weights in different distributed
environments (LAN vs WAN).
Can also maximize throughput
Search algorithm
How do we move inside the solution space?
Exhaustive search, heuristic algorithms (iterative
improvement, simulated annealing, genetic,)
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 44
Query Optimization Process
Input Query
Search Space
Generation
Transformation
Rules
Equivalent QEP
Search
Strategy
Cost Model
Best QEP
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 45
Search Space
Search space characterized by
alternative execution plans
Focus on join trees
For N relations, there are O(N!)
equivalent join trees that can be
obtained by applying
commutativity and associativity
rules
SELECT
FROM
WHERE
AND
ENAME,RESP
EMP, ASG, PROJ
EMP.ENO=ASG.ENO
ASG.PNO=PROJ.PNO
PNO
PROJ
ENO
EMP
ASG
ENO
EMP
PNO
PROJ
ASG
ENO,PNO
PROJ
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
ASG
EMP
Page 7-9. 46
Search Space
Restrict by means of heuristics
Perform unary operations before binary operations
Restrict the shape of the join tree
Consider only linear trees, ignore bushy ones
Linear Join Tree
Bushy Join Tree
R4
R3
R1
Distributed DBMS
R2
R1
R2
1998 M. Tamer zsu & Patrick Valduriez
R3
R4
Page 7-9. 47
Search Strategy
How to move in the search space.
Deterministic
Start from base relations and build plans by adding one
relation at each step
Dynamic programming: breadth-first
Greedy: depth-first
Randomized
Search for optimalities around a particular starting point
Trade optimization time for execution time
Better when > 5-6 relations
Simulated annealing
Iterative improvement
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 48
Search Strategies
Deterministic
R4
R3
R1
R2
R1
R2
R3
R1
R2
Randomized
R3
R1
Distributed DBMS
R2
R2
R1
1998 M. Tamer zsu & Patrick Valduriez
R3
Page 7-9. 49
Cost Functions
Total Time (or Total Cost)
Reduce each cost (in terms of time) component
individually
Do as little of each cost component as possible
Optimizes the utilization of the resources
Increases system throughput
Response Time
Do as many things as possible in parallel
May increase total time because of increased total activity
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 50
Total Cost
Summation of all cost factors
Total cost = CPU cost + I/O cost + communication cost
CPU cost = unit instruction cost no.of instructions
I/O cost
= unit disk I/O cost no. of disk I/Os
communication cost = message initiation + transmission
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 51
Total Cost Factors
Wide area network
message initiation and transmission costs high
local processing cost is low (fast mainframes or
minicomputers)
ratio of communication to I/O costs = 20:1
Local area networks
communication and local processing costs are more
or less equal
ratio = 1:1.6
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 52
Response Time
Elapsed time between the initiation and the completion of
a query
Response time
= CPU time + I/O time + communication time
CPU time
= unit instruction time no. of sequential
instructions
I/O time
= unit I/O time no. of sequential I/Os
communication time = unit msg initiation time
no. of sequential msg + unit transmission time
no. of sequential bytes
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 53
Example
Site 1
x units
Site 3
Site 2
y units
Assume that only the communication cost is considered
Total time = 2 message initialization time + unit
transmission time (x+y)
Response time = max {time to send x from 1 to 3, time to
send y from 2 to 3}
time to send x from 1 to 3 = message initialization time +
unit transmission time x
time to send y from 2 to 3 = message initialization time +
unit transmission time y
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 54
Optimization Statistics
Primary cost factor: size of intermediate relations
Make them precise more costly to maintain
For each relation R[A1, A2, , An] fragmented as R1, , Rr
length of each attribute: length(Ai)
the number of distinct values for each attribute in each fragment:
card(AiRj)
maximum and minimum values in the domain of each attribute:
min(Ai), max(Ai)
the cardinalities of each domain: card(dom[Ai])
the cardinalities of each fragment: card(Rj)
Selectivity factor of each operation for relations
For joins
SF (R,S) =
Distributed DBMS
card(R
S)
card(R)card(S)
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 55
Intermediate Relation Sizes
Selection
size(R) = card(R) length(R)
card(F (R)) = SF (F) card(R)
where
S F(A = value) =
card(A(R))
max(A) value
S F(A > value) =
max(A) min(A)
S F(A < value) =
value max(A)
max(A) min(A)
SF(p(Ai) p(Aj)) = SF(p(Ai)) SF(p(Aj))
SF(p(Ai) p(Aj)) = SF(p(Ai)) + SF(p(Aj)) (SF(p(Ai)) SF(p(Aj)))
SF(A value) = SF(A= value) card({values})
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 56
Intermediate Relation Sizes
Projection
card(A(R))=card(R)
Cartesian Product
card(R S) = card(R) card(S)
Union
upper bound: card(R S) = card(R) + card(S)
lower bound: card(R S) = max{card(R), card(S)}
Set Difference
upper bound: card(RS) = card(R)
lower bound: 0
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 57
Intermediate Relation Size
Join
Special case: A is a key of R and B is a foreign key of
S;
card(R
A=B
S) = card(S)
More general:
card(R
S) = SF card(R) card(S)
Semijoin
card(R
S) = SF (S.A) card(R)
where
SF (R
Distributed DBMS
S)= SF (S.A) =
card(A(S))
card(dom[A])
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 58
Centralized Query Optimization
INGRES
dynamic
interpretive
System R
static
exhaustive search
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 59
INGRES Algorithm
Decompose each multi-variable query into a
sequence of mono-variable queries with a
common variable
Process each by a one variable query processor
Choose an initial execution plan (heuristics)
Order the rest by considering intermediate relation
sizes
No statistical information is maintained
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 60
INGRES AlgorithmDecomposition
Replace an nvariable query q by a series of
queries
q1 q2 qn
where qi uses the result of qi-1.
Detachment
Query q decomposed into q' q" where q' and q"
have a common variable which is the result of q'
Tuple substitution
Replace the value of each tuple with actual values
and simplify the query
q(V1, V2, ... Vn) (q' (t1, V2, V2, ... , Vn), t1 R)
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 61
Detachment
q: SELECT
V2.A2,V3.A3, ,Vn.An
FROM
R1 V1, ,Rn Vn
WHERE
P1(V1.A1) AND P2(V1.A1,V2.A2,, Vn.An)
q':SELECT
V1.A1 INTO R1'
FROM
R1 V 1
WHERE
P1(V1.A1)
q": SELECT
V2.A2, , Vn.An
FROM
R1' V1, R2 V2, , Rn Vn
WHERE
P2(V1.A1, V2.A2, , Vn.An)
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 62
Detachment Example
Names of employees working on CAD/CAM project
q 1:
SELECT
FROM
WHERE
AND
AND
EMP.ENAME
EMP, ASG, PROJ
EMP.ENO=ASG.ENO
ASG.PNO=PROJ.PNO
PROJ.PNAME="CAD/CAM"
q11: SELECT
FROM
WHERE
PROJ.PNO INTO JVAR
PROJ
PROJ.PNAME="CAD/CAM"
q':
Distributed DBMS
SELECT
FROM
WHERE
AND
EMP.ENAME
EMP,ASG,JVAR
EMP.ENO=ASG.ENO
ASG.PNO=JVAR.PNO
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 63
Detachment Example (contd)
q': SELECT
FROM
WHERE
AND
q12:
EMP.ENAME
EMP,ASG,JVAR
EMP.ENO=ASG.ENO
ASG.PNO=JVAR.PNO
SELECT
FROM
WHERE
q13:
Distributed DBMS
ASG,JVAR
ASG.PNO=JVAR.PNO
SELECT
FROM
WHERE
ASG.ENO INTO GVAR
EMP.ENAME
EMP,GVAR
EMP.ENO=GVAR.ENO
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 64
Tuple Substitution
q11 is a mono-variable query
q12 and q13 is subject to tuple substitution
Assume GVAR has two tuples only: <E1> and <E2>
Then q13 becomes
q131: SELECT
FROM
WHERE
EMP.ENAME
EMP
EMP.ENO="E1"
q132: SELECT
FROM
WHERE
EMP.ENAME
EMP
EMP.ENO="E2"
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 65
System R Algorithm
Simple (i.e., mono-relation) queries are
executed according to the best access path
Execute joins
2.1 Determine the possible ordering of joins
2.2 Determine the cost of each ordering
2.3 Choose the join ordering with minimal cost
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 66
System R Algorithm
For joins, two alternative algorithms :
Nested loops
for each tuple of external relation (cardinality n1)
for each tuple of internal relation (cardinality n2)
join two tuples if the join predicate is true
end
end
Complexity: n1n2
Merge join
sort relations
merge relations
Complexity: n1+ n2 if relations are previously sorted and
equijoin
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 67
System R Algorithm Example
Names of employees working on the CAD/CAM project
Assume
EMP has an index on ENO,
ASG has an index on PNO,
PROJ has an index on PNO and an index on PNAME
ASG
ENO
EMP
Distributed DBMS
PNO
PROJ
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 68
System R Example (contd)
Choose the best access paths to each relation
EMP:
ASG:
PROJ:
sequential scan (no selection on EMP)
sequential scan (no selection on ASG)
index on PNAME (there is a selection on
PROJ based on PNAME)
Determine the best join ordering
Distributed DBMS
EMP ASG PROJ
ASG PROJ EMP
PROJ ASG EMP
ASG EMP PROJ
EMP PROJ ASG
PROJ EMP ASG
Select the best ordering based on the join costs
evaluated according to the two methods
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 69
System R Algorithm
Alternatives
ASG
EMP
EMP ASG
pruned
EMP PROJ ASG
pruned
(ASG
EMP
EMP)
PROJ
ASG PROJ
pruned
PROJ
PROJ
(PROJ
ASG
ASG)
PROJ EMP
pruned
EMP
Best total join order is one of
((ASG EMP) PROJ)
((PROJ ASG) EMP)
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 70
System R Algorithm
((PROJ ASG) EMP) has a useful index on
the select attribute and direct access to the
join attributes of ASG and EMP
Therefore, chose it with the following access
methods:
select PROJ using index on PNAME
then join with ASG using index on PNO
then join with EMP using index on ENO
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 71
Join Ordering in Fragment Queries
Ordering joins
Distributed INGRES
System R*
Semijoin ordering
SDD-1
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 72
Join Ordering
Consider
two relations only
if size (R) < size (S)
R
S
if size (R) > size (S)
Multiple relations more difficult because too
many alternatives.
Compute the cost of all alternatives and select the
best one.
Necessary to compute the size of intermediate
relations which is difficult.
Use heuristics
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 73
Join Ordering Example
Consider
PROJ
ASG
PNO
EMP
ENO
Site 2
ASG
ENO
EMP
PNO
PROJ
Site 1
Distributed DBMS
Site 3
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 74
Join Ordering Example
Execution alternatives:
1. EMP Site 2
2. ASG Site 1
Site 2 computes EMP'=EMP ASG
Site 1 computes EMP'=EMP ASG
EMP' Site 3
EMP' Site 3
Site 3 computes EMP PROJ
Site 3 computes EMP PROJ
3. ASG Site 3
4. PROJ Site 2
Site 3 computes ASG'=ASG PROJ Site 2 computes PROJ'=PROJ ASG
ASG' Site 1
PROJ' Site 1
Site 1 computes ASG' EMP
Site 1 computes PROJ'
EMP
5. EMP Site 2
PROJ Site 2
Site 2 computes EMP
Distributed DBMS
PROJ
ASG
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 75
Semijoin Algorithms
Consider the join of two relations:
R[A] (located at site 1)
S[A] (located at site 2)
Alternatives:
1 Do the join R
2 Perform one of the semijoin equivalents
R
S (R
R
Distributed DBMS
(R
A
A
S)
S)
(S
A
A
(S
R)
A
R)
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 76
Semijoin Algorithms
Perform the join
send R to Site 2
Site 2 computes R
Consider semijoin (R
S)
S' A(S)
S' Site 1
Site 1 computes R' = R
S'
R' Site 2
Site 2 computes R'
Semijoin is better if
size(A(S)) + size(R
Distributed DBMS
S)) < size(R)
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 77
Distributed Query
Processing
Algorithms
Opt.
Timing
Objective
Function
Opt.
Factors
Network
To pology
Dist.
INGRES
Dynamic
Resp.
Msg. Size, General or
time or Proc. Cost Broadcast
To tal time
No
Horizontal
R*
Static
To tal time No. Msg., General or
Msg. Size,
Local
IO, CPU
No
1, 2
No
SDD-1
Static
To tal time Msg. Size
Yes
1,3,4,
5
No
General
Semijoin Stats
Fragments
1: relation cardinality; 2: number of unique values per attribute; 3: join selectivity factor; 4: size
of projection on each join attribute; 5: attribute size and tuple size
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 78
Distributed INGRES Algorithm
Same as the centralized version except
Movement of relations (and fragments) need to
be considered
Optimization with respect to communication
cost or response time possible
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 79
R* Algorithm
Cost function includes local processing as well
as transmission
Considers only joins
Exhaustive search
Compilation
Published papers provide solutions to handling
horizontal and vertical fragmentations but the
implemented prototype does not
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 80
R* Algorithm
Performing joins
Ship whole
larger data transfer
smaller number of messages
better if relations are small
Fetch as needed
number of messages = O(cardinality of external
relation)
data transfer per message is minimal
better if relations are large and the selectivity is
good
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 81
R* Algorithm
Vertical Partitioning & Joins
1. Move outer relation tuples to the site of the inner
relation
(a) Retrieve outer tuples
(b) Send them to the inner relation site
(c) Join them as they arrive
Total Cost = cost(retrieving qualified outer tuples)
+ no. of outer tuples fetched
cost(retrieving qualified inner tuples)
Distributed DBMS
+ msg. cost (no. outer tuples fetched
avg. outer tuple size) / msg. size
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 82
R* Algorithm
Vertical Partitioning & Joins
2. Move inner relation to the site of outer relation
cannot join as they arrive; they need to be stored
Total Cost = cost(retrieving qualified outer tuples)
+ no. of outer tuples fetched
cost(retrieving
matching inner tuples
from temporary
storage)
+ cost(retrieving qualified inner tuples)
Distributed DBMS
+ cost(storing all qualified inner tuples
temporary storage)
in
+ msg. cost (no. of inner tuples fetched
inner tuple size) / msg. size
avg.
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 83
R* Algorithm
Vertical Partitioning & Joins
3. Move both inner and outer relations to another site
Total cost = cost(retrieving qualified outer tuples)
+
cost(retrieving qualified inner tuples)
cost(storing inner tuples in storage)
+
msg. cost (no. of outer tuples fetched
avg. outer tuple size) / msg. size
+
msg. cost (no. of inner tuples fetched
avg. inner tuple size) / msg. size
+
no. of outer tuples fetched
cost(retrieving inner tuples from
temporary
storage)
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 84
R* Algorithm
Vertical Partitioning & Joins
4. Fetch inner tuples as needed
(a) Retrieve qualified tuples at outer relation site
(b) Send request containing join column value(s) for outer
tuples to inner relation site
(c) Retrieve matching inner tuples at inner relation site
(d) Send the matching inner tuples to outer relation site
(e) Join as they arrive
Total Cost = cost(retrieving qualified outer tuples)
+ msg. cost (no. of outer tuples fetched)
+ no. of outer tuples fetched (no. of
inner tuples fetched avg. inner tuple
size msg. cost / msg. size)
+ no. of outer tuples fetched
cost(retrieving matching inner tuples
for one outer value)
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 85
SDD-1 Algorithm
Based on the Hill Climbing Algorithm
Semijoins
No replication
No fragmentation
Cost of transferring the result to the user site from
the final result site is not considered
Can minimize either total time or response time
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 86
Hill Climbing Algorithm
Assume join is between three relations.
Step 1: Do initial processing
Step 2: Select initial feasible solution (ES0)
2.1 Determine the candidate result sites - sites
where a relation referenced in the query exist
2.2 Compute the cost of transferring all the other
referenced relations to each candidate site
2.3 ES0 = candidate site with minimum cost
Step 3: Determine candidate splits of ES0 into
{ES1, ES2}
3.1 ES1 consists of sending one of the relations to
the other relation's site
3.2 ES2 consists of sending the join of the
relations to the final result site
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 87
Hill Climbing Algorithm
Step 4: Replace ES0 with the split schedule
which gives
cost(ES1) + cost(local join) + cost(ES2) < cost(ES0)
Step 5: Recursively apply steps 34 on ES1 and
ES2 until no such plans can be found
Step 6: Check for redundant transmissions in the
final plan and eliminate them.
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 88
Hill Climbing Algorithm
Example
What are the salaries of engineers who work on the
CAD/CAM project?
SAL(PAY
TITLE
Relation
(EMP
ENO
(ASG
Size
Site
EMP
PAY
PROJ
ASG
4
4
10
2
3
4
PNO
(PNAME=CAD/CAM(PROJ)))))
Assume:
Distributed DBMS
Size of relations is defined as their cardinality
Minimize total cost
Transmission cost between two sites is 1
Ignore local processing cost
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 89
Hill Climbing Algorithm
Example
Step 1:
Selection on PROJ; result has cardinality 1
Relation
EMP
PAY
PROJ
ASG
Distributed DBMS
Size
8
4
1
10
Site
1
2
3
4
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 90
Hill Climbing Algorithm
Example
Step 2: Initial feasible solution
Alternative 1: Resulting site is Site 1
Total cost = cost(PAYSite 1) + cost(ASGSite 1) + cost(PROJSite 1)
=
4 + 10 + 1 = 15
Alternative 2: Resulting site is Site 2
Total cost = 8 + 10 + 1 = 19
Alternative 3: Resulting site is Site 3
Total cost = 8 + 4 + 10 = 22
Alternative 4: Resulting site is Site 4
Total cost = 8 + 4 + 1 = 13
Therefore ES0 = {EMP Site 4; S Site 4; PROJ Site 4}
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 91
Hill Climbing Algorithm
Example
Step 3: Determine candidate splits
Alternative 1: {ES1, ES2, ES3} where
ES1: EMP Site 2
ES2: (EMP
PAY) Site 4
ES3: PROJ Site 4
Alternative 2: {ES1, ES2, ES3} where
ES1: PAY Site 1
ES2: (PAY
EMP) Site 4
ES3: PROJ Site 4
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 92
Hill Climbing Algorithm
Example
Step 4: Determine costs of each split alternative
cost(Alternative 1)
= cost(EMPSite 2) + cost((EMP
PAY)Site 4) +
cost(PROJ Site 4)
8 + 8 + 1 = 17
cost(Alternative 2)
= cost(PAYSite 1) + cost((PAY
EMP)Site 4) +
cost(PROJ Site 4)
4 + 8 + 1 = 13
Decision : DO NOT SPLIT
Step 5: ES0 is the best.
Step 6: No redundant transmissions.
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 93
Hill Climbing Algorithm
Problems :
Greedy algorithm determines an initial feasible
solution and iteratively tries to improve it
If there are local minimas, it may not find global
minima
If the optimal schedule has a high initial cost, it
won't find it since it won't choose it as the initial
feasible solution
Example : A better schedule is
PROJ Site 4
ASG' = (PROJ ASG) Site 1
(ASG' EMP) Site 2
Total cost = 1 + 2 + 2 = 5
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 94
SDD-1 Algorithm
Initialization
Step 1: In the execution strategy (call it ES),
include all the local processing
Step 2: Reflect the effects of local processing on
the database profile
Step 3: Construct a set of beneficial semijoin
operations (BS) as follows :
BS =
For each semijoin SJi
BS BS SJi if cost(SJi ) < benefit(SJi)
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 95
SDD-1 Algorithm Example
Consider the following query
SELECT
FROM
WHERE
AND
R3.C
R1, R2, R3
R1.A = R2.A
R2.B = R3.B
which has the following query graph and statistics:
relation card
Site 1
R1
Distributed DBMS
Site 2
A
R2
Site 3
B
R3
R1
R2
R3
tuple size
30
100
50
attribute
R1.A
R2.A
R2.B
R3.B
50
30
40
SF
1998 M. Tamer zsu & Patrick Valduriez
0.3
0.8
1.0
0.4
relation
size
1500
3000
2000
size(attribute)
36
320
400
80
Page 7-9. 96
SDD-1 Algorithm Example
Beneficial semijoins:
SJ1 = R2
R1, whose benefit is
2100 = (1 0.3)3000 and cost is 36
SJ2 = R2
R3, whose benefit is
1800 = (1 0.4) 3000 and cost is 80
Nonbeneficial semijoins:
SJ3 = R1
R2 , whose benefit is
300 = (1 0.8) 1500 and cost is 320
SJ4 = R3
Distributed DBMS
R2 , whose benefit is 0 and cost is 400
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 97
SDD-1 Algorithm
Iterative Process
Step 4: Remove the most beneficial SJi from BS
and append it to ES
Step 5: Modify the database profile accordingly
Step 6: Modify BS appropriately
compute new benefit/cost values
check if any new semijoin need to be
included in BS
Step 7: If BS , go back to Step 4.
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 98
SDD-1 Algorithm Example
Iteration 1:
Remove SJ1 from BS and add it to ES.
Update statistics
size(R2) = 900 (= 30000.3)
SF (R2.A) = ~0.80.3 = ~0.24
Iteration 2:
Two beneficial semijoins:
SJ2 = R2
R3,
whose benefit is 540 = (10.4) 900 and cost is 200
SJ3 = R1
R2', whose benefit is 1140=(10.24)1500 and cost is 96
Add SJ3 to ES
Update statistics
size(R1) = 360 (= 15000.24)
SF (R1.A) = ~0.30.24 = 0.072
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 99
SDD-1 Algorithm Example
Iteration 3:
No new beneficial semijoins.
Remove remaining beneficial semijoin SJ2 from
BS and add it to ES.
Update statistics
size(R2) = 360 (= 900*0.4)
Note: selectivity of R2 may also change, but not
important in this example.
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 100
SDD-1 Algorithm
Assembly Site Selection
Step 8: Find the site where the largest amount of data
resides and select it as the assembly site
Example:
Amount of data stored at sites:
Site 1: 360
Site 2: 360
Site 3: 2000
Therefore, Site 3 will be chosen as the assembly site.
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 101
SDD-1 Algorithm
Postprocessing
Step 9: For each Ri at the assembly site, find the
semijoins of the type
Ri Rj
where the total cost of ES without this semijoin
is smaller than the cost with it and remove the
semijoin from ES.
Note : There might be indirect benefits.
Example: No semijoins are removed.
Step 10: Permute the order of semijoins if doing so
would improve the total cost of ES.
Example: Final strategy:
Send (R2
Send R1
Distributed DBMS
R1)
R3 to Site 3
R2 to Site 3
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 102
Step 4 Local Optimization
Input: Best global execution schedule
Select the best access path
Use the centralized optimization techniques
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 103
Distributed Query Optimization
Problems
Cost model
multiple query optimization
heuristics to cut down on alternatives
Larger set of queries
optimization only on select-project-join queries
also need to handle complex queries (e.g., unions,
disjunctions, aggregations and sorting)
Optimization cost vs execution cost tradeoff
heuristics to cut down on alternatives
controllable search strategies
Optimization/reoptimization interval
extent of changes in database profile before
reoptimization is necessary
Distributed DBMS
1998 M. Tamer zsu & Patrick Valduriez
Page 7-9. 104