Outline
Introduction
Background
Distributed DBMS Architecture
Distributed Database Design
Distributed Query Processing
Query Processing Methodology
Distributed Query Optimization
Distributed Transaction Management (Extensive)
Building Distributed Database Systems (RAID)
Mobile Database Systems
Privacy, Trust, and Authentication
Peer to Peer Systems
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 1
Useful References
Textbook Principles of Distributed Database Systems,
Chapter 7
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 2
Distributed Query Processing
Methodology
Calculus Query on Distributed
Relations
Query
Query
GLOBAL
GLOBAL
Decomposition
Decomposition SCHEMA
SCHEMA
Algebraic Query on Distributed
Relations
CONTROL
Data FRAGMENT
SITE Data FRAGMENT
Localization
Localization
SCHEMA
SCHEMA
Fragment Query
Global STATS
STATSON
ON
Global
Optimization
Optimization
FRAGMENTS
FRAGMENTS
Optimized Fragment Query
with Communication Operations
LOCAL Local LOCAL
LOCAL
Local
SITES Optimization
Optimization
SCHEMAS
SCHEMAS
Optimized Local
Queries
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 3
Restructuring
Convert relational calculus to relational ENAME Project
algebra
Make use of query trees
Example DUR=12 OR DUR=24
Find the names of employees other than J. Doe
who worked on the CAD/CAM project for
either 1 or 2 years. PNAME=“CAD/CAM” Select
SELECT ENAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO ENAME≠“J. DOE”
AND ASG.PNO = PROJ.PNO
AND ENAME ≠ “J. Doe”
PNO
AND PNAME = “CAD/CAM”
AND (DUR = 12 OR DUR = 24)
ENO Join
PROJ ASG EMP
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 4
Restructuring –Transformation
Rules
Commutativity of binary operations
RSSR
R SS R
RSSR
Associativity of binary operations
( R S ) T R (S T)
( R S ) T R (S T )
Idempotence of unary operations
A’(A’(R)) A’(R)
p1(A1)(p2(A2)(R)) = p1(A1) p2(A2)(R)
where R[A] and A' A, A" A and A' A"
Commuting selection with projection
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 5
Restructuring – Transformation
Rules
Commuting selection with binary operations
p(A)(R S) (p(A) (R)) S
p(Ai)(R (Aj,Bk) S) (p(Ai) (R)) (Aj,Bk) S
p(Ai)(R T) p(Ai) (R) p(Ai) (T)
where Ai belongs to R and T
Commuting projection with binary operations
C(R S) A’(R) B’(S)
C(R (Aj,Bk) S) A’(R) (Aj,Bk) B’(S)
C(R S) C (R) C (S)
where R[A] and S[B]; C = A' B' where A' A, B' B
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 6
Example
Recall the previous example: ENAME Project
Find the names of employees other than J.
Doe who worked on the CAD/CAM
project for either one or two years. DUR=12 OR DUR=24
SELECT ENAME PNAME=“CAD/CAM” Select
FROM PROJ, ASG, EMP
WHERE ASG.ENO=EMP.ENO
AND ASG.PNO=PROJ.PNO
ENAME≠“J. DOE”
AND ENAME≠“J. Doe”
AND PROJ.PNAME=“CAD/CAM” PNO
AND (DUR=12 OR DUR=24)
ENO Join
PROJ ASG EMP
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 7
Equivalent Query
ENAME
PNAME=“CAD/CAM” (DUR=12 DUR=24) ENAME≠“J. DOE”
PNO ENO
ASG PROJ EMP
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 8
Restructuring
ENAME
PNO
PNO,ENAME
ENO
PNO PNO,ENO PNO,ENAME
PNAME = "CAD/CAM" DUR =12 DUR=24 ENAME ≠ "J. Doe"
PROJ ASG EMP
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 9
Cost Functions
Total Time (or Total Cost)
Reduce each cost (in terms of time) component individually
Do as little of each cost component as possible
Optimizes the utilization of the resources
Increases system throughput
Response Time
Do as many things as possible in parallel
May increase total time because of increased total activity
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 10
Total Cost
Summation of all cost factors
Total cost = CPU cost + I/O cost + communication cost
CPU cost = unit instruction cost no.of instructions
I/O cost = unit disk I/O cost no. of disk I/Os
communication cost = message initiation + transmission
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 11
Total Cost Factors
Wide area network
message initiation and transmission costs high
local processing cost is low (fast mainframes or
minicomputers)
ratio of communication to I/O costs = 20:1
Local area networks
communication and local processing costs are more or less
equal
ratio = 1:1.6
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 12
Response Time
Elapsed time between the initiation and the completion of a query
Response time = CPU time + I/O time + communication time
CPU time = unit instruction time no. of sequential instructions
I/O time = unit I/O time no. of sequential I/Os
communication time = unit msg initiation time no.
of sequential msg + unit transmission time no. of
sequential bytes
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 13
Example
Site 1
x units
Site 3
Site 2 y units
Assume that only the communication cost is considered
Total time = 2 message initialization time + unit transmission time
(x+y)
Response time = max {time to send x from 1 to 3, time to send y
from 2 to 3}
time to send x from 1 to 3 = message initialization time + unit
transmission time x
time to send y from 2 to 3 = message initialization time + unit
transmission time y
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 14