Distributed Query Processing


 Distributed database
 Query processing
 Distributed query processing methodology
 Query decomposition
 Data localization
 Global query optimization
 Join ordering
 Semi join
 Local query optimization
Topics Covered


 Multiple, Logically interrelated Databases
distributed over a complete network.
Distributed Database


 Translating a high level query (relational calculus) in
a sequence of database operators (relational algebra
+ communication operators)
 One high level query can have many equivalent
transformations, the main difficulty is to select the
most efficient one
Query Processing


Example – Input
 All players called “Muller", who are playing for a
team
QUERY: SELECT p.Name
FROM Players p, Teams t
WHERE p.TID = t.TID
AND p.Name LIKE " Muller"
Contd..


QUERY: SELECT p.Name FROM Players p, Teams
t WHERE p.TID = t.TID
AND p.Name LIKE " Muller"
Solution 1 Solution 2


 The retrieval of data from different sites in a
network.
DISTRIBUTED QUERY
PROCESSING


 Decomposing a high level query (relational calculus)
into an algebraic query (relational algebra) on global
relations
1. Query Decomposition


1) Normalization: Rewriting the query in a normalized form,
that is useful for further manipulations
2) Analysis: Query is analysed semantically so that incorrect
queries are detected and rejected
3) Elimination of redundancy: Simplifying query to
eliminate redundant predicates
4) Rewriting: The relational calculus query is translated in a
equivalent algebraic query
Steps of query
decomposition


 Conjunctive normal form
(p11∨p12∨…∨p1n) ∧…∧ (pm1∨pm2∨…∨pmn)
 Disjunctive normal form
(p11∧p12 ∧…∧p1n) ∨…∨ (pm1 ∧pm2∧…∧ pmn)
 OR's mapped into union
 AND's mapped into join or selection
Normalization


 Conjunctive Normal Form
(p.TID = t.TID) ∧ (p.Name =“Muller” ∨ p.Name
=“Kim”)
 Disjunctive Normal Form
((p.TID = t.TID) ∧ (p.Name =“Muller”)) ∨
((p.TID = t.TID) ∧ (p.Name =“Kim”))
SELECT p.Name FROM Players p, Teams t
WHERE p.TID = t.TID AND (p.Name LIKE " Muller“
OR p.Name LIKE “Kim”)


 Rejecting normalized queries for which further
processing is impossible or unnecessary
 Two main reasons for rejection:
1)Type incorrectness: Using a relation or an
attribute which is unknown in the global schema or
using wrong datatype for an operation
2)Semantical incorrectness: If the components of
a query do not contribute in any way to generate a
valid result
Analysis


Simplification
(removing redundancy)
SELECT DISTINCT f.Position
FROM Players p, Formation f
WHERE (NOT(f.Position =
striker")
AND (f.Position = "striker" OR
f.Position="keeper")
AND NOT(f.Position =
"keeper"))
OR (f.PID = p.PID
AND p.Name = "Thomas
Muller")
SELECT DISTINCT
f.Position
FROM Players p,
Formation f
WHERE f.PID = p.PID
AND p.Name = "Thomas
Muller"

SELECT p.Name, t.Name
FROM Players p, Team t,
Formation f
WHERE p.TID = t.TID
AND p.PID = f.PID
AND p.Active = 1
AND (t.Name = "FC Bayern
Munchen“ OR t.Name = "Borussia
Dortmund")
AND f.Position = "striker"
Rewriting
 Two Steps:
1. Transforming the
query into an algebraic
relational query tree.
2. Restructuring the
algebraic tree to improve
performance
 Ex:All active strikers of
"FC Bayern Munchen"
and "Borussia Dortmund"


1) Separating of unary operators (selection, projection)
2) Grouping unary operators on the same relation
3) Commuting unary operators with binary operators
4) Changing order of binary operators
How to improve a "bad-
algebraic-tree"?


 Input: Algebraic query on distributed relations
 Purpose:
∗ Apply data distribution information to the
algebra operations and determine which fragments are
involved
∗ Substitute global query with queries on
fragments
∗ Optimize the global query
2. Data Localization


Query:
SELECT * FROM EMP, ASG WHERE
EMP.ENO=ASG.ENO
Horizontal fragmentation:
∗ EMP1 = σ(ENO ≤ ” E3”(EMP))
∗ EMP2 = σ (” E3” ≤ENO>” E6”(EMP))
∗ EMP3 = σENO> ” E6”(EMP)
∗ ASG1 = σENO ≤ ” E3”(ASG)
∗ ASG2 = σENO> ” E3”(ASG)
Horizontal
Fragmentation


Contd..
Generic Query Reduced Query


Query:
SELECT ENAME FROM EMP
Fragmentation:
∗ EMP1 = ΠENO,ENAME (EMP )
∗ EMP2 = ΠENO,TITLE (EMP )
Vertical Fragmentation


SELECT ENAME FROM EMP
Generic Query Reduced Query


 Input: Fragment query
 Find the best (not necessarily optimal) global
schedule
➠ Minimize a cost function
➠ Distributed join processing
Which relation to ship where?
Ship-whole vs. ship-as-needed
➠ Decide on the use of semi joins
3. Global Query
Optimization


Example relations: Employee at site 1 and
Department at Site 2
Employee at site 1. 10,000 rows. Row size = 100 bytes.
Table size = 106 bytes.
Department at Site 2. 100 rows. Row size = 35 bytes.
Table size = 3,500 bytes.
Q: For each employee, retrieve employee name
and department name Where the employee
works.
Q: Fname,Lname,Dname (Employee Dno = Dnumber
Department)
Join Ordering


 Strategies:
1. Transfer Employee and Department to site 3.
 Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3.
 Query result size = 40 * 10,000 = 400,000 bytes. Total
transfer size = 400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join
at site 1, and send the result to site 3.
 Total bytes transferred = 400,000 + 3500 = 403,500 bytes.
Contd..


 It is used to reduce the data transmission cost.
Computing steps:
1) Project Ri on attribute A (Ri[A] ) and
ship this projection ( a semijoin
projection) from the site of Ri to the site
of Rj ;
2) Reduce Rj to Rj’ by eliminating tuples
where attribute A are not matching any
value in Ri[A] .
Semijoin Rj⋉ Ri


Contd..
3
4
5
7
8
9
A C
R2
A B
1
2
4
5
3 6
R1
Site 1
Site 2
1
2
3
R1[A]
projection
Ship(3)
qs
Ship(2)
Ship(6)
3 7
R2’
reduc
e


1. Project the join attributes of Department at site 2, and
transfer them to site 1. For Q, 4 * 100 = 400 bytes are
transferred
2. Join the transferred file with the Employee relation at
site 1, and transfer the required attributes from the
resulting file to site 2. For Q, 34 * 10,000 = 340,000
bytes are transferred.
3. Execute the query by joining the transferred file with
Department and present the result to the user at site
2.
 Using this strategy, we transfer 340,400 bytes for Q.
Contd..


 Input: Best global execution schedule
• Use the centralized optimization
techniques
4. Local Optimization


 R. Elmasri, S.B. Navathe, “Fundamentals of Database
Systems”, Fifth Edition,Pearson Education/Addison
Wesley, 2007.
 Henry F Korth, Abraham Silberschatz, S.
Sudharshan, “Database System Concepts”, Fifth
Edition, McGraw Hill, 2006.
References

Distributed Query Processing

More Related Content

What's hot

Similar to Distributed Query Processing

Recently uploaded

Distributed Query Processing