KEMBAR78
Distributed Query Processing | PPTX
MYTHILI.K

 Distributed database
 Query processing
 Distributed query processing methodology
 Query decomposition
 Data localization
 Global query optimization
 Join ordering
 Semi join
 Local query optimization
Topics Covered

 Multiple, Logically interrelated Databases
distributed over a complete network.
Distributed Database

 Translating a high level query (relational calculus) in
a sequence of database operators (relational algebra
+ communication operators)
 One high level query can have many equivalent
transformations, the main difficulty is to select the
most efficient one
Query Processing

Example – Input
 All players called “Muller", who are playing for a
team
QUERY: SELECT p.Name
FROM Players p, Teams t
WHERE p.TID = t.TID
AND p.Name LIKE " Muller"
Contd..

QUERY: SELECT p.Name FROM Players p, Teams
t WHERE p.TID = t.TID
AND p.Name LIKE " Muller"
Solution 1 Solution 2

 The retrieval of data from different sites in a
network.
DISTRIBUTED QUERY
PROCESSING

Methodology

 Decomposing a high level query (relational calculus)
into an algebraic query (relational algebra) on global
relations
1. Query Decomposition

1) Normalization: Rewriting the query in a normalized form,
that is useful for further manipulations
2) Analysis: Query is analysed semantically so that incorrect
queries are detected and rejected
3) Elimination of redundancy: Simplifying query to
eliminate redundant predicates
4) Rewriting: The relational calculus query is translated in a
equivalent algebraic query
Steps of query
decomposition

 Conjunctive normal form
(p11∨p12∨…∨p1n) ∧…∧ (pm1∨pm2∨…∨pmn)
 Disjunctive normal form
(p11∧p12 ∧…∧p1n) ∨…∨ (pm1 ∧pm2∧…∧ pmn)
 OR's mapped into union
 AND's mapped into join or selection
Normalization

 Conjunctive Normal Form
(p.TID = t.TID) ∧ (p.Name =“Muller” ∨ p.Name
=“Kim”)
 Disjunctive Normal Form
((p.TID = t.TID) ∧ (p.Name =“Muller”)) ∨
((p.TID = t.TID) ∧ (p.Name =“Kim”))
SELECT p.Name FROM Players p, Teams t
WHERE p.TID = t.TID AND (p.Name LIKE " Muller“
OR p.Name LIKE “Kim”)

 Rejecting normalized queries for which further
processing is impossible or unnecessary
 Two main reasons for rejection:
1)Type incorrectness: Using a relation or an
attribute which is unknown in the global schema or
using wrong datatype for an operation
2)Semantical incorrectness: If the components of
a query do not contribute in any way to generate a
valid result
Analysis

Simplification
(removing redundancy)
SELECT DISTINCT f.Position
FROM Players p, Formation f
WHERE (NOT(f.Position =
striker")
AND (f.Position = "striker" OR
f.Position="keeper")
AND NOT(f.Position =
"keeper"))
OR (f.PID = p.PID
AND p.Name = "Thomas
Muller")
SELECT DISTINCT
f.Position
FROM Players p,
Formation f
WHERE f.PID = p.PID
AND p.Name = "Thomas
Muller"
SELECT p.Name, t.Name
FROM Players p, Team t,
Formation f
WHERE p.TID = t.TID
AND p.PID = f.PID
AND p.Active = 1
AND (t.Name = "FC Bayern
Munchen“ OR t.Name = "Borussia
Dortmund")
AND f.Position = "striker"
Rewriting
 Two Steps:
1. Transforming the
query into an algebraic
relational query tree.
2. Restructuring the
algebraic tree to improve
performance
 Ex:All active strikers of
"FC Bayern Munchen"
and "Borussia Dortmund"

1) Separating of unary operators (selection, projection)
2) Grouping unary operators on the same relation
3) Commuting unary operators with binary operators
4) Changing order of binary operators
How to improve a "bad-
algebraic-tree"?

Algebraic Tree

 Input: Algebraic query on distributed relations
 Purpose:
∗ Apply data distribution information to the
algebra operations and determine which fragments are
involved
∗ Substitute global query with queries on
fragments
∗ Optimize the global query
2. Data Localization

Query:
SELECT * FROM EMP, ASG WHERE
EMP.ENO=ASG.ENO
Horizontal fragmentation:
∗ EMP1 = σ(ENO ≤ ” E3”(EMP))
∗ EMP2 = σ (” E3” ≤ENO>” E6”(EMP))
∗ EMP3 = σENO> ” E6”(EMP)
∗ ASG1 = σENO ≤ ” E3”(ASG)
∗ ASG2 = σENO> ” E3”(ASG)
Horizontal
Fragmentation

Contd..
Generic Query Reduced Query

Query:
SELECT ENAME FROM EMP
Fragmentation:
∗ EMP1 = ΠENO,ENAME (EMP )
∗ EMP2 = ΠENO,TITLE (EMP )
Vertical Fragmentation

SELECT ENAME FROM EMP
Generic Query Reduced Query

 Input: Fragment query
 Find the best (not necessarily optimal) global
schedule
➠ Minimize a cost function
➠ Distributed join processing
Which relation to ship where?
Ship-whole vs. ship-as-needed
➠ Decide on the use of semi joins
3. Global Query
Optimization

Example relations: Employee at site 1 and
Department at Site 2
Employee at site 1. 10,000 rows. Row size = 100 bytes.
Table size = 106 bytes.
Department at Site 2. 100 rows. Row size = 35 bytes.
Table size = 3,500 bytes.
Q: For each employee, retrieve employee name
and department name Where the employee
works.
Q: Fname,Lname,Dname (Employee Dno = Dnumber
Department)
Join Ordering

 Strategies:
1. Transfer Employee and Department to site 3.
 Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3.
 Query result size = 40 * 10,000 = 400,000 bytes. Total
transfer size = 400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join
at site 1, and send the result to site 3.
 Total bytes transferred = 400,000 + 3500 = 403,500 bytes.
Contd..

 It is used to reduce the data transmission cost.
Computing steps:
1) Project Ri on attribute A (Ri[A] ) and
ship this projection ( a semijoin
projection) from the site of Ri to the site
of Rj ;
2) Reduce Rj to Rj’ by eliminating tuples
where attribute A are not matching any
value in Ri[A] .
Semijoin Rj⋉ Ri

Contd..
3
4
5
7
8
9
A C
R2
A B
1
2
4
5
3 6
R1
Site 1
Site 2
1
2
3
R1[A]
projection
Ship(3)
qs
Ship(2)
Ship(6)
3 7
R2’
reduc
e

1. Project the join attributes of Department at site 2, and
transfer them to site 1. For Q, 4 * 100 = 400 bytes are
transferred
2. Join the transferred file with the Employee relation at
site 1, and transfer the required attributes from the
resulting file to site 2. For Q, 34 * 10,000 = 340,000
bytes are transferred.
3. Execute the query by joining the transferred file with
Department and present the result to the user at site
2.
 Using this strategy, we transfer 340,400 bytes for Q.
Contd..

 Input: Best global execution schedule
• Use the centralized optimization
techniques
4. Local Optimization

 R. Elmasri, S.B. Navathe, “Fundamentals of Database
Systems”, Fifth Edition,Pearson Education/Addison
Wesley, 2007.
 Henry F Korth, Abraham Silberschatz, S.
Sudharshan, “Database System Concepts”, Fifth
Edition, McGraw Hill, 2006.
References
THANK YOU
.

Distributed Query Processing

  • 1.
  • 2.
      Distributed database Query processing  Distributed query processing methodology  Query decomposition  Data localization  Global query optimization  Join ordering  Semi join  Local query optimization Topics Covered
  • 3.
      Multiple, Logicallyinterrelated Databases distributed over a complete network. Distributed Database
  • 4.
      Translating ahigh level query (relational calculus) in a sequence of database operators (relational algebra + communication operators)  One high level query can have many equivalent transformations, the main difficulty is to select the most efficient one Query Processing
  • 5.
     Example – Input All players called “Muller", who are playing for a team QUERY: SELECT p.Name FROM Players p, Teams t WHERE p.TID = t.TID AND p.Name LIKE " Muller" Contd..
  • 6.
     QUERY: SELECT p.NameFROM Players p, Teams t WHERE p.TID = t.TID AND p.Name LIKE " Muller" Solution 1 Solution 2
  • 7.
      The retrievalof data from different sites in a network. DISTRIBUTED QUERY PROCESSING
  • 8.
  • 9.
      Decomposing ahigh level query (relational calculus) into an algebraic query (relational algebra) on global relations 1. Query Decomposition
  • 10.
     1) Normalization: Rewritingthe query in a normalized form, that is useful for further manipulations 2) Analysis: Query is analysed semantically so that incorrect queries are detected and rejected 3) Elimination of redundancy: Simplifying query to eliminate redundant predicates 4) Rewriting: The relational calculus query is translated in a equivalent algebraic query Steps of query decomposition
  • 11.
      Conjunctive normalform (p11∨p12∨…∨p1n) ∧…∧ (pm1∨pm2∨…∨pmn)  Disjunctive normal form (p11∧p12 ∧…∧p1n) ∨…∨ (pm1 ∧pm2∧…∧ pmn)  OR's mapped into union  AND's mapped into join or selection Normalization
  • 12.
      Conjunctive NormalForm (p.TID = t.TID) ∧ (p.Name =“Muller” ∨ p.Name =“Kim”)  Disjunctive Normal Form ((p.TID = t.TID) ∧ (p.Name =“Muller”)) ∨ ((p.TID = t.TID) ∧ (p.Name =“Kim”)) SELECT p.Name FROM Players p, Teams t WHERE p.TID = t.TID AND (p.Name LIKE " Muller“ OR p.Name LIKE “Kim”)
  • 13.
      Rejecting normalizedqueries for which further processing is impossible or unnecessary  Two main reasons for rejection: 1)Type incorrectness: Using a relation or an attribute which is unknown in the global schema or using wrong datatype for an operation 2)Semantical incorrectness: If the components of a query do not contribute in any way to generate a valid result Analysis
  • 14.
     Simplification (removing redundancy) SELECT DISTINCTf.Position FROM Players p, Formation f WHERE (NOT(f.Position = striker") AND (f.Position = "striker" OR f.Position="keeper") AND NOT(f.Position = "keeper")) OR (f.PID = p.PID AND p.Name = "Thomas Muller") SELECT DISTINCT f.Position FROM Players p, Formation f WHERE f.PID = p.PID AND p.Name = "Thomas Muller"
  • 15.
    SELECT p.Name, t.Name FROMPlayers p, Team t, Formation f WHERE p.TID = t.TID AND p.PID = f.PID AND p.Active = 1 AND (t.Name = "FC Bayern Munchen“ OR t.Name = "Borussia Dortmund") AND f.Position = "striker" Rewriting  Two Steps: 1. Transforming the query into an algebraic relational query tree. 2. Restructuring the algebraic tree to improve performance  Ex:All active strikers of "FC Bayern Munchen" and "Borussia Dortmund"
  • 16.
     1) Separating ofunary operators (selection, projection) 2) Grouping unary operators on the same relation 3) Commuting unary operators with binary operators 4) Changing order of binary operators How to improve a "bad- algebraic-tree"?
  • 17.
  • 18.
      Input: Algebraicquery on distributed relations  Purpose: ∗ Apply data distribution information to the algebra operations and determine which fragments are involved ∗ Substitute global query with queries on fragments ∗ Optimize the global query 2. Data Localization
  • 19.
     Query: SELECT * FROMEMP, ASG WHERE EMP.ENO=ASG.ENO Horizontal fragmentation: ∗ EMP1 = σ(ENO ≤ ” E3”(EMP)) ∗ EMP2 = σ (” E3” ≤ENO>” E6”(EMP)) ∗ EMP3 = σENO> ” E6”(EMP) ∗ ASG1 = σENO ≤ ” E3”(ASG) ∗ ASG2 = σENO> ” E3”(ASG) Horizontal Fragmentation
  • 20.
  • 21.
     Query: SELECT ENAME FROMEMP Fragmentation: ∗ EMP1 = ΠENO,ENAME (EMP ) ∗ EMP2 = ΠENO,TITLE (EMP ) Vertical Fragmentation
  • 22.
     SELECT ENAME FROMEMP Generic Query Reduced Query
  • 23.
      Input: Fragmentquery  Find the best (not necessarily optimal) global schedule ➠ Minimize a cost function ➠ Distributed join processing Which relation to ship where? Ship-whole vs. ship-as-needed ➠ Decide on the use of semi joins 3. Global Query Optimization
  • 24.
     Example relations: Employeeat site 1 and Department at Site 2 Employee at site 1. 10,000 rows. Row size = 100 bytes. Table size = 106 bytes. Department at Site 2. 100 rows. Row size = 35 bytes. Table size = 3,500 bytes. Q: For each employee, retrieve employee name and department name Where the employee works. Q: Fname,Lname,Dname (Employee Dno = Dnumber Department) Join Ordering
  • 25.
      Strategies: 1. TransferEmployee and Department to site 3.  Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3.  Query result size = 40 * 10,000 = 400,000 bytes. Total transfer size = 400,000 + 1,000,000 = 1,400,000 bytes. 3. Transfer Department relation to site 1, execute the join at site 1, and send the result to site 3.  Total bytes transferred = 400,000 + 3500 = 403,500 bytes. Contd..
  • 26.
      It isused to reduce the data transmission cost. Computing steps: 1) Project Ri on attribute A (Ri[A] ) and ship this projection ( a semijoin projection) from the site of Ri to the site of Rj ; 2) Reduce Rj to Rj’ by eliminating tuples where attribute A are not matching any value in Ri[A] . Semijoin Rj⋉ Ri
  • 27.
     Contd.. 3 4 5 7 8 9 A C R2 A B 1 2 4 5 36 R1 Site 1 Site 2 1 2 3 R1[A] projection Ship(3) qs Ship(2) Ship(6) 3 7 R2’ reduc e
  • 28.
     1. Project thejoin attributes of Department at site 2, and transfer them to site 1. For Q, 4 * 100 = 400 bytes are transferred 2. Join the transferred file with the Employee relation at site 1, and transfer the required attributes from the resulting file to site 2. For Q, 34 * 10,000 = 340,000 bytes are transferred. 3. Execute the query by joining the transferred file with Department and present the result to the user at site 2.  Using this strategy, we transfer 340,400 bytes for Q. Contd..
  • 29.
      Input: Bestglobal execution schedule • Use the centralized optimization techniques 4. Local Optimization
  • 30.
      R. Elmasri,S.B. Navathe, “Fundamentals of Database Systems”, Fifth Edition,Pearson Education/Addison Wesley, 2007.  Henry F Korth, Abraham Silberschatz, S. Sudharshan, “Database System Concepts”, Fifth Edition, McGraw Hill, 2006. References
  • 31.