KEMBAR78
Query processing in Distributed Database System | PPTX
QUERY PROCESSING IN
DISTRIBUTED DATABASE SYSTEMS
1
Presented by:
Muskaan
MCA/25020/18
OUTLINE
2
 What is Query ?
 What is Query Processor?
 Main Problems of Query Processing
 Characteristics of Query Processor
 Main layers of Query Processing
statement requesting the retrieval of
 What is Query ?
 A query is a
information.
A database query can be either a select query or an
action query.
 A select query is a data retrieval query, while an
action query asks for additional operations on the data, such
as insertion, updating or deletion.
3
What is Query Processor?
 The query processor in a DBMS receives as input , parses it,
generates an execution plan, and completes the processing
by executing the plan and returning the results to the
client.
 In relational database, users perform the task of data
processing and data manipulation with the help of high-
level non-procedural language (e.g. SQL).
4
What is Query Processor?
 Main function of a query processor is to transform a high- level-
query (also called calculus query) into an equivalent lower-level
query (also called algebraic query).
 This high-level query hides the low-level details from the user about
the physical organization of the data and presents such an environment
so that the user can handle the tasks of even complex queries in an
easy, concise and simple fashion.
 Main Problems of Query Processing
 Main problem of query processing is query optimization.
 It is a time consuming task, because many execution
strategies are involved to minimize (optimize) computer
resource consumption.
 Time and space required to process the query is also an
important factor for the performance of the query
processing.
6
 Important Characteristics of Query Processor
 Language
 Types of Optimization
 Optimization Timing
 Statistics
7
Important Characteristics of Query Processor
Language
 The input language of query processing can be based on
relational calculus or relational algebra.
Types of Optimization:
 Among all possible strategies for executing query, the one in
which less time and space are required is the best solution
for the optimization of query.
9
Optimization Timing:
 The actual time required to optimize the execution of a query is an
important factor. If less time is required, then it is the best solution for
query processing.
10
Statistics:
 The effectiveness of query optimization relies on statistical
information of the database, i.e. how many fragments
query will be needed, which operation should be done first.
11
 Main layers of Query Processing
Query processing involves 4 main layers:
• Query Decomposition
• Data Localization
• Global Query Optimization
• Distributed Execution
12
 Main layers of Query Processing
13
Query Decomposition
Calculus Query on Global Relations
Algebraic Query on Global Relations
Data Localization
Algebraic Query on Fragments
Global Optimization
Distributed Query Execution Plan
Distributed Execution
Global
Schema
Fragment
Schema
Allocation
Schema
Control Site
Local Sites
Fig. Generic Layering Scheme for Distributed Query Processing
 Query Decomposition
 The first layer decomposes the calculus query into an
algebraic query on global relations.
 Query decomposition can be viewed as four successive
steps:
 1) Normalization, 2)Analysis,
3) Elimination of redundancy, and 4) Rewriting.
14
15
 Query Decomposition
• Normalization
 First, the calculus query is rewritten in a normalized form
that is suitable for manipulation.
 Its main objective is to isolate data so that additions,
deletions, and modifications of a field can be made in just
one table
• Analysis
 Second, the normalized query is analysed so that incorrect
queries are detected and rejected as early as possible.
 Query Decomposition
• Elimination of Redundancy
 Third, the correct query is simplified. One way to simplify a
query is to eliminate redundancy.
• Rewriting
 Fourth, the calculus query is restructured as an algebraic
query. Several algebraic queries can be derived from the
same calculus query, and that some algebraic queries are
“better” than others.
16
 Localization of Distributed Data
 Output of the first layer is an algebraic query on distributed
relations which is input to the second layer.
 The main role of this layer is to localize the query’s data
using data distribution information.
 We know that relations are fragmented and stored in disjoint
subsets, called fragments where each fragment is stored at
different site.
17
 Global Query Optimization
 The input to the third layer is a fragment algebraic query.
 The goal of this layer is to find an execution strategy for
the algebraic fragment query which is close to optimal.
 The previous layers have already optimized the query, by
eliminating redundancies.
18
 Global Query Optimization
 Query optimization consists of
i)Finding the best ordering of operations in the query,
ii)Finding the communication operations which minimize
a cost function.
19
 Distributed Execution
 The last layer is performed by all the sites having
fragments involved in the query.
 Each subquery, called a local query, is executing at one
site. It is then optimized using the local schema of the
site.
20
THANK YOU

Query processing in Distributed Database System

  • 1.
    QUERY PROCESSING IN DISTRIBUTEDDATABASE SYSTEMS 1 Presented by: Muskaan MCA/25020/18
  • 2.
    OUTLINE 2  What isQuery ?  What is Query Processor?  Main Problems of Query Processing  Characteristics of Query Processor  Main layers of Query Processing
  • 3.
    statement requesting theretrieval of  What is Query ?  A query is a information. A database query can be either a select query or an action query.  A select query is a data retrieval query, while an action query asks for additional operations on the data, such as insertion, updating or deletion. 3
  • 4.
    What is QueryProcessor?  The query processor in a DBMS receives as input , parses it, generates an execution plan, and completes the processing by executing the plan and returning the results to the client.  In relational database, users perform the task of data processing and data manipulation with the help of high- level non-procedural language (e.g. SQL). 4
  • 5.
    What is QueryProcessor?  Main function of a query processor is to transform a high- level- query (also called calculus query) into an equivalent lower-level query (also called algebraic query).  This high-level query hides the low-level details from the user about the physical organization of the data and presents such an environment so that the user can handle the tasks of even complex queries in an easy, concise and simple fashion.
  • 6.
     Main Problemsof Query Processing  Main problem of query processing is query optimization.  It is a time consuming task, because many execution strategies are involved to minimize (optimize) computer resource consumption.  Time and space required to process the query is also an important factor for the performance of the query processing. 6
  • 7.
     Important Characteristicsof Query Processor  Language  Types of Optimization  Optimization Timing  Statistics 7
  • 8.
    Important Characteristics ofQuery Processor Language  The input language of query processing can be based on relational calculus or relational algebra.
  • 9.
    Types of Optimization: Among all possible strategies for executing query, the one in which less time and space are required is the best solution for the optimization of query. 9
  • 10.
    Optimization Timing:  Theactual time required to optimize the execution of a query is an important factor. If less time is required, then it is the best solution for query processing. 10
  • 11.
    Statistics:  The effectivenessof query optimization relies on statistical information of the database, i.e. how many fragments query will be needed, which operation should be done first. 11
  • 12.
     Main layersof Query Processing Query processing involves 4 main layers: • Query Decomposition • Data Localization • Global Query Optimization • Distributed Execution 12
  • 13.
     Main layersof Query Processing 13 Query Decomposition Calculus Query on Global Relations Algebraic Query on Global Relations Data Localization Algebraic Query on Fragments Global Optimization Distributed Query Execution Plan Distributed Execution Global Schema Fragment Schema Allocation Schema Control Site Local Sites Fig. Generic Layering Scheme for Distributed Query Processing
  • 14.
     Query Decomposition The first layer decomposes the calculus query into an algebraic query on global relations.  Query decomposition can be viewed as four successive steps:  1) Normalization, 2)Analysis, 3) Elimination of redundancy, and 4) Rewriting. 14
  • 15.
    15  Query Decomposition •Normalization  First, the calculus query is rewritten in a normalized form that is suitable for manipulation.  Its main objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table • Analysis  Second, the normalized query is analysed so that incorrect queries are detected and rejected as early as possible.
  • 16.
     Query Decomposition •Elimination of Redundancy  Third, the correct query is simplified. One way to simplify a query is to eliminate redundancy. • Rewriting  Fourth, the calculus query is restructured as an algebraic query. Several algebraic queries can be derived from the same calculus query, and that some algebraic queries are “better” than others. 16
  • 17.
     Localization ofDistributed Data  Output of the first layer is an algebraic query on distributed relations which is input to the second layer.  The main role of this layer is to localize the query’s data using data distribution information.  We know that relations are fragmented and stored in disjoint subsets, called fragments where each fragment is stored at different site. 17
  • 18.
     Global QueryOptimization  The input to the third layer is a fragment algebraic query.  The goal of this layer is to find an execution strategy for the algebraic fragment query which is close to optimal.  The previous layers have already optimized the query, by eliminating redundancies. 18
  • 19.
     Global QueryOptimization  Query optimization consists of i)Finding the best ordering of operations in the query, ii)Finding the communication operations which minimize a cost function. 19
  • 20.
     Distributed Execution The last layer is performed by all the sites having fragments involved in the query.  Each subquery, called a local query, is executing at one site. It is then optimized using the local schema of the site. 20
  • 21.