Query processing in Distributed Database System

QUERY PROCESSING IN
DISTRIBUTED DATABASE SYSTEMS
1
Presented by:
Muskaan
MCA/25020/18

OUTLINE
2
 What is Query ?
 What is Query Processor?
 Main Problems of Query Processing
 Characteristics of Query Processor
 Main layers of Query Processing

statement requesting the retrieval of
 What is Query ?
 A query is a
information.
A database query can be either a select query or an
action query.
 A select query is a data retrieval query, while an
action query asks for additional operations on the data, such
as insertion, updating or deletion.
3

What is Query Processor?
 The query processor in a DBMS receives as input , parses it,
generates an execution plan, and completes the processing
by executing the plan and returning the results to the
client.
 In relational database, users perform the task of data
processing and data manipulation with the help of high-
level non-procedural language (e.g. SQL).
4

What is Query Processor?
 Main function of a query processor is to transform a high- level-
query (also called calculus query) into an equivalent lower-level
query (also called algebraic query).
 This high-level query hides the low-level details from the user about
the physical organization of the data and presents such an environment
so that the user can handle the tasks of even complex queries in an
easy, concise and simple fashion.

 Main Problems of Query Processing
 Main problem of query processing is query optimization.
 It is a time consuming task, because many execution
strategies are involved to minimize (optimize) computer
resource consumption.
 Time and space required to process the query is also an
important factor for the performance of the query
processing.
6

 Important Characteristics of Query Processor
 Language
 Types of Optimization
 Optimization Timing
 Statistics
7

Important Characteristics of Query Processor
Language
 The input language of query processing can be based on
relational calculus or relational algebra.

Types of Optimization:
 Among all possible strategies for executing query, the one in
which less time and space are required is the best solution
for the optimization of query.
9

Optimization Timing:
 The actual time required to optimize the execution of a query is an
important factor. If less time is required, then it is the best solution for
query processing.
10

Statistics:
 The effectiveness of query optimization relies on statistical
information of the database, i.e. how many fragments
query will be needed, which operation should be done first.
11

 Main layers of Query Processing
Query processing involves 4 main layers:
• Query Decomposition
• Data Localization
• Global Query Optimization
• Distributed Execution
12

 Main layers of Query Processing
13
Query Decomposition
Calculus Query on Global Relations
Algebraic Query on Global Relations
Data Localization
Algebraic Query on Fragments
Global Optimization
Distributed Query Execution Plan
Distributed Execution
Global
Schema
Fragment
Schema
Allocation
Schema
Control Site
Local Sites
Fig. Generic Layering Scheme for Distributed Query Processing

 Query Decomposition
 The first layer decomposes the calculus query into an
algebraic query on global relations.
 Query decomposition can be viewed as four successive
steps:
 1) Normalization, 2)Analysis,
3) Elimination of redundancy, and 4) Rewriting.
14

15
• Normalization
 First, the calculus query is rewritten in a normalized form
that is suitable for manipulation.
 Its main objective is to isolate data so that additions,
deletions, and modifications of a field can be made in just
one table
• Analysis
 Second, the normalized query is analysed so that incorrect
queries are detected and rejected as early as possible.

• Elimination of Redundancy
 Third, the correct query is simplified. One way to simplify a
query is to eliminate redundancy.
• Rewriting
 Fourth, the calculus query is restructured as an algebraic
query. Several algebraic queries can be derived from the
same calculus query, and that some algebraic queries are
“better” than others.
16

 Localization of Distributed Data
 Output of the first layer is an algebraic query on distributed
relations which is input to the second layer.
 The main role of this layer is to localize the query’s data
using data distribution information.
 We know that relations are fragmented and stored in disjoint
subsets, called fragments where each fragment is stored at
different site.
17

 Global Query Optimization
 The input to the third layer is a fragment algebraic query.
 The goal of this layer is to find an execution strategy for
the algebraic fragment query which is close to optimal.
 The previous layers have already optimized the query, by
eliminating redundancies.
18

 Global Query Optimization
 Query optimization consists of
i)Finding the best ordering of operations in the query,
ii)Finding the communication operations which minimize
a cost function.
19

 Distributed Execution
 The last layer is performed by all the sites having
fragments involved in the query.
 Each subquery, called a local query, is executing at one
site. It is then optimized using the local schema of the
site.
20

Query processing in Distributed Database System

More Related Content

What's hot

Similar to Query processing in Distributed Database System

More from Meghaj Mallick

Recently uploaded

In this document

Query processing in Distributed Database System