Chapter 6
Introduction to Distributed
Database
Chapter Outline
1. Distributed Database definition
2. Distributed Database Advantages
3. Types of Distributed Database Systems
4. Distributed Query Processing
5. Client-Server Database Architecture
Advanced Database System- Ch6: Introduction to Distributed Database Slide 2
Distributed Database Definition
A distributed database (DDB) is a collection of
multiple logically related database distributed
over a computer network,
A distributed database management system as
a software system that manages a distributed
database while making the distribution
transparent to the user.
Database 1
Database 3
Communication Network
Server 1
Server 3
Database 2
Server 2
Advanced Database System- Ch6: Introduction to Distributed Database Slide 3
Distributed Database Advantages
Management of distributed data with
different levels of transparency:
This refers to the physical placement of
data (files, relations, etc.) which is not
known to the user (distribution
transparency).
Advanced Database System- Ch6: Introduction to Distributed Database Slide 4
Distributed Database Advantages…
Distribution and Network transparency:
Users do not have to worry about operational
details of the network.
There is Location transparency, which refers to
freedom of issuing command from any location
without affecting its working.
Then there is Naming transparency, which
allows access to any names object (files, relations,
etc.) from any location.
Advanced Database System- Ch6: Introduction to Distributed Database Slide 5
Distributed Database Advantages…
Replication transparency:
It allows to store copies of a data at multiple
sites as shown in the above diagram.
This is done to minimize access time to the
required data.
Fragmentation transparency:
Allows to fragment a relation horizontally
(create a subset of tuples of a relation) or
vertically (create a subset of columns of a
relation).
Advanced Database System- Ch6: Introduction to Distributed Database Slide 6
Distributed Database Advantages…
The EMPLOYEE, PROJECT, and WORKS_ON
tables may be fragmented horizontally and
stored with possible replication as shown
below.
Advanced Database System- Ch6: Introduction to Distributed Database Slide 7
Distributed Database Advantages…
Increased reliability and availability:
Reliability refers to system live time, that is,
system is running efficiently most of the time.
Availability is the probability that the system is
continuously available (usable or accessible)
during a time interval.
A distributed database system has multiple
nodes (computers) and if one fails then others
are available to do the job.
Advanced Database System- Ch6: Introduction to Distributed Database Slide 8
Distributed Database Advantages…
Improved performance:
A distributed DBMS fragments the database to
keep data closer to where it is needed most.
This reduces data management (access and
modification) time significantly.
Easier expansion (scalability):
Allows new nodes (computers) to be added
anytime without chaining the entire
configuration.
Advanced Database System- Ch6: Introduction to Distributed Database Slide 9
Types of Distributed Database Systems
Homogeneous
All sites of the
database system
have identical setup,
i.e., same database Window
Site 5 Unix
system software. Oracle Site 1
The underlying Oracle
operating system Window
may be different. Site 4 Communications
For example, all network
sites run Oracle or
DB2, or Sybase or Oracle
some other Site 3 Site 2
database system. Linux Oracle Linux Oracle
The underlying
operating systems
can be a mixture of
Linux, Window, Unix,
etc.
Advanced Database System- Ch6: Introduction to Distributed Database Slide 10
Types of Distributed Database Systems
Heterogeneous
Each site may run different database system with one
or more varieties of operating system
Object Unix Relational
Oriented Site 5 Unix
Site 1
Hierarchical
Window
Site 4 Communications
network
Network
Object DBMS
Oriented Site 3 Site 2 Relational
Linux Linux
Advanced Database System- Ch6: Introduction to Distributed Database Slide 11
Query Processing in Distributed Databases
Issues
Cost of transferring data (files and results) over the
network.
This cost is usually high so some optimization is
necessary.
Example relations: Employee at site 1 and Department at
Site 2
Employee at site 1. 10,000 rows. Row size = 100 bytes.
Table size = 106 bytes.
Fname Minit Lname SSN Bdate Address Sex Salary Superssn Dno
Department at Site 2. 100 rows. Row size = 35 bytes. Table
size = 3,500 bytes. Dname Dnumber Mgrssn Mgrstartdate
Q: For each employee, retrieve employee name and
department name Where the employee works.
Q: Fname,Lname,Dname (Employee Dno = Dnumber Department)
Advanced Database System- Ch6: Introduction to Distributed Database Slide 12
Query Processing in Distributed Databases
Result
The result of this query will have 10,000
tuples, assuming that every employee is
related to a department.
Suppose each result tuple is 40 bytes long.
The query is submitted at site 3 and the result
is sent to this site.
Problem: Employee and Department relations
are not present at site 3.
Advanced Database System- Ch6: Introduction to Distributed Database Slide 13
Query Processing in Distributed Databases
Strategies:
1. Transfer Employee and Department to site 3.
Total transfer bytes = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3.
Query result size = 40 * 10,000 = 400,000 bytes. Total
transfer size = 400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join
at site 1, and send the result to site 3.
Total bytes transferred = 400,000 + 3500 = 403,500
bytes.
Optimization criteria: minimizing data transfer.
Advanced Database System- Ch6: Introduction to Distributed Database Slide 14
Query Processing in Distributed Databases
Strategies:
1. Transfer Employee and Department to site 3.
Total transfer bytes = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3.
Query result size = 40 * 10,000 = 400,000 bytes. Total
transfer size = 400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join
at site 1, and send the result to site 3.
Total bytes transferred = 400,000 + 3500 = 403,500
bytes.
Optimization criteria: minimizing data transfer.
Preferred approach: strategy 3.
Advanced Database System- Ch6: Introduction to Distributed Database Slide 15
Query Processing in Distributed Databases
Consider the query
Q’: For each department, retrieve the
department name and the name of the
department manager
Relational Algebra expression:
Fname,Lname,Dname (Employee Mgrssn = SSN
Department)
Advanced Database System- Ch6: Introduction to Distributed Database Slide 16
Query Processing in Distributed Databases
The result of this query will have 100 tuples,
assuming that every department has a manager, the
execution strategies are:
1. Transfer Employee and Department to the result site
and perform the join at site 3.
Total bytes transferred = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3. Query result size = 40 * 100 =
4000 bytes.
Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at
site 1 and send the result to site 3.
Total transfer size = 4000 + 3500 = 7500 bytes.
Advanced Database System- Ch6: Introduction to Distributed Database Slide 17
Query Processing in Distributed Databases
The result of this query will have 100 tuples,
assuming that every department has a manager, the
execution strategies are:
1. Transfer Employee and Department to the result site
and perform the join at site 3.
Total bytes transferred = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3. Query result size = 40 * 100 =
4000 bytes.
Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at
site 1 and send the result to site 3.
Total transfer size = 4000 + 3500 = 7500 bytes.
Preferred strategy: Choose strategy 3.
Advanced Database System- Ch6: Introduction to Distributed Database Slide 18
Query Processing in Distributed Databases
Now suppose the result site is 2. Possible
strategies :
1. Transfer Employee relation to site 2, execute
the query and present the result to the user at
site 2.
Total transfer size = 1,000,000 bytes for both
queries Q and Q’.
2. Transfer Department relation to site 1, execute
join at site 1 and send the result back to site 2.
Total transfer size for Q = 400,000 + 3500 =
403,500 bytes and for Q’ = 4000 + 3500 = 7500
bytes.
Advanced Database System- Ch6: Introduction to Distributed Database Slide 19
Client-Server Database Architecture
It consists of clients running client software, a
set of servers which provide all database
functionalities and a reliable communication
infrastructure.
Server 1 Client 1
Client 2
Server 2 Client 3
Server n Client n
Advanced Database System- Ch6: Introduction to Distributed Database Slide 20
Client-Server Database Architecture
Clients reach server for desired service, but
server does reach clients.
The server software is responsible for local
data management at a site, much like
centralized DBMS software.
The client software is responsible for most of
the distribution function.
The communication software manages
communication among clients and servers.
Advanced Database System- Ch6: Introduction to Distributed Database Slide 21
Client-Server Database Architecture
The processing of a SQL queries goes as
follows:
Client parses a user query and decomposes it
into a number of independent sub-queries.
Each subquery is sent to appropriate site for
execution.
Each server processes its query and sends the
result to the client.
The client combines the results of subqueries
and produces the final result.
Advanced Database System- Ch6: Introduction to Distributed Database Slide 22