KEMBAR78
Distributed Database Management System | PPT
Distributed Databases and
Client-Server Architectures
Outline
1 Distributed Database Concepts
2 Data Fragmentation, Replication and Allocation
3 Types of Distributed Database Systems
4 Query Processing
5 Concurrency Control and Recovery
6 3-Tier Client-Server Architecture
Distributed Database Concepts
It is a system to process Unit of execution (a transaction) in
a distributed manner. That is, a transaction can be executed
by multiple networked computers in a unified manner.
It can be defined as
A distributed database (DDB) is a collection of multiple
logically related database distributed over a computer
network, and a distributed database management system
as a software system that manages a distributed
database while making the distribution transparent to
the user.
Shared nothing architecture
Centralized database
Distributed database
Distributed Database System
Advantages
1. Management of distributed data with different levels
of transparency: This refers to the physical placement
of data (files, relations, etc.) which is not known to the
user (distribution transparency).
Communications neteork
Site 5
Site 1
Site 2
Site 4
Site 3
Distributed Database System
Advantages
The EMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented
horizontally and stored with possible replication as shown below.
Communications neteork
Atlanta
San Francisco
EMPLOYEES - All
PROJECTS - All
WORKS_ON - All
Chicago
(headquarters)
New York
EMPLOYEES - New York
PROJECTS - All
WORKS_ON - New York Employees
EMPLOYEES - San Francisco and LA
PROJECTS - San Francisco
WORKS_ON - San Francisco Employees
Los Angeles
EMPLOYEES - LA
PROJECTS - LA and San Francisco
WORKS_ON - LA Employees
EMPLOYEES - Atlanta
PROJECTS - Atlanta
WORKS_ON - Atlanta Employees
Distributed Database System
Advantages
• Distribution and Network transparency: Users do not have to
worry about operational details of the network. There is
Location transparency, which refers to freedom of issuing
command from any location without affecting its working. Then
there is Naming transparency, which allows access to any names
object (files, relations, etc.) from any location.
• Replication transparency: It allows to store copies of a data at
multiple sites as shown in the above diagram. This is done to
minimize access time to the required data.
• Fragmentation transparency: Allows to fragment a relation
horizontally (create a subset of tuples of a relation) or vertically
(create a subset of columns of a relation).
Distributed Database System
Advantages
2. Increased reliability and availability: Reliability refers to system
live time, that is, system is running efficiently most of the time.
Availability is the probability that the system is continuously
available (usable or accessible) during a time interval. A distributed
database system has multiple nodes (computers) and if one fails then
others are available to do the job.
3. Improved performance: A distributed DBMS fragments the
database to keep data closer to where it is needed most. This reduces
data management (access and modification) time significantly.
4. Easier expansion (scalability): Allows new nodes (computers) to
be added anytime without chaining the entire configuration.
Types of Distributed Database Systems
Homogeneous
All sites of the database system have identical setup, i.e., same database
system software. The underlying operating system may be different. For
example, all sites run Oracle or DB2, or Sybase or some other database
system. The underlying operating systems can be a mixture of Linux,
Window, Unix, etc. The clients thus have to use identical client software.
Communications
neteork
Site 5
Site 1
Site 2Site 3
Oracle Oracle
Oracle
Oracle
Site 4
Oracle
LinuxLinux
Window
Window
Unix
Types of Distributed Database Systems
Heterogeneous
Federated: Each site may run different database system but the data access
is managed through a single conceptual schema. This implies that the
degree of local autonomy is minimum. Each site must adhere to a
centralized access policy. There may be a global schema.
Multidatabase: There is no one conceptual global schema. For data access
a schema is constructed dynamically as needed by the application software.
Communications
network
Site 5
Site 1
Site 2Site 3
Network
DBMS
Relational
Site 4
Object
Oriented
LinuxLinux
Unix
Hierarchical
Object
Oriented
RelationalUnix
Window
Types of Distributed Database Systems
• Differences in data models: Relational, Objected
oriented, hierarchical, network, etc.
• Differences in constraints: Each site may have their
own data accessing and processing constraints.
• Differences in query language: Some site may use
SQL, some may use SQL-89, some may use SQL-92,
and so on.
Federated Database Management Systems Issues
Data Fragmentation, Replication and
Allocation
Data Fragmentation
Which site should be used to store which portion of database.
Split a relation into logically related and correct parts.
Simplest logical unit is relation.
A relation can be fragmented in two ways:
Horizontal fragmentation
• Divides a relation “horizontally”
• It is a horizontal subset of a relation which contain those of
tuples which satisfy selection conditions.
• Consider the Employee relation with selection condition
(DNO = 5). All tuples satisfy this condition will create a
subset which will be a horizontal fragment of Employee
relation.
• A selection condition may be composed of several conditions
connected by AND or OR.
• Derived horizontal fragmentation: It is the partitioning of a
primary relation to other secondary relations which are related
with Foreign keys.
Vertical fragmentation
• Divides a relation vertically by columns.
• It is a subset of a relation which is created by a subset of
columns. Thus a vertical fragment of a relation will contain
values of selected columns. There is no selection condition
used in vertical fragmentation.
• Consider the Employee relation. Two vertical fragments
1)Name, Bdate, Sex, and Address.
2) ENO,salary,supereno,dno
• Because there is no condition for creating a vertical fragment,
each fragment must include the primary key attribute of the
parent relation Employee. In this way all vertical fragments
of a relation are connected.
Data Fragmentation, Replication and
Allocation
Representation
Horizontal fragmentation
Each horizontal fragment on a relation can be specified
by a σCi (R) operation in the relational algebra.
Complete horizontal fragmentation
A set of horizontal fragments whose conditions C1, C2,
…, Cn include all the tuples in R- that is, every tuple in
R satisfies (C1 OR C2 OR … OR Cn).
Disjoint complete horizontal fragmentation: No tuple in
R satisfies (Ci AND Cj) where i ≠ j.
To reconstruct R from horizontal fragments a UNION is
applied.
Data Fragmentation, Replication and
Allocation
Representation
Vertical fragmentation
A vertical fragment on a relation can be specified by a
ΠLi(R) operation in the relational algebra.
Complete vertical fragmentation
A set of vertical fragments whose projection lists L1, L2,
…, Ln include all the attributes in R but share only the
primary key of R. In this case the projection lists satisfy
the following two conditions:
L1 ∪ L2 ∪ ... ∪ Ln = ATTRS (R)
Li ∩ Lj = PK(R) for any i j, where ATTRS (R) is the set
of attributes of R and PK(R) is the primary key of R.
To reconstruct R from complete vertical fragments a
OUTER JOIN is applied.
Data Fragmentation, Replication and
Allocation
Mixed (Hybrid) fragmentation
A combination of Vertical fragmentation and Horizontal
fragmentation.
This is achieved by SELECT-PROJECT operations which is
represented by ΠLi(σCi (R)).
If C = True (Select all tuples) and L ≠ ATTRS(R), we get a
vertical fragment, and if C ≠ True and L ≠ ATTRS(R), we get
a mixed fragment.
If C = True and L = ATTRS(R), then R can be considered a
fragment.
 L1={ssn,name,bdate,address,sex}
 L2={ssn,salary,super_ssn,dno}
Salary > 5000 and dno=4
L1={name,address}
 L2={ssn,name,salary}
Data Fragmentation, Replication and
Allocation
Data Fragmentation, Replication and
Allocation
Fragmentation schema
A definition of a set of fragments (horizontal or vertical
or horizontal and vertical) that includes all attributes
and tuples in the database that satisfies the condition
that the whole database can be reconstructed from the
fragments by applying some sequence of OUTER
UNION (or OUTER JOIN) and UNION operations.
Allocation schema
It describes the distribution of fragments to sites of
distributed databases. It can be fully or partially
replicated or can be partitioned.
Data Fragmentation, Replication and
Allocation
Data Replication
Database is replicated to all sites. In full replication the
entire database is replicated and in partial replication some
selected part is replicated to some of the sites. Data
replication is achieved through a replication schema.
Full replication improves availability and hence
performance for retrieval queries, but slows down update
operation.
No replication- nonredundant allocation
Data Distribution (Data Allocation)
Each fragment-or copy of fragment- must be assigned to a
particular site.
This is relevant only in the case of partial replication or
partition. The selected portion of the database is
distributed to the database sites.
The choice of sites and degree of replication depend on
performance and availability goals
Data Fragmentation, Replication and
Allocation
Architectural models of
distributed databases
 The systems sharing multiple DBMS are characterized with respect to
Autonomy, distribution, heterogeneity
 Autonomy:- this refers to the distribution of control and not of data.
The dimensions of autonomy can be
– Design Autonomy
– Communication Autonomy:-
– Execution autonomy
 Distribution:- this refers to the distribution of data
– Client server
– Peer to peer
 Heterogeneity:- this ranges from hardware heterogeneity, networking
protocols, data models, query languages and transaction management
protocols
Distributed database
architecture
Client server
Peer-to-peer
Types of client-server
architecture
There are 2 types of client server architecture
Multi-client and single server
Multi-client and multi-server:
– Client manages its own connections to appropriate
server. This architecture simplifies the server code, but
loads client machines with additional responsibilities.
This is heavy client
– Client knows only of the home servers which then
communicates with other servers as required. Loads the
server more and is called as light client
Peer-to-peer database
architecture
Global conceptual schema:- Gives the enterprise
view of data. Union of all local conceptual
schemas
Fragmentation and Allocation schema:-takes care
of fragmentation and allocation of data
Local conceptual schema:-logical organization of
data at each site
External schemas:- user access and user
applications
Peer-to-peer database
architecture
Client-server architecture
Components of DDBMS
Components of DDBMS
Contd…
User interface handler: interpreting user commands as they come
in, and formatting the result data as it is sent to the user.
Semantic data controller: uses the integrity constraints and
authorizations that are defined as part of the global conceptual
schema to check if the user query can be processed.
Global query optimizer and decomposer determine an execution
strategy to minimize a cost function and translate the global
queries into local ones using global and local conceptual
schemas as well as the global directory.
The global execution monitor coordinates the distributed
execution of the user request. It communicates with other
execution monitors
The data processor consists of the
Local query processor: this acts as the access path
selector which is responsible for choosing the best
access path to access any data item
Local recovery manager :this is responsible for
making sure that the local database remains
consistent even when failures occur
Runtime support processor: interface to the os and
contains the database buffer manager, which is
responsible for maintaining the main memory
buffers and managing the data accesses.
Components of DDBMS
Contd…
Query Processing and
Optimization in
Distributed Databases
• Distributed Query Processing
• Data Transfer Costs of Distributed Query Processing
• Distributed Query Processing Using Semijoin
• Query and Update Decomposition
Distributed Query Processing
1. Query Mapping
2. Localization
3. Global Query Optimization
4. Local Query Optimization
Query Processing in
Distributed Databases
Query Processing in Distributed Databases
Result
Stretagies:
1. Transfer Employee and Department to site 3. Total
transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send
the result to site 3. Query result size = 40 * 10,000 =
400,000 bytes. Total transfer size = 400,000 + 1,000,000 =
1,400,000 bytes.
The result of this query will have 10,000 tuples, assuming that
every employee is related to a department.
Suppose each result tuple is 40 bytes long. The query is
submitted at site 3 and the result is sent to this site.
Problem: Employee and Department relations are not present
at site 3.
Query Processing in Distributed Databases
Stretagies:
3. Transfer Department relation to site 1, execute the join at
site 1, and send the result to site 3. Total bytes transferred
= 400,000 + 3500 = 403,500 bytes.
Optimization criteria: minimizing data transfer.
Preferred approach: strategy 3.
Consider the query
Q’: For each department, retrieve the department name and
the name of the department manager
Relational Algebra expression:
ΠFname,Lname,Dname (Employee Mgrssn = SSN Department)
Query Processing in Distributed Databases
The result of this query will have 100 tuples, assuming that
every department has a manager, the execution strategies are:
Strategies:
1. Transfer Employee and Department to the result site and
perorm the join at site 3. Total bytes transferred =
1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send
the result to site 3. Query result size = 40 * 100 = 4000
bytes. Total transfer size = 4000 + 1,000,000 = 1,004,000
bytes.
3. Transfer Department relation to site 1, execute join at site 1
and send the result to site 3. Total transfer size = 4000 +
3500 = 7500 bytes.
Query Processing in Distributed Databases
Preferred strategy: Chose strategy 3.
Now suppose the result site is 2. Possible strategies:
Possible strategies :
1. Transfer Employee relation to site 2, execute the query and
present the result to the user at site 2. Total transfer size =
1,000,000 bytes for both queries Q and Q’.
2. Transfer Department relation to site 1, execute join at site 1
and send the result back to site 2. Total transfer size for Q
= 400,000 + 3500 = 403,500 bytes and for Q’ = 4000 +
3500 = 7500 bytes.
Query Processing in Distributed Databases
Semijoin: Objective is to reduce the number of tuples in a
relation before transferring it to another site.
Example execution of Q or Q’:
1. Project the join attributes of Department at site 2, and
transfer them to site 1.
For Q, 4 * 100 = 400 bytes are transferred and for
Q’, 9 * 100 = 900 bytes are transferred.
2. Join the transferred file with the Employee relation at site
1, and transfer the required attributes from the resulting file
to site 2.
For Q, 34 * 10,000 = 340,000 bytes are transferred and for
Q’, 39 * 100 = 3900 bytes are transferred.
3. Execute the query by joining the transferred file with
Department and present the result to the user at site 2.
Availability.
Increased parallelism
Increased overhead on update
Replication
Replication
Synchronous
All copies of a modified
relation must be updated
before the modifying
transaction commits.
Before an update transaction
commits, it synchronizes all
copies of modified data.
Asynchronous
Copies of a modified relation
are updated only periodically
and a transaction that reads
different copies of the same
relation may see different
values.
compromises distributed data
independence
can be more efficiently
implemented
Synchronous Replication
There are two basic techniques for ensuring that transactions see
the same value regardless of which copy of an object they access.
1) Voting - a transaction must write a majority of copies in order to
modify an object and read at least enough copies to make sure that
one of the copies is current.
Eg. If there are 10 copies and 7 copies are written by update
transactions, then at least 4 copies must be read.
 Each copy has a version number, and the copy with the highest
version number is current.
This technique is not attractive in most situations because reading
an object requires reading multiple copies; in most applications,
objects are read much more frequently than they are updated, and
efficient performance on reads is very important.
Synchronous Replication
2) Read-any write-all
To read an object, a transaction can read any one copy, but
to write an object, it must write all copies.
Reads are fast, especially if we have a local copy, but
writes are slower, relative to the first technique.
This technique is attractive when reads are much more
frequent than writes, and it is usually adopted for
implementing synchronous replication.
Synchronous Replication
Synchronous replication comes at a significant cost.
1. Before an update transaction can commit, it must obtain
exclusive locks on all copies—assuming that the read-any
write-all technique is used—of modified data.
2. The transaction may have to send lock requests to remote sites,
and wait for the locks to be granted, and during this potentially
long period, it continues to hold all its other locks.
3. If sites or communication links fail, the transaction cannot
commit until all sites at which it has modified data recover and
are reachable.
4. Finally, even if locks are obtained readily and there are no
failures, committing a transaction requires several additional
messages to be sent as part of a commit protocol
Primary Site versus Peer-to-Peer
Replication
In primary site asynchronous replication, one copy of a
relation is designated as the primary or master copy.
Replicas of the entire relation or of fragments of the relation can
be created at other sites; these are secondary copies, and, unlike
the primary copy, they cannot be updated.
A common mechanism for setting up primary and secondary
copies is that users first register or publish the relation at the
primary site and subsequently subscribe to a fragment of a
registered relation from another (secondary) site.
 In peer-to-peer asynchronous replication, more than one copy
can be designated as being updatable, that is, a master copy.
 In addition to propagating changes, a conflict resolution strategy
must be used to deal with conflicting changes made at different
sites.
 For example, Joe’s age may be changed to 35 at one site and to
38 at another. Which value is ‘correct’?
 Many more subtle kinds of conflicts can arise and it leads to ad
hoc conflict resolution.
 In some situations which it does not lead to conflicts peer-to-
peer replication is best utilized.
Primary Site versus Peer-to-Peer
Replication
Naming Transparency
Data items—such as relations, fragments, and replicas—
must have unique names.
This property is easy to ensure in a centralized database.
In a distributed database, however, we must take care to
ensure that two sites do not use the same name for distinct
data items.
Naming Transparency
1) Name server. The name server helps to ensure that the
same name does not get used for different data items.
 We can also use the name server to locate a data item,
given the name of the item.
Disadvantages.
1. The name server may become a performance bottleneck
resulting in poor performance.
2. If the name server crashes, it may not be possible for any
site in the distributed system to continue to run..
2) Site Prefix
Each site prefix its own site identifier to any name that it
generates.
This approach ensures that no two sites generate the same name.
No central control is required.
Fails to achieve location transparency, since site identifiers are
attached to names.
The account relation might be referred to as site17.account, or
account@site17, rather than as simply account.
Many database systems use the internet address of a site to
identify it.
Naming Transparency
Solution: The database system can create a set of alternative
names or aliases for data items.
A user may thus refer to data items by simple names that are
translated by the system to complete names. The mapping of
aliases to the real names can be stored at each site.
With aliases, the user can be unaware of the physical location
of a data item. Furthermore, the user will be unaffected if the
database administrator decides to move a data item from one
site to another.
Users should not have to refer to a specific replica of a data
item. Instead, the system should determine which replica to
reference on a read request, and should update all replicas on
a write request.
This can be ensured by maintaining a catalog table, which the
system uses to determine all replicas for the data item.
Concurrency Control and Recovery
Distributed Databases encounter a number of concurrency
control and recovery problems which are not present in
centralized databases. Some of them are listed below.
 Dealing with multiple copies of data items: The
concurrency control must maintain global
consistency. Likewise the recovery mechanism
must recover all copies and maintain consistency
after recovery.
 Failure of individual sites: Database availability
must not be affected due to the failure of one or
two sites and the recovery scheme must recover
them before they are available for use.
Concurrency Control and Recovery
 Communication link failure: This failure may
create network partition which would affect
database availability even though all database sites
may be running.
 Distributed commit: A transaction may be
fragmented and they may be executed by a number
of sites. This require a two or three-phase commit
approach for transaction commit.
 Distributed deadlock: Since transactions are
processed at multiple sites, two or more sites may
get involved in deadlock. This must be resolved in
a distributed manner.
Concurrency Control and Recovery
Distributed Concurrency control based on a distributed
copy of a data item
Primary site technique:
A single site is designated as a primary site which serves as a
coordinator for transaction management.
Communications neteork
Site 5
Site 1
Site 2
Site 4
Site 3
Primary site
Concurrency Control and Recovery
1. Transaction management: Concurrency control and
commit are managed by this site. In two phase locking,
this site manages locking and releasing data items. If all
transactions follow two-phase policy at all sites, then
serializability is guaranteed.
2. Advantages: An extension to the centralized two phase
locking; so implementation and management is simple.
Data items are locked only at one site but they can be
accessed at any site. Deadlock can be detected easily by
applying a directed graph directly.
3. Disadvantages: All transaction management activities go
to primary site which is likely to overload the site. If the
primary site fails, the entire system is inaccessible.
4. To aid recovery a backup site is designated which behaves
as a shadow of primary site.
Primary site with Backup Site:
In case of primary site failure, backup site can act as
primary site. All locking information is maintained at
both primary site and backup site
1. Advantages: Simplifies process of recovery
2. Disadvantages: Slows down process of acquiring locks.
Original problem not solved!
Concurrency Control and Recovery
Concurrency Control and Recovery
Primary Copy Technique:
In this approach, instead of a site, a data item partition is
designated as primary copy. To lock a data item just the
primary copy of the data item is locked.
1. Advantages: Since primary copies are distributed at
various sites, a single site is not overloaded with locking
and unlocking requests.
2. Disadvantages: Identification of a primary copy is
complex. A distributed directory must be maintained,
possibly at all sites.
Concurrency Control and Recovery
Recovery from a coordinator failure
In both approaches a coordinator site or copy may become
unavailable. This will require the selection of a new
coordinator.
Primary site approach with no backup site: Aborts and
restarts all active transactions at all sites. Elects a new
coordinator and initiates transaction processing.
Primary site approach with backup site: Suspends all active
transactions, designates the backup site as the primary site and
identifies a new back up site. Primary site receives all
transaction management information to resume processing.
Primary and backup sites fail or no backup site: Use
election process to select a new coordinator site.
Concurrency Control and Recovery
Concurrency control based on voting (Majority Protocol
There is no primary copy of coordinator.
 Send lock request to sites that have data item.
 If majority of sites grant lock then the requesting transaction
gets the data item.
 Locking information (grant or denied) is sent to all these sites.
 To avoid unacceptably long wait, a time-out period is defined.
 If the requesting transaction does not get any vote information
then the transaction is aborted.
Biased Protocol
The difference from the majority protocol is that requests
for shared locks are given more favorable treatment than
requests for exclusive locks.
Shared locks. When a transaction needs to lock data item
Q, it simply requests a lock on Q from the lock manager at
one site that contains a replica of Q.
Exclusive locks. When a transaction needs to lock data
item Q, it requests a lock on Q from the lock manager at
all sites that contain a replica of Q.
Advantage- less overhead on read operations
The additional overhead on writes is a disadvantage
complexity in handling deadlock.
Concurrency Control and Recovery
Quorum Consensus Protocol
It is a generalization of the majority protocol.
It assigns each site a nonnegative weight.
It assigns read and write operations on an item x two integers,
called read quorum Qr and write quorum Qw, S is the total
weight of all sites at which x resides:
Qr + Qw > S and 2 Qw > S∗
To execute a read operation, enough replicas must be read that
their total weight is ≥ Qr.
To execute a write operation, enough replicas must be written so
that their total weight is ≥ Qw
It can permit the cost of either reads or writes to be selectively
reduced by appropriately defining the read and write quorums.
Concurrency Control and Recovery
Timestamping
Each transaction is given a unique timestamp that the system
uses in deciding the serialization order.
Two primary methods for generating unique timestamps,
1. Centralized scheme- a single site distributes the timestamps.
The site can use a logical counter or its own local clock for
this purpose.
2. Distributed scheme- each site generates a unique local
timestamp by using either a logical counter or the local clock.
unique global timestamp = unique local timestamp + site
identifier, which also must be unique
Concurrency Control and Recovery
System Structure
Each site has its own local transaction manager, whose function
is to ensure the ACID properties of those transactions that
execute at that site.
The various transaction managers cooperate to execute global
transactions.
Each site contains two subsystems:
1) The transaction manager manages the execution of those
transactions (or subtransactions) that access data stored in a
local site.
Each such transaction may be either a local transaction or part
of a global transaction
2) The transaction coordinator coordinates the execution of the
various transactions (both local and global) initiated at that site.
Each transaction manager is responsible for
Maintaining a log for recovery purposes
Participating in an appropriate concurrency-control scheme to
coordinate the concurrent execution of the transactions
executing at that site
A transaction coordinator, is responsible for coordinating the
execution of all the transactions initiated at that site.
For each such transaction, the coordinator is responsible for
• Starting the execution of the transaction
• Breaking the transaction into a number of subtransactions and
distributing these subtransactions to the appropriate sites for
execution
• Coordinating the termination of the transaction, which may
result in the transaction being committed at all sites or aborted
at all sites.
Commit Protocols
If we are to ensure atomicity, all the sites in which a
transaction T executed must agree on the final outcome of
the execution. T must either commit at all sites, or it must
abort at all sites.
To ensure this property, the transaction coordinator of T
must execute a commit protocol.
Two-Phase Commit
Consider a transaction T initiated at site Si, where the
transaction coordinator is Ci.
When T completes its execution—that is, when all the sites
at which T has executed inform Ci that T has completed—
Ci starts the 2PC protocol.
Phase 1. Ci adds the record <prepare T> to the log, and
forces the log onto stable storage.
It then sends a prepare T message to all sites at which T
executed.
On receiving such a message, the transaction manager at
that site determines whether it is willing to commit its
portion of T.
If the answer is no, it adds a record <no T> to the log, and
then responds by sending an abort T message to Ci.
If the answer is yes, it adds a record <ready T> to the log,
and forces the log onto stable storage.
The transaction manager then replies with a ready T
message to Ci.
2 Phase Commit (2PC)
2 Phase Commit (2PC)
Phase 2. When Ci receives responses to the prepare T message from
all the sites, or when a prespecified interval of time has elapsed
since the prepare T message was sent out, Ci can determine
whether the transaction T can be committed or aborted.
Transaction T can be committed if Ci received a ready T message
from all the participating sites.
Otherwise, transaction T must be aborted.
Depending on the verdict, either a record <commit T> or a record
<abort T> is added to the log and the log is forced onto stable
storage.
At this point, the fate of the transaction has been sealed. Following
this point, the coordinator sends either a commit T or an abort T
message to all participating sites.
When a site receives that message, it records the message in the log.
A site at which T executed can unconditionally abort T at any
time before it sends the message ready T to the coordinator.
Once the message is sent, the transaction is said to be in the
ready state at the site.
The ready T message is, in effect, a promise by a site to follow
the coordinator’s order to commit T or to abort T.
To make such a promise, the needed information must first be
stored in stable storage.
Otherwise, if the site crashes after sending ready T, it may be
unable to make good on its promise.
Further, locks acquired by the transaction must continue to be
held till the transaction completes.
2 Phase Commit (2PC)
The fate of T is sealed as soon as at least one site responds
abort T.
Since the coordinator site Si is one of the sites at which T
executed, the coordinator can decide unilaterally to abort T.
The final verdict regarding T is determined at the time that
the coordinator writes that verdict (commit or abort) to the
log and forces that verdict to stable storage.
In some implementations of the 2PC protocol, a site sends an
acknowledge T message to the coordinator at the end of the
second phase of the protocol.
When the coordinator receives the acknowledge T message
from all the sites, it adds the record <complete T> to the log.
2 Phase Commit (2PC)
Handling of Failures
Failure of a participating site.
If the site fails before responding with a ready T message
to Ci, the coordinator assumes that it responded with an
abort T message.
If the site fails after the coordinator has received the ready
T message from the site, the coordinator executes the rest
of the commit protocol in the normal fashion, ignoring the
failure of the site.
Failure of a participating site.
When a participating site Sk recovers from a failure, it must
examine its log to determine the fate of those transactions that
were in the midst of execution when the failure occurred. Let T
be one such transaction.
The log contains a <commit T> record. In this case, the site
executes redo(T).
The log contains an <abort T> record. In this case, the site
executes undo(T).
The log contains no control records (abort, commit, ready)
concerning T.
Sk failed before responding to the prepare T message from Ci.
Ci must abort T. Hence, Sk must execute undo(T).
Handling of Failures
The log contains a <ready T> record.
The site must consult Ci to determine the fate of T.
If Ci is up, it notifies Sk regarding whether T committed or
aborted. So it executes either redo(T) or undo(T).
If Ci is down, Sk must try to find the fate of T from other sites.
It sends a querystatus T message to all the sites in the system.
On receiving such a message, a site must consult its log to
determine whether T has executed there, and if T has, whether
T committed or aborted.
It then notifies Sk about this outcome.
If no site has the appropriate information then Sk can neither
abort nor commit T. The decision concerning T is postponed
until Sk can obtain the needed information.
Thus, Sk must periodically resend the querystatus message to
the other sites.
Failure of the coordinator
 If the coordinator fails in the midst of the execution of the
commit protocol for transaction T, then the participating
sites must decide the fate of T.
In certain cases, the participating sites cannot decide
whether to commit or abort T, and therefore these sites must
wait for the recovery of the failed coordinator.
Handling of Failures
Failure of the coordinator
If an active site contains a <commit T> record in its log,
then T must be committed.
If an active site contains an <abort T> record in its log,
then T must be aborted.
If some active site does not contain a <ready T> record in
its log, then the failed coordinator Ci cannot have decided
to commit T. However, the coordinator may have decided
to abort T, but not to commit T. Rather than wait for Ci to
recover, it is preferable to abort T.
Handling of Failures
If none of the preceding cases holds, then all active sites must
have a <ready T> record in their logs, but no additional
control records (such as <abort T> or <commit T>).
Since the coordinator has failed, it is impossible to determine
whether a decision has been made, and if one has, what that
decision is, until the coordinator recovers.
Thus, the active sites must wait for Ci to recover.
Since the fate of T remains in doubt, T may continue to hold
system resources- blocking problem
Handling of Failures
Network partition.
1. The coordinator and all its participants remain in one
partition. In this case, the failure has no effect on the
commit protocol.
2. The coordinator and its participants belong to several
partitions.
From the viewpoint of the sites in one of the partitions, it
appears that the sites in other partitions have failed.
Sites that are not in the coordinator partition- failure of the
coordinator.
The coordinator and the sites in the partition - follow the
usual commit protocol, assuming that the sites in the other
partitions have failed.
Handling of Failures
Recovery and
Concurrency Control
The recovery procedure must treat in-doubt transactions
specially.
The recovering site must determine the commit–abort status
of such transactions by contacting other sites.
Normal transaction processing at the site cannot begin until
all in-doubt transactions have been committed or rolled back.
Finding the status of in-doubt transactions can be slow.
Further, recovery potentially could become blocked if 2PC is
used.
As a result, the site performing restart recovery may remain
unusable for a long period.
To circumvent this problem, recovery algorithms typically
provide support for noting lock information in the log.
Instead of writing a <ready T> log record, the algorithm
writes a <ready T, L> log record
At recovery time, after performing local recovery actions,
for every in-doubt transaction T, all the write locks noted in
the <ready T,L> log record are reacquired.
After this transaction processing can start at the site, even
before the commit–abort status of the in-doubt transactions
is determined.
The commit or rollback of in-doubt transactions proceeds
concurrently with the execution of new transactions.
Thus, site recovery is faster, and never gets blocked.
Recovery and
Concurrency Control
Three-Phase Commit
The three-phase commit (3PC) protocol is an extension of
the two-phase commit protocol that avoids the blocking
problem under certain assumptions.
It is assumed that no network partition occurs, and not more
than k sites fail, where k is some predetermined number.
The protocol avoids blocking by introducing an extra third
phase where multiple sites are involved in the decision to
commit.
 Instead of directly noting the commit decision in its
persistent storage, the coordinator first ensures that at least k
other sites know that it intended to commit the transaction.
If the coordinator fails, the remaining sites first select a new
coordinator.
This new coordinator checks the status of the protocol
from the remaining sites; if the coordinator had decided to
commit, at least one of the other k sites that it informed
will be up and will ensure that the commit decision is
respected.
The new coordinator restarts the third phase of the
protocol if some site knew that the old coordinator
intended to commit the transaction.
Otherwise the new coordinator aborts the transaction.
Three-Phase Commit
While the 3PC protocol has the desirable property of not
blocking unless k sites fail, it has the drawback that a
partitioning of the network will appear to be the same as
more than k sites failing, which would lead to blocking.
The protocol also has to be carefully implemented to
ensure that network partitioning does not result in
inconsistencies, where a transaction is committed in one
partition, and aborted in another.
Because of its overhead, 3PC protocol is not widely used.
Three-Phase Commit
Client-Server Database Architecture
It consists of clients running client software, a set of servers
which provide all database functionalities and a reliable
communication infrastructure.
Client 1
Client 3
Client 2
Client n
Server 1
Server 2
Server n
Client-Server Database Architecture
Clients reach server for desired service, but server does reach clients.
 The server software is responsible for local data management at a
site, much like centralized DBMS software.
 The client software is responsible for most of the distribution
function.
 The communication software manages communication among
clients and servers.
Client-Server Database Architecture
The processing of a SQL queries goes as follows:
 Client parses a user query and decomposes it into a number
of independent sub-queries. Each subquery is sent to
appropriate site for execution.
 Each server processes its query and sends the result to the
client.
 The client combines the results of subqueries and produces
the final result.

Distributed Database Management System

  • 1.
  • 2.
    Outline 1 Distributed DatabaseConcepts 2 Data Fragmentation, Replication and Allocation 3 Types of Distributed Database Systems 4 Query Processing 5 Concurrency Control and Recovery 6 3-Tier Client-Server Architecture
  • 3.
    Distributed Database Concepts Itis a system to process Unit of execution (a transaction) in a distributed manner. That is, a transaction can be executed by multiple networked computers in a unified manner. It can be defined as A distributed database (DDB) is a collection of multiple logically related database distributed over a computer network, and a distributed database management system as a software system that manages a distributed database while making the distribution transparent to the user.
  • 4.
  • 5.
  • 6.
  • 7.
    Distributed Database System Advantages 1.Management of distributed data with different levels of transparency: This refers to the physical placement of data (files, relations, etc.) which is not known to the user (distribution transparency). Communications neteork Site 5 Site 1 Site 2 Site 4 Site 3
  • 8.
    Distributed Database System Advantages TheEMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented horizontally and stored with possible replication as shown below. Communications neteork Atlanta San Francisco EMPLOYEES - All PROJECTS - All WORKS_ON - All Chicago (headquarters) New York EMPLOYEES - New York PROJECTS - All WORKS_ON - New York Employees EMPLOYEES - San Francisco and LA PROJECTS - San Francisco WORKS_ON - San Francisco Employees Los Angeles EMPLOYEES - LA PROJECTS - LA and San Francisco WORKS_ON - LA Employees EMPLOYEES - Atlanta PROJECTS - Atlanta WORKS_ON - Atlanta Employees
  • 9.
    Distributed Database System Advantages •Distribution and Network transparency: Users do not have to worry about operational details of the network. There is Location transparency, which refers to freedom of issuing command from any location without affecting its working. Then there is Naming transparency, which allows access to any names object (files, relations, etc.) from any location. • Replication transparency: It allows to store copies of a data at multiple sites as shown in the above diagram. This is done to minimize access time to the required data. • Fragmentation transparency: Allows to fragment a relation horizontally (create a subset of tuples of a relation) or vertically (create a subset of columns of a relation).
  • 10.
    Distributed Database System Advantages 2.Increased reliability and availability: Reliability refers to system live time, that is, system is running efficiently most of the time. Availability is the probability that the system is continuously available (usable or accessible) during a time interval. A distributed database system has multiple nodes (computers) and if one fails then others are available to do the job. 3. Improved performance: A distributed DBMS fragments the database to keep data closer to where it is needed most. This reduces data management (access and modification) time significantly. 4. Easier expansion (scalability): Allows new nodes (computers) to be added anytime without chaining the entire configuration.
  • 11.
    Types of DistributedDatabase Systems Homogeneous All sites of the database system have identical setup, i.e., same database system software. The underlying operating system may be different. For example, all sites run Oracle or DB2, or Sybase or some other database system. The underlying operating systems can be a mixture of Linux, Window, Unix, etc. The clients thus have to use identical client software. Communications neteork Site 5 Site 1 Site 2Site 3 Oracle Oracle Oracle Oracle Site 4 Oracle LinuxLinux Window Window Unix
  • 12.
    Types of DistributedDatabase Systems Heterogeneous Federated: Each site may run different database system but the data access is managed through a single conceptual schema. This implies that the degree of local autonomy is minimum. Each site must adhere to a centralized access policy. There may be a global schema. Multidatabase: There is no one conceptual global schema. For data access a schema is constructed dynamically as needed by the application software. Communications network Site 5 Site 1 Site 2Site 3 Network DBMS Relational Site 4 Object Oriented LinuxLinux Unix Hierarchical Object Oriented RelationalUnix Window
  • 13.
    Types of DistributedDatabase Systems • Differences in data models: Relational, Objected oriented, hierarchical, network, etc. • Differences in constraints: Each site may have their own data accessing and processing constraints. • Differences in query language: Some site may use SQL, some may use SQL-89, some may use SQL-92, and so on. Federated Database Management Systems Issues
  • 14.
    Data Fragmentation, Replicationand Allocation Data Fragmentation Which site should be used to store which portion of database. Split a relation into logically related and correct parts. Simplest logical unit is relation. A relation can be fragmented in two ways:
  • 15.
    Horizontal fragmentation • Dividesa relation “horizontally” • It is a horizontal subset of a relation which contain those of tuples which satisfy selection conditions. • Consider the Employee relation with selection condition (DNO = 5). All tuples satisfy this condition will create a subset which will be a horizontal fragment of Employee relation. • A selection condition may be composed of several conditions connected by AND or OR. • Derived horizontal fragmentation: It is the partitioning of a primary relation to other secondary relations which are related with Foreign keys.
  • 16.
    Vertical fragmentation • Dividesa relation vertically by columns. • It is a subset of a relation which is created by a subset of columns. Thus a vertical fragment of a relation will contain values of selected columns. There is no selection condition used in vertical fragmentation. • Consider the Employee relation. Two vertical fragments 1)Name, Bdate, Sex, and Address. 2) ENO,salary,supereno,dno • Because there is no condition for creating a vertical fragment, each fragment must include the primary key attribute of the parent relation Employee. In this way all vertical fragments of a relation are connected.
  • 17.
    Data Fragmentation, Replicationand Allocation Representation Horizontal fragmentation Each horizontal fragment on a relation can be specified by a σCi (R) operation in the relational algebra. Complete horizontal fragmentation A set of horizontal fragments whose conditions C1, C2, …, Cn include all the tuples in R- that is, every tuple in R satisfies (C1 OR C2 OR … OR Cn). Disjoint complete horizontal fragmentation: No tuple in R satisfies (Ci AND Cj) where i ≠ j. To reconstruct R from horizontal fragments a UNION is applied.
  • 18.
    Data Fragmentation, Replicationand Allocation Representation Vertical fragmentation A vertical fragment on a relation can be specified by a ΠLi(R) operation in the relational algebra. Complete vertical fragmentation A set of vertical fragments whose projection lists L1, L2, …, Ln include all the attributes in R but share only the primary key of R. In this case the projection lists satisfy the following two conditions: L1 ∪ L2 ∪ ... ∪ Ln = ATTRS (R) Li ∩ Lj = PK(R) for any i j, where ATTRS (R) is the set of attributes of R and PK(R) is the primary key of R. To reconstruct R from complete vertical fragments a OUTER JOIN is applied.
  • 19.
    Data Fragmentation, Replicationand Allocation Mixed (Hybrid) fragmentation A combination of Vertical fragmentation and Horizontal fragmentation. This is achieved by SELECT-PROJECT operations which is represented by ΠLi(σCi (R)). If C = True (Select all tuples) and L ≠ ATTRS(R), we get a vertical fragment, and if C ≠ True and L ≠ ATTRS(R), we get a mixed fragment. If C = True and L = ATTRS(R), then R can be considered a fragment.
  • 20.
     L1={ssn,name,bdate,address,sex}  L2={ssn,salary,super_ssn,dno} Salary> 5000 and dno=4 L1={name,address}  L2={ssn,name,salary} Data Fragmentation, Replication and Allocation
  • 21.
    Data Fragmentation, Replicationand Allocation Fragmentation schema A definition of a set of fragments (horizontal or vertical or horizontal and vertical) that includes all attributes and tuples in the database that satisfies the condition that the whole database can be reconstructed from the fragments by applying some sequence of OUTER UNION (or OUTER JOIN) and UNION operations. Allocation schema It describes the distribution of fragments to sites of distributed databases. It can be fully or partially replicated or can be partitioned.
  • 22.
    Data Fragmentation, Replicationand Allocation Data Replication Database is replicated to all sites. In full replication the entire database is replicated and in partial replication some selected part is replicated to some of the sites. Data replication is achieved through a replication schema. Full replication improves availability and hence performance for retrieval queries, but slows down update operation. No replication- nonredundant allocation
  • 23.
    Data Distribution (DataAllocation) Each fragment-or copy of fragment- must be assigned to a particular site. This is relevant only in the case of partial replication or partition. The selected portion of the database is distributed to the database sites. The choice of sites and degree of replication depend on performance and availability goals Data Fragmentation, Replication and Allocation
  • 24.
    Architectural models of distributeddatabases  The systems sharing multiple DBMS are characterized with respect to Autonomy, distribution, heterogeneity  Autonomy:- this refers to the distribution of control and not of data. The dimensions of autonomy can be – Design Autonomy – Communication Autonomy:- – Execution autonomy  Distribution:- this refers to the distribution of data – Client server – Peer to peer  Heterogeneity:- this ranges from hardware heterogeneity, networking protocols, data models, query languages and transaction management protocols
  • 26.
  • 27.
    Types of client-server architecture Thereare 2 types of client server architecture Multi-client and single server Multi-client and multi-server: – Client manages its own connections to appropriate server. This architecture simplifies the server code, but loads client machines with additional responsibilities. This is heavy client – Client knows only of the home servers which then communicates with other servers as required. Loads the server more and is called as light client
  • 28.
  • 29.
    Global conceptual schema:-Gives the enterprise view of data. Union of all local conceptual schemas Fragmentation and Allocation schema:-takes care of fragmentation and allocation of data Local conceptual schema:-logical organization of data at each site External schemas:- user access and user applications Peer-to-peer database architecture
  • 30.
  • 31.
  • 32.
    Components of DDBMS Contd… Userinterface handler: interpreting user commands as they come in, and formatting the result data as it is sent to the user. Semantic data controller: uses the integrity constraints and authorizations that are defined as part of the global conceptual schema to check if the user query can be processed. Global query optimizer and decomposer determine an execution strategy to minimize a cost function and translate the global queries into local ones using global and local conceptual schemas as well as the global directory. The global execution monitor coordinates the distributed execution of the user request. It communicates with other execution monitors
  • 33.
    The data processorconsists of the Local query processor: this acts as the access path selector which is responsible for choosing the best access path to access any data item Local recovery manager :this is responsible for making sure that the local database remains consistent even when failures occur Runtime support processor: interface to the os and contains the database buffer manager, which is responsible for maintaining the main memory buffers and managing the data accesses. Components of DDBMS Contd…
  • 34.
    Query Processing and Optimizationin Distributed Databases • Distributed Query Processing • Data Transfer Costs of Distributed Query Processing • Distributed Query Processing Using Semijoin • Query and Update Decomposition
  • 35.
    Distributed Query Processing 1.Query Mapping 2. Localization 3. Global Query Optimization 4. Local Query Optimization
  • 36.
  • 37.
    Query Processing inDistributed Databases Result Stretagies: 1. Transfer Employee and Department to site 3. Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3. Query result size = 40 * 10,000 = 400,000 bytes. Total transfer size = 400,000 + 1,000,000 = 1,400,000 bytes. The result of this query will have 10,000 tuples, assuming that every employee is related to a department. Suppose each result tuple is 40 bytes long. The query is submitted at site 3 and the result is sent to this site. Problem: Employee and Department relations are not present at site 3.
  • 38.
    Query Processing inDistributed Databases Stretagies: 3. Transfer Department relation to site 1, execute the join at site 1, and send the result to site 3. Total bytes transferred = 400,000 + 3500 = 403,500 bytes. Optimization criteria: minimizing data transfer. Preferred approach: strategy 3. Consider the query Q’: For each department, retrieve the department name and the name of the department manager Relational Algebra expression: ΠFname,Lname,Dname (Employee Mgrssn = SSN Department)
  • 39.
    Query Processing inDistributed Databases The result of this query will have 100 tuples, assuming that every department has a manager, the execution strategies are: Strategies: 1. Transfer Employee and Department to the result site and perorm the join at site 3. Total bytes transferred = 1,000,000 + 3500 = 1,003,500 bytes. 2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3. Query result size = 40 * 100 = 4000 bytes. Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes. 3. Transfer Department relation to site 1, execute join at site 1 and send the result to site 3. Total transfer size = 4000 + 3500 = 7500 bytes.
  • 40.
    Query Processing inDistributed Databases Preferred strategy: Chose strategy 3. Now suppose the result site is 2. Possible strategies: Possible strategies : 1. Transfer Employee relation to site 2, execute the query and present the result to the user at site 2. Total transfer size = 1,000,000 bytes for both queries Q and Q’. 2. Transfer Department relation to site 1, execute join at site 1 and send the result back to site 2. Total transfer size for Q = 400,000 + 3500 = 403,500 bytes and for Q’ = 4000 + 3500 = 7500 bytes.
  • 41.
    Query Processing inDistributed Databases Semijoin: Objective is to reduce the number of tuples in a relation before transferring it to another site. Example execution of Q or Q’: 1. Project the join attributes of Department at site 2, and transfer them to site 1. For Q, 4 * 100 = 400 bytes are transferred and for Q’, 9 * 100 = 900 bytes are transferred. 2. Join the transferred file with the Employee relation at site 1, and transfer the required attributes from the resulting file to site 2. For Q, 34 * 10,000 = 340,000 bytes are transferred and for Q’, 39 * 100 = 3900 bytes are transferred. 3. Execute the query by joining the transferred file with Department and present the result to the user at site 2.
  • 42.
  • 43.
    Replication Synchronous All copies ofa modified relation must be updated before the modifying transaction commits. Before an update transaction commits, it synchronizes all copies of modified data. Asynchronous Copies of a modified relation are updated only periodically and a transaction that reads different copies of the same relation may see different values. compromises distributed data independence can be more efficiently implemented
  • 44.
    Synchronous Replication There aretwo basic techniques for ensuring that transactions see the same value regardless of which copy of an object they access. 1) Voting - a transaction must write a majority of copies in order to modify an object and read at least enough copies to make sure that one of the copies is current. Eg. If there are 10 copies and 7 copies are written by update transactions, then at least 4 copies must be read.  Each copy has a version number, and the copy with the highest version number is current. This technique is not attractive in most situations because reading an object requires reading multiple copies; in most applications, objects are read much more frequently than they are updated, and efficient performance on reads is very important.
  • 45.
    Synchronous Replication 2) Read-anywrite-all To read an object, a transaction can read any one copy, but to write an object, it must write all copies. Reads are fast, especially if we have a local copy, but writes are slower, relative to the first technique. This technique is attractive when reads are much more frequent than writes, and it is usually adopted for implementing synchronous replication.
  • 46.
    Synchronous Replication Synchronous replicationcomes at a significant cost. 1. Before an update transaction can commit, it must obtain exclusive locks on all copies—assuming that the read-any write-all technique is used—of modified data. 2. The transaction may have to send lock requests to remote sites, and wait for the locks to be granted, and during this potentially long period, it continues to hold all its other locks. 3. If sites or communication links fail, the transaction cannot commit until all sites at which it has modified data recover and are reachable. 4. Finally, even if locks are obtained readily and there are no failures, committing a transaction requires several additional messages to be sent as part of a commit protocol
  • 47.
    Primary Site versusPeer-to-Peer Replication In primary site asynchronous replication, one copy of a relation is designated as the primary or master copy. Replicas of the entire relation or of fragments of the relation can be created at other sites; these are secondary copies, and, unlike the primary copy, they cannot be updated. A common mechanism for setting up primary and secondary copies is that users first register or publish the relation at the primary site and subsequently subscribe to a fragment of a registered relation from another (secondary) site.
  • 48.
     In peer-to-peerasynchronous replication, more than one copy can be designated as being updatable, that is, a master copy.  In addition to propagating changes, a conflict resolution strategy must be used to deal with conflicting changes made at different sites.  For example, Joe’s age may be changed to 35 at one site and to 38 at another. Which value is ‘correct’?  Many more subtle kinds of conflicts can arise and it leads to ad hoc conflict resolution.  In some situations which it does not lead to conflicts peer-to- peer replication is best utilized. Primary Site versus Peer-to-Peer Replication
  • 49.
    Naming Transparency Data items—suchas relations, fragments, and replicas— must have unique names. This property is easy to ensure in a centralized database. In a distributed database, however, we must take care to ensure that two sites do not use the same name for distinct data items.
  • 50.
    Naming Transparency 1) Nameserver. The name server helps to ensure that the same name does not get used for different data items.  We can also use the name server to locate a data item, given the name of the item. Disadvantages. 1. The name server may become a performance bottleneck resulting in poor performance. 2. If the name server crashes, it may not be possible for any site in the distributed system to continue to run..
  • 51.
    2) Site Prefix Eachsite prefix its own site identifier to any name that it generates. This approach ensures that no two sites generate the same name. No central control is required. Fails to achieve location transparency, since site identifiers are attached to names. The account relation might be referred to as site17.account, or account@site17, rather than as simply account. Many database systems use the internet address of a site to identify it. Naming Transparency
  • 52.
    Solution: The databasesystem can create a set of alternative names or aliases for data items. A user may thus refer to data items by simple names that are translated by the system to complete names. The mapping of aliases to the real names can be stored at each site. With aliases, the user can be unaware of the physical location of a data item. Furthermore, the user will be unaffected if the database administrator decides to move a data item from one site to another. Users should not have to refer to a specific replica of a data item. Instead, the system should determine which replica to reference on a read request, and should update all replicas on a write request. This can be ensured by maintaining a catalog table, which the system uses to determine all replicas for the data item.
  • 53.
    Concurrency Control andRecovery Distributed Databases encounter a number of concurrency control and recovery problems which are not present in centralized databases. Some of them are listed below.  Dealing with multiple copies of data items: The concurrency control must maintain global consistency. Likewise the recovery mechanism must recover all copies and maintain consistency after recovery.  Failure of individual sites: Database availability must not be affected due to the failure of one or two sites and the recovery scheme must recover them before they are available for use.
  • 54.
    Concurrency Control andRecovery  Communication link failure: This failure may create network partition which would affect database availability even though all database sites may be running.  Distributed commit: A transaction may be fragmented and they may be executed by a number of sites. This require a two or three-phase commit approach for transaction commit.  Distributed deadlock: Since transactions are processed at multiple sites, two or more sites may get involved in deadlock. This must be resolved in a distributed manner.
  • 55.
    Concurrency Control andRecovery Distributed Concurrency control based on a distributed copy of a data item Primary site technique: A single site is designated as a primary site which serves as a coordinator for transaction management. Communications neteork Site 5 Site 1 Site 2 Site 4 Site 3 Primary site
  • 56.
    Concurrency Control andRecovery 1. Transaction management: Concurrency control and commit are managed by this site. In two phase locking, this site manages locking and releasing data items. If all transactions follow two-phase policy at all sites, then serializability is guaranteed. 2. Advantages: An extension to the centralized two phase locking; so implementation and management is simple. Data items are locked only at one site but they can be accessed at any site. Deadlock can be detected easily by applying a directed graph directly. 3. Disadvantages: All transaction management activities go to primary site which is likely to overload the site. If the primary site fails, the entire system is inaccessible. 4. To aid recovery a backup site is designated which behaves as a shadow of primary site.
  • 57.
    Primary site withBackup Site: In case of primary site failure, backup site can act as primary site. All locking information is maintained at both primary site and backup site 1. Advantages: Simplifies process of recovery 2. Disadvantages: Slows down process of acquiring locks. Original problem not solved! Concurrency Control and Recovery
  • 58.
    Concurrency Control andRecovery Primary Copy Technique: In this approach, instead of a site, a data item partition is designated as primary copy. To lock a data item just the primary copy of the data item is locked. 1. Advantages: Since primary copies are distributed at various sites, a single site is not overloaded with locking and unlocking requests. 2. Disadvantages: Identification of a primary copy is complex. A distributed directory must be maintained, possibly at all sites.
  • 59.
    Concurrency Control andRecovery Recovery from a coordinator failure In both approaches a coordinator site or copy may become unavailable. This will require the selection of a new coordinator. Primary site approach with no backup site: Aborts and restarts all active transactions at all sites. Elects a new coordinator and initiates transaction processing. Primary site approach with backup site: Suspends all active transactions, designates the backup site as the primary site and identifies a new back up site. Primary site receives all transaction management information to resume processing. Primary and backup sites fail or no backup site: Use election process to select a new coordinator site.
  • 60.
    Concurrency Control andRecovery Concurrency control based on voting (Majority Protocol There is no primary copy of coordinator.  Send lock request to sites that have data item.  If majority of sites grant lock then the requesting transaction gets the data item.  Locking information (grant or denied) is sent to all these sites.  To avoid unacceptably long wait, a time-out period is defined.  If the requesting transaction does not get any vote information then the transaction is aborted.
  • 61.
    Biased Protocol The differencefrom the majority protocol is that requests for shared locks are given more favorable treatment than requests for exclusive locks. Shared locks. When a transaction needs to lock data item Q, it simply requests a lock on Q from the lock manager at one site that contains a replica of Q. Exclusive locks. When a transaction needs to lock data item Q, it requests a lock on Q from the lock manager at all sites that contain a replica of Q. Advantage- less overhead on read operations The additional overhead on writes is a disadvantage complexity in handling deadlock. Concurrency Control and Recovery
  • 62.
    Quorum Consensus Protocol Itis a generalization of the majority protocol. It assigns each site a nonnegative weight. It assigns read and write operations on an item x two integers, called read quorum Qr and write quorum Qw, S is the total weight of all sites at which x resides: Qr + Qw > S and 2 Qw > S∗ To execute a read operation, enough replicas must be read that their total weight is ≥ Qr. To execute a write operation, enough replicas must be written so that their total weight is ≥ Qw It can permit the cost of either reads or writes to be selectively reduced by appropriately defining the read and write quorums. Concurrency Control and Recovery
  • 63.
    Timestamping Each transaction isgiven a unique timestamp that the system uses in deciding the serialization order. Two primary methods for generating unique timestamps, 1. Centralized scheme- a single site distributes the timestamps. The site can use a logical counter or its own local clock for this purpose. 2. Distributed scheme- each site generates a unique local timestamp by using either a logical counter or the local clock. unique global timestamp = unique local timestamp + site identifier, which also must be unique Concurrency Control and Recovery
  • 65.
    System Structure Each sitehas its own local transaction manager, whose function is to ensure the ACID properties of those transactions that execute at that site. The various transaction managers cooperate to execute global transactions. Each site contains two subsystems: 1) The transaction manager manages the execution of those transactions (or subtransactions) that access data stored in a local site. Each such transaction may be either a local transaction or part of a global transaction 2) The transaction coordinator coordinates the execution of the various transactions (both local and global) initiated at that site.
  • 67.
    Each transaction manageris responsible for Maintaining a log for recovery purposes Participating in an appropriate concurrency-control scheme to coordinate the concurrent execution of the transactions executing at that site A transaction coordinator, is responsible for coordinating the execution of all the transactions initiated at that site. For each such transaction, the coordinator is responsible for • Starting the execution of the transaction • Breaking the transaction into a number of subtransactions and distributing these subtransactions to the appropriate sites for execution • Coordinating the termination of the transaction, which may result in the transaction being committed at all sites or aborted at all sites.
  • 68.
    Commit Protocols If weare to ensure atomicity, all the sites in which a transaction T executed must agree on the final outcome of the execution. T must either commit at all sites, or it must abort at all sites. To ensure this property, the transaction coordinator of T must execute a commit protocol. Two-Phase Commit Consider a transaction T initiated at site Si, where the transaction coordinator is Ci. When T completes its execution—that is, when all the sites at which T has executed inform Ci that T has completed— Ci starts the 2PC protocol.
  • 69.
    Phase 1. Ciadds the record <prepare T> to the log, and forces the log onto stable storage. It then sends a prepare T message to all sites at which T executed. On receiving such a message, the transaction manager at that site determines whether it is willing to commit its portion of T. If the answer is no, it adds a record <no T> to the log, and then responds by sending an abort T message to Ci. If the answer is yes, it adds a record <ready T> to the log, and forces the log onto stable storage. The transaction manager then replies with a ready T message to Ci. 2 Phase Commit (2PC)
  • 70.
    2 Phase Commit(2PC) Phase 2. When Ci receives responses to the prepare T message from all the sites, or when a prespecified interval of time has elapsed since the prepare T message was sent out, Ci can determine whether the transaction T can be committed or aborted. Transaction T can be committed if Ci received a ready T message from all the participating sites. Otherwise, transaction T must be aborted. Depending on the verdict, either a record <commit T> or a record <abort T> is added to the log and the log is forced onto stable storage. At this point, the fate of the transaction has been sealed. Following this point, the coordinator sends either a commit T or an abort T message to all participating sites. When a site receives that message, it records the message in the log.
  • 71.
    A site atwhich T executed can unconditionally abort T at any time before it sends the message ready T to the coordinator. Once the message is sent, the transaction is said to be in the ready state at the site. The ready T message is, in effect, a promise by a site to follow the coordinator’s order to commit T or to abort T. To make such a promise, the needed information must first be stored in stable storage. Otherwise, if the site crashes after sending ready T, it may be unable to make good on its promise. Further, locks acquired by the transaction must continue to be held till the transaction completes. 2 Phase Commit (2PC)
  • 72.
    The fate ofT is sealed as soon as at least one site responds abort T. Since the coordinator site Si is one of the sites at which T executed, the coordinator can decide unilaterally to abort T. The final verdict regarding T is determined at the time that the coordinator writes that verdict (commit or abort) to the log and forces that verdict to stable storage. In some implementations of the 2PC protocol, a site sends an acknowledge T message to the coordinator at the end of the second phase of the protocol. When the coordinator receives the acknowledge T message from all the sites, it adds the record <complete T> to the log. 2 Phase Commit (2PC)
  • 73.
    Handling of Failures Failureof a participating site. If the site fails before responding with a ready T message to Ci, the coordinator assumes that it responded with an abort T message. If the site fails after the coordinator has received the ready T message from the site, the coordinator executes the rest of the commit protocol in the normal fashion, ignoring the failure of the site.
  • 74.
    Failure of aparticipating site. When a participating site Sk recovers from a failure, it must examine its log to determine the fate of those transactions that were in the midst of execution when the failure occurred. Let T be one such transaction. The log contains a <commit T> record. In this case, the site executes redo(T). The log contains an <abort T> record. In this case, the site executes undo(T). The log contains no control records (abort, commit, ready) concerning T. Sk failed before responding to the prepare T message from Ci. Ci must abort T. Hence, Sk must execute undo(T). Handling of Failures
  • 75.
    The log containsa <ready T> record. The site must consult Ci to determine the fate of T. If Ci is up, it notifies Sk regarding whether T committed or aborted. So it executes either redo(T) or undo(T). If Ci is down, Sk must try to find the fate of T from other sites. It sends a querystatus T message to all the sites in the system. On receiving such a message, a site must consult its log to determine whether T has executed there, and if T has, whether T committed or aborted. It then notifies Sk about this outcome. If no site has the appropriate information then Sk can neither abort nor commit T. The decision concerning T is postponed until Sk can obtain the needed information. Thus, Sk must periodically resend the querystatus message to the other sites.
  • 76.
    Failure of thecoordinator  If the coordinator fails in the midst of the execution of the commit protocol for transaction T, then the participating sites must decide the fate of T. In certain cases, the participating sites cannot decide whether to commit or abort T, and therefore these sites must wait for the recovery of the failed coordinator. Handling of Failures
  • 77.
    Failure of thecoordinator If an active site contains a <commit T> record in its log, then T must be committed. If an active site contains an <abort T> record in its log, then T must be aborted. If some active site does not contain a <ready T> record in its log, then the failed coordinator Ci cannot have decided to commit T. However, the coordinator may have decided to abort T, but not to commit T. Rather than wait for Ci to recover, it is preferable to abort T. Handling of Failures
  • 78.
    If none ofthe preceding cases holds, then all active sites must have a <ready T> record in their logs, but no additional control records (such as <abort T> or <commit T>). Since the coordinator has failed, it is impossible to determine whether a decision has been made, and if one has, what that decision is, until the coordinator recovers. Thus, the active sites must wait for Ci to recover. Since the fate of T remains in doubt, T may continue to hold system resources- blocking problem Handling of Failures
  • 79.
    Network partition. 1. Thecoordinator and all its participants remain in one partition. In this case, the failure has no effect on the commit protocol. 2. The coordinator and its participants belong to several partitions. From the viewpoint of the sites in one of the partitions, it appears that the sites in other partitions have failed. Sites that are not in the coordinator partition- failure of the coordinator. The coordinator and the sites in the partition - follow the usual commit protocol, assuming that the sites in the other partitions have failed. Handling of Failures
  • 80.
    Recovery and Concurrency Control Therecovery procedure must treat in-doubt transactions specially. The recovering site must determine the commit–abort status of such transactions by contacting other sites. Normal transaction processing at the site cannot begin until all in-doubt transactions have been committed or rolled back. Finding the status of in-doubt transactions can be slow. Further, recovery potentially could become blocked if 2PC is used. As a result, the site performing restart recovery may remain unusable for a long period.
  • 81.
    To circumvent thisproblem, recovery algorithms typically provide support for noting lock information in the log. Instead of writing a <ready T> log record, the algorithm writes a <ready T, L> log record At recovery time, after performing local recovery actions, for every in-doubt transaction T, all the write locks noted in the <ready T,L> log record are reacquired. After this transaction processing can start at the site, even before the commit–abort status of the in-doubt transactions is determined. The commit or rollback of in-doubt transactions proceeds concurrently with the execution of new transactions. Thus, site recovery is faster, and never gets blocked. Recovery and Concurrency Control
  • 82.
    Three-Phase Commit The three-phasecommit (3PC) protocol is an extension of the two-phase commit protocol that avoids the blocking problem under certain assumptions. It is assumed that no network partition occurs, and not more than k sites fail, where k is some predetermined number. The protocol avoids blocking by introducing an extra third phase where multiple sites are involved in the decision to commit.  Instead of directly noting the commit decision in its persistent storage, the coordinator first ensures that at least k other sites know that it intended to commit the transaction. If the coordinator fails, the remaining sites first select a new coordinator.
  • 83.
    This new coordinatorchecks the status of the protocol from the remaining sites; if the coordinator had decided to commit, at least one of the other k sites that it informed will be up and will ensure that the commit decision is respected. The new coordinator restarts the third phase of the protocol if some site knew that the old coordinator intended to commit the transaction. Otherwise the new coordinator aborts the transaction. Three-Phase Commit
  • 84.
    While the 3PCprotocol has the desirable property of not blocking unless k sites fail, it has the drawback that a partitioning of the network will appear to be the same as more than k sites failing, which would lead to blocking. The protocol also has to be carefully implemented to ensure that network partitioning does not result in inconsistencies, where a transaction is committed in one partition, and aborted in another. Because of its overhead, 3PC protocol is not widely used. Three-Phase Commit
  • 85.
    Client-Server Database Architecture Itconsists of clients running client software, a set of servers which provide all database functionalities and a reliable communication infrastructure. Client 1 Client 3 Client 2 Client n Server 1 Server 2 Server n
  • 86.
    Client-Server Database Architecture Clientsreach server for desired service, but server does reach clients.  The server software is responsible for local data management at a site, much like centralized DBMS software.  The client software is responsible for most of the distribution function.  The communication software manages communication among clients and servers.
  • 87.
    Client-Server Database Architecture Theprocessing of a SQL queries goes as follows:  Client parses a user query and decomposes it into a number of independent sub-queries. Each subquery is sent to appropriate site for execution.  Each server processes its query and sends the result to the client.  The client combines the results of subqueries and produces the final result.

Editor's Notes

  • #5 Some different database system architectures. (a) Shared nothing architecture. (b) A networked architecture with a centralized database at one of the sites. (c) A truly distributed database architecture. We need to distinguish distributed databases from multiprocessor systems that use shared storage (primary memory or disk). For a database to be called distributed, the following minimum conditions should be satisfied: ■ Connection of database nodes over a computer network. There are multiple computers, called sites or nodes. These sites must be connected by an underlying communication network to transmit data and commands among sites, as shown later in Figure 25.3(c). ■ Logical interrelation of the connected databases. It is essential that the information in the databases be logically related. ■ Absence of homogeneity constraint among connected nodes. It is not necessary that all nodes be identical in terms of data, hardware, and software.
  • #29 Supports all transparency
  • #55 A deadlock is a situation in which two or more competing actions are each waiting for the other to finish, and thus neither ever does.
  • #62 When a statement modifies data, its transaction holds an exclusive lock on data that prevents other transactions from accessing the data.