Unit 4 -Distributed Database Management System By ABNS
A distributed database management system (DDBMS) is a centralized software system that manages a
distributed database in a manner as if it were all stored in a single location.
It is used to create, retrieve, update and delete distributed databases.
Features
• It is used to create, retrieve, update and delete distributed databases.
• It ensures that the data modified at any site is universally updated.
• It is used in application areas where large volumes of data are processed
• It is designed for heterogeneous database platforms.
Factors Encouraging DDBMS
• Distributed Nature of Organizational Units −
• Need for Sharing of Data − The multiple organizational units often need to
communicate with each other and share their data and resources.
• Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and
Online Analytical Processing (OLAP) Distributed database systems aid both these processing by
providing synchronized data.
• Database Recovery :-Replication of data automatically helps in data recovery
if database in any site is damaged.
• Support for Multiple Application Software
Applications of Distributed Database:
• It is used in Corporate Management Information System.
• It is used in multimedia applications.
• Used in Military’s control system, Hotel chains etc.
• It is also used in manufacturing control system.
Advantages of Distributed Database System :
1) There is fast data processing as several sites participate in request processing.
2) It possess reduced operating cost.
3) It is easier to expand the system by adding more sites.
Disadvantages of Distributed Database System :
1) The system becomes complex to manage and control.
2) The security issues must be carefully managed.
3) There is need of some standardization for processing of distributed database
system.
1
2
Types of Distributed Databases
Distributed databases can be broadly classified into homogeneous and heterogeneous distributed database
environments
Homogeneous Distributed Databases
In a homogeneous distributed database, all the sites use identical DBMS and operating
systems. Its properties are −
• The sites use very similar software.
• The sites use identical DBMS or DBMS from the same vendor.
• The database is accessed through a single interface as if it is a single database.
There are two types of homogeneous distributed database −
Autonomous
Non-autonomous
Heterogeneous Distributed Databases
In a heterogeneous distributed database, different sites have different operating systems,
DBMS products and data models. Its properties are −
• Different sites use dissimilar schemas and software.
• Query processing is complex due to dissimilar schemas.
• Transaction processing is complex due to dissimilar software.
Types of Heterogeneous Distributed Databases
Federated
Un-federated
Distributed DBMS Architectures
DDBMS architectures are generally developed depending on three parameters −
• Distribution − It states the physical distribution of data across the different sites.
• Autonomy − It indicates the distribution of control of the database system and the
degree to which each constituent DBMS can operate independently.
• Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system
components and databases.
3
Architectural Models
• Client - Server Architecture for DDBMS
• Peer - to - Peer Architecture for DDBMS
• Multi - DBMS Architecture
Client-server architecture:
This is a two-level architecture where the functionality is divided into servers and clients.
Server does most of the data management work
– query processing
– data management
– Optimization
– Transaction management etc
Client performs
– Application
– User interface
– DBMS Client model
The two different client - server architecture are −
Single Server Multiple Client
Multiple Server Multiple Client
4
Peer- to-Peer Architecture for DDBMS
In these systems, each peer acts both as a client and a server for imparting database services.The peers
share their resource with other peers and co-ordinate their activities
This architecture generally has four levels of schemas: –
Individual internal schema definition at each site, local internal schema
Enterprise view of data is described the global conceptual schema.
Local organization of data at each site is describe in the local conceptual schema.
User applications and user access to the database is supported by external schemas
Major Components of a Peer-to-Peer System
(i)User Processor
(ii) Data processor
User Processor
• User-interface handler
• checks if the user query can be processed.
• Translates global queries into local one.
Data processor
• Local query optimizer
• Responsible for choosing the the best access path
• Local Recovery Manager
5
Multi - DBMS Architectures
This is an integrated database system formed by a collection of two or more autonomous
database systems.
Multi-DBMS can be expressed through six levels of schemas −
• Multi-database View Level − Depicts multiple user views comprising of subsets of
the integrated distributed database.
• Multi-database Conceptual Level − Depicts integrated multi-database that comprises
of global logical multi-database structure definitions.
• Multi-database Internal Level − Depicts the data distribution across different sites
• Local database View Level − Depicts public view of local data.
• Local database Conceptual Level − Depicts local data organization at each site.
• Local database Internal Level − Depicts physical data organization at each site.
There are two design alternatives for multi-DBMS −
Model with multi-database conceptual level.
6
Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables.
The subsets of the table are called fragments.
Fragmentation can be of three types:
horizontal, vertical, and hybrid (combination of horizontal and vertical). Horizontal fragmentation can
further beclassified into two techniques: primary horizontal fragmentation and derived horizontal
fragmentation. Fragmentation increases parallelism and provides better disaster recovery. Here, there is
only one copy of each fragment in the system, i.e. no redundant data.
Advantages :
As the data is stored close to the usage site, the efficiency of the database system will increase.
Permits a number of transactions to executed concurrently
3. Increases level of concurrency
Disadvantages :
Access speeds may be very high if data from different fragments are needed
Lack of back-up copies of data in different sites may render the database ineffective in
case of failure of a site.
The three fragmentation techniques are −
• Vertical fragmentation
• Horizontal fragmentation
• Hybrid fragmentation
Horizontal Fragmentation:
In Horizontal Fragmentation, the relational table or schema is broken down into a group of one and more
rows, and each row gets one fragment of the schema. It is also called splitting by rows.
Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. each fragment
should contain the primary key field(s) of the table. Vertical fragmentation can be used to enforce privacy
of data.
Hybrid fragementation
The combination of vertical fragmentation of a table followed by further horizontal fragmentation of some
fragments is called mixed or hybrid fragmentation.
For defining this type of fragmentation we use the SELECT and the PROJECT operations of relational
algebra.
7
Replication
Data replication is the process of storing separate copies of the database at two or more sites.
It is a popular fault tolerance technique of distributed databases
Advantages of Data Replication
• Reliability − In case of failure of any site, the database system continues to work since a copy is available
at another site(s).
• Quicker Response − Availability of local copies of data ensures quick query processing and consequently
quick response time.
Disadvantages of Data Replication
• Increased Storage Requirements − Maintaining multiple copies of data is associated with increased
storage costs.
• Increased Cost and Complexity of Data Updating − Each time a data item is updated, the update needs to
be reflected in all the copies of the data at the different sites. This requires complex synchronization
techniques and protocols.
Some commonly used replication techniques are
Snapshot replication
Near-real-time replication
Pull replication
Aspect Replication Fragmentation
Creating multiple copies of the same data Dividing a database into smaller,
Definition
across different nodes. manageable pieces (fragments).
Improves performance and scalability by
Purpose Enhances data availability and reliability.
distributing data.
Data Ensures all copies are synchronized (eventual Each fragment can be independently
Consistency consistency). updated; consistency may vary.
Storage May lead to increased storage requirements Can optimize storage by only storing
Efficiency due to multiple copies. necessary fragments.
Performance Can improve read performance but may slow Can enhance performance by reducing
Impact down writes due to synchronization. the amount of data scanned.
More complex due to synchronization Complexity arises from managing
Complexity
mechanisms. fragments and their distribution.
8
Recovery technique for distributed database
Backup and Restoration
The most basic recovery technique is the regular backup of the database. DBAs can schedule backups to
run periodically, storing snapshots of the database. In the event of a failure, the latest backup is restored .
Recovery from Disk Failure
A disk failure or hard crash causes a total database loss. To recover from this hard crash, a new disk is
prepared, then the operating system is restored, and finally the database is recovered using the database
backup and transaction log.
Checkpointing
Checkpoint is a point of time at which a record is written onto the database from the buffers. in case of a
system crash, the recovery manager does not have to redo the transactions that have been committed
before checkpoint.
The two types of checkpointing techniques are −
• Consistent checkpointing
• Fuzzy checkpointing
Database Replication
Database replication involves copying and maintaining the same set of data across multiple databases. This
ensures redundancy, improves availability, and can enhance performance by distributing workload.
Point-in-Time Recovery
Point-in-Time Recovery (PITR) allows restoring a database to a specific moment in time, rather than just the
latest backup. It involves using transaction logs to roll forward or backward to the desired timestamp,
Transaction Recovery Using UNDO / REDO
UNDO all faulty transactions and transactions that may be affected by the faulty transactions.
REDO all transactions that are not faulty but have been undone due to the faulty transactions.
CONCURRENCY CONTROL IN DISTRIBUTED DBMS
Concurrency controlling techniques ensure that multiple transactions are executed simultaneously while
maintaining the ACID properties of the transactions and serializability in the schedules.
Concurrency Control Mechanisms
Distributed Two-phase Locking Algorithm
The basic principle of distributed two-phase locking is same as the basic two-phase locking protocol.
However, in a distributed system there are sites designated as lock managers. A lock manager controls lock
acquisition requests from transaction monitors.
two-phase locking approaches can be of three types : Distributed two-phase locking , Primary copy two-
phase locking, Centralized two-phase locking
9
Distributed Timestamp Concurrency Control
in a distributed system, any site’s local physical/logical clock readings cannot be used as global timestamps,
since they are not globally unique.
So, a timestamp comprises of a combination of site ID and that site’s clock reading.
For implementing timestamp ordering algorithms, each site has a schedule, The scheduler puts the request
to the corresponding queue in increasing timestamp order.
Conflict Graphs
A conflict graph is created for the classes to which active transactions belong. This contains a set of vertical,
horizontal, and diagonal edges.
The conflict graphs are analyzed to ascertain whether two transactions within the same class or across two
different classes can be run in parallel.
Distributed Optimistic Concurrency Control Algorithm
Distributed optimistic concurrency control algorithm extends optimistic concurrency control algorithm. For
this extension, two rules are applied −
Rule 1 − According to this rule, a transaction must be validated locally at all sites when it executes.
Rule 2 − According to this rule, after a transaction passes local validation test, it should be globally
validated. Commit Protocols
in a distributed system, the transaction manager should convey the decision to commit to all the servers in the
various sites where the transaction is being executed.
When processing is complete at each site, it reaches the partially committed transaction state and waits for all
other transactions to reach their partially committed states.
When it receives the message that all the sites are ready to commit, it starts to commit. In a distributed system,
either all sites commit or none of them does.The different distributed commit protocols are −
• One-phase commit
• Two-phase commit
• Three-phase commit
One-Phase Commit
It is the simplest commit protocol. In this commit protocol, there is a controlling site, and there are a variety of
slave sites where the transaction is performed.
10
Two-Phase Commit
It is the second type of commit protocol in DBMS. It was introduced to reduce the vulnerabilities of the one phase
commit protocol. There are two phases in the two-phase commit protocol.
Prepare Phase
Commit/Abort Phase
Three Phase Commit Protocol
It is the second type of commit protocol in DBMS. It was introduced to address the issue of blocking. In this commit
protocol, there are three phases: –
Prepare Phase, prepare to commit phase, commit/abort phase
Advantages of Commit Protocol in DBMS
• It basically also helps to ensure that the integrity of the data is maintained throughout the database.
• It will also helps to maintain the atomicity which means that either all the operations in a transaction are
completed successfully or not done at all.
• The commit protocol provide mechanisms for system recovery in the case of system failures.
11
Two Phase locking protocol
Every transaction will lock and unlock the data item in two different phases.
Growing Phase − All the locks are issued in this phase. No locks are released, after all changes to data-items are
committed and then the second phase (shrinking phase) starts.
Shrinking phase − No locks are issued in this phase, all the changes to data-items are noted (stored) and then locks
are released.
The following way shows how unlocking and locking work
with 2-PL.
Transaction T1:
o Growing phase: from step 1-3
o Shrinking phase: from step 5-7
o Lock point: at 3
Transaction T2:
o Growing phase: from step 2-6
o Shrinking phase: from step 8-9
o Lock point: at 6
Two phase locking is of two types –
Strict two phase locking protocol
A transaction can release a shared lock after the lock point, but it cannot release any exclusive lock until the
transaction commits.
Rigorous two phase locking protocol
A transaction cannot release any lock either shared or exclusive until it commits.
12