Distributed Systems
CH – 07
Replication
Baikuntha Acharya (baikunth2a@gmail.com)
Lecturer, Sagarmatha Engineering College, Sanepa, Lalitpur
© Baikuntha Acharya (baikunth2a@gmail.com)
Reasons for Replication
Replication
- Keep multiple copies of a data objects ensuring all copies are identical.
✓ Increased Reliability
• Redundancy is a key technique to increase availability.
• If one replica is unavailable or crashes, use another.
• Protect against corrupted data
✓ Increased Availability
• At least some server somewhere
• Availability = 1 – Probability (all replicas have failed)
• Probability(all replicas have failed) = Probability(replica 1 has failed) × Probability(replica 2 has failed)..
× Probability(replica n has failed)
• If the probability of a system failing is 0.3 (30%), the probability of two copies of the system failing at
the same time is 0.3x0.3=0.09, the probability of three copies all failing at the same time is 0.027, the
probability of ten copies all failing at the same time is 0.000005905.
2
© Baikuntha Acharya (baikunth2a@gmail.com)
Reasons for Replication
✓ Performance (Reduce Latencies)
• Scalability: Scale with size of the distributed system (E.g.: replicated web servers)
• Cache: Scale in geographically distributed systems by placing copy of data in
proximity of the process using them. (E.g.: web proxies)
✓ Fault Tolerance
• Guarantees strictly correct behavior even in the presence of faults.
✓ Consistency problems:
• If a copy is modified, this copy becomes different from the rest.
• Consequently, modifications have to be carried out on all copies to ensure
consistency
3
© Baikuntha Acharya (baikunth2a@gmail.com)
Challenges of Replication
✓ Need to maintain consistency of replicated data.
• If one copy is modified, others become inconsistent.
✓ Keeping multiple copies consistent may itself be subject to serious
scalability problems!
• Global synchronization takes a lot of time when replicas are spread across a wide area
network.
✓ Need to maintain transparency of replicated data.
• User should not be aware of replication
✓ Balance trade-off between number of replicas and associated cost.
• Need to decide how many replicas to keep
✓ Managing replicas across different machines is - engineering challenge.
4
© Baikuntha Acharya (baikunth2a@gmail.com)
Replication Schemes & Types
✓ Replication Schemes:
• No Replication:
• Each fragment is stored exactly at one location.
• Partial replication:
• Partial replication means only some fragments are replicated from the database.
• Full Replication:
• Database is available to almost every location or user in communication network.
✓ Based on active and passive processing of request:
• Active replication
• Which is performed by processing the same request at every replica.
• Passive replication
• Which involves processing every request on a single replica and transferring the
result to the other replicas.
5
© Baikuntha Acharya (baikunth2a@gmail.com)
Types of Replication (Cont..)
✓ Based on synchronization:
• Synchronous Replication
• Replica will be modified immediately after some changes are made in the relation table.
• Asynchronous replication:
• In asynchronous replication, the replica will be modified after commit is fired on to the
database.
• Replica manager - responsible for managing the synchronization of replicas.
✓ Based on Data Replication:
• Transactional Replication
• Keep full initial copies of the database and then receive updates as data changes.
• Snapshot Replication
• Distributes data exactly as it appears at a specific moment – do not update.
• Merge Replication
• Data from two or more databases is combined into a single database
6
© Baikuntha Acharya (baikunth2a@gmail.com)
Object Replication
✓ Create and maintain copies of objects across multiple nodes.
✓ There are two approaches for object sharing:
• The object itself can handle concurrent invocation (reducing the burden on the server).
• Object lacks concurrent invocation protection, relying on the server for concurrency control.
• In particular, use an appropriate object adapter.
Figure a): A remote object capable of handling concurrent invocations on its own.
Figure b): A remote object for which an object adapter is required to handle concurrent invocation
7
© Baikuntha Acharya (baikunth2a@gmail.com)
Object Replication (Cont..)
✓ There are two approaches for object replication:
• The application is responsible for replication.
• Application needs to handle consistency issues.
• The system (middleware) handles replication.
• Consistency issues are handled by the middleware.
• It simplifies application development but makes object specific solutions
harder.
8
© Baikuntha Acharya (baikunth2a@gmail.com)
Object Replication (Cont..)
✓ Approaches for object replication (Cont..)
a) A distributed system for replication-aware distributed objects.
• Replication managed at the object level using an object-specific protocol.
b) A distributed system (with middleware) responsible for replica management.
• Middleware centrally manages replication with a unified protocol simplifying
application development and maintenance.
9
© Baikuntha Acharya (baikunth2a@gmail.com)
Replication as Scaling Technique
✓ Replication and caching can be used as a scaling technique by creating
multiple copies of data or services across different servers or nodes.
✓ Placing data copies close to processes reduces access time and improves
performance.
✓ Replication allows system scaling by providing the following features:
• Load Balancing: Incoming requests are distributed among replicas to optimize
resource utilization and minimize response times.
• High Availability: Multiple replicas provide redundancy, ensuring continuous
service availability even if some nodes fail.
• Geographical Distribution: Replicas can be placed in different geographical
locations to reduce latency and improve access speeds for users in diverse regions.
• Horizontal Scaling: By adding more nodes and replicating data across them, the
system can handle increased load without degrading performance.
10
© Baikuntha Acharya (baikunth2a@gmail.com)
Replication as Scaling Technique
✓ Multiple copies improve performance by reducing access latency but
have higher network overheads of maintaining consistency.
✓ Ensuring consistency among multiple copies can cause scalability issues.
✓ Consistency maintenance is itself an issue:
• What semantics to provide?
• Tight consistency requires globally synchronized clocks.
✓ The solution is to loosen consistency requirements.
• Variety of consistency semantics possible
✓ Consistency model (consistency semantics):
• All models attempt to return the results of the last write for a read operation.
• But they differ in how last write is determined/defined
11
© Baikuntha Acharya (baikunth2a@gmail.com)
Consistency Models
Data Centric Consistency Model
✓ Two Models
• Data-centric and client-centric consistency models
✓ A data-centric consistency model:
• defines the rules for read and write operations on shared data in a distributed data store.
✓ Data centric organization - logical data store, physically distributed and
replicated across multiple processes.
✓ As we replicate data objects over different clusters, it is challenging to
maintain consistency.
• Guarantee the read operation returns most recent value.
✓ So, we need to define different
consistency models depending upon
use case/application requirements.
12
© Baikuntha Acharya (baikunth2a@gmail.com)
Consistency Models (Cont..)
Strict Consistency
✓ Any read on a data item X returns a value corresponding to the result of the
most recent write on X.
• This definition implicitly assumes the existence of absolute global time.
✓ Naturally available in uni-processor systems, but impossible to implement
in distributed systems.
✓ Behavior of two processes, operating on the same data item.
a) A strictly consistent store.
b) A store that is not strictly consistent.
13
© Baikuntha Acharya (baikunth2a@gmail.com)
Consistency Models (Cont..)
Sequential Consistency
✓ Ensures that the results of execution are the same as if the operations of all
the processors were executed in some sequential order.
✓ A system is sequentially consistent if:
• There is a total order on all operations.
• Operations of each individual process appear in this total order in the sequence specified by
the program.
✓ Behavior of two processes:
a) A sequentially consistent data store.
b) A data store that is not sequentially consistent.
14
© Baikuntha Acharya (baikunth2a@gmail.com)
Consistency Models (Cont..)
Causal Consistency
✓ Represents a weakening of sequential consistency (enforce ordering for causally related events only).
✓ Requires a total order of causally related write operations only.
P2 Uses value a to update b (Two writes are casually related)
✓ Potentially causally related events:
• A read followed by a later write by the same process.
• Write followed by a later read to the same store/location.
• Transitive of above two types (If write1 → read, and read → write2, then write1 → write2)
✓ Necessary condition for causal consistency:
• Writes that are potentially casually related must be seen by all processes in the same order.
• Concurrent writes may be seen in a different order on different machines.
Not sequential,
Concurrent Events! (No causality)
But casually consistent!
✓ Behavior of two processes in the diagram:
a) A violation of a casually-consistent store.
b) A correct sequence of events in a casually-consistent store.
15
© Baikuntha Acharya (baikunth2a@gmail.com)
Consistency Models (Cont..)
FIFO Consistency
✓ Necessary Condition:
• Writes done by a single process are seen by all other processes in the
order in which they were issued,
• but writes from different processes may be seen in a different order by
different processes.
Weak Consistency
16
© Baikuntha Acharya (baikunth2a@gmail.com)
Consistency Models (Cont..)
Client-Centric Consistency Models
Eventual Consistency
✓ Client-Centric Consistency Models
• Focus on what specific clients want, instead of what should be maintained by servers.
✓ All replicas of data will eventually converge to the same state over time.
✓ Write-write conflicts are resolved by designating one write as the "winner“
• or using specific conflict resolution methods.
✓ Clients may read stale data until updates are fully propagated.
✓ Suitable for large-scale systems where immediate consistency is less critical.
• Allows occasional discrepancies for improved performance and availability.
18
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services
✓ Fault Tolerance via Replication:
• Replicate data and functionality across multiple replica managers (RMs).
• Ensures service correctness despite up to f process failures.
• Assumes reliable communication and no network partitions.
• RMs behave according to specified semantics when not crashed (e.g., funds in
bank accounts are preserved).
✓ Consistency and Correctness Criteria:
• Linearizability:
• Operations appear instantaneously and in real-time order.
• Example: Atomic fetch-and-increment instruction in CPU.
• Sequential Consistency:
• Operations appear in the order issued by each client.
• Does not consider real-time order, only the relative or program order.
• Example: Database transactions maintaining ACID properties.
19
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Replication Models
General Idea:
✓ Passive (Primary-Backup) Model:
• One primary RM processes requests and updates backups.
• On primary failure, a backup takes over seamlessly, maintaining continuity.
• Front ends communicate with primary; backups receive updates post-execution.
• FE:
• Provides Transparency
✓ Active Replication:
• All RMs process requests identically and independently.
• Front end multicasts requests to all replica managers.
• Ensures fault tolerance, tolerates Byzantine failures by comparing responses.
20
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Passive Replication
✓ One RM is distinguished as the primary one.
✓ All front ends communicate with the primary RM:
• which is responsible for executing all requests and then updating the other RMs, known as
backups or slaves.
✓ If the primary RM fails it is replaced by one of the backups:
21
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Passive Replication
✓ The following sequence is followed if the primary is correct and
guarantees linearizability:
1. Request: The front end issues a request to the primary RM. Each request
contains a unique identifier.
2. Coordination: The primary deals with each request in FIFO order; using the
identifier it filters duplicate requests and resends the previous response.
3. Execution: The primary executes the request and stores the response.
4. Agreement: If the request is an update, the primary will propagate it, together
with the response and the request id to all the backup RMs.
• The backups will send an acknowledgement to the primary.
5. Response: The primary responds to the front end; the front end responds to the
client.
22
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Passive Replication (Cont..)
✓ Failure of the Primary:
• Backup RM takes over if primary fails.
• Maintain linearizability by:
• Replacing primary with a unique backup.
• Ensuring all surviving RMs agree on operations performed at replacement
time.
• RMs are organized in group and view-synchronous communication:
• Provide consistent group view, excluding failed primary.
• Select new primary from the group.
• View-synchronous communication semantics:
• either all the backups or none of them will deliver any given update before
delivering the new view.
23
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Passive Replication (Cont..)
✓ Passive Replication Issues:
• High overhead from view-synchronous communication.
• Passive replication cannot survive Byzantine failures; requires f + 1 RMs
to survive up to f crashes.
• Front-end needs functionality to look up the new primary if the current
primary doesn't respond.
• Allowing read requests to backup RMs reduces congestion at the
primary but results in sequential consistency instead of linearizability.
• Example:
• Sun Network Information Service (NIS) uses a form of passive replication with
updates propagated from the primary RM to others on a one-to-one basis,
not guaranteeing sequential consistency.
24
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Active Replication
✓ RM are state machines that play equivalent roles and are organized
as a group.
✓ Front ends multicast their requests to the group and each RM
processes the request independently but identically and replies.
✓ Active replication can tolerate byzantine failures because the front
end can collect and compare responses.
25
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Active Replication
✓ 1. Request: The front end multicasts the request to all RMs, using totally ordered,
reliable multicast, after attaching a unique identifier to it.
✓ 2. Coordination: The request is delivered to all correct RMs in the same order (by
properties of the group communication used).
✓ 3. Execution: Every correct RM executes the request, producing the same result.
This is guaranteed since the RMs are all state machines and they each handle
requests in the same order. Each response contains the unique identifier.
✓ 4. Agreement: No agreement phase is needed, because of the multicast delivery
semantics.
✓ 5. Response: Each RM sends its response to the front end.
• Various policies can be applied at this stage.
• For instance, the front end might choose to relay the first response to the client, ignoring any
subsequent responses containing same id.
26
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Active Replication (Cont..)
✓ Active Replication Issues:
• Depends on totally ordered and reliable multicast to achieve sequential
consistency.
• Ensures all correct RMs process the same requests in the same order.
• Requests are processed in happened-before order if clients don’t communicate
and synchronize while waiting.
• Multi-threaded clients need causally and totally ordered multicast to maintain
happened-before order.
• Does not achieve linearizability:
• total order of request processing may differ from real-time submission order.
27
© Baikuntha Acharya (baikunth2a@gmail.com)
High Available Services
✓ Ensures clients can access the service as much as possible.
✓ Clients can access data from alternative servers if the default
server fails.
✓ Helps maintain low response times.
✓ Prioritizes availability over consistency; responses may be sent
before reaching collective agreement.
• Availability = 1 – Probability (all replicas have failed)
✓ Updates may propagate lazily rather than eagerly.
✓ Example:
• The Gossip Architecture
28
© Baikuntha Acharya (baikunth2a@gmail.com)
High Available Services (Cont..)
The Gossip Architecture
✓ Framework for highly available services that
replicate data near client groups.
✓ RMs periodically exchange "gossip" messages to
share client updates.
✓ Offers two operations: queries (read-only) and
updates (modify state).
✓ Front end can send queries or updates to any RM
based on availability and response time.
✓ Guarantees relaxed consistency even if RMs
temporarily can't communicate due to failures.
29
© Baikuntha Acharya (baikunth2a@gmail.com)
High Available Services (Cont..)
The Gossip Architecture (Cont..)
✓ Gossip Service Front End
• Converts client operations into gossip queries or updates via an API.
• RM updates are exchanged occasionally.
• Maintains a vector timestamp for the latest accessed data.
• Merges vector timestamps when clients communicate directly.
✓ Gossip System Guarantees
• Consistent Service: Each client gets data reflecting all updates they've seen,
even when using different RMs.
• Relaxed Consistency: All RMs eventually get all updates, with ordering
guarantees making them sufficiently similar, though clients might see stale data.
• Causal Ordering: Requests are handled in the order issued, with stronger
ordering possible at higher costs.
30
© Baikuntha Acharya (baikunth2a@gmail.com)
High Available Services (Cont..)
The Gossip Architecture (Cont..)
✓ Processing Requests
• Request: The front end sends the request to an RM. Queries are synchronous;
updates are asynchronous.
• Update Response: RM acknowledges the update once received.
• Coordination: RM processes requests only when ordering constraints are met,
using gossip messages if needed.
• Execution: RM executes the request.
• Query Response: RM responds if the request is a query.
• Agreement: RMs coordinate only via gossip messages.
31
© Baikuntha Acharya (baikunth2a@gmail.com)
Transactions with Replicated Data
One copy serializability
✓ Replicated transactional service
• Each RM provides concurrency
control and recovery of its own data
items in the same way as it would for
non-replicated data
✓ Transactions by various clients on replicated data appear as if done
sequentially on a single item.
✓ Additional complications: failures, network partitions
• Failures should be serialized wrt transactions, i.e. any failure observed by a transaction must
appear to have happened before a transaction started
32
© Baikuntha Acharya (baikunth2a@gmail.com)
Transactions with Replicated Data
Replication Schemes
✓ Read One/Write All:
• Implements write locks across all replicas and read locks at one replica to maintain
consistency.
• It utilizes two-phase commit for atomicity across replicas.
• It can't handle network partitions.
• Network partitions occur when subsets of replica managers lose connectivity, preventing
communication between these subsets.
✓ Schemes that can handle network partitions:
• Network partitions separate replica managers into two
or more subgroups.
• Members of a subgroup can communicate with each
other but members of different subgroups cannot.
• Optimistic approaches: Available copies with validation
• Pessimistic approaches: Quorum consensus
33
© Baikuntha Acharya (baikunth2a@gmail.com)
Transactions with Replicated Data
Replication Schemes
✓ Schemes that can handle network partitions:
1. Available copies with validation:
• Allows reads from any available replica.
• Requires writes on all available replicas for consistency.
• Validation is done with version vectors, precedence graphs (each partition maintains a
log of data items affected by the Read and Write operations of transactions).
• If there are cycles in graph, validation fail.
2. Quorum consensus:
• Involves a subgroup of replica managers for decision-making.
• Requires a majority vote to perform read and write operations..
• Each replica maintains a version number for detecting up-to-date replicas.
• Suitable for environments where high availability and fault tolerance are critical.
34
© Baikuntha Acharya (baikunth2a@gmail.com)
Transactions with Replicated Data
Replication Schemes
✓ Schemes that can handle network partitions (Cont..):
3. Virtual Partition:
• Combines available copies and quorum consensus
for optimizing read performance.
• Virtual partition = set of replica managers that have a
read and write quorum
• Ensures transactional integrity by dynamically
adjusting partitions.
• If a virtual partition can be formed, available copies is
used
• Adapts to network changes by reconfiguring virtual
partitions as needed.
• Balances performance and consistency in dynamic
distributed systems.
35
© Baikuntha Acharya (baikunth2a@gmail.com)
References
✓ Eles, P. (n.d.). Distributed systems. Institutionen för Datavetenskap (IDA), Linköpings Universitet. Retrieved from
http://www.ida.liu.se/~petel
✓ Coulouris, G., Dollimore, J., & Kindberg, T. (2001). *Distributed Systems: Concepts and Design* (5th ed.). Addison-
Wesley.
36
© Baikuntha Acharya (baikunth2a@gmail.com)