KEMBAR78
CH-07 Replication | PDF | Replication (Computing) | Scalability
0% found this document useful (0 votes)
17 views35 pages

CH-07 Replication

The document discusses replication in distributed systems, highlighting its importance for increased reliability, availability, performance, and fault tolerance. It outlines various replication schemes, challenges, and consistency models, including passive and active replication methods. Additionally, it addresses the trade-offs between maintaining consistency and the benefits of improved performance through replication and caching techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views35 pages

CH-07 Replication

The document discusses replication in distributed systems, highlighting its importance for increased reliability, availability, performance, and fault tolerance. It outlines various replication schemes, challenges, and consistency models, including passive and active replication methods. Additionally, it addresses the trade-offs between maintaining consistency and the benefits of improved performance through replication and caching techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Distributed Systems

CH – 07
Replication

Baikuntha Acharya (baikunth2a@gmail.com)


Lecturer, Sagarmatha Engineering College, Sanepa, Lalitpur

© Baikuntha Acharya (baikunth2a@gmail.com)


Reasons for Replication

Replication
- Keep multiple copies of a data objects ensuring all copies are identical.

✓ Increased Reliability
• Redundancy is a key technique to increase availability.

• If one replica is unavailable or crashes, use another.

• Protect against corrupted data

✓ Increased Availability
• At least some server somewhere

• Availability = 1 – Probability (all replicas have failed)

• Probability(all replicas have failed) = Probability(replica 1 has failed) × Probability(replica 2 has failed)..
× Probability(replica n has failed)

• If the probability of a system failing is 0.3 (30%), the probability of two copies of the system failing at
the same time is 0.3x0.3=0.09, the probability of three copies all failing at the same time is 0.027, the
probability of ten copies all failing at the same time is 0.000005905.

2
© Baikuntha Acharya (baikunth2a@gmail.com)
Reasons for Replication

✓ Performance (Reduce Latencies)


• Scalability: Scale with size of the distributed system (E.g.: replicated web servers)

• Cache: Scale in geographically distributed systems by placing copy of data in


proximity of the process using them. (E.g.: web proxies)

✓ Fault Tolerance
• Guarantees strictly correct behavior even in the presence of faults.

✓ Consistency problems:
• If a copy is modified, this copy becomes different from the rest.

• Consequently, modifications have to be carried out on all copies to ensure


consistency
3
© Baikuntha Acharya (baikunth2a@gmail.com)
Challenges of Replication

✓ Need to maintain consistency of replicated data.


• If one copy is modified, others become inconsistent.

✓ Keeping multiple copies consistent may itself be subject to serious


scalability problems!
• Global synchronization takes a lot of time when replicas are spread across a wide area
network.

✓ Need to maintain transparency of replicated data.


• User should not be aware of replication

✓ Balance trade-off between number of replicas and associated cost.


• Need to decide how many replicas to keep

✓ Managing replicas across different machines is - engineering challenge.

4
© Baikuntha Acharya (baikunth2a@gmail.com)
Replication Schemes & Types
✓ Replication Schemes:
• No Replication:
• Each fragment is stored exactly at one location.

• Partial replication:
• Partial replication means only some fragments are replicated from the database.

• Full Replication:
• Database is available to almost every location or user in communication network.

✓ Based on active and passive processing of request:


• Active replication

• Which is performed by processing the same request at every replica.

• Passive replication

• Which involves processing every request on a single replica and transferring the
result to the other replicas.

5
© Baikuntha Acharya (baikunth2a@gmail.com)
Types of Replication (Cont..)
✓ Based on synchronization:
• Synchronous Replication

• Replica will be modified immediately after some changes are made in the relation table.

• Asynchronous replication:

• In asynchronous replication, the replica will be modified after commit is fired on to the
database.

• Replica manager - responsible for managing the synchronization of replicas.

✓ Based on Data Replication:


• Transactional Replication
• Keep full initial copies of the database and then receive updates as data changes.

• Snapshot Replication
• Distributes data exactly as it appears at a specific moment – do not update.

• Merge Replication
• Data from two or more databases is combined into a single database
6
© Baikuntha Acharya (baikunth2a@gmail.com)
Object Replication

✓ Create and maintain copies of objects across multiple nodes.

✓ There are two approaches for object sharing:


• The object itself can handle concurrent invocation (reducing the burden on the server).

• Object lacks concurrent invocation protection, relying on the server for concurrency control.
• In particular, use an appropriate object adapter.

Figure a): A remote object capable of handling concurrent invocations on its own.
Figure b): A remote object for which an object adapter is required to handle concurrent invocation
7
© Baikuntha Acharya (baikunth2a@gmail.com)
Object Replication (Cont..)

✓ There are two approaches for object replication:


• The application is responsible for replication.
• Application needs to handle consistency issues.

• The system (middleware) handles replication.


• Consistency issues are handled by the middleware.

• It simplifies application development but makes object specific solutions


harder.

8
© Baikuntha Acharya (baikunth2a@gmail.com)
Object Replication (Cont..)

✓ Approaches for object replication (Cont..)


a) A distributed system for replication-aware distributed objects.
• Replication managed at the object level using an object-specific protocol.

b) A distributed system (with middleware) responsible for replica management.


• Middleware centrally manages replication with a unified protocol simplifying
application development and maintenance.

9
© Baikuntha Acharya (baikunth2a@gmail.com)
Replication as Scaling Technique
✓ Replication and caching can be used as a scaling technique by creating
multiple copies of data or services across different servers or nodes.

✓ Placing data copies close to processes reduces access time and improves
performance.

✓ Replication allows system scaling by providing the following features:


• Load Balancing: Incoming requests are distributed among replicas to optimize
resource utilization and minimize response times.

• High Availability: Multiple replicas provide redundancy, ensuring continuous


service availability even if some nodes fail.

• Geographical Distribution: Replicas can be placed in different geographical


locations to reduce latency and improve access speeds for users in diverse regions.

• Horizontal Scaling: By adding more nodes and replicating data across them, the
system can handle increased load without degrading performance.

10
© Baikuntha Acharya (baikunth2a@gmail.com)
Replication as Scaling Technique
✓ Multiple copies improve performance by reducing access latency but
have higher network overheads of maintaining consistency.

✓ Ensuring consistency among multiple copies can cause scalability issues.

✓ Consistency maintenance is itself an issue:


• What semantics to provide?

• Tight consistency requires globally synchronized clocks.

✓ The solution is to loosen consistency requirements.


• Variety of consistency semantics possible

✓ Consistency model (consistency semantics):


• All models attempt to return the results of the last write for a read operation.
• But they differ in how last write is determined/defined

11
© Baikuntha Acharya (baikunth2a@gmail.com)
Consistency Models
Data Centric Consistency Model

✓ Two Models
• Data-centric and client-centric consistency models

✓ A data-centric consistency model:


• defines the rules for read and write operations on shared data in a distributed data store.

✓ Data centric organization - logical data store, physically distributed and


replicated across multiple processes.

✓ As we replicate data objects over different clusters, it is challenging to


maintain consistency.
• Guarantee the read operation returns most recent value.

✓ So, we need to define different


consistency models depending upon
use case/application requirements.
12
© Baikuntha Acharya (baikunth2a@gmail.com)
Consistency Models (Cont..)
Strict Consistency

✓ Any read on a data item X returns a value corresponding to the result of the
most recent write on X.
• This definition implicitly assumes the existence of absolute global time.

✓ Naturally available in uni-processor systems, but impossible to implement


in distributed systems.

✓ Behavior of two processes, operating on the same data item.


a) A strictly consistent store.

b) A store that is not strictly consistent.

13
© Baikuntha Acharya (baikunth2a@gmail.com)
Consistency Models (Cont..)
Sequential Consistency

✓ Ensures that the results of execution are the same as if the operations of all
the processors were executed in some sequential order.

✓ A system is sequentially consistent if:


• There is a total order on all operations.

• Operations of each individual process appear in this total order in the sequence specified by
the program.

✓ Behavior of two processes:


a) A sequentially consistent data store.

b) A data store that is not sequentially consistent.


14
© Baikuntha Acharya (baikunth2a@gmail.com)
Consistency Models (Cont..)
Causal Consistency
✓ Represents a weakening of sequential consistency (enforce ordering for causally related events only).

✓ Requires a total order of causally related write operations only.


P2 Uses value a to update b (Two writes are casually related)
✓ Potentially causally related events:
• A read followed by a later write by the same process.

• Write followed by a later read to the same store/location.

• Transitive of above two types (If write1 → read, and read → write2, then write1 → write2)

✓ Necessary condition for causal consistency:


• Writes that are potentially casually related must be seen by all processes in the same order.

• Concurrent writes may be seen in a different order on different machines.


Not sequential,
Concurrent Events! (No causality)
But casually consistent!
✓ Behavior of two processes in the diagram:
a) A violation of a casually-consistent store.
b) A correct sequence of events in a casually-consistent store.
15
© Baikuntha Acharya (baikunth2a@gmail.com)
Consistency Models (Cont..)
FIFO Consistency

✓ Necessary Condition:
• Writes done by a single process are seen by all other processes in the
order in which they were issued,

• but writes from different processes may be seen in a different order by


different processes.
Weak Consistency

16
© Baikuntha Acharya (baikunth2a@gmail.com)
Consistency Models (Cont..)
Client-Centric Consistency Models
Eventual Consistency

✓ Client-Centric Consistency Models


• Focus on what specific clients want, instead of what should be maintained by servers.

✓ All replicas of data will eventually converge to the same state over time.

✓ Write-write conflicts are resolved by designating one write as the "winner“


• or using specific conflict resolution methods.

✓ Clients may read stale data until updates are fully propagated.

✓ Suitable for large-scale systems where immediate consistency is less critical.


• Allows occasional discrepancies for improved performance and availability.

18
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services
✓ Fault Tolerance via Replication:
• Replicate data and functionality across multiple replica managers (RMs).

• Ensures service correctness despite up to f process failures.

• Assumes reliable communication and no network partitions.

• RMs behave according to specified semantics when not crashed (e.g., funds in
bank accounts are preserved).

✓ Consistency and Correctness Criteria:


• Linearizability:
• Operations appear instantaneously and in real-time order.

• Example: Atomic fetch-and-increment instruction in CPU.

• Sequential Consistency:
• Operations appear in the order issued by each client.

• Does not consider real-time order, only the relative or program order.

• Example: Database transactions maintaining ACID properties.


19
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Replication Models
General Idea:
✓ Passive (Primary-Backup) Model:
• One primary RM processes requests and updates backups.

• On primary failure, a backup takes over seamlessly, maintaining continuity.

• Front ends communicate with primary; backups receive updates post-execution.

• FE:
• Provides Transparency

✓ Active Replication:
• All RMs process requests identically and independently.

• Front end multicasts requests to all replica managers.

• Ensures fault tolerance, tolerates Byzantine failures by comparing responses.


20
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Passive Replication

✓ One RM is distinguished as the primary one.

✓ All front ends communicate with the primary RM:


• which is responsible for executing all requests and then updating the other RMs, known as
backups or slaves.

✓ If the primary RM fails it is replaced by one of the backups:

21
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Passive Replication

✓ The following sequence is followed if the primary is correct and


guarantees linearizability:
1. Request: The front end issues a request to the primary RM. Each request
contains a unique identifier.

2. Coordination: The primary deals with each request in FIFO order; using the
identifier it filters duplicate requests and resends the previous response.

3. Execution: The primary executes the request and stores the response.

4. Agreement: If the request is an update, the primary will propagate it, together
with the response and the request id to all the backup RMs.
• The backups will send an acknowledgement to the primary.

5. Response: The primary responds to the front end; the front end responds to the
client.

22
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Passive Replication (Cont..)

✓ Failure of the Primary:


• Backup RM takes over if primary fails.

• Maintain linearizability by:


• Replacing primary with a unique backup.

• Ensuring all surviving RMs agree on operations performed at replacement


time.

• RMs are organized in group and view-synchronous communication:


• Provide consistent group view, excluding failed primary.

• Select new primary from the group.

• View-synchronous communication semantics:


• either all the backups or none of them will deliver any given update before
delivering the new view.
23
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Passive Replication (Cont..)

✓ Passive Replication Issues:


• High overhead from view-synchronous communication.

• Passive replication cannot survive Byzantine failures; requires f + 1 RMs


to survive up to f crashes.

• Front-end needs functionality to look up the new primary if the current


primary doesn't respond.

• Allowing read requests to backup RMs reduces congestion at the


primary but results in sequential consistency instead of linearizability.

• Example:
• Sun Network Information Service (NIS) uses a form of passive replication with
updates propagated from the primary RM to others on a one-to-one basis,
not guaranteeing sequential consistency.
24
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Active Replication

✓ RM are state machines that play equivalent roles and are organized
as a group.

✓ Front ends multicast their requests to the group and each RM


processes the request independently but identically and replies.

✓ Active replication can tolerate byzantine failures because the front


end can collect and compare responses.

25
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Active Replication

✓ 1. Request: The front end multicasts the request to all RMs, using totally ordered,
reliable multicast, after attaching a unique identifier to it.

✓ 2. Coordination: The request is delivered to all correct RMs in the same order (by
properties of the group communication used).

✓ 3. Execution: Every correct RM executes the request, producing the same result.
This is guaranteed since the RMs are all state machines and they each handle
requests in the same order. Each response contains the unique identifier.

✓ 4. Agreement: No agreement phase is needed, because of the multicast delivery


semantics.

✓ 5. Response: Each RM sends its response to the front end.


• Various policies can be applied at this stage.

• For instance, the front end might choose to relay the first response to the client, ignoring any
subsequent responses containing same id.
26
© Baikuntha Acharya (baikunth2a@gmail.com)
Fault Tolerant Services (Cont..)
Active Replication (Cont..)

✓ Active Replication Issues:


• Depends on totally ordered and reliable multicast to achieve sequential
consistency.

• Ensures all correct RMs process the same requests in the same order.

• Requests are processed in happened-before order if clients don’t communicate


and synchronize while waiting.

• Multi-threaded clients need causally and totally ordered multicast to maintain


happened-before order.

• Does not achieve linearizability:


• total order of request processing may differ from real-time submission order.

27
© Baikuntha Acharya (baikunth2a@gmail.com)
High Available Services

✓ Ensures clients can access the service as much as possible.

✓ Clients can access data from alternative servers if the default


server fails.

✓ Helps maintain low response times.

✓ Prioritizes availability over consistency; responses may be sent


before reaching collective agreement.
• Availability = 1 – Probability (all replicas have failed)

✓ Updates may propagate lazily rather than eagerly.

✓ Example:
• The Gossip Architecture

28
© Baikuntha Acharya (baikunth2a@gmail.com)
High Available Services (Cont..)
The Gossip Architecture

✓ Framework for highly available services that


replicate data near client groups.

✓ RMs periodically exchange "gossip" messages to


share client updates.

✓ Offers two operations: queries (read-only) and


updates (modify state).

✓ Front end can send queries or updates to any RM


based on availability and response time.

✓ Guarantees relaxed consistency even if RMs


temporarily can't communicate due to failures.

29
© Baikuntha Acharya (baikunth2a@gmail.com)
High Available Services (Cont..)
The Gossip Architecture (Cont..)

✓ Gossip Service Front End


• Converts client operations into gossip queries or updates via an API.

• RM updates are exchanged occasionally.

• Maintains a vector timestamp for the latest accessed data.

• Merges vector timestamps when clients communicate directly.

✓ Gossip System Guarantees


• Consistent Service: Each client gets data reflecting all updates they've seen,
even when using different RMs.

• Relaxed Consistency: All RMs eventually get all updates, with ordering
guarantees making them sufficiently similar, though clients might see stale data.

• Causal Ordering: Requests are handled in the order issued, with stronger
ordering possible at higher costs.
30
© Baikuntha Acharya (baikunth2a@gmail.com)
High Available Services (Cont..)
The Gossip Architecture (Cont..)

✓ Processing Requests

• Request: The front end sends the request to an RM. Queries are synchronous;
updates are asynchronous.

• Update Response: RM acknowledges the update once received.

• Coordination: RM processes requests only when ordering constraints are met,


using gossip messages if needed.

• Execution: RM executes the request.

• Query Response: RM responds if the request is a query.

• Agreement: RMs coordinate only via gossip messages.

31
© Baikuntha Acharya (baikunth2a@gmail.com)
Transactions with Replicated Data
One copy serializability

✓ Replicated transactional service


• Each RM provides concurrency
control and recovery of its own data
items in the same way as it would for
non-replicated data

✓ Transactions by various clients on replicated data appear as if done


sequentially on a single item.

✓ Additional complications: failures, network partitions


• Failures should be serialized wrt transactions, i.e. any failure observed by a transaction must
appear to have happened before a transaction started

32
© Baikuntha Acharya (baikunth2a@gmail.com)
Transactions with Replicated Data
Replication Schemes

✓ Read One/Write All:


• Implements write locks across all replicas and read locks at one replica to maintain
consistency.

• It utilizes two-phase commit for atomicity across replicas.

• It can't handle network partitions.


• Network partitions occur when subsets of replica managers lose connectivity, preventing
communication between these subsets.

✓ Schemes that can handle network partitions:


• Network partitions separate replica managers into two
or more subgroups.

• Members of a subgroup can communicate with each


other but members of different subgroups cannot.

• Optimistic approaches: Available copies with validation

• Pessimistic approaches: Quorum consensus


33
© Baikuntha Acharya (baikunth2a@gmail.com)
Transactions with Replicated Data
Replication Schemes

✓ Schemes that can handle network partitions:


1. Available copies with validation:
• Allows reads from any available replica.

• Requires writes on all available replicas for consistency.

• Validation is done with version vectors, precedence graphs (each partition maintains a
log of data items affected by the Read and Write operations of transactions).

• If there are cycles in graph, validation fail.

2. Quorum consensus:
• Involves a subgroup of replica managers for decision-making.

• Requires a majority vote to perform read and write operations..

• Each replica maintains a version number for detecting up-to-date replicas.

• Suitable for environments where high availability and fault tolerance are critical.

34
© Baikuntha Acharya (baikunth2a@gmail.com)
Transactions with Replicated Data
Replication Schemes
✓ Schemes that can handle network partitions (Cont..):

3. Virtual Partition:
• Combines available copies and quorum consensus
for optimizing read performance.

• Virtual partition = set of replica managers that have a


read and write quorum

• Ensures transactional integrity by dynamically


adjusting partitions.

• If a virtual partition can be formed, available copies is


used

• Adapts to network changes by reconfiguring virtual


partitions as needed.

• Balances performance and consistency in dynamic


distributed systems.

35
© Baikuntha Acharya (baikunth2a@gmail.com)
References

✓ Eles, P. (n.d.). Distributed systems. Institutionen för Datavetenskap (IDA), Linköpings Universitet. Retrieved from
http://www.ida.liu.se/~petel

✓ Coulouris, G., Dollimore, J., & Kindberg, T. (2001). *Distributed Systems: Concepts and Design* (5th ed.). Addison-
Wesley.

36
© Baikuntha Acharya (baikunth2a@gmail.com)

You might also like