The document discusses various strategies for maintaining consistency in distributed systems, focusing on techniques such as 2-phase commit (2PC), transactional outbox patterns, and saga patterns for managing transactions across multiple services. It highlights the challenges associated with these approaches, including potential failures and the need for idempotency in operations. Additionally, it provides practical scenarios and examples related to message queuing, event handling, and state management in the context of microservices architecture.
Discusses maintaining consistency in distributed systems and introduces the CAP theorem, highlighting the tradeoffs between consistency and availability.
Explains the 2-Phase-Commit protocol through its voting and commit phases, including its limitations such as scalability issues and resource requirements.
Describes a scenario with user and message APIs, detailing consistency strategies including idempotency and handling failures in service calls.
Outlines methods for updating databases and sending messages, emphasizing the importance of idempotent listeners to handle message retries effectively.
Introduces the transactional outbox pattern for updating databases and sending messages atomically, including handling delays and potential duplicates.
Describes the Saga pattern for managing transactions across multiple services, detailing local transaction sequences and compensating actions.
Discusses synchronous and asynchronous saga patterns, their capabilities, and the use of routing slips to manage compensation actions on errors.
Explains state machines and their reactions to external inputs using a UML state diagram example for better conceptual understanding.
Presents an example exercise involving a food delivery app, emphasizing the challenges of implementing sagas and consistency in distributed operations.
Concludes the presentation with a thank you note, summarizing the key topics discussed throughout the training.
119 VictorRentea.ro
a trainingby
1) Vo&ng Phase
- All par6cipants no6fy the coordinator if their local transac6on would commit OK
2) Commit Phase:
- Coordinator decides to commit if all voted "Yes" or rollback; no6fies all par6cipants
§Downsides
- Can s6ll fail, requiring recovery steps
- Involves locking, doesn't scale well
- Not supported by some resources: requires XA drivers, and a JTA coordinator
- Requires direct connec6on to remote DB
2-Phase-Commit (2PC)
119
5.
120 VictorRentea.ro
a trainingby
Scenario #1
@Entity // in user-api
public class User {
@Id @GeneratedValue
private long id;
private String name;
private LocalDateTime lastMessageTime;
}
@Entity // in message-api
public class Message {
@Id @GeneratedValue
private long id;
private long userId;
private String contents;
private LocalDateTime messageTimestamp;
}
6.
121 VictorRentea.ro
a trainingby
POST message-api/messages ...
§Sync call to sync state
- PUT user-api/users/{uid}/lastMessageTimestamp
- Fragile: What if user-api is down? Retry? For how long?
§Async send a message via durable queue (eg. Rabbit)
- eg. MessageSentEvent
- What if MQ broker is down? è
§Avoid synchroniza&on: redesign the service boundaries
- GET message-api/lastMessageTimestamp?user={uid}
Scenario #1 - Consistency Strategies
7.
122 VictorRentea.ro
a trainingby
A call failed or ,med out
Let me try again...
Is the opera,on IDEMPOTENT?
Retry
DUP:REMOVE
8.
123 VictorRentea.ro
a trainingby
= can be applied many .mes without changing the result. Examples:
§Get Product by id via GET ?
- ✅ YES: the call does not change any data on the server
§Cancel Payment by id via DELETE
- ✅ YES: canceling it again has no addi1onal effect
§Update Product price by id via PUT
- ✅ YES: we would just set the same price again
§Place Order { items: [..] } via POST or MQ
- ❌ NO if retry would create a second order
- ✅ YES, if we deduplicate via lastPlacedOrders = Map<custId, List<orderJsonHash>> (TTL 1h)
§Place Order { items: [..], clickId/messageID: UUID } via POST or MQ
- ✅ YES if we deduplicate via Set<lastSeenClickIds>
§Place Order { id: UUID, items: [..] } via PUT or MQ = Client-generated ID 🤔
- ✅ YES: a duplicate would cause a PK/UK viola1on
Idempotent OperaAon
In DB: alternate UK, next to numeric PK
DUP:REMOVE
9.
124 VictorRentea.ro
a trainingby
Update DB and send a Message
void f() {
mq.send(..);
repo.save(..)
}
@TransacDonal
void f() {
repo.saveAndFlush(..)
mq.send(..);
}
@TransacDonalEventListener(AFTER_COMMIT)
void aOerCommit(..) {
mq.send(..);
}
db.update(data);
db.commit;
mq.send(message);💥
db.update(data);
mq.send(message);
db.commit;💥
mq.send(message);
db.update(data);💥
db.commit;💥
10.
125 VictorRentea.ro
a trainingby
Receive a Message and Update DB
If ack is not sent, MQ would retry the message
è Listeners should be idempotent
SEEN_MESSAGES_IDS
db.update(data);
mq.ack(message);💥
mq.ack(message);
db.update(data);💥
11.
126 VictorRentea.ro
a trainingby
TransacAonal Outbox Table
Problem: update DB and send message atomically.
2PC is not an opNon.
Solu/on:
§Instead of sending the message, INSERT it in 'MESSAGES_TO_SEND' table
§A scheduler polls this table, sends messages in order and removes them
§A form of 'persistent retry'
§Can raise alarms if message is delayed too much
§:/ Could send duplicate messages
12.
127 VictorRentea.ro
a trainingby
TransacAonal Outbox Table
⏱
Change Data Capture (CDC)
h"p://debezium.io
tails the transac7on log and
publishes every change to a Ka<a topic
13.
128 VictorRentea.ro
a trainingby
Saga PaJern
Problem: Run a business transacNon across mulNple services (separate DB)
Solu/on: Saga PaYern
§Implement the business transacNon as a sequence of local transacNons
§Each local transacNon updates the DB and sends a message
(command or event) to trigger the next local transacNon to take place
§If a local transacNon fails, the saga executes compensa/ng transac/ons to
undo the previously commiYed transacNons
§CompensaNng acNons must be retry-able
§Use reserva/on (Nmed) for non-reversible steps + confirmaNon/cancel
130 VictorRentea.ro
a trainingby
Each party commits then calls next step.
On error, each party must call undo on
all previously completed steps.
++COUPLING
Orchestrator calls all parDes synchronously.
On error: orchestrator calls compensaDng 'undo'
endpoints for previously completed steps.
NOT SCALABLE, FRAGILE
Sync Saga
Sync RPC
TransacDon
Orchestrated Choreographed
16.
131 VictorRentea.ro
a trainingby
Orchestrated
Choreographed
Async Saga
Orchestrator sends messages to par6es.
On error: it sends compensa)ng command
messages to previously completed steps
Async Message
TransacDon
Each service commits and sends a message
to the next service
On error, a party:
a) publishes a failure event,
listened by all previous par6es (coupling++)
b) sends compensa)ng commands to all
par6es stamped on message (Rou6ng Slip)
c) no6fies a Saga Execu)on Coordinator
133 VictorRentea.ro
a trainingby
§Rou&ng Slip PaAern = Accumulate all previous "undo" ac<ons
- Each service appends its own "undo" informa6on to the message sent forward
- On any error on received message => call/message all UNDO ac6ons
§Error Event upstream
- All previous steps undo on LegalCheckFailedEvent{orderId}
Choreographed Compensations
Stock
Payment
Legal
1) cancel stock reserva6on
2) undo payment
19.
134 VictorRentea.ro
a trainingby
locked
state
unlocked
state
push in stack
insert coin
insert coin
push in stack
/capture coin
Side-effect / Ac6on
/release coin
External signal
ini2al state
A state machine reacts to various
external signals (inputs)
in different ways (outputs),
depending on its current state
/🤨
/🤨
UML State Diagram:
136 VictorRentea.ro
a trainingby
Feed hungry people with food from restaurants delivered by couriers.
Assume customer, restaurants and couriers have an app installed.
High level flow:
1.hungry customer orders Food FF from Restaurant RR
2.accept card payment via external payment gateway
3.tell RR to cook FF
4.find a courier CC
5.CC picks FF from RR
6.CC delivers food to customer
7.charge a fee to RR for the service
Exercise: Food Delivery App
22.
137 VictorRentea.ro
a trainingby
Sagas are Hard
§Keep hard consistency constraints within the boundary of one service.
- (that is, don't distribute)
§Manual intervenNon could be cheaper(eg: by 2nd level support)
- eg. log.error("[CALL-SUPPORT] Out of stock for order {}", ...);
- Implement a Saga to recover from frequent or expensive failures
§Use a Saga framework (or learn from it)
- Orchestrated: Camunda , Apache Camel
- Choreographed: Eventuate. Seata, Axon Saga