The Saga Pattern: Managing Distributed Transactions in Microservices
The Saga Pattern: Managing Distributed Transactions in Microservices
When building microservices architectures, one of the most challenging problems is maintaining data consistency across multiple services. Traditional ACID transactions don’t work when data is distributed across service boundaries. The Saga pattern provides a proven solution for managing distributed transactions through a sequence of local transactions coordinated by compensating actions.
What is the Saga Pattern?
A saga is a sequence of local transactions where each transaction updates data within a single service. If one transaction fails, the saga executes compensating transactions to undo the changes made by preceding transactions.
Unlike distributed transactions with two-phase commit (2PC), sagas maintain consistency through eventual consistency and compensation, avoiding the availability and performance problems of distributed locks.
Two Implementation Approaches
1. Choreography-Based Saga
Services publish domain events that trigger local transactions in other services. Each service listens for events, performs its local transaction, and publishes new events.
Example: Order processing system
// Go - Order Service publishes event
type OrderCreatedEvent struct {
OrderID string
UserID string
Amount decimal.Decimal
Timestamp time.Time
}
func (s *OrderService) CreateOrder(ctx context.Context, req CreateOrderRequest) error {
// Local transaction
order, err := s.repo.CreateOrder(ctx, req)
if err != nil {
return err
}
// Publish event to trigger next step
event := OrderCreatedEvent{
OrderID: order.ID,
UserID: req.UserID,
Amount: order.Total,
Timestamp: time.Now(),
}
return s.eventBus.Publish(ctx, "order.created", event)
}
// Payment Service listens and processes
func (s *PaymentService) HandleOrderCreated(ctx context.Context, event OrderCreatedEvent) error {
// Attempt payment
payment, err := s.processPayment(ctx, event.UserID, event.Amount)
if err != nil {
// Publish failure event to trigger compensation
s.eventBus.Publish(ctx, "payment.failed", PaymentFailedEvent{
OrderID: event.OrderID,
Reason: err.Error(),
})
return err
}
// Publish success to continue saga
return s.eventBus.Publish(ctx, "payment.succeeded", PaymentSucceededEvent{
OrderID: event.OrderID,
PaymentID: payment.ID,
})
}
2. Orchestration-Based Saga
A central orchestrator tells saga participants what local transactions to execute. The orchestrator maintains the saga state and handles compensation logic.
# Python - Saga Orchestrator
from enum import Enum
from typing import List, Callable
import asyncio
class SagaStep:
def __init__(self,
action: Callable,
compensation: Callable,
name: str):
self.action = action
self.compensation = compensation
self.name = name
class SagaStatus(Enum):
PENDING = "pending"
EXECUTING = "executing"
COMPLETED = "completed"
COMPENSATING = "compensating"
FAILED = "failed"
class SagaOrchestrator:
def __init__(self, saga_id: str, steps: List[SagaStep]):
self.saga_id = saga_id
self.steps = steps
self.completed_steps = []
self.status = SagaStatus.PENDING
async def execute(self) -> bool:
"""Execute saga steps in sequence"""
self.status = SagaStatus.EXECUTING
try:
for step in self.steps:
print(f"Executing step: {step.name}")
await step.action()
self.completed_steps.append(step)
self.status = SagaStatus.COMPLETED
return True
except Exception as e:
print(f"Saga failed at {step.name}: {e}")
await self.compensate()
return False
async def compensate(self):
"""Execute compensation in reverse order"""
self.status = SagaStatus.COMPENSATING
for step in reversed(self.completed_steps):
try:
print(f"Compensating step: {step.name}")
await step.compensation()
except Exception as e:
print(f"Compensation failed for {step.name}: {e}")
# Log and continue - compensation must be idempotent
self.status = SagaStatus.FAILED
# Usage: Order Processing Saga
class OrderProcessingSaga:
def __init__(self, order_service, payment_service, inventory_service):
self.order_service = order_service
self.payment_service = payment_service
self.inventory_service = inventory_service
async def execute_order(self, order_data):
steps = [
SagaStep(
action=lambda: self.order_service.create_order(order_data),
compensation=lambda: self.order_service.cancel_order(order_data['id']),
name="CreateOrder"
),
SagaStep(
action=lambda: self.inventory_service.reserve_items(order_data['items']),
compensation=lambda: self.inventory_service.release_items(order_data['items']),
name="ReserveInventory"
),
SagaStep(
action=lambda: self.payment_service.charge_payment(order_data['payment']),
compensation=lambda: self.payment_service.refund_payment(order_data['payment']),
name="ChargePayment"
),
]
orchestrator = SagaOrchestrator(
saga_id=order_data['id'],
steps=steps
)
return await orchestrator.execute()
When to Use the Saga Pattern
Use Sagas When:
- You have microservices that need to maintain consistency across service boundaries
- ACID transactions across services are not feasible
- You can model your business process as a sequence of steps
- You can define compensating actions for each step
- Eventual consistency is acceptable for your use case
Avoid Sagas When:
- You need immediate consistency (consider different service boundaries)
- Compensating actions are not possible or too complex
- Your transaction involves read-after-write dependencies that create circular compensation logic
- A single service with local transactions can handle your requirements
Trade-offs and Considerations
Advantages
- No distributed locks: Each service uses only local transactions
- High availability: Services remain loosely coupled
- Scalability: No coordination overhead of 2PC
- Flexibility: Can integrate with external systems that don’t support 2PC
Challenges
- Complexity: More complex than local ACID transactions
- Eventual consistency: Temporary inconsistency during saga execution
- Compensating transactions: Must be carefully designed and idempotent
- Lack of isolation: Other transactions can see partial saga results
- Debugging: Distributed flow is harder to trace and debug
Frontend Considerations (ReactJS)
When building UIs for saga-driven backends, manage user expectations around eventual consistency:
// React - Handling Saga-based Operations
import { useState, useEffect } from 'react';
function OrderSubmission({ orderData }) {
const [sagaStatus, setSagaStatus] = useState('pending');
const [error, setError] = useState(null);
const submitOrder = async () => {
try {
setSagaStatus('processing');
// Submit order - saga begins
const response = await fetch('/api/orders', {
method: 'POST',
body: JSON.stringify(orderData),
});
const { sagaId } = await response.json();
// Poll for saga completion
await pollSagaStatus(sagaId);
} catch (err) {
setError(err.message);
setSagaStatus('failed');
}
};
const pollSagaStatus = async (sagaId) => {
const checkStatus = async () => {
const response = await fetch(`/api/sagas/${sagaId}/status`);
const { status } = await response.json();
if (status === 'completed') {
setSagaStatus('completed');
return true;
} else if (status === 'failed') {
setSagaStatus('failed');
return true;
}
return false;
};
// Poll every 2 seconds until completion
while (true) {
if (await checkStatus()) break;
await new Promise(resolve => setTimeout(resolve, 2000));
}
};
return (
<div>
{sagaStatus === 'processing' && (
<div>
<Spinner />
<p>Processing your order...</p>
<p className="text-sm">This may take a few moments</p>
</div>
)}
{sagaStatus === 'completed' && (
<SuccessMessage message="Order confirmed!" />
)}
{sagaStatus === 'failed' && (
<ErrorMessage message="Order failed. No charges were made." />
)}
</div>
);
}
Best Practices
- Design idempotent operations: Both actions and compensations must be safely retryable
- Use semantic locks: Prevent concurrent sagas from conflicting (e.g., mark inventory as “pending”)
- Store saga state: Persist saga execution state for recovery after failures
- Implement timeout handling: Sagas should not run indefinitely
- Monitor saga execution: Track success rates, duration, and failure patterns
- Order steps carefully: Put more likely-to-fail steps early to minimize compensation
- Communicate clearly in UI: Users should understand the asynchronous nature
Conclusion
The Saga pattern is essential for maintaining consistency in distributed systems while avoiding the pitfalls of distributed transactions. By breaking complex operations into local transactions with compensating actions, you can build resilient microservices that scale effectively.
Choose choreography for simpler flows with clear event-driven logic. Use orchestration when you need centralized control, complex coordination, or detailed monitoring of saga execution.
The complexity is real, but for distributed systems requiring cross-service consistency, sagas provide a battle-tested solution that many organizations rely on at scale.