Saga Pattern for Distributed Transactions
Saga Pattern for Distributed Transactions
Overview
The Saga pattern is a design pattern for managing distributed transactions across multiple microservices without relying on traditional two-phase commit protocols. Instead of locking resources across services, Sagas break long-lived transactions into a sequence of local transactions, each with a corresponding compensating transaction to handle failures.
The Problem
In microservices architectures, maintaining data consistency across services is challenging. Traditional ACID transactions with distributed locking don’t scale well and create tight coupling. Consider an e-commerce order workflow:
- Reserve inventory
- Process payment
- Create shipment
- Send notification
If step 3 fails after steps 1 and 2 succeed, how do you rollback the inventory reservation and payment? Distributed transactions are complex, slow, and often impossible when integrating with external services.
The Saga Solution
A Saga coordinates a sequence of local transactions through either:
- Choreography: Each service produces and listens to events, no central coordinator
- Orchestration: A central orchestrator tells each service what to do
Each local transaction updates its database and publishes an event or message. If a transaction fails, the Saga executes compensating transactions to undo the changes.
When to Use
Use the Saga pattern when:
- You need consistency across multiple microservices
- Services are loosely coupled and owned by different teams
- Two-phase commit is impractical or impossible
- You can model compensating transactions for failure scenarios
- Eventual consistency is acceptable for your use case
Avoid when:
- Strong consistency is required (consider rethinking service boundaries)
- Compensating transactions are impossible or extremely complex
- The transaction scope is within a single service (use local transactions)
- You have cyclic dependencies between services
Implementation: Orchestration Approach (Go)
Here’s a practical implementation using the orchestration pattern with Go:
package saga
import (
"context"
"fmt"
"log"
)
// Step represents a single step in the saga
type Step struct {
Name string
Action func(ctx context.Context) error
Compensate func(ctx context.Context) error
}
// Orchestrator manages the saga execution
type Orchestrator struct {
steps []Step
}
func NewOrchestrator() *Orchestrator {
return &Orchestrator{
steps: make([]Step, 0),
}
}
func (o *Orchestrator) AddStep(step Step) {
o.steps = append(o.steps, step)
}
// Execute runs the saga with automatic compensation on failure
func (o *Orchestrator) Execute(ctx context.Context) error {
completedSteps := make([]Step, 0)
// Execute each step
for _, step := range o.steps {
log.Printf("Executing step: %s", step.Name)
if err := step.Action(ctx); err != nil {
log.Printf("Step %s failed: %v. Starting compensation...", step.Name, err)
// Compensate in reverse order
if compensateErr := o.compensate(ctx, completedSteps); compensateErr != nil {
return fmt.Errorf("saga failed and compensation failed: original error: %w, compensation error: %v",
err, compensateErr)
}
return fmt.Errorf("saga failed at step %s: %w", step.Name, err)
}
completedSteps = append(completedSteps, step)
}
log.Println("Saga completed successfully")
return nil
}
func (o *Orchestrator) compensate(ctx context.Context, completedSteps []Step) error {
// Compensate in reverse order
for i := len(completedSteps) - 1; i >= 0; i-- {
step := completedSteps[i]
log.Printf("Compensating step: %s", step.Name)
if err := step.Compensate(ctx); err != nil {
// Log but continue compensation
log.Printf("Warning: compensation failed for step %s: %v", step.Name, err)
return err
}
}
return nil
}
Real-World Example: Order Processing
package main
import (
"context"
"errors"
"fmt"
"saga"
)
type OrderService struct {
inventoryService *InventoryService
paymentService *PaymentService
shippingService *ShippingService
}
type OrderRequest struct {
OrderID string
ProductID string
Amount float64
}
func (s *OrderService) ProcessOrder(ctx context.Context, req OrderRequest) error {
orchestrator := saga.NewOrchestrator()
var reservationID, paymentID, shipmentID string
// Step 1: Reserve Inventory
orchestrator.AddStep(saga.Step{
Name: "ReserveInventory",
Action: func(ctx context.Context) error {
id, err := s.inventoryService.Reserve(ctx, req.ProductID, 1)
if err != nil {
return fmt.Errorf("inventory reservation failed: %w", err)
}
reservationID = id
return nil
},
Compensate: func(ctx context.Context) error {
if reservationID == "" {
return nil
}
return s.inventoryService.CancelReservation(ctx, reservationID)
},
})
// Step 2: Process Payment
orchestrator.AddStep(saga.Step{
Name: "ProcessPayment",
Action: func(ctx context.Context) error {
id, err := s.paymentService.Charge(ctx, req.Amount)
if err != nil {
return fmt.Errorf("payment failed: %w", err)
}
paymentID = id
return nil
},
Compensate: func(ctx context.Context) error {
if paymentID == "" {
return nil
}
return s.paymentService.Refund(ctx, paymentID)
},
})
// Step 3: Create Shipment
orchestrator.AddStep(saga.Step{
Name: "CreateShipment",
Action: func(ctx context.Context) error {
id, err := s.shippingService.CreateShipment(ctx, req.OrderID)
if err != nil {
return fmt.Errorf("shipment creation failed: %w", err)
}
shipmentID = id
return nil
},
Compensate: func(ctx context.Context) error {
if shipmentID == "" {
return nil
}
return s.shippingService.CancelShipment(ctx, shipmentID)
},
})
return orchestrator.Execute(ctx)
}
Python Implementation with Async/Await
For Python microservices using async frameworks:
from typing import Callable, List, Optional
import asyncio
import logging
logger = logging.getLogger(__name__)
class SagaStep:
def __init__(
self,
name: str,
action: Callable,
compensate: Callable,
):
self.name = name
self.action = action
self.compensate = compensate
class SagaOrchestrator:
def __init__(self):
self.steps: List[SagaStep] = []
def add_step(self, step: SagaStep):
self.steps.append(step)
async def execute(self) -> bool:
completed_steps = []
try:
for step in self.steps:
logger.info(f"Executing step: {step.name}")
await step.action()
completed_steps.append(step)
except Exception as e:
logger.error(f"Saga failed: {e}. Starting compensation...")
await self._compensate(completed_steps)
raise
logger.info("Saga completed successfully")
return True
async def _compensate(self, completed_steps: List[SagaStep]):
# Compensate in reverse order
for step in reversed(completed_steps):
try:
logger.info(f"Compensating step: {step.name}")
await step.compensate()
except Exception as e:
logger.error(f"Compensation failed for {step.name}: {e}")
# Example usage
async def process_order(order_id: str, amount: float):
saga = SagaOrchestrator()
inventory_id = None
payment_id = None
async def reserve_inventory():
nonlocal inventory_id
inventory_id = await inventory_service.reserve(order_id)
async def compensate_inventory():
if inventory_id:
await inventory_service.release(inventory_id)
async def process_payment():
nonlocal payment_id
payment_id = await payment_service.charge(amount)
async def compensate_payment():
if payment_id:
await payment_service.refund(payment_id)
saga.add_step(SagaStep("reserve_inventory", reserve_inventory, compensate_inventory))
saga.add_step(SagaStep("process_payment", process_payment, compensate_payment))
await saga.execute()
State Persistence and Recovery
For production systems, persist saga state to handle service restarts:
type SagaState struct {
SagaID string
CurrentStep int
CompletedSteps []string
Status string // "running", "completed", "compensating", "failed"
CreatedAt time.Time
UpdatedAt time.Time
}
type PersistentOrchestrator struct {
steps []Step
repository SagaRepository
}
func (o *PersistentOrchestrator) Execute(ctx context.Context, sagaID string) error {
state, err := o.repository.Get(ctx, sagaID)
if err != nil {
state = &SagaState{
SagaID: sagaID,
Status: "running",
CreatedAt: time.Now(),
}
}
for i := state.CurrentStep; i < len(o.steps); i++ {
step := o.steps[i]
if err := step.Action(ctx); err != nil {
state.Status = "compensating"
o.repository.Update(ctx, state)
return o.compensate(ctx, state)
}
state.CurrentStep = i + 1
state.CompletedSteps = append(state.CompletedSteps, step.Name)
o.repository.Update(ctx, state)
}
state.Status = "completed"
o.repository.Update(ctx, state)
return nil
}
Trade-offs
Advantages:
- No distributed locking or two-phase commit
- Services remain loosely coupled
- Scales horizontally
- Works with external systems that don’t support transactions
- Better availability than distributed transactions
Disadvantages:
- Eventual consistency only - no isolation guarantees
- Compensating transactions add complexity
- Possible inconsistent states during execution visible to users
- Requires careful design of idempotent operations
- More complex debugging and monitoring
Best Practices
- Make operations idempotent: Services may receive duplicate messages
- Design compensating transactions carefully: They must reliably undo the forward transaction
- Use correlation IDs: Track saga execution across services
- Implement timeouts: Don’t wait forever for failed services
- Monitor saga state: Track success rates, compensation frequency, and step durations
- Handle partial failures: Compensations themselves can fail
- Persist saga state: Enable recovery after service restarts
- Use semantic locks: Prevent concurrent modifications (e.g., “reservation” instead of pessimistic lock)
Conclusion
The Saga pattern is essential for maintaining consistency in distributed systems without sacrificing availability and scalability. While it introduces complexity through compensating transactions and eventual consistency, it’s often the only practical approach for managing multi-service workflows. The orchestration approach provides better visibility and control, making it preferable for complex business processes where monitoring and debugging are critical.