The Saga Pattern: Managing Distributed Transactions in Microservices

The Saga Pattern: Managing Distributed Transactions in Microservices

When building microservices architectures, one of the most challenging problems is maintaining data consistency across multiple services. Traditional ACID transactions don’t work when data is distributed across service boundaries. The Saga pattern provides a proven solution for managing distributed transactions through a sequence of local transactions coordinated by compensating actions.

What is the Saga Pattern?

A saga is a sequence of local transactions where each transaction updates data within a single service. If one transaction fails, the saga executes compensating transactions to undo the changes made by preceding transactions.

Unlike distributed transactions with two-phase commit (2PC), sagas maintain consistency through eventual consistency and compensation, avoiding the availability and performance problems of distributed locks.

Two Implementation Approaches

1. Choreography-Based Saga

Services publish domain events that trigger local transactions in other services. Each service listens for events, performs its local transaction, and publishes new events.

Example: Order processing system

// Go - Order Service publishes event
type OrderCreatedEvent struct {
    OrderID   string
    UserID    string
    Amount    decimal.Decimal
    Timestamp time.Time
}

func (s *OrderService) CreateOrder(ctx context.Context, req CreateOrderRequest) error {
    // Local transaction
    order, err := s.repo.CreateOrder(ctx, req)
    if err != nil {
        return err
    }
    
    // Publish event to trigger next step
    event := OrderCreatedEvent{
        OrderID:   order.ID,
        UserID:    req.UserID,
        Amount:    order.Total,
        Timestamp: time.Now(),
    }
    
    return s.eventBus.Publish(ctx, "order.created", event)
}

// Payment Service listens and processes
func (s *PaymentService) HandleOrderCreated(ctx context.Context, event OrderCreatedEvent) error {
    // Attempt payment
    payment, err := s.processPayment(ctx, event.UserID, event.Amount)
    if err != nil {
        // Publish failure event to trigger compensation
        s.eventBus.Publish(ctx, "payment.failed", PaymentFailedEvent{
            OrderID: event.OrderID,
            Reason:  err.Error(),
        })
        return err
    }
    
    // Publish success to continue saga
    return s.eventBus.Publish(ctx, "payment.succeeded", PaymentSucceededEvent{
        OrderID:   event.OrderID,
        PaymentID: payment.ID,
    })
}

2. Orchestration-Based Saga

A central orchestrator tells saga participants what local transactions to execute. The orchestrator maintains the saga state and handles compensation logic.

# Python - Saga Orchestrator
from enum import Enum
from typing import List, Callable
import asyncio

class SagaStep:
    def __init__(self, 
                 action: Callable,
                 compensation: Callable,
                 name: str):
        self.action = action
        self.compensation = compensation
        self.name = name

class SagaStatus(Enum):
    PENDING = "pending"
    EXECUTING = "executing"
    COMPLETED = "completed"
    COMPENSATING = "compensating"
    FAILED = "failed"

class SagaOrchestrator:
    def __init__(self, saga_id: str, steps: List[SagaStep]):
        self.saga_id = saga_id
        self.steps = steps
        self.completed_steps = []
        self.status = SagaStatus.PENDING
        
    async def execute(self) -> bool:
        """Execute saga steps in sequence"""
        self.status = SagaStatus.EXECUTING
        
        try:
            for step in self.steps:
                print(f"Executing step: {step.name}")
                await step.action()
                self.completed_steps.append(step)
                
            self.status = SagaStatus.COMPLETED
            return True
            
        except Exception as e:
            print(f"Saga failed at {step.name}: {e}")
            await self.compensate()
            return False
    
    async def compensate(self):
        """Execute compensation in reverse order"""
        self.status = SagaStatus.COMPENSATING
        
        for step in reversed(self.completed_steps):
            try:
                print(f"Compensating step: {step.name}")
                await step.compensation()
            except Exception as e:
                print(f"Compensation failed for {step.name}: {e}")
                # Log and continue - compensation must be idempotent
                
        self.status = SagaStatus.FAILED

# Usage: Order Processing Saga
class OrderProcessingSaga:
    def __init__(self, order_service, payment_service, inventory_service):
        self.order_service = order_service
        self.payment_service = payment_service
        self.inventory_service = inventory_service
    
    async def execute_order(self, order_data):
        steps = [
            SagaStep(
                action=lambda: self.order_service.create_order(order_data),
                compensation=lambda: self.order_service.cancel_order(order_data['id']),
                name="CreateOrder"
            ),
            SagaStep(
                action=lambda: self.inventory_service.reserve_items(order_data['items']),
                compensation=lambda: self.inventory_service.release_items(order_data['items']),
                name="ReserveInventory"
            ),
            SagaStep(
                action=lambda: self.payment_service.charge_payment(order_data['payment']),
                compensation=lambda: self.payment_service.refund_payment(order_data['payment']),
                name="ChargePayment"
            ),
        ]
        
        orchestrator = SagaOrchestrator(
            saga_id=order_data['id'],
            steps=steps
        )
        
        return await orchestrator.execute()

When to Use the Saga Pattern

Use Sagas When:

Avoid Sagas When:

Trade-offs and Considerations

Advantages

  1. No distributed locks: Each service uses only local transactions
  2. High availability: Services remain loosely coupled
  3. Scalability: No coordination overhead of 2PC
  4. Flexibility: Can integrate with external systems that don’t support 2PC

Challenges

  1. Complexity: More complex than local ACID transactions
  2. Eventual consistency: Temporary inconsistency during saga execution
  3. Compensating transactions: Must be carefully designed and idempotent
  4. Lack of isolation: Other transactions can see partial saga results
  5. Debugging: Distributed flow is harder to trace and debug

Frontend Considerations (ReactJS)

When building UIs for saga-driven backends, manage user expectations around eventual consistency:

// React - Handling Saga-based Operations
import { useState, useEffect } from 'react';

function OrderSubmission({ orderData }) {
  const [sagaStatus, setSagaStatus] = useState('pending');
  const [error, setError] = useState(null);
  
  const submitOrder = async () => {
    try {
      setSagaStatus('processing');
      
      // Submit order - saga begins
      const response = await fetch('/api/orders', {
        method: 'POST',
        body: JSON.stringify(orderData),
      });
      
      const { sagaId } = await response.json();
      
      // Poll for saga completion
      await pollSagaStatus(sagaId);
      
    } catch (err) {
      setError(err.message);
      setSagaStatus('failed');
    }
  };
  
  const pollSagaStatus = async (sagaId) => {
    const checkStatus = async () => {
      const response = await fetch(`/api/sagas/${sagaId}/status`);
      const { status } = await response.json();
      
      if (status === 'completed') {
        setSagaStatus('completed');
        return true;
      } else if (status === 'failed') {
        setSagaStatus('failed');
        return true;
      }
      return false;
    };
    
    // Poll every 2 seconds until completion
    while (true) {
      if (await checkStatus()) break;
      await new Promise(resolve => setTimeout(resolve, 2000));
    }
  };
  
  return (
    <div>
      {sagaStatus === 'processing' && (
        <div>
          <Spinner />
          <p>Processing your order...</p>
          <p className="text-sm">This may take a few moments</p>
        </div>
      )}
      
      {sagaStatus === 'completed' && (
        <SuccessMessage message="Order confirmed!" />
      )}
      
      {sagaStatus === 'failed' && (
        <ErrorMessage message="Order failed. No charges were made." />
      )}
    </div>
  );
}

Best Practices

  1. Design idempotent operations: Both actions and compensations must be safely retryable
  2. Use semantic locks: Prevent concurrent sagas from conflicting (e.g., mark inventory as “pending”)
  3. Store saga state: Persist saga execution state for recovery after failures
  4. Implement timeout handling: Sagas should not run indefinitely
  5. Monitor saga execution: Track success rates, duration, and failure patterns
  6. Order steps carefully: Put more likely-to-fail steps early to minimize compensation
  7. Communicate clearly in UI: Users should understand the asynchronous nature

Conclusion

The Saga pattern is essential for maintaining consistency in distributed systems while avoiding the pitfalls of distributed transactions. By breaking complex operations into local transactions with compensating actions, you can build resilient microservices that scale effectively.

Choose choreography for simpler flows with clear event-driven logic. Use orchestration when you need centralized control, complex coordination, or detailed monitoring of saga execution.

The complexity is real, but for distributed systems requiring cross-service consistency, sagas provide a battle-tested solution that many organizations rely on at scale.