Circuit Breaker Pattern with Fallback Strategies
Circuit Breaker Pattern with Fallback Strategies
Introduction
The Circuit Breaker pattern prevents cascading failures in distributed systems by detecting failures and encapsulating logic to prevent repeated calls to failing services. When combined with intelligent fallback strategies, it enables graceful degradation rather than total system failure.
This pattern is essential for building resilient microservices architectures, especially when dealing with unreliable external dependencies like third-party APIs, databases, or downstream services.
Core Concept
A circuit breaker acts like an electrical circuit breaker: it monitors for failures and “trips” when failure rates exceed a threshold, preventing further calls to the failing service. The circuit breaker operates in three states:
- Closed: Normal operation, requests pass through
- Open: Failure threshold exceeded, requests fail immediately without calling the service
- Half-Open: Testing if the service has recovered by allowing limited requests
When to Use
Ideal scenarios:
- Calling external APIs with variable reliability
- Database connections that may become overwhelmed
- Microservices communication where downstream failures could cascade
- Integration with legacy systems prone to intermittent failures
- Rate-limited third-party services
Avoid when:
- Calling internal, highly reliable services where failure is unexpected
- Single-user applications without external dependencies
- Batch processing where immediate failure feedback is preferred
- Services with unpredictable response times (use timeouts instead)
Implementation in Go
package circuitbreaker
import (
"errors"
"sync"
"time"
)
type State int
const (
StateClosed State = iota
StateOpen
StateHalfOpen
)
type CircuitBreaker struct {
mu sync.RWMutex
state State
failureCount int
successCount int
lastFailureTime time.Time
// Configuration
maxFailures int
timeout time.Duration
halfOpenMaxSuccesses int
}
func New(maxFailures int, timeout time.Duration) *CircuitBreaker {
return &CircuitBreaker{
state: StateClosed,
maxFailures: maxFailures,
timeout: timeout,
halfOpenMaxSuccesses: 2,
}
}
func (cb *CircuitBreaker) Call(fn func() (interface{}, error), fallback func() (interface{}, error)) (interface{}, error) {
cb.mu.Lock()
state := cb.state
// Transition from Open to Half-Open if timeout expired
if state == StateOpen && time.Since(cb.lastFailureTime) > cb.timeout {
cb.state = StateHalfOpen
cb.successCount = 0
state = StateHalfOpen
}
cb.mu.Unlock()
// Fast fail if circuit is open
if state == StateOpen {
if fallback != nil {
return fallback()
}
return nil, errors.New("circuit breaker is open")
}
// Execute the function
result, err := fn()
cb.mu.Lock()
defer cb.mu.Unlock()
if err != nil {
cb.onFailure()
if fallback != nil {
return fallback()
}
return nil, err
}
cb.onSuccess()
return result, nil
}
func (cb *CircuitBreaker) onSuccess() {
if cb.state == StateHalfOpen {
cb.successCount++
if cb.successCount >= cb.halfOpenMaxSuccesses {
cb.state = StateClosed
cb.failureCount = 0
}
} else {
cb.failureCount = 0
}
}
func (cb *CircuitBreaker) onFailure() {
cb.failureCount++
cb.lastFailureTime = time.Now()
if cb.failureCount >= cb.maxFailures {
cb.state = StateOpen
}
}
Usage example:
cb := circuitbreaker.New(5, 60*time.Second)
// With fallback to cached data
result, err := cb.Call(
func() (interface{}, error) {
return fetchFromAPI()
},
func() (interface{}, error) {
return fetchFromCache(), nil
},
)
Implementation in Python
from enum import Enum
from datetime import datetime, timedelta
from typing import Callable, Optional, TypeVar, Generic
import threading
T = TypeVar('T')
class State(Enum):
CLOSED = 1
OPEN = 2
HALF_OPEN = 3
class CircuitBreaker(Generic[T]):
def __init__(
self,
max_failures: int = 5,
timeout: timedelta = timedelta(seconds=60),
half_open_max_successes: int = 2
):
self._state = State.CLOSED
self._failure_count = 0
self._success_count = 0
self._last_failure_time: Optional[datetime] = None
self._max_failures = max_failures
self._timeout = timeout
self._half_open_max_successes = half_open_max_successes
self._lock = threading.RLock()
def call(
self,
fn: Callable[[], T],
fallback: Optional[Callable[[], T]] = None
) -> T:
with self._lock:
state = self._state
# Transition from OPEN to HALF_OPEN
if (state == State.OPEN and
self._last_failure_time and
datetime.now() - self._last_failure_time > self._timeout):
self._state = State.HALF_OPEN
self._success_count = 0
state = State.HALF_OPEN
# Fast fail if circuit is open
if state == State.OPEN:
if fallback:
return fallback()
raise Exception("Circuit breaker is open")
# Execute the function
try:
result = fn()
with self._lock:
self._on_success()
return result
except Exception as e:
with self._lock:
self._on_failure()
if fallback:
return fallback()
raise
def _on_success(self):
if self._state == State.HALF_OPEN:
self._success_count += 1
if self._success_count >= self._half_open_max_successes:
self._state = State.CLOSED
self._failure_count = 0
else:
self._failure_count = 0
def _on_failure(self):
self._failure_count += 1
self._last_failure_time = datetime.now()
if self._failure_count >= self._max_failures:
self._state = State.OPEN
Usage with decorators:
# Create a circuit breaker instance
user_service_cb = CircuitBreaker(max_failures=3, timeout=timedelta(seconds=30))
def get_user_from_api(user_id: int) -> dict:
# API call implementation
pass
def get_user_from_cache(user_id: int) -> dict:
# Cache fallback implementation
return {"id": user_id, "name": "Cached User", "stale": True}
# Use with fallback
user = user_service_cb.call(
lambda: get_user_from_api(123),
fallback=lambda: get_user_from_cache(123)
)
Fallback Strategies
1. Cached Response
Return stale data from cache, marked as potentially outdated:
def with_cache_fallback(cache_key: str):
return lambda: {
**cache.get(cache_key),
'_stale': True,
'_cached_at': cache.get_timestamp(cache_key)
}
2. Default Value
Return a safe default when the service is unavailable:
func defaultUserFallback() (interface{}, error) {
return &User{
ID: 0,
Name: "Guest",
Permissions: []string{"read"},
}, nil
}
3. Degraded Functionality
Reduce functionality but keep core features working:
def degraded_search_fallback(query: str):
# Use simpler, local search instead of full-featured API
return simple_local_search(query, max_results=10)
4. Queue for Later
Queue the request for async processing when service recovers:
func queueForLaterFallback(request Request) (interface{}, error) {
queue.Enqueue(request)
return &Response{
Status: "queued",
Message: "Your request will be processed when service recovers",
}, nil
}
5. Alternative Service
Route to a backup service or data source:
def alternative_service_fallback():
# Try secondary API or data source
return backup_api_client.get_data()
ReactJS Implementation for Frontend
import { useState, useEffect, useCallback } from 'react';
enum CircuitState {
CLOSED,
OPEN,
HALF_OPEN,
}
interface CircuitBreakerConfig {
maxFailures: number;
timeout: number;
halfOpenMaxSuccesses: number;
}
class FrontendCircuitBreaker {
private state: CircuitState = CircuitState.CLOSED;
private failureCount = 0;
private successCount = 0;
private lastFailureTime: number | null = null;
private config: CircuitBreakerConfig;
constructor(config: CircuitBreakerConfig) {
this.config = config;
}
async call<T>(
fn: () => Promise<T>,
fallback?: () => T | Promise<T>
): Promise<T> {
// Check if we should transition from OPEN to HALF_OPEN
if (
this.state === CircuitState.OPEN &&
this.lastFailureTime &&
Date.now() - this.lastFailureTime > this.config.timeout
) {
this.state = CircuitState.HALF_OPEN;
this.successCount = 0;
}
// Fast fail if circuit is open
if (this.state === CircuitState.OPEN) {
if (fallback) {
return await fallback();
}
throw new Error('Circuit breaker is open');
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
if (fallback) {
return await fallback();
}
throw error;
}
}
private onSuccess() {
if (this.state === CircuitState.HALF_OPEN) {
this.successCount++;
if (this.successCount >= this.config.halfOpenMaxSuccesses) {
this.state = CircuitState.CLOSED;
this.failureCount = 0;
}
} else {
this.failureCount = 0;
}
}
private onFailure() {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.config.maxFailures) {
this.state = CircuitState.OPEN;
}
}
getState(): CircuitState {
return this.state;
}
}
// React Hook
export function useCircuitBreaker<T>(
apiFn: () => Promise<T>,
fallbackFn?: () => T,
config: CircuitBreakerConfig = {
maxFailures: 3,
timeout: 30000,
halfOpenMaxSuccesses: 2,
}
) {
const [cb] = useState(() => new FrontendCircuitBreaker(config));
const [data, setData] = useState<T | null>(null);
const [error, setError] = useState<Error | null>(null);
const [isLoading, setIsLoading] = useState(false);
const [circuitState, setCircuitState] = useState(CircuitState.CLOSED);
const execute = useCallback(async () => {
setIsLoading(true);
setError(null);
try {
const result = await cb.call(apiFn, fallbackFn);
setData(result);
setCircuitState(cb.getState());
} catch (err) {
setError(err as Error);
setCircuitState(cb.getState());
} finally {
setIsLoading(false);
}
}, [apiFn, fallbackFn, cb]);
return { data, error, isLoading, execute, circuitState };
}
Usage in a React component:
function UserProfile({ userId }: { userId: number }) {
const { data, error, isLoading, execute, circuitState } = useCircuitBreaker(
() => fetch(`/api/users/${userId}`).then(r => r.json()),
() => ({ id: userId, name: 'Guest', cached: true }),
{ maxFailures: 3, timeout: 30000, halfOpenMaxSuccesses: 2 }
);
useEffect(() => {
execute();
}, [userId, execute]);
if (circuitState === CircuitState.OPEN) {
return <div className="alert">Service temporarily unavailable. Showing cached data.</div>;
}
if (isLoading) return <div>Loading...</div>;
if (error) return <div>Error: {error.message}</div>;
return <div>User: {data?.name}</div>;
}
Trade-offs
Advantages
- Prevents cascading failures: Stops failures from propagating across service boundaries
- Fast failure: Reduces latency by failing fast when a service is known to be down
- Automatic recovery: Tests service health and recovers automatically
- Graceful degradation: Combined with fallbacks, provides degraded functionality instead of total failure
- Resource protection: Prevents thread/connection pool exhaustion from hanging on failed services
Disadvantages
- Added complexity: Introduces state management and configuration overhead
- False positives: May trip during legitimate temporary spikes
- Configuration challenges: Requires tuning thresholds for each service
- Stale data: Fallback strategies may serve outdated information
- Monitoring requirement: Needs dashboards to track circuit states across services
Best Practices
- Service-specific configuration: Tune thresholds based on each service’s SLA
- Observability: Emit metrics for circuit state changes and fallback invocations
- Graceful fallbacks: Always provide meaningful fallback responses when possible
- Timeout integration: Combine with timeouts to prevent hanging on slow services
- Testing: Test circuit breaker transitions in integration tests
- Documentation: Document fallback behavior for API consumers
Conclusion
The Circuit Breaker pattern with fallback strategies is essential for building resilient distributed systems. By preventing cascading failures and providing graceful degradation, it enables systems to maintain partial functionality during outages rather than complete failure.
For principal engineers, implementing circuit breakers across service boundaries is a key architectural decision that significantly improves system reliability and user experience during failure scenarios.