Rate Limiting Demystified - Part 4: Distributed Rate Limiting
Series Navigation:
Index |
Part 1 - Fundamentals |
Part 2 - Algorithms |
Part 3 - Implementation |
Part 5 - Advanced |
Part 6 - Interview Questions
Table of Contents
- Why Distributed Rate Limiting Is Hard
- Centralized Rate Limiting
- Decentralized (Local) Rate Limiting
- Race Conditions and Atomicity
- Redis Lua Scripts for Atomicity
- Redis Transactions vs Lua Scripts
- Redis Cluster Considerations
- Sticky Sessions Approach
- Hybrid: Local Approximate + Global Precise
- Rate Limiting in a Service Mesh
- Multi-Region Rate Limiting
- Handling Redis Failures: Fail-Open vs Fail-Closed
1. Why Distributed Rate Limiting Is Hard
The Single-Server Illusion
On a single server, rate limiting is trivial. A global counter in memory, incremented on
every request, checked against a limit. Done.
Single Server:
[User] --> [Server A (counter=47)] --> Allow or Deny
But almost no production system runs on a single server. You have a cluster:
Distributed:
[User] --> [Load Balancer] --> [Server A (counter=15)]
--> [Server B (counter=12)]
--> [Server C (counter=20)]
Each server only sees its fraction of requests. If the limit is 100/minute and you have
3 servers each counting independently, you effectively allow 300 requests/minute. Your
rate limit is broken.
The Four Core Problems
Problem 1: No Shared State
Each application instance maintains its own in-memory counter. These counters are not
synchronized. The total rate can be limit * number_of_instances.
Problem 2: Race Conditions
Even with a shared Redis, multiple instances can race to read-then-write:
Time 0: Server A reads count=99, Server B reads count=99 (both see 99 < 100)
Time 1: Server A writes count=100 (increments to 100)
Time 2: Server B writes count=100 (ALSO increments to 100 - both allowed!)
Result: 2 requests allowed when only 1 should be permitted
Problem 3: Clock Skew
Different servers have slightly different system clocks. In fixed-window rate limiting,
the window boundaries are based on the current time. If two servers disagree on the time
by even 1 second, they may count in different windows, splitting the count.
Server A clock: 12:01:00.500
Server B clock: 12:01:01.200
At the transition from window 1 to window 2:
Server A puts its requests in Window 2 (clock says 12:01:01)
Server B puts its requests in Window 1 (clock says 12:01:00.9)
Each window appears to have fewer requests than it should.
Problem 4: Network Latency
Every rate limit check requires a round trip to Redis. At p99, this can add 5-20ms of
latency. For high-frequency endpoints, this overhead is significant.
2. Centralized Rate Limiting
Architecture
All rate limit state lives in a single shared store (Redis). Every application instance
queries this store for every request.
[Server A] --+
[Server B] --+--> [Redis Cluster] (single source of truth)
[Server C] --+
Implementation (as seen in Part 3)
The implementations using Redis INCR, ZADD, or Lua scripts in Part 3 ARE centralized
rate limiting. The key insight: by pushing all state to Redis, all instances share the
same counter.
Redis Key Design for Centralized Rate Limiting
# Good: Namespaced, scannable, evictable
key = f"rl:{algorithm}:{entity_type}:{identifier}:{window}"
# Examples:
"rl:fw:user:user_123:1735689600" # Fixed window, user, window ID
"rl:sw:apikey:abc123" # Sliding window, API key
"rl:tb:ip:203.0.113.42" # Token bucket, IP address
"rl:fw:global:system:1735689600" # Global limit
# Key length matters: binary_remote_addr vs remote_addr in Nginx
# IPv4: 4 bytes vs up to 15 characters
# Use hashing for long identifiers
import hashlib
def make_redis_key(prefix: str, identifier: str) -> str:
if len(identifier) > 64:
identifier = hashlib.sha256(identifier.encode()).hexdigest()[:16]
return f"{prefix}:{identifier}"Pros and Cons of Centralized
| Pros | Cons |
|---|---|
| Accurate: single source of truth | Extra network hop per request (~1-5ms) |
| Works with any number of instances | Redis is a single point of failure |
| No complex synchronization needed | Redis can become a bottleneck at very high QPS |
| Easy to implement with Redis | Requires Redis infrastructure |
Capacity Planning for Redis Rate Limiting
Calculation:
Users: 1,000,000
Keys per user: 3 (per-second, per-minute, per-day)
Key size: ~50 bytes average
Value size: 8 bytes (int64 counter)
TTL overhead: minimal (Redis handles internally)
Memory estimate:
1,000,000 users x 3 keys x (50 + 8) bytes = ~174 MB
For sliding window log at 1000 limit:
1,000,000 users x 1000 entries x 20 bytes = ~20 GB (!)
-> This is why sliding window log is rarely used at scale
3. Decentralized (Local) Rate Limiting
Architecture
Each server instance maintains its own in-memory rate limiter. No shared state.
[Server A] --> [Local Limiter A: count=15]
[Server B] --> [Local Limiter B: count=12]
[Server C] --> [Local Limiter C: count=20]
When Local Rate Limiting Is Acceptable
Surprisingly, local rate limiting IS acceptable in specific scenarios:
- Single-instance deployments: Development environments, small services
- Load balanced sticky sessions: User always hits the same server (see section 8)
- Approximate global limits: You want to limit to ~N per server, total is ~N*instances
- Protecting individual server resources: CPU, memory, connections per server
- Rate limiting outbound calls: When YOUR code calls external APIs, local bucket is fine
Java In-Memory Implementation (Bucket4j)
@Service
public class LocalRateLimitService {
// ConcurrentHashMap for thread safety
private final ConcurrentHashMap<String, Bucket> userBuckets = new ConcurrentHashMap<>();
// Scheduled cleanup to prevent memory leaks
@Scheduled(fixedDelay = 300_000) // every 5 minutes
public void cleanupStaleBuckets() {
// Remove buckets that haven't been used recently
// In production, use Caffeine cache with expiry instead
userBuckets.entrySet().removeIf(entry ->
entry.getValue().getAvailableTokens() == entry.getValue().asVerbose()
.getConfiguration().getBandwidths()[0].getCapacity()
);
}
public boolean isAllowed(String userId) {
Bucket bucket = userBuckets.computeIfAbsent(userId, key ->
Bucket4j.builder()
.addLimit(Bandwidth.classic(100, Refill.greedy(100, Duration.ofMinutes(1))))
.build()
);
return bucket.tryConsume(1);
}
}Memory Leak Warning
Local in-memory rate limiters MUST have a cleanup strategy:
from cachetools import TTLCache
import threading
class LocalRateLimiter:
"""
Local rate limiter with automatic cleanup via TTL cache.
Each user's counter automatically expires after the window.
"""
def __init__(self, limit: int, window_seconds: int):
self.limit = limit
self.window_seconds = window_seconds
# TTLCache automatically evicts entries after maxsize or ttl
self.cache = TTLCache(maxsize=100_000, ttl=window_seconds)
self.lock = threading.Lock()
def is_allowed(self, identifier: str) -> bool:
with self.lock:
count = self.cache.get(identifier, 0)
if count >= self.limit:
return False
self.cache[identifier] = count + 1
return True4. Race Conditions and Atomicity
The Classic Race Condition
Redis key: "rl:user123" = 99
Limit: 100
Thread A: GET "rl:user123" -> 99 (reads 99)
Thread B: GET "rl:user123" -> 99 (reads 99)
Thread A: 99 < 100 -> allowed! SET "rl:user123" 100
Thread B: 99 < 100 -> allowed! SET "rl:user123" 100
Result: Both allowed. But only one should have been.
Why INCR Solves the Counter Problem
Redis INCR is atomic - it reads and increments in a single operation. But the conditional
check (is count <= limit?) is still outside Redis, creating a TOCTOU (Time of Check, Time
of Use) vulnerability.
# INCORRECT: Non-atomic check-then-increment
count = redis.get(key)
if int(count or 0) < limit:
redis.incr(key) # Another thread may increment between these two lines
return True
return False
# BETTER: Atomic increment, then check
count = redis.incr(key) # Atomic! No race condition in the increment itself
if count == 1:
redis.expire(key, window_seconds)
return count <= limit
# Still a minor issue: multiple "count == 1" checks can race on expire
# But the count check is now safeThe Remaining Problem with INCR
Even with INCR, there is a subtle issue: if 1000 concurrent requests all INCR and get
values 1-1000, the first 100 (those whose INCR returned 1-100) will be allowed. This is
CORRECT behavior. The race condition above where two clients both see 99 is eliminated.
However, the EXPIRE setting is still racy:
count = redis.incr(key) # Thread A: gets 1
if count == 1: # Thread A: true
redis.expire(key, 60) # Thread A: sets TTL
# CRASH HERE - process dies
# Thread A never sets TTL
# Thread B: gets 2, count != 1, doesn't set TTL
# Key lives FOREVER
# Fix: Always set expiry, not just on first request
count = redis.incr(key)
redis.expire(key, 60) # Set on every request (slightly wasteful but safe)
# Better fix: Use SET with NX and EX
# Or use a Lua script5. Redis Lua Scripts for Atomicity
Lua scripts in Redis execute atomically. The entire script runs without interruption
from any other Redis command. This is the correct solution for complex rate limiting logic.
Complete Lua Script Reference
-- atomic_rate_limit.lua
-- Implements sliding window counter with atomic check-and-increment
--
-- KEYS[1]: base key for this rate limit
-- ARGV[1]: limit (max requests per window)
-- ARGV[2]: window_size (in seconds)
-- ARGV[3]: current_time (Unix timestamp, integer seconds)
-- Returns: {allowed (0/1), current_count, remaining, reset_at}
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
-- Determine window boundaries
local current_window_id = math.floor(now / window)
local prev_window_id = current_window_id - 1
local curr_key = key .. ':' .. current_window_id
local prev_key = key .. ':' .. prev_window_id
-- Get counts for current and previous windows
local curr_count = tonumber(redis.call('GET', curr_key) or '0')
local prev_count = tonumber(redis.call('GET', prev_key) or '0')
-- How far are we into the current window? (0.0 to 1.0)
local window_elapsed = (now % window) / window
-- Estimate using sliding window: prev * (1 - elapsed) + curr
local estimated = prev_count * (1 - window_elapsed) + curr_count
if estimated < limit then
-- Allow: increment current window
local new_count = redis.call('INCR', curr_key)
if new_count == 1 then
redis.call('EXPIRE', curr_key, window * 2)
end
local reset_at = (current_window_id + 1) * window
local remaining = limit - math.floor(estimated) - 1
return {1, math.floor(estimated) + 1, math.max(0, remaining), reset_at}
else
-- Deny
local reset_at = (current_window_id + 1) * window
return {0, math.floor(estimated), 0, reset_at}
endUsing the Script in Python
import redis
import time
class AtomicSlidingWindowLimiter:
SCRIPT = """
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local current_window_id = math.floor(now / window)
local prev_window_id = current_window_id - 1
local curr_key = key .. ':' .. current_window_id
local prev_key = key .. ':' .. prev_window_id
local curr_count = tonumber(redis.call('GET', curr_key) or '0')
local prev_count = tonumber(redis.call('GET', prev_key) or '0')
local window_elapsed = (now % window) / window
local estimated = prev_count * (1 - window_elapsed) + curr_count
if estimated < limit then
local new_count = redis.call('INCR', curr_key)
if new_count == 1 then redis.call('EXPIRE', curr_key, window * 2) end
local reset_at = (current_window_id + 1) * window
return {1, math.floor(estimated) + 1, math.max(0, limit - math.floor(estimated) - 1), reset_at}
else
return {0, math.floor(estimated), 0, (current_window_id + 1) * window}
end
"""
def __init__(self, r: redis.Redis, limit: int, window_seconds: int):
self.r = r
self.limit = limit
self.window_seconds = window_seconds
self._script = r.register_script(self.SCRIPT)
def is_allowed(self, identifier: str) -> dict:
now = int(time.time())
key = f"rl:sw:{identifier}"
result = self._script(keys=[key], args=[self.limit, self.window_seconds, now])
return {
"allowed": bool(int(result[0])),
"current_count": int(result[1]),
"remaining": int(result[2]),
"reset_at": int(result[3]),
"limit": self.limit
}Important Lua Script Constraints
1. Lua scripts must be deterministic: same inputs -> same outputs
2. Do NOT use time() in Lua - it is non-deterministic in replication
Instead: pass current time as an ARGV argument from the client
3. All keys a Lua script will access MUST be declared in KEYS[]
(Required for Redis Cluster to route correctly)
4. Keep Lua scripts short: they block other Redis commands while running
5. Use EVALSHA instead of EVAL in production to avoid resending the script
6. Redis Transactions vs Lua Scripts
MULTI/EXEC Transactions
# Redis MULTI/EXEC: optimistic locking with WATCH
def is_allowed_with_transaction(r, key, limit):
with r.pipeline() as pipe:
while True:
try:
# Watch the key for changes
pipe.watch(key)
count = int(pipe.get(key) or 0)
if count >= limit:
pipe.reset()
return False
# Start transaction
pipe.multi()
pipe.incr(key)
pipe.expire(key, 60)
pipe.execute() # Fails if key changed since WATCH
return True
except redis.WatchError:
# Another client modified the key - retry
continue # This loop is unbounded - dangerous under high contention!Why Lua Scripts Win Over MULTI/EXEC
| Aspect | MULTI/EXEC | Lua Script |
|---|---|---|
| Atomicity | Yes (if no WATCH conflict) | Yes (always) |
| Retry needed | Yes (on WATCH conflict) | No |
| Network round trips | 3+ (WATCH, MULTI, EXEC) | 1 (EVAL) |
| Performance | Degrades under contention | Constant |
| Complexity | Higher (retry loop) | Lower |
| Use case | Complex logic with reads | Rate limiting (ideal) |
Use Lua scripts for rate limiting. Always.
7. Redis Cluster Considerations
The Hash Slot Problem
Redis Cluster distributes data across 16,384 hash slots. Keys are assigned to slots based
on CRC16 hash. A rate limiter that uses multiple keys (e.g., current window + previous window)
can end up on DIFFERENT Redis nodes.
Lua scripts CANNOT access keys on different nodes. This will fail:
-- This will fail in Redis Cluster if key1 and key2 are on different nodes
local curr = redis.call('GET', KEYS[1]) -- node A
local prev = redis.call('GET', KEYS[2]) -- node B (CROSS-SLOT ERROR!)Solution: Hash Tags
Force related keys to the same slot by using a hash tag {...}:
# Without hash tags: keys on different slots (BROKEN in cluster)
curr_key = f"rl:sw:{user_id}:curr" # slot = CRC16("rl:sw:user123:curr") % 16384
prev_key = f"rl:sw:{user_id}:prev" # slot = CRC16("rl:sw:user123:prev") % 16384
# With hash tags: same user, same slot (CORRECT)
curr_key = f"rl:sw:{{{user_id}}}:curr" # slot = CRC16("user123") % 16384
prev_key = f"rl:sw:{{{user_id}}}:prev" # slot = CRC16("user123") % 16384
# Both keys go to the same slot because {user_id} determines the hashLua Script with Hash Tags
SCRIPT = """
local user_id = ARGV[4]
local curr_key = 'rl:sw:{' .. user_id .. '}:curr'
local prev_key = 'rl:sw:{' .. user_id .. '}:prev'
-- Now both keys are guaranteed to be on the same slot
local curr = tonumber(redis.call('GET', curr_key) or '0')
local prev = tonumber(redis.call('GET', prev_key) or '0')
...
"""Redis Sentinel vs Cluster vs Standalone
| Mode | Use Case | Rate Limiting Notes |
|---|---|---|
| Standalone | Dev, small scale | Simplest. Works with all patterns. |
| Sentinel | HA, no sharding | Automatic failover. Same as standalone for rate limiting. |
| Cluster | Very high QPS, large data | Use hash tags! All keys per user must be co-located. |
8. Sticky Sessions Approach
What It Is
With sticky sessions (session affinity), the load balancer routes ALL requests from a
specific user/IP to the SAME application server. That server can then do local in-memory
rate limiting.
[User 123] --> [Load Balancer] --> [Server A] (always, based on user ID hash)
[User 456] --> [Load Balancer] --> [Server B] (always)
[User 789] --> [Load Balancer] --> [Server C] (always)
Implementation
# Nginx sticky session based on IP
upstream backend {
ip_hash;
server server1:8080;
server server2:8080;
server server3:8080;
}
# AWS ELB: enable "Stickiness" in target group settings
# Stickiness type: Application-based / LB-based
# Duration: 1 dayWhen Sticky Sessions Break Down
Sticky sessions seem like an elegant solution but fail in several scenarios:
-
Server failure: When Server A dies, all its users are redistributed. Their rate
limit counters reset to zero. A user who was at 99/100 requests can immediately make
another 100 requests. -
Scaling out: Adding a new server changes the hash distribution. Users that were
on Server A may move to Server D. Counters reset. -
Mobile users: Mobile users change IPs frequently (WiFi to cellular). IP-hash
sticky sessions break with IP changes. -
Corporate proxies: Thousands of users behind the same proxy IP all go to the
same server. That server gets overloaded.
Conclusion: Sticky sessions are not a reliable solution for rate limiting. Use
centralized Redis instead.
9. Hybrid: Local Approximate + Global Precise
The Problem to Solve
Centralized Redis adds 1-5ms per request. For endpoints serving 10,000 RPS per instance,
that is 50ms of extra latency per second of CPU time just waiting for Redis.
The Hybrid Approach
Keep a local counter that approximates the rate, and only sync with Redis periodically.
[Server A: local_count=15, synced_5s_ago] -----> [Redis: global_count=47]
[Server B: local_count=12, synced_3s_ago] -/
[Server C: local_count=20, synced_7s_ago] -/
Implementation
import time
import redis
import threading
from dataclasses import dataclass, field
@dataclass
class LocalCounter:
count: int = 0
last_sync: float = field(default_factory=time.time)
class HybridRateLimiter:
"""
Hybrid rate limiter: local approximation + global Redis enforcement.
Strategy:
1. Each instance maintains a local counter
2. Requests that pass the local check are then validated globally in Redis
3. Local sync happens every `sync_interval` seconds
This reduces Redis calls by (sync_interval * local_rps) factor.
"""
def __init__(
self,
r: redis.Redis,
limit: int,
window_seconds: int,
# Each server reserves a "chunk" of the global limit
# For 3 servers with limit=100: reserve_percent=0.4 -> each reserves 40
reserve_percent: float = 0.4
):
self.r = r
self.limit = limit
self.window_seconds = window_seconds
self.local_limit = int(limit * reserve_percent)
self.locals: dict[str, LocalCounter] = {}
self.lock = threading.Lock()
self._script = r.register_script("""
local key = KEYS[1]
local add = tonumber(ARGV[1])
local limit = tonumber(ARGV[2])
local window = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
local window_id = math.floor(now / window)
local full_key = key .. ':' .. window_id
local count = redis.call('INCRBY', full_key, add)
if count == add then
redis.call('EXPIRE', full_key, window * 2)
end
if count <= limit then
return {1, count}
else
-- Refund: we over-counted, subtract back
redis.call('DECRBY', full_key, add)
return {0, count - add}
end
""")
def is_allowed(self, identifier: str, cost: int = 1) -> bool:
with self.lock:
if identifier not in self.locals:
self.locals[identifier] = LocalCounter()
local = self.locals[identifier]
# Phase 1: Fast local check
if local.count + cost > self.local_limit:
# Local quota exhausted, go to Redis
return self._global_check(identifier, cost)
# Phase 2: Increment local counter
local.count += cost
# Phase 3: Periodically sync local counts to Redis
now = time.time()
if now - local.last_sync >= 1.0: # sync every second
self._sync_to_redis(identifier, local)
return True
def _global_check(self, identifier: str, cost: int) -> bool:
now = int(time.time())
key = f"rl:hybrid:{identifier}"
result = self._script(keys=[key], args=[cost, self.limit, self.window_seconds, now])
return bool(int(result[0]))
def _sync_to_redis(self, identifier: str, local: LocalCounter) -> None:
"""Flush accumulated local counts to Redis."""
if local.count == 0:
return
to_sync = local.count
local.count = 0
local.last_sync = time.time()
now = int(time.time())
key = f"rl:hybrid:{identifier}"
# Push accumulated local count to Redis
try:
self._script(keys=[key], args=[to_sync, self.limit, self.window_seconds, now])
except redis.RedisError:
# If Redis is down, restore local count and fail open
local.count += to_sync10. Rate Limiting in a Service Mesh
Envoy Proxy Rate Limiting
Envoy is the data plane used by Istio and many other service meshes. It supports both
local rate limiting (per Envoy instance) and global rate limiting (via external gRPC service).
# Envoy local rate limiting filter
- name: envoy.filters.http.local_ratelimit
typed_config:
"@type": type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
value:
stat_prefix: http_local_rate_limiter
token_bucket:
max_tokens: 1000
tokens_per_fill: 1000
fill_interval: 1s
filter_enabled:
runtime_key: local_rate_limit_enabled
default_value:
numerator: 100
denominator: HUNDRED
response_headers_to_add:
- append: false
header:
key: x-local-rate-limit
value: "true"Global Rate Limit Service (gRPC)
For true distributed rate limiting in a service mesh, Envoy calls out to a global rate
limit service over gRPC before forwarding each request.
# Envoy global rate limit config
rate_limits:
- actions:
- request_headers:
header_name: x-user-id
descriptor_key: user_id
- request_headers:
header_name: ":path"
descriptor_key: pathThe external rate limit service (e.g., Lyft's ratelimit service) then decides allow/deny
based on the descriptor and configured rules.
Istio Rate Limiting
# Istio EnvoyFilter for rate limiting
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: filter-ratelimit
spec:
configPatches:
- applyTo: HTTP_FILTER
match:
context: SIDECAR_INBOUND
listener:
filterChain:
filter:
name: envoy.filters.network.http_connection_manager
patch:
operation: INSERT_BEFORE
value:
name: envoy.filters.http.ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: productpage-ratelimit
failure_mode_deny: false # fail open
rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name: rate_limit_cluster
transport_api_version: V311. Multi-Region Rate Limiting
The CAP Theorem Trade-Off
Multi-region rate limiting forces a fundamental choice:
- Consistency: Every region enforces the exact same limit
- Availability: Every region responds even if others are unreachable
- Partition Tolerance: Required (network partitions between regions happen)
You must choose between C and A. Rate limiting typically favors Availability (AP):
it is better to allow a few extra requests than to block all requests because a cross-region
connection failed.
Approaches
Approach 1: Separate limits per region
User limit: 1000/minute globally
Region US-EAST: 500/minute
Region EU-WEST: 300/minute
Region AP-SOUTH: 200/minute
Total enforced: 1000/minute
Users are redirected to specific regions based on geography. Each region enforces its
own limit independently with no cross-region communication.
Limitation: A user who routes through a VPN or CDN can exceed their limit by switching
regions.
Approach 2: Eventual consistency with local caches
Each region caches the global count, refreshed every N seconds.
Local check: local_count < local_limit
Background sync: push local increments to global store periodically
Approach 3: Centralized global store with read replication
Write region (primary): All increments go to primary Redis
Read region (replica): Check reads can go to regional replica
Lag: Replica is ~5-50ms behind primary
This introduces a brief window where both primary and replica allow a request that the
global limit would deny.
Approach 4: Accept approximate limiting
For most APIs, allowing 5-10% over-provisioning due to multi-region lag is acceptable.
Enforce exact limits only on the most critical endpoints (payment APIs, etc.) using
synchronous cross-region calls.
12. Handling Redis Failures: Fail-Open vs Fail-Closed
The Dilemma
If your Redis rate limiter goes down, what should happen to incoming requests?
Fail-Open (Allow):
def is_allowed(identifier: str) -> bool:
try:
result = redis.execute_rate_limit_check(identifier)
return result.allowed
except redis.RedisError:
logger.error("Redis down! Failing open (allowing all requests)")
return True # Allow when Redis is unavailablePros: Service stays available. Users not impacted.
Cons: Opens door to abuse. All rate limits bypassed.
Fail-Closed (Deny):
def is_allowed(identifier: str) -> bool:
try:
result = redis.execute_rate_limit_check(identifier)
return result.allowed
except redis.RedisError:
logger.error("Redis down! Failing closed (denying all requests)")
return False # Deny when Redis is unavailablePros: Security maintained. No abuse possible during outage.
Cons: Service unavailable. All users impacted, not just bad actors.
Recommended: Fail-Open with Local Circuit Breaker
import time
from enum import Enum
class CircuitState(Enum):
CLOSED = "closed" # Normal: using Redis
OPEN = "open" # Failure: using local fallback
HALF_OPEN = "half_open" # Testing: trying Redis again
class ResilientRateLimiter:
"""
Rate limiter with circuit breaker pattern.
Falls back to local in-memory limiting when Redis is unavailable.
"""
def __init__(self, redis_limiter, local_limiter, failure_threshold=5, recovery_timeout=60):
self.redis_limiter = redis_limiter
self.local_limiter = local_limiter
self.failure_count = 0
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.state = CircuitState.CLOSED
self.last_failure_time = None
def is_allowed(self, identifier: str) -> dict:
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = CircuitState.HALF_OPEN
else:
# Circuit open: use local limiter as fallback
return self.local_limiter.is_allowed(identifier)
try:
result = self.redis_limiter.is_allowed(identifier)
# Success: reset circuit
if self.state == CircuitState.HALF_OPEN:
self.state = CircuitState.CLOSED
self.failure_count = 0
return result
except Exception as e:
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
# Alert! Rate limiter is down
# alert_system.send("Redis rate limiter DOWN, using local fallback")
# Fallback to local limiting
return self.local_limiter.is_allowed(identifier)Decision Guide: Fail-Open vs Fail-Closed
| Scenario | Recommendation |
|---|---|
| Public API (general use) | Fail-Open. User experience > perfect enforcement. |
| Payment / Financial API | Fail-Closed or strict local fallback. |
| Authentication endpoints | Fail-Closed. Security critical. |
| Read-only endpoints | Fail-Open. No abuse risk. |
| Internal service calls | Fail-Open. All callers are trusted. |
| Free tier API | Fail-Open with monitoring. Alert on anomalies. |
Summary
| Challenge | Solution |
|---|---|
| Multiple instances, no shared state | Centralized Redis rate limiter |
| Race conditions | Redis Lua scripts (atomic execution) |
| Clock skew | Pass timestamp as ARGV to Lua, not inside script |
| Redis Cluster key routing | Use hash tags {user_id} in key names |
| High Redis latency | Hybrid local + global approach |
| Redis failure | Circuit breaker with local fallback |
| Multi-region accuracy | Accept approximation or use primary-region enforcement |
| Sticky sessions failure | Do not rely on sticky sessions for rate limiting |
Next: Part 5 - Advanced Concepts and Industry Practices
Learn adaptive rate limiting, tiered systems, how Twitter, GitHub, and Stripe do it,
and the tips, pitfalls, and anti-patterns that matter most in production.