Rate Limiting - Supplement 1: Anti-Patterns Extended Deep Dive
Series Navigation:
Main Index |
Part 5 - Advanced (12 original anti-patterns) |
Supplement 2 - Production Challenges |
Supplement 3 - Trade-Offs and Decision Guide |
Supplement 4 - Architecture Patterns
This supplement extends Part 5 with 25 additional anti-patterns not covered earlier.
Each includes: the pattern name, how it manifests, why it is dangerous,
a broken code example, a fixed example, and production impact.
Table of Contents
Infrastructure Anti-Patterns
- The In-Process Island
- The Connection Pool Killer
- The Oversized Lua Script
- The Missing Hash Tag
- The TTL Roulette
- The Unbounded Key Space
- The Shared Counter Spaghetti
Business Logic Anti-Patterns 8. The Zero-Context Limiter 9. The Quota Without a Rate Limit 10. The Grandfathered Exemption 11. The Asymmetric Read-Write Limit 12. The Rolling Reset Surprise 13. The Cost-Blind Limiter
Operational Anti-Patterns 14. The One Environment for All 15. The Unmonitored Limiter 16. The Stale Configuration 17. The Incident Blackout 18. The Cascading Quota Drain
Client-Side Anti-Patterns 19. The Aggressive Poller 20. The Fan-Out Bomb 21. The Missing Cache Layer 22. The Synchronous Bulk Stampede
Security Anti-Patterns 23. The Predictable Window Attack Surface 24. The Shared API Key Bypass 25. The Unvalidated Forwarded IP
Infrastructure Anti-Patterns
AP-1: The In-Process Island
What it looks like:
Each application server maintains its own in-memory rate limit counters. No shared state.
# BROKEN: In-memory limiter on a clustered deployment
from collections import defaultdict
import time
# This lives in process memory. Each of your 10 servers has its OWN copy.
_counters = defaultdict(int)
_windows = defaultdict(float)
def is_allowed(user_id: str, limit: int = 100) -> bool:
now = time.time()
if now - _windows[user_id] >= 60:
_counters[user_id] = 0
_windows[user_id] = now
_counters[user_id] += 1
return _counters[user_id] <= limitWhy it is dangerous:
With 10 application servers and round-robin load balancing, each server sees roughly 1/10
of traffic. A user can make 1,000 requests per minute (10x the limit) by spreading them
across servers and being within limit on each individual server.
User's actual requests: 1000/min
Server A sees: 100/min -> ALLOW (within limit)
Server B sees: 100/min -> ALLOW (within limit)
...
Server J sees: 100/min -> ALLOW (within limit)
Effective enforcement: 0% - completely bypassed
The Fix:
Use Redis as the shared counter store. This is the most important requirement for any
rate limiter in a horizontally-scaled system.
import redis
import time
r = redis.Redis(host="redis-cluster", decode_responses=True)
def is_allowed(user_id: str, limit: int = 100, window: int = 60) -> bool:
now = int(time.time())
window_id = now // window
key = f"rl:{user_id}:{window_id}"
count = r.incr(key)
if count == 1:
r.expire(key, window * 2)
return count <= limit
# Now all 10 servers increment the SAME Redis key.
# User sees exactly the intended limit regardless of which server handles them.Production Impact: High. In-process rate limiters are one of the most common
rate limiting mistakes in production. Discovered only during load testing or after an incident.
AP-2: The Connection Pool Killer
What it looks like:
Rate limiter creates a new Redis connection per request, or uses a pool so small that
it exhausts under load.
# BROKEN: New connection per request
def is_allowed(user_id: str) -> bool:
r = redis.Redis(host="redis") # NEW CONNECTION EVERY REQUEST
count = r.incr(f"rl:{user_id}")
r.expire(f"rl:{user_id}", 60)
r.close()
return count <= 100
# BROKEN: Lambda-style code with same problem
# AWS Lambda can spawn 1000+ concurrent instances
# Each instance tries to create a new Redis connection
# Redis default max connections = 10,000
# 1000 Lambda instances x 5 connections each = 5,000 connections used
# Plus your other services = connection exhaustionWhy it is dangerous:
- Each TCP connection has overhead: memory, file descriptor, TLS handshake
- Connection establishment takes 1-10ms (defeats the purpose of a fast rate check)
- Redis has a connection limit (default 10,000). Exhausting it makes Redis unavailable
for ALL services, not just the rate limiter
The Fix:
# CORRECT: Connection pool, created once at module load
import redis
# Created ONCE at startup, reused across all requests in this process
_pool = redis.ConnectionPool(
host="redis",
port=6379,
db=1, # separate DB for rate limiting
max_connections=20, # per-instance pool size
socket_timeout=0.5, # 500ms timeout - fail fast
socket_connect_timeout=0.5
)
_client = redis.Redis(connection_pool=_pool, decode_responses=True)
def is_allowed(user_id: str) -> bool:
# _client is reused - borrows from pool, returns when done
count = _client.incr(f"rl:{user_id}")
if count == 1:
_client.expire(f"rl:{user_id}", 60)
return count <= 100For AWS Lambda / Serverless:
# Lambda: connection pool per Lambda container (not per invocation)
# Lambda containers are reused across invocations - module-level code runs once
import redis
import os
# This runs once per Lambda CONTAINER, not once per invocation
_r = redis.Redis(
host=os.environ["REDIS_HOST"],
max_connections=5, # Keep small: Lambda can have many containers
socket_timeout=0.3 # Very aggressive timeout for Lambda
)
def handler(event, context):
# _r is reused from module-level initialization
result = _r.incr("rl:key")
...Production Impact: Critical in Lambda/containerized environments. Can silently
exhaust Redis connections for all services.
AP-3: The Oversized Lua Script
What it looks like:
A Lua script that does too much work, blocking Redis's single-threaded execution for
an excessive amount of time.
-- BROKEN: Lua script that iterates over ALL user keys to compute totals
-- This is pathological but variants appear in production
local pattern = "rl:user:*"
local cursor = "0"
local total = 0
repeat
local result = redis.call("SCAN", cursor, "MATCH", pattern, "COUNT", 100)
cursor = result[1]
local keys = result[2]
for _, key in ipairs(keys) do
local val = tonumber(redis.call("GET", key) or "0")
total = total + val
end
until cursor == "0"
return total
-- This script runs for potentially SECONDS. Redis is blocked the entire time.
-- All other clients wait. p99 latency explodes.Why it is dangerous:
Redis is single-threaded. A Lua script runs atomically - no other command can execute
while the script is running. A script that takes 100ms effectively freezes Redis for
all clients during that time.
Redis has a lua-time-limit (default 5000ms). If a script exceeds it, Redis starts
rejecting new commands with BUSY errors. Scripts running longer than this are killed
(if SCRIPT KILL is issued).
The Fix:
-- CORRECT: Lua script does only what is necessary for ONE rate limit check
-- Keep scripts under 1ms execution time
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local window_id = math.floor(now / window)
local k = key .. ':' .. window_id
local count = redis.call('INCR', k)
if count == 1 then
redis.call('EXPIRE', k, window * 2)
end
if count <= limit then
return {1, count, limit - count}
end
return {0, count, 0}
-- This runs in microseconds. No scanning. No iteration. Just 2-3 commands.Rules for Lua Scripts in Rate Limiters:
- No SCAN, no KEYS (both are O(n) on keyspace)
- No loops over variable-length data
- No external calls (no HTTP from Lua)
- Test under load to verify < 1ms execution
- Monitor slow log:
CONFIG SET slowlog-log-slower-than 1000(1ms threshold)
Production Impact: Can take down Redis for all services. One slow script = total
outage for all Redis-dependent systems.
AP-4: The Missing Hash Tag
What it looks like:
Multi-key Lua scripts in Redis Cluster where keys land on different shards.
-- BROKEN: Two keys that may be on different Redis Cluster shards
local curr_key = KEYS[1] -- "rl:sw:user123:curr" -> shard 7
local prev_key = KEYS[2] -- "rl:sw:user123:prev" -> shard 12
-- ERROR: redis.call('GET', prev_key) fails if prev_key is on a different shard
-- Redis Cluster will return: CROSSSLOT Keys in request don't hash to the same slot
local curr = tonumber(redis.call('GET', curr_key) or '0')
local prev = tonumber(redis.call('GET', prev_key) or '0')Why it fails:
Redis Cluster assigns keys to hash slots: slot = CRC16(key) % 16384. Keys on different
slots go to different shards. A Lua script cannot access keys on different shards.
"rl:sw:user123:curr" -> CRC16 = 34891 -> slot 5099 -> shard 2
"rl:sw:user123:prev" -> CRC16 = 28143 -> slot 3887 -> shard 1
Different shards! Script fails with CROSSSLOT error.
The Fix: Hash Tags
-- CORRECT: Use {} hash tags to force both keys to the same slot
-- Only the content within {} is used for hash slot calculation
local curr_key = KEYS[1] -- "rl:sw:{user123}:curr" -> slot = CRC16("user123") % 16384
local prev_key = KEYS[2] -- "rl:sw:{user123}:prev" -> slot = CRC16("user123") % 16384
-- Both use "user123" for slotting -> guaranteed same shard -> Lua script works!
local curr = tonumber(redis.call('GET', curr_key) or '0')
local prev = tonumber(redis.call('GET', prev_key) or '0')# Python: generate keys with hash tags
def make_rate_limit_keys(user_id: str, window_seconds: int, now: int) -> tuple[str, str]:
window_id = now // window_seconds
# Curly braces around user_id = hash tag
curr_key = f"rl:sw:{{{user_id}}}:{window_id}"
prev_key = f"rl:sw:{{{user_id}}}:{window_id - 1}"
return curr_key, prev_key
# "rl:sw:{user123}:28956"
# "rl:sw:{user123}:28955"
# Both hash on "user123" -> always same Redis Cluster shardProduction Impact: Silent failure in Redis Cluster. Scripts work in standalone
Redis (no sharding) but break immediately when deployed to Cluster. Often discovered
only when the system is first deployed to a Cluster.
AP-5: The TTL Roulette
What it looks like:
Inconsistent, missing, or incorrectly calculated TTLs on rate limit keys.
# BROKEN: Pattern 1 - TTL set only conditionally
count = redis.incr(key)
if count == 1:
redis.expire(key, 60)
# PROBLEM: If the process crashes between INCR and EXPIRE,
# the key has no TTL and lives FOREVER. One crashed process = permanent key.
# BROKEN: Pattern 2 - TTL shorter than the window
count = redis.incr(key)
redis.expire(key, 30) # Window is 60s, TTL is 30s!
# After 30s, key disappears. Counter resets mid-window.
# Users can make 2x the limit: 100 requests in first 30s + 100 more after reset.
# BROKEN: Pattern 3 - TTL too long, wasting memory
count = redis.incr(key)
redis.expire(key, 86400) # 1 day TTL for a 60-second window
# Key stays in Redis for 24 hours after user stops making requests.
# With 1M users: 1M stale keys consuming memory for 24 hours unnecessarily.
# BROKEN: Pattern 4 - No TTL at all (memory leak)
count = redis.incr(key)
# No expire call at all. Keys accumulate forever. Redis memory grows until OOM.The Fix:
# CORRECT: Atomic pipeline, TTL = window_size * 2 (buffer for previous window)
def is_allowed(r, key: str, limit: int, window: int) -> bool:
now = int(time.time())
window_id = now // window
full_key = f"{key}:{window_id}"
ttl = window * 2 # Keep key for 2 windows (needed by sliding window counter)
pipe = r.pipeline()
pipe.incr(full_key)
pipe.expire(full_key, ttl) # Set EVERY time, not just on count==1
results = pipe.execute()
count = results[0]
return count <= limit
# Using SET with NX and EX for guaranteed atomic TTL setting:
def is_allowed_atomic(r, key: str, limit: int, window: int) -> bool:
now = int(time.time())
window_id = now // window
full_key = f"{key}:{window_id}"
ttl = window * 2
# Lua script: atomic check-set-expire
script = """
local count = redis.call('INCR', KEYS[1])
redis.call('EXPIRE', KEYS[1], tonumber(ARGV[1]))
return count
"""
count = r.eval(script, 1, full_key, ttl)
return int(count) <= limitTTL Rules:
- Sliding window counter: TTL =
window_size * 2(need previous window) - Fixed window: TTL =
window_size + 10(small buffer) - Token bucket: TTL =
capacity / refill_rate * 2 + 60(time to refill from empty) - Always set TTL on every request (not just on count==1) to handle process crashes
Production Impact: High. Memory leaks accumulate silently. Incorrect TTLs cause
incorrect rate limiting that is extremely hard to debug.
AP-6: The Unbounded Key Space
What it looks like:
Rate limit keys are created based on arbitrary user input with no bounds on uniqueness.
# BROKEN: Key includes arbitrary user-supplied path
def rate_limit_by_path(ip: str, path: str) -> bool:
# path comes from the URL - user controlled
key = f"rl:{ip}:{path}"
# Attacker sends: GET /api/xxxxxxxxxxxxxxxxxxxxxxxxxxx (random 500-char path)
# Creates millions of unique keys, one per random path
# Redis keyspace explodes. Memory fills. OOM kill.
count = redis.incr(key)
redis.expire(key, 60)
return count <= 100
# BROKEN: Key includes User-Agent header
def rate_limit_by_ua(ip: str, user_agent: str) -> bool:
key = f"rl:{ip}:{user_agent}"
# User-Agent strings can be thousands of chars and completely arbitrary
# Attacker rotates User-Agent: "Mozilla/5.0 ... [random 200 chars]"
# Creates one new key per request. Keyspace explosion.
...
# BROKEN: Key includes arbitrary query parameters
key = f"rl:{user_id}:{request.query_string}"
# ?q=<random 1000 char string> -> unique key per requestWhy it is dangerous:
- Redis stores all keys in memory. Unbounded unique keys = unbounded memory consumption.
- An attacker can craft requests to create millions of unique keys, exhausting Redis memory.
- This is a form of resource exhaustion / DoS attack against your rate limiter.
The Fix:
import hashlib
import re
# CORRECT: Normalize and hash all variable components
def make_rate_limit_key(
user_id: str,
endpoint: str,
max_key_length: int = 200
) -> str:
# Normalize endpoint: strip query params, normalize path params
# /api/users/12345 -> /api/users/{id}
normalized_endpoint = re.sub(r'/\d+', '/{id}', endpoint)
normalized_endpoint = normalized_endpoint.split('?')[0] # Remove query string
raw_key = f"rl:{user_id}:{normalized_endpoint}"
# If still too long, hash it
if len(raw_key) > max_key_length:
key_hash = hashlib.sha256(raw_key.encode()).hexdigest()[:16]
return f"rl:h:{key_hash}"
return raw_key
# CORRECT: Use enum/fixed set for endpoint rate limiting keys
RATE_LIMITED_ENDPOINTS = {
"/api/users",
"/api/orders",
"/api/search",
"/api/export",
}
def get_endpoint_key(path: str) -> str:
# Only rate limit known, fixed endpoints - no arbitrary keys
for endpoint in RATE_LIMITED_ENDPOINTS:
if path.startswith(endpoint):
return endpoint
return "/api/other" # catch-all bucketAdditional protection:
# Redis maxmemory and eviction policy
maxmemory 2gb
maxmemory-policy allkeys-lru
# LRU eviction removes least-recently-used keys first
# Rate limit keys that haven't been used are evicted before application data
Production Impact: Can cause Redis OOM (Out of Memory) killing the process, or
triggering aggressive eviction that removes valid rate limit counters.
AP-7: The Shared Counter Spaghetti
What it looks like:
Multiple unrelated services or teams share the same Redis rate limit counters without
clear ownership, leading to accidental cross-service interference.
# Service A (user API team):
def check_limit_service_a(user_id: str) -> bool:
key = f"rl:{user_id}" # Generic key
count = redis.incr(key)
redis.expire(key, 60)
return count <= 100
# Service B (reporting team, different codebase, different team):
def check_limit_service_b(user_id: str) -> bool:
key = f"rl:{user_id}" # SAME generic key! Unintentional sharing!
count = redis.incr(key)
redis.expire(key, 60)
return count <= 50
# Now user's requests to Service A AND Service B both increment the same counter.
# User makes 60 requests to Service A -> Service B starts rejecting them (counter=60 > 50)
# User has not made a single request to Service B but is still rate limited by it.The Fix: Strict Namespacing
# Each service, team, and resource type gets its own namespace
class RateLimitKey:
@staticmethod
def user_api(user_id: str, window_id: int) -> str:
return f"rl:svc:user-api:user:{user_id}:{window_id}"
@staticmethod
def reporting(user_id: str, window_id: int) -> str:
return f"rl:svc:reporting:user:{user_id}:{window_id}"
@staticmethod
def auth_login(identifier: str, window_id: int) -> str:
return f"rl:svc:auth:login:{identifier}:{window_id}"
# Key format: rl:{service}:{entity_type}:{identifier}:{window_id}
# Service A: rl:svc:user-api:user:user123:28956
# Service B: rl:svc:reporting:user:user123:28956
# Completely independent counters. No cross-contamination.Production Impact: Extremely difficult to debug. One service's traffic silently
reduces another service's effective rate limit. Users report being rate limited when
they "barely used the API."
Business Logic Anti-Patterns
AP-8: The Zero-Context Limiter
What it looks like:
The same rate limit is applied to all users regardless of their trust level, subscription
tier, account age, or usage history.
# BROKEN: One limit for all users, always
def is_allowed(user_id: str) -> bool:
return redis.incr(f"rl:{user_id}") <= 100
# Problems:
# 1. A bot with a fresh account gets the same limit as a 3-year paying customer
# 2. An enterprise customer paying $10,000/month gets same limit as free user
# 3. A verified developer gets same limit as an anonymous scraper
# 4. A background batch job gets same limit as a user's interactive sessionThe Fix: Context-Aware Limits
// Load user context once per request (cached in Redis, not DB)
public record UserContext(
String userId,
String tier, // "free", "pro", "enterprise"
String trustLevel, // "anonymous", "new", "verified", "premium"
boolean isBot,
boolean isMachineClient
) {}
public int getEffectiveLimit(UserContext ctx) {
int baseLimit = switch (ctx.tier()) {
case "enterprise" -> 10_000;
case "pro" -> 1_000;
case "free" -> 100;
default -> 30; // anonymous
};
// Trust multiplier
double trustMultiplier = switch (ctx.trustLevel()) {
case "premium" -> 2.0;
case "verified" -> 1.0;
case "new" -> 0.5; // warm-up period
case "anonymous" -> 0.3;
default -> 0.5;
};
// Bot/machine clients get different limits
if (ctx.isBot()) return Math.max(1, (int)(baseLimit * 0.1));
if (ctx.isMachineClient()) return baseLimit * 2; // service accounts need more
return Math.max(1, (int)(baseLimit * trustMultiplier));
}AP-9: The Quota Without a Rate Limit
What it looks like:
A daily/monthly quota is enforced but no per-second rate limit protects against bursts.
Configuration:
Daily quota: 10,000 requests/day
Per-minute limit: NONE
What happens:
Attacker sends 10,000 requests in 5 seconds.
Quota enforcement allows all 10,000 (quota not yet hit).
Your database receives 2,000 RPS for 5 seconds.
Connection pool exhausts. Database falls over.
All users are impacted. Service down for 10 minutes.
The Fix: Always layer rate + quota
# Correct: Per-second burst protection + per-minute + per-day quota
class TieredLimitConfig:
free_tier = {
"per_second": 2, # burst protection
"per_minute": 30, # sustained rate
"per_hour": 500, # hourly budget
"per_day": 2_000, # daily quota
}
pro_tier = {
"per_second": 20,
"per_minute": 300,
"per_hour": 5_000,
"per_day": 50_000,
}
# ALL four must pass for the request to be allowed.
# The per-second limit is the most important for protecting infrastructure.Production Impact: Quota enforcement without rate limiting is the most common
cause of self-inflicted database outages from legitimate (but poorly written) client code.
AP-10: The Grandfathered Exemption
What it looks like:
Legacy clients or "important" customers are permanently exempted from rate limits.
This exemption is never reviewed or revisited.
# BROKEN: Permanent exemption list, grown over years
EXEMPT_API_KEYS = {
"sk_legacy_abc123", # "Big Enterprise - added 2019"
"sk_legacy_def456", # "CEO's personal project - never limit"
"sk_legacy_ghi789", # "Why is this here? Unknown - afraid to remove"
# ... 47 more entries, nobody knows what they do
}
def is_allowed(api_key: str) -> bool:
if api_key in EXEMPT_API_KEYS:
return True # Skip all rate limiting
return rate_limiter.check(api_key)Why it is dangerous:
- Exempt keys become attack targets. Leaked key = unlimited access.
- Exempt customers put the most load on your system during outages.
- Nobody knows why exemptions exist. Removing them breaks unknown things.
- Exemptions accumulate silently over years.
The Fix: Managed High-Limit Tiers
# CORRECT: No exemptions. Legitimate high-volume users get their own tier.
CUSTOMER_TIERS = {
# "Big Enterprise" now has an enterprise tier with appropriate limits
"sk_enterprise_abc": {"tier": "enterprise", "rpm": 100_000},
# "CEO's project" has its own dedicated API key with explicitly high limits
"sk_ceo_project": {"tier": "vip", "rpm": 10_000},
}
def get_limit(api_key: str) -> int:
config = CUSTOMER_TIERS.get(api_key)
if config:
return config["rpm"]
# Default tier logic
return DEFAULT_LIMITS[get_subscription_tier(api_key)]
# Every key has a limit. Limits are documented. High-volume users have high limits.
# No magic exempt list. No unknown exceptions. Full audit trail.AP-11: The Asymmetric Read-Write Limit
What it looks like:
Read endpoints have very high limits while write endpoints are tightly controlled.
Users discover they can use reads to amplify writes through side effects.
Read limit: 10,000 reads/minute
Write limit: 100 writes/minute
Attack: User creates a public resource and makes 10,000 read requests to it.
Each read increments a view counter (a write!). The view counter gets
10,000 writes/minute but only 100 were rate limited on the write endpoint.
Expensive side effect: Each read triggers a push notification to subscribers.
User reads their own resource 10,000 times. Their 50,000 subscribers each
receive 10,000 push notifications. Your push notification service is overwhelmed.
The Fix: Rate limit by impact, not by HTTP method
# Identify high-impact reads and rate limit them separately
ENDPOINT_LIMITS = {
"GET /api/products/{id}": (10_000, 60), # cheap, cached
"GET /api/analytics/report": (10, 3600), # expensive query
"GET /api/feed": (100, 60), # triggers notifications
"POST /api/comments": (30, 60), # write + notifications
"POST /api/bulk-export": (2, 3600), # very expensive
}
# Rate limit by "resource impact" not just HTTP verb
# GET /api/feed triggers the same backend work as POST /api/feed
# Both should have similar rate limitsAP-12: The Rolling Reset Surprise
What it looks like:
Rate limit windows that reset at predictable times (e.g., top of every hour) cause
user confusion and poor experience.
Scenario: User is on a 1000 requests/hour plan.
11:59 PM: User sends 1000 requests. All allowed.
11:59:30 PM: User sends 1 more request. Rejected (429). "Retry at 12:00:00"
12:00:00 AM: User rushes to send requests. THUNDERING HERD.
12:00:00 AM: 100,000 users all try to send requests. Server slammed.
Also: Users feel cheated because they "only had 30 seconds to use 1000 requests"
The Fix: Per-user jittered windows + rolling window
import hashlib
def get_user_window_offset(user_id: str, window_seconds: int) -> int:
"""
Each user has a deterministic but different window offset.
User alice: window resets at :07 seconds
User bob: window resets at :41 seconds
This spreads resets across the full window duration.
"""
user_hash = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
return user_hash % window_seconds
def get_window_id(user_id: str, window_seconds: int) -> int:
now = int(time.time())
offset = get_user_window_offset(user_id, window_seconds)
return (now - offset) // window_seconds
# Alternative: Use a TRUE sliding window (rolling window)
# "No more than 1000 requests in any 60-minute period"
# No reset points. Naturally prevents thundering herd. More fair.AP-13: The Cost-Blind Limiter
What it looks like:
Every request consumes exactly 1 token, regardless of how expensive it is.
# BROKEN: Same cost for all requests
def is_allowed(user_id: str) -> bool:
return rate_limiter.consume(user_id, cost=1)
# GET /api/users/{id} -> cost 1 (2ms, hits cache, returns 100 bytes)
# POST /api/ml/predict -> cost 1 (500ms, runs ML model, uses 4 GPUs)
# GET /api/reports/full -> cost 1 (15s query, scans 10M rows, returns 5MB)
# User does 100 requests/minute of ML predictions
# = 100 x 500ms = 50 CPU seconds/minute = 3000 CPU seconds/hour
# Same user doing 100 simple reads = 100 x 2ms = 0.2 CPU seconds/minute
# Both counted as "100 requests". Completely wrong.The Fix: Cost-proportional token consumption
ENDPOINT_COSTS = {
("GET", "/api/users/{id}"): 1,
("GET", "/api/users"): 5, # lists are more expensive
("POST", "/api/orders"): 3,
("GET", "/api/reports/full"): 50, # expensive query
("POST", "/api/ml/predict"): 100, # GPU compute
("POST", "/api/bulk"): None, # special: cost = len(items)
}
def get_request_cost(method: str, path: str, body: dict = None) -> int:
normalized = normalize_path(path) # /api/users/123 -> /api/users/{id}
cost = ENDPOINT_COSTS.get((method, normalized), 1)
if cost is None: # bulk operation
items = body.get("items", []) if body else []
cost = max(1, len(items))
return cost
def is_allowed(user_id: str, method: str, path: str, body: dict = None) -> bool:
cost = get_request_cost(method, path, body)
return token_bucket.consume(user_id, cost=cost)Operational Anti-Patterns
AP-14: The One Environment for All
What it looks like:
The same rate limit values are used in production, staging, and development.
# BROKEN: Single config used in all environments
rate_limit:
per_minute: 30 # Production value - very restrictive
per_hour: 500
# Development consequence:
# Developer runs integration tests: 50 requests in 2 seconds -> rate limited
# Developer can't run performance tests without hitting limits
# CI/CD pipeline tests start failing intermittently from rate limits
# Developer spends hours debugging "why is CI slow?" -> it's rate limitedThe Fix:
# CORRECT: Per-environment configuration
environments:
production:
rate_limit:
per_second: 10
per_minute: 100
per_hour: 2_000
enforce: true
staging:
rate_limit:
per_second: 100 # 10x higher for test automation
per_minute: 1_000
per_hour: 20_000
enforce: true # Still enforce (so staging tests are realistic)
development:
rate_limit:
per_second: 10_000 # Effectively unlimited for dev
per_minute: 100_000
enforce: false # Can disable entirely in dev
dry_run: true # Log what would be limited
test:
rate_limit:
enforce: false # Never rate limit in unit testsAP-15: The Unmonitored Limiter
What it looks like:
Rate limiting is deployed with no metrics, no alerting, and no visibility into behavior.
What you cannot see:
- How many requests are being rate limited per second?
- Which users are hitting limits most frequently?
- What percentage of traffic is being rejected?
- Is the rate limiter contributing to latency?
- When Redis goes down (and rate limiting fails open), do you know?
- Are limits too tight (blocking legitimate users)?
- Are limits too loose (not actually protecting the system)?
Answer: If you have no metrics, you have no answer to any of these.
The Fix: Minimum viable rate limit observability
from prometheus_client import Counter, Histogram, Gauge
import functools
# Metrics
rl_requests_total = Counter(
"rate_limit_requests_total",
"Total rate limit checks",
["endpoint_group", "result", "tier"]
)
rl_redis_latency = Histogram(
"rate_limit_redis_duration_seconds",
"Redis rate limit check latency",
buckets=[0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25]
)
rl_redis_errors = Counter(
"rate_limit_redis_errors_total",
"Redis errors in rate limiter",
["error_type"]
)
def instrumented_rate_limit_check(identifier, endpoint, tier, limiter):
import time
start = time.time()
try:
result = limiter.is_allowed(identifier)
duration = time.time() - start
rl_redis_latency.observe(duration)
rl_requests_total.labels(
endpoint_group=endpoint,
result="allowed" if result["allowed"] else "denied",
tier=tier
).inc()
return result
except redis.RedisError as e:
rl_redis_errors.labels(error_type=type(e).__name__).inc()
# Fail open and return a metric-tracked result
return {"allowed": True, "fallback": True}Minimum alerting rules:
Alert: rl_denied_rate > 5% of total traffic for 5 minutes
-> Limits may be too tight or there's an attack
Alert: rl_redis_errors_total increases > 0 for 30 seconds
-> Rate limiter falling back to fail-open mode
Alert: rl_redis_latency p99 > 50ms for 2 minutes
-> Redis is slow, affecting API response times
Alert: rl_denied_rate = 0 while traffic is high for 5 minutes
-> Rate limiter may have silently stopped working
AP-16: The Stale Configuration
What it looks like:
Rate limits are set once during initial deployment and never revisited as the system grows.
Year 1: Service launched. 1,000 users. Limit set at 100 RPM (server can handle 50K RPM).
Year 2: 10,000 users. 100 RPM is still fine. Nobody thinks about it.
Year 3: 100,000 users. Server upgraded. Limit still 100 RPM.
Year 3.5: Competitor launches. Your users demand features that require more API calls.
Year 3.5: Users complain about rate limits. Churn increases. Developers switch platforms.
Year 4: Emergency: "Raise all limits 10x immediately."
No impact analysis. No testing. Service degrades within hours.
The Fix: Rate Limit Review Process
# Rate limit configuration as code (not hardcoded, reviewed and versioned)
RATE_LIMITS_VERSION = "2026-Q2"
RATE_LIMITS_LAST_REVIEWED = "2026-04-01"
RATE_LIMITS_NEXT_REVIEW = "2026-07-01"
# Annotated with reasoning (reviewable in code review)
TIER_LIMITS = {
"free": {
"per_minute": 60, # Set based on: p95 free user needs 30 RPM (2x buffer)
"per_day": 1_000, # Set based on: free tier conversion threshold research
"rationale": "Lower limits encourage upgrade. 60 RPM supports typical use cases.",
"last_load_test": "2026-03-15",
"max_server_capacity_rpm": 500_000,
}
}
# Quarterly review checklist:
# [ ] Are any users consistently hitting limits? (check: rate_limit_utilization > 90%)
# [ ] Did server capacity change?
# [ ] Did typical usage patterns change?
# [ ] What is the P95 usage per tier?
# [ ] Are competitors offering higher limits?AP-17: The Incident Blackout
What it looks like:
No mechanism exists to quickly disable, loosen, or modify rate limits during an incident.
Scenario: A critical bug in your mobile app causes ALL users to send 10x normal requests.
Rate limiter is now rejecting 90% of legitimate requests.
Users cannot use the app. Support queue overflowing.
The right response: "Temporarily raise limits 10x while we fix the bug."
The actual response:
"The limit is hardcoded in application.properties"
"We need to redeploy to change it"
"Deployment takes 45 minutes"
"Our change approval process takes 2 hours"
"So users will be blocked for ~3 hours while we fix a bug"
The Fix: Dynamic Configuration
class DynamicRateLimiter:
"""
Rate limits loaded from Redis config store, refreshed every 30s.
Limits can be changed in production without redeployment.
"""
def __init__(self, r: redis.Redis, defaults: dict):
self.r = r
self.defaults = defaults
self._config_cache = {}
self._cache_ttl = 30 # refresh config every 30 seconds
self._last_refresh = 0
def _get_limit(self, tier: str, window: str) -> int:
now = time.time()
if now - self._last_refresh > self._cache_ttl:
self._refresh_config()
config_key = f"rl:config:{tier}:{window}"
cached = self._config_cache.get(config_key)
if cached is not None:
return cached
return self.defaults.get(tier, {}).get(window, 100)
def _refresh_config(self):
try:
# Config stored in Redis as JSON
config = self.r.get("rl:global_config")
if config:
self._config_cache = json.loads(config)
self._last_refresh = time.time()
except redis.RedisError:
pass # Keep using cached config
# To change limits during an incident (no deployment needed):
# redis-cli SET rl:global_config '{"free":{"per_minute":600},"pro":{"per_minute":6000}}'
# Takes effect within 30 seconds on all instances.
# To revert: redis-cli SET rl:global_config '{"free":{"per_minute":60},...}'AP-18: The Cascading Quota Drain
What it looks like:
One team's automated job consumes another team's API quota without anyone realizing it.
Team A (Data Science): Runs nightly ML training job at 2 AM.
Training job calls the internal "Data API" 50,000 times per run.
Team B (Product): Their users call the same "Data API" during business hours.
Users share a global 100,000/day quota with Team A.
Result:
2 AM - 4 AM: Team A's job consumes 50,000 of 100,000 daily quota
9 AM - 5 PM: Team B users only have 50,000 remaining
4 PM: Team B users start getting rate limited
Team B files incident: "Data API is broken"
Root cause: Team A's job consumed the shared quota
The Fix: Tenant-Isolated Quotas
# CORRECT: Each team/service/use-case has its own quota bucket
QUOTA_BUCKETS = {
"user_interactive": 100_000, # per day - human users
"ml_training_batch": 500_000, # per day - batch jobs
"api_team_a": 200_000, # per day - Team A's services
"api_team_b": 200_000, # per day - Team B's services
"background_jobs": 300_000, # per day - all background processing
}
def get_quota_bucket(caller_context: dict) -> str:
if caller_context.get("is_batch_job"):
return "ml_training_batch"
if caller_context.get("team") == "team_a":
return "api_team_a"
if caller_context.get("is_human_session"):
return "user_interactive"
return "background_jobs"
# Now Team A's training job can never affect Team B's users.
# Each bucket has its own Redis counter.Client-Side Anti-Patterns
AP-19: The Aggressive Poller
What it looks like:
A client polls an endpoint at maximum rate regardless of whether data has changed.
# BROKEN: Poll every second, read rate limit headers but ignore them
while True:
response = requests.get("https://api.example.com/events")
events = response.json()
process(events)
time.sleep(1) # Poll every second regardless of response
# This sends 3,600 requests/hour just for polling.
# 90% of responses are probably empty (no new events).
# Wastes 90% of the rate limit quota on empty polls.The Fix: Smart Polling or WebSockets
# OPTION 1: Conditional requests with ETag/Last-Modified
class SmartPoller:
def __init__(self):
self.etag = None
self.last_modified = None
self.poll_interval = 5 # seconds
def poll(self):
while True:
headers = {}
if self.etag:
headers["If-None-Match"] = self.etag
if self.last_modified:
headers["If-Modified-Since"] = self.last_modified
response = requests.get("https://api/events", headers=headers)
if response.status_code == 304:
# Not Modified - no new data, no rate limit cost (some APIs)
self.poll_interval = min(60, self.poll_interval * 2) # back off
elif response.status_code == 200:
self.etag = response.headers.get("ETag")
self.last_modified = response.headers.get("Last-Modified")
process(response.json())
self.poll_interval = 5 # reset to base interval
elif response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 60))
time.sleep(retry_after)
continue
time.sleep(self.poll_interval)
# OPTION 2: Use webhooks instead of polling (zero rate limit cost for polling)
# OPTION 3: Long-polling (one request waits up to 30s for new data - much more efficient)
# OPTION 4: WebSocket (one persistent connection, server pushes data)AP-20: The Fan-Out Bomb
What it looks like:
One user action triggers hundreds or thousands of downstream API calls,
each counted against the user's rate limit or the system's capacity.
# BROKEN: Processing 1000 orders one at a time in a loop
def process_order_batch(order_ids: list[str]):
for order_id in order_ids: # 1000 iterations
order = inventory_api.get_order(order_id) # 1 API call per order
customer = customer_api.get_customer(order.customer_id) # 1 API call per order
shipping = shipping_api.calculate(order) # 1 API call per order
# Total: 3000 API calls for 1000 orders
# If rate limit is 100 calls/minute, this takes 30 minutes
# Meanwhile, the user's rate limit is exhausted for interactive requestsThe Fix: Batch and async
# CORRECT: Use bulk endpoints to collapse N calls into 1
def process_order_batch_smart(order_ids: list[str]):
# Batch fetch: 1 API call for 1000 orders instead of 1000 calls
orders = inventory_api.get_orders_bulk(order_ids) # 1 call
customer_ids = [o.customer_id for o in orders]
customers = customer_api.get_customers_bulk(customer_ids) # 1 call
shippings = shipping_api.calculate_bulk(orders) # 1 call
# Total: 3 API calls for 1000 orders. 1000x more efficient.
# CORRECT: Run in background with its own rate limit budget
def submit_batch_job(order_ids: list[str]):
# Submit to a queue for async processing
# Background job has its own rate limit bucket (not shared with user's interactive quota)
background_queue.submit(
job_type="process_orders",
data={"order_ids": order_ids},
rate_limit_bucket="batch_processing" # separate quota
)
return {"job_id": "...", "status": "queued"}AP-21: The Missing Cache Layer
What it looks like:
Client repeatedly calls the API for data that rarely changes, consuming rate limit quota
on unnecessarily repeated requests.
# BROKEN: Fetch user profile on every request to show in navbar
@app.route("/api/dashboard")
def dashboard():
# This makes 1 API call per page load per user
# A user loading the dashboard 10 times per hour = 10 API calls
# Just to show their name and avatar (which never change)
user_profile = external_api.get_user_profile(user_id)
data = get_dashboard_data(user_id)
return render(user_profile, data)The Fix: Cache stable data
from functools import lru_cache
from cachetools import TTLCache
import time
# In-process cache for data that changes rarely
_profile_cache = TTLCache(maxsize=10_000, ttl=300) # 5-minute TTL
def get_user_profile_cached(user_id: str) -> dict:
if user_id in _profile_cache:
return _profile_cache[user_id]
profile = external_api.get_user_profile(user_id)
_profile_cache[user_id] = profile
return profile
# User loads dashboard 10 times in 5 minutes: 1 API call (first load), 9 cache hits
# Rate limit usage: 90% reduction
# Response time: also faster (no API round trip for cached requests)AP-22: The Synchronous Bulk Stampede
What it looks like:
Processing a large batch synchronously, exhausting rate limits and blocking the calling thread.
# BROKEN: Synchronous bulk processor blocks for minutes
def migrate_10000_users(user_ids: list[str]):
for user_id in user_ids:
try:
api.update_user(user_id, {"migrated": True})
except RateLimitError as e:
time.sleep(e.retry_after) # Block the thread for minutes
api.update_user(user_id, {"migrated": True})
# Problems:
# 1. Thread is blocked for potentially hours
# 2. If thread is killed, migration restarts from beginning (no checkpointing)
# 3. All rate limit tokens consumed, blocking other operations
# 4. No progress visibility for the user who submitted the jobThe Fix: Async batch processing with checkpointing
import asyncio
import aiohttp
from dataclasses import dataclass
@dataclass
class BatchProgress:
total: int
processed: int
failed: list[str]
checkpoint_key: str # Redis key for resumable progress
async def migrate_users_async(user_ids: list[str], rate_limit: int = 10):
"""
Async batch migration with:
- Non-blocking rate limiting (no sleep() on the main thread)
- Checkpointing (resumable if interrupted)
- Progress tracking
- Exponential backoff on rate limit errors
"""
semaphore = asyncio.Semaphore(rate_limit) # max N concurrent requests
progress = BatchProgress(
total=len(user_ids),
processed=0,
failed=[],
checkpoint_key=f"migration:progress:{int(time.time())}"
)
async def process_one(session: aiohttp.ClientSession, user_id: str, attempt: int = 0):
async with semaphore:
try:
async with session.patch(f"/api/users/{user_id}",
json={"migrated": True}) as resp:
if resp.status == 429:
retry_after = int(resp.headers.get("Retry-After", 1))
if attempt < 5:
await asyncio.sleep(retry_after * (2 ** attempt))
return await process_one(session, user_id, attempt + 1)
progress.failed.append(user_id)
elif resp.status == 200:
progress.processed += 1
# Checkpoint: save progress to Redis
if progress.processed % 100 == 0:
redis.set(progress.checkpoint_key, progress.processed)
except Exception as e:
progress.failed.append(user_id)
async with aiohttp.ClientSession() as session:
tasks = [process_one(session, uid) for uid in user_ids]
await asyncio.gather(*tasks)
return progressSecurity Anti-Patterns
AP-23: The Predictable Window Attack Surface
What it looks like:
Fixed window rate limits with predictable reset times that attackers exploit.
Configuration: 100 requests/minute, window resets at :00 seconds each minute
Attacker knows: "At 12:00:00, the counter resets. I can send 100 requests."
Attack pattern:
11:59:58 - 11:59:59: Send 100 requests (end of window 1, all allowed)
12:00:00 - 12:00:01: Send 100 requests (start of window 2, all allowed)
Result: 200 requests in 3 seconds, 67x the intended 3 requests/3 seconds.
More sophisticated: Automate this. Send exactly 100 requests at 23 seconds into
each minute window (giving 37 seconds before the next window for the attack burst).
200 effective requests per minute sustained indefinitely.
The Fix: Rolling window + jitter
# CORRECT: True sliding window or user-specific window offsets
class SecureRateLimiter:
def __init__(self, r, limit: int, window: int):
self.r = r
self.limit = limit
self.window = window
def get_user_window_offset(self, user_id: str) -> int:
"""Deterministic offset: same user always gets same offset, but different per user."""
import hashlib
h = int(hashlib.sha256(user_id.encode()).hexdigest(), 16)
return h % self.window
def is_allowed(self, user_id: str) -> bool:
now = int(time.time())
offset = self.get_user_window_offset(user_id)
# Shift the window start by the user's offset
window_id = (now - offset) // self.window
key = f"rl:secure:{user_id}:{window_id}"
count = self.r.incr(key)
if count == 1:
self.r.expire(key, self.window * 2)
return count <= self.limit
# User alice: window resets at :07 of each minute (offset=7)
# User bob: window resets at :41 of each minute (offset=41)
# Attacker targeting one user still hits window boundaries, but cannot exploit
# them system-wide because every user has different boundaries.AP-24: The Shared API Key Bypass
What it looks like:
Multiple users or services share a single API key. One heavy consumer exhausts the key's
rate limit, blocking all others sharing it.
Scenario: A team of 10 developers all use the same API key for testing.
Developer 1 runs a load test: 1000 requests in 1 minute.
Rate limit: 100 requests/minute per key.
Result: Developer 1 triggers rate limiting for all 10 developers.
Developers 2-10 cannot use the API until next minute.
Worse scenario: A shared key is leaked. Attacker uses it.
Rate limit blocks all legitimate users of the shared key.
Security team must rotate the key - but that breaks all 10 developers.
The Fix: One key per user/service, enforced
# Key provisioning system that enforces one-key-per-principal
class APIKeyManager:
def provision_key(
self,
user_id: str,
purpose: str,
tier: str
) -> str:
# Check: does this user already have a key for this purpose?
existing = self.db.get_key_by_user_and_purpose(user_id, purpose)
if existing:
raise ValueError(
f"User {user_id} already has a key for {purpose}. "
f"Use the existing key or revoke it first."
)
# Generate new key
key = secrets.token_urlsafe(32)
key_hash = hashlib.sha256(key.encode()).hexdigest()
self.db.store_key(key_hash, user_id, purpose, tier)
return key # Return raw key once, never store it
def rate_limit_key(self, api_key: str) -> dict:
key_hash = hashlib.sha256(api_key.encode()).hexdigest()
metadata = self.db.get_key_metadata(key_hash)
if not metadata:
raise InvalidKeyError("Unknown API key")
# Rate limit by the KEY HASH (not user_id - each key has independent limits)
return self.limiter.check(f"rl:apikey:{key_hash}", metadata["tier"])AP-25: The Unvalidated Forwarded IP
What it looks like:
Blindly trusting X-Forwarded-For without verifying it comes from a trusted proxy.
# BROKEN: Trust any X-Forwarded-For header blindly
def get_client_ip(request) -> str:
return request.headers.get("X-Forwarded-For", request.remote_addr).split(",")[0].strip()
# Attack:
# Normal request: X-Forwarded-For: 203.0.113.1 (attacker's real IP)
# After 100 requests, rate limited.
# Attacker now sends: X-Forwarded-For: 8.8.8.8 (Google's DNS IP - trusted, never rate limited)
# Or: X-Forwarded-For: 127.0.0.1 (localhost - whitelisted)
# Result: Attacker bypasses IP-based rate limiting completely by spoofing the header.The Fix: Validate proxy chain
# CORRECT: Only trust X-Forwarded-For if the immediate sender is a known proxy
TRUSTED_PROXY_RANGES = [
"10.0.0.0/8", # internal network
"172.16.0.0/12", # internal network
"192.168.0.0/16", # internal network
"100.64.0.0/10", # Cloudflare's IP range (verify in Cloudflare docs)
]
import ipaddress
def get_real_client_ip(request) -> str:
"""
Get the real client IP.
Only trust X-Forwarded-For if the connection came from a trusted proxy.
Otherwise, use the actual connection IP.
"""
connection_ip = request.remote_addr
connection_ipobj = ipaddress.ip_address(connection_ip)
# Check if the connection is from a trusted proxy
from_trusted_proxy = any(
connection_ipobj in ipaddress.ip_network(range_)
for range_ in TRUSTED_PROXY_RANGES
)
if from_trusted_proxy:
# Trust the X-Forwarded-For header (but take the LAST untrusted IP, not the first)
forwarded_for = request.headers.get("X-Forwarded-For", "")
ips = [ip.strip() for ip in forwarded_for.split(",")]
# Walk from right to left, find the first non-trusted IP
for ip in reversed(ips):
try:
ip_obj = ipaddress.ip_address(ip)
if not any(ip_obj in ipaddress.ip_network(r) for r in TRUSTED_PROXY_RANGES):
return ip
except ValueError:
continue
# Not from trusted proxy: use connection IP directly
return connection_ipSummary: Anti-Pattern Quick Reference
| Anti-Pattern | Risk Level | Root Cause | Primary Fix |
|---|---|---|---|
| In-Process Island | Critical | No shared state in clusters | Use Redis |
| Connection Pool Killer | Critical | New connections per request | Module-level pool |
| Oversized Lua Script | Critical | Complex Lua blocks Redis | Keep scripts simple |
| Missing Hash Tag | High | Redis Cluster routing | Use {user_id} hash tags |
| TTL Roulette | High | Inconsistent key expiry | Lua atomic + window*2 |
| Unbounded Key Space | High | User-controlled key parts | Normalize + hash |
| Shared Counter Spaghetti | High | No namespace strategy | Service-scoped namespaces |
| Zero-Context Limiter | High | Same limit for all users | Tier + trust-based limits |
| Quota Without Rate Limit | High | Only daily/monthly limits | Add per-second limit always |
| Grandfathered Exemption | Medium | Accumulating exceptions | Managed tiers instead |
| Asymmetric Read-Write | Medium | HTTP method bias | Limit by impact, not method |
| Rolling Reset Surprise | Medium | Predictable window resets | Per-user jitter / sliding window |
| Cost-Blind Limiter | Medium | Flat cost per request | Cost-proportional tokens |
| One Environment for All | Medium | No env-specific config | Per-environment limits |
| Unmonitored Limiter | Medium | No observability | Metrics + alerting |
| Stale Configuration | Medium | Set-and-forget policy | Quarterly review process |
| Incident Blackout | High | No dynamic config | Redis-backed dynamic limits |
| Cascading Quota Drain | High | Shared quota across teams | Tenant-isolated quotas |
| Aggressive Poller | Medium | Inefficient client design | ETags / webhooks |
| Fan-Out Bomb | High | Loop over individual calls | Bulk APIs + async processing |
| Missing Cache Layer | Medium | No client-side caching | TTL cache for stable data |
| Synchronous Bulk Stampede | Medium | Blocking batch processing | Async + checkpointing |
| Predictable Window Attack | High | Fixed window + no jitter | Sliding window + user offset |
| Shared API Key Bypass | High | Shared credentials | One key per principal |
| Unvalidated Forwarded IP | Critical (security) | Trusting user input | Validate proxy chain |
Next Supplement: Supplement 2 - Production Challenges