← Back to Articles
6/6/2026Admin Post

rate limiting supplement1 antipatterns extended

Rate Limiting - Supplement 1: Anti-Patterns Extended Deep Dive

Series Navigation:
Main Index |
Part 5 - Advanced (12 original anti-patterns) |
Supplement 2 - Production Challenges |
Supplement 3 - Trade-Offs and Decision Guide |
Supplement 4 - Architecture Patterns

This supplement extends Part 5 with 25 additional anti-patterns not covered earlier.
Each includes: the pattern name, how it manifests, why it is dangerous,
a broken code example, a fixed example, and production impact.


Table of Contents

Infrastructure Anti-Patterns

  1. The In-Process Island
  2. The Connection Pool Killer
  3. The Oversized Lua Script
  4. The Missing Hash Tag
  5. The TTL Roulette
  6. The Unbounded Key Space
  7. The Shared Counter Spaghetti

Business Logic Anti-Patterns 8. The Zero-Context Limiter 9. The Quota Without a Rate Limit 10. The Grandfathered Exemption 11. The Asymmetric Read-Write Limit 12. The Rolling Reset Surprise 13. The Cost-Blind Limiter

Operational Anti-Patterns 14. The One Environment for All 15. The Unmonitored Limiter 16. The Stale Configuration 17. The Incident Blackout 18. The Cascading Quota Drain

Client-Side Anti-Patterns 19. The Aggressive Poller 20. The Fan-Out Bomb 21. The Missing Cache Layer 22. The Synchronous Bulk Stampede

Security Anti-Patterns 23. The Predictable Window Attack Surface 24. The Shared API Key Bypass 25. The Unvalidated Forwarded IP


Infrastructure Anti-Patterns


AP-1: The In-Process Island

What it looks like:
Each application server maintains its own in-memory rate limit counters. No shared state.

# BROKEN: In-memory limiter on a clustered deployment
from collections import defaultdict
import time
 
# This lives in process memory. Each of your 10 servers has its OWN copy.
_counters = defaultdict(int)
_windows  = defaultdict(float)
 
def is_allowed(user_id: str, limit: int = 100) -> bool:
    now = time.time()
    if now - _windows[user_id] >= 60:
        _counters[user_id] = 0
        _windows[user_id] = now
    _counters[user_id] += 1
    return _counters[user_id] <= limit

Why it is dangerous:
With 10 application servers and round-robin load balancing, each server sees roughly 1/10
of traffic. A user can make 1,000 requests per minute (10x the limit) by spreading them
across servers and being within limit on each individual server.

User's actual requests: 1000/min
Server A sees: 100/min -> ALLOW (within limit)
Server B sees: 100/min -> ALLOW (within limit)
...
Server J sees: 100/min -> ALLOW (within limit)
Effective enforcement: 0% - completely bypassed

The Fix:
Use Redis as the shared counter store. This is the most important requirement for any
rate limiter in a horizontally-scaled system.

import redis
import time
 
r = redis.Redis(host="redis-cluster", decode_responses=True)
 
def is_allowed(user_id: str, limit: int = 100, window: int = 60) -> bool:
    now = int(time.time())
    window_id = now // window
    key = f"rl:{user_id}:{window_id}"
 
    count = r.incr(key)
    if count == 1:
        r.expire(key, window * 2)
    return count <= limit
 
# Now all 10 servers increment the SAME Redis key.
# User sees exactly the intended limit regardless of which server handles them.

Production Impact: High. In-process rate limiters are one of the most common
rate limiting mistakes in production. Discovered only during load testing or after an incident.


AP-2: The Connection Pool Killer

What it looks like:
Rate limiter creates a new Redis connection per request, or uses a pool so small that
it exhausts under load.

# BROKEN: New connection per request
def is_allowed(user_id: str) -> bool:
    r = redis.Redis(host="redis")  # NEW CONNECTION EVERY REQUEST
    count = r.incr(f"rl:{user_id}")
    r.expire(f"rl:{user_id}", 60)
    r.close()
    return count <= 100
 
# BROKEN: Lambda-style code with same problem
# AWS Lambda can spawn 1000+ concurrent instances
# Each instance tries to create a new Redis connection
# Redis default max connections = 10,000
# 1000 Lambda instances x 5 connections each = 5,000 connections used
# Plus your other services = connection exhaustion

Why it is dangerous:

  • Each TCP connection has overhead: memory, file descriptor, TLS handshake
  • Connection establishment takes 1-10ms (defeats the purpose of a fast rate check)
  • Redis has a connection limit (default 10,000). Exhausting it makes Redis unavailable
    for ALL services, not just the rate limiter

The Fix:

# CORRECT: Connection pool, created once at module load
import redis
 
# Created ONCE at startup, reused across all requests in this process
_pool = redis.ConnectionPool(
    host="redis",
    port=6379,
    db=1,            # separate DB for rate limiting
    max_connections=20,          # per-instance pool size
    socket_timeout=0.5,          # 500ms timeout - fail fast
    socket_connect_timeout=0.5
)
_client = redis.Redis(connection_pool=_pool, decode_responses=True)
 
 
def is_allowed(user_id: str) -> bool:
    # _client is reused - borrows from pool, returns when done
    count = _client.incr(f"rl:{user_id}")
    if count == 1:
        _client.expire(f"rl:{user_id}", 60)
    return count <= 100

For AWS Lambda / Serverless:

# Lambda: connection pool per Lambda container (not per invocation)
# Lambda containers are reused across invocations - module-level code runs once
import redis
import os
 
# This runs once per Lambda CONTAINER, not once per invocation
_r = redis.Redis(
    host=os.environ["REDIS_HOST"],
    max_connections=5,    # Keep small: Lambda can have many containers
    socket_timeout=0.3    # Very aggressive timeout for Lambda
)
 
def handler(event, context):
    # _r is reused from module-level initialization
    result = _r.incr("rl:key")
    ...

Production Impact: Critical in Lambda/containerized environments. Can silently
exhaust Redis connections for all services.


AP-3: The Oversized Lua Script

What it looks like:
A Lua script that does too much work, blocking Redis's single-threaded execution for
an excessive amount of time.

-- BROKEN: Lua script that iterates over ALL user keys to compute totals
-- This is pathological but variants appear in production
local pattern = "rl:user:*"
local cursor = "0"
local total = 0
 
repeat
    local result = redis.call("SCAN", cursor, "MATCH", pattern, "COUNT", 100)
    cursor = result[1]
    local keys = result[2]
    for _, key in ipairs(keys) do
        local val = tonumber(redis.call("GET", key) or "0")
        total = total + val
    end
until cursor == "0"
 
return total
-- This script runs for potentially SECONDS. Redis is blocked the entire time.
-- All other clients wait. p99 latency explodes.

Why it is dangerous:
Redis is single-threaded. A Lua script runs atomically - no other command can execute
while the script is running. A script that takes 100ms effectively freezes Redis for
all clients during that time.

Redis has a lua-time-limit (default 5000ms). If a script exceeds it, Redis starts
rejecting new commands with BUSY errors. Scripts running longer than this are killed
(if SCRIPT KILL is issued).

The Fix:

-- CORRECT: Lua script does only what is necessary for ONE rate limit check
-- Keep scripts under 1ms execution time
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local window_id = math.floor(now / window)
local k = key .. ':' .. window_id
local count = redis.call('INCR', k)
if count == 1 then
    redis.call('EXPIRE', k, window * 2)
end
if count <= limit then
    return {1, count, limit - count}
end
return {0, count, 0}
-- This runs in microseconds. No scanning. No iteration. Just 2-3 commands.

Rules for Lua Scripts in Rate Limiters:

  • No SCAN, no KEYS (both are O(n) on keyspace)
  • No loops over variable-length data
  • No external calls (no HTTP from Lua)
  • Test under load to verify < 1ms execution
  • Monitor slow log: CONFIG SET slowlog-log-slower-than 1000 (1ms threshold)

Production Impact: Can take down Redis for all services. One slow script = total
outage for all Redis-dependent systems.


AP-4: The Missing Hash Tag

What it looks like:
Multi-key Lua scripts in Redis Cluster where keys land on different shards.

-- BROKEN: Two keys that may be on different Redis Cluster shards
local curr_key = KEYS[1]   -- "rl:sw:user123:curr" -> shard 7
local prev_key = KEYS[2]   -- "rl:sw:user123:prev" -> shard 12
 
-- ERROR: redis.call('GET', prev_key) fails if prev_key is on a different shard
-- Redis Cluster will return: CROSSSLOT Keys in request don't hash to the same slot
local curr = tonumber(redis.call('GET', curr_key) or '0')
local prev = tonumber(redis.call('GET', prev_key) or '0')

Why it fails:
Redis Cluster assigns keys to hash slots: slot = CRC16(key) % 16384. Keys on different
slots go to different shards. A Lua script cannot access keys on different shards.

"rl:sw:user123:curr"  -> CRC16 = 34891 -> slot 5099 -> shard 2
"rl:sw:user123:prev"  -> CRC16 = 28143 -> slot 3887 -> shard 1
Different shards! Script fails with CROSSSLOT error.

The Fix: Hash Tags

-- CORRECT: Use {} hash tags to force both keys to the same slot
-- Only the content within {} is used for hash slot calculation
local curr_key = KEYS[1]   -- "rl:sw:{user123}:curr" -> slot = CRC16("user123") % 16384
local prev_key = KEYS[2]   -- "rl:sw:{user123}:prev" -> slot = CRC16("user123") % 16384
-- Both use "user123" for slotting -> guaranteed same shard -> Lua script works!
 
local curr = tonumber(redis.call('GET', curr_key) or '0')
local prev = tonumber(redis.call('GET', prev_key) or '0')
# Python: generate keys with hash tags
def make_rate_limit_keys(user_id: str, window_seconds: int, now: int) -> tuple[str, str]:
    window_id = now // window_seconds
    # Curly braces around user_id = hash tag
    curr_key = f"rl:sw:{{{user_id}}}:{window_id}"
    prev_key = f"rl:sw:{{{user_id}}}:{window_id - 1}"
    return curr_key, prev_key
 
# "rl:sw:{user123}:28956"
# "rl:sw:{user123}:28955"
# Both hash on "user123" -> always same Redis Cluster shard

Production Impact: Silent failure in Redis Cluster. Scripts work in standalone
Redis (no sharding) but break immediately when deployed to Cluster. Often discovered
only when the system is first deployed to a Cluster.


AP-5: The TTL Roulette

What it looks like:
Inconsistent, missing, or incorrectly calculated TTLs on rate limit keys.

# BROKEN: Pattern 1 - TTL set only conditionally
count = redis.incr(key)
if count == 1:
    redis.expire(key, 60)
# PROBLEM: If the process crashes between INCR and EXPIRE,
# the key has no TTL and lives FOREVER. One crashed process = permanent key.
 
# BROKEN: Pattern 2 - TTL shorter than the window
count = redis.incr(key)
redis.expire(key, 30)  # Window is 60s, TTL is 30s!
# After 30s, key disappears. Counter resets mid-window.
# Users can make 2x the limit: 100 requests in first 30s + 100 more after reset.
 
# BROKEN: Pattern 3 - TTL too long, wasting memory
count = redis.incr(key)
redis.expire(key, 86400)  # 1 day TTL for a 60-second window
# Key stays in Redis for 24 hours after user stops making requests.
# With 1M users: 1M stale keys consuming memory for 24 hours unnecessarily.
 
# BROKEN: Pattern 4 - No TTL at all (memory leak)
count = redis.incr(key)
# No expire call at all. Keys accumulate forever. Redis memory grows until OOM.

The Fix:

# CORRECT: Atomic pipeline, TTL = window_size * 2 (buffer for previous window)
def is_allowed(r, key: str, limit: int, window: int) -> bool:
    now = int(time.time())
    window_id = now // window
    full_key = f"{key}:{window_id}"
    ttl = window * 2  # Keep key for 2 windows (needed by sliding window counter)
 
    pipe = r.pipeline()
    pipe.incr(full_key)
    pipe.expire(full_key, ttl)  # Set EVERY time, not just on count==1
    results = pipe.execute()
 
    count = results[0]
    return count <= limit
 
# Using SET with NX and EX for guaranteed atomic TTL setting:
def is_allowed_atomic(r, key: str, limit: int, window: int) -> bool:
    now = int(time.time())
    window_id = now // window
    full_key = f"{key}:{window_id}"
    ttl = window * 2
 
    # Lua script: atomic check-set-expire
    script = """
local count = redis.call('INCR', KEYS[1])
redis.call('EXPIRE', KEYS[1], tonumber(ARGV[1]))
return count
"""
    count = r.eval(script, 1, full_key, ttl)
    return int(count) <= limit

TTL Rules:

  • Sliding window counter: TTL = window_size * 2 (need previous window)
  • Fixed window: TTL = window_size + 10 (small buffer)
  • Token bucket: TTL = capacity / refill_rate * 2 + 60 (time to refill from empty)
  • Always set TTL on every request (not just on count==1) to handle process crashes

Production Impact: High. Memory leaks accumulate silently. Incorrect TTLs cause
incorrect rate limiting that is extremely hard to debug.


AP-6: The Unbounded Key Space

What it looks like:
Rate limit keys are created based on arbitrary user input with no bounds on uniqueness.

# BROKEN: Key includes arbitrary user-supplied path
def rate_limit_by_path(ip: str, path: str) -> bool:
    # path comes from the URL - user controlled
    key = f"rl:{ip}:{path}"
    # Attacker sends: GET /api/xxxxxxxxxxxxxxxxxxxxxxxxxxx (random 500-char path)
    # Creates millions of unique keys, one per random path
    # Redis keyspace explodes. Memory fills. OOM kill.
    count = redis.incr(key)
    redis.expire(key, 60)
    return count <= 100
 
# BROKEN: Key includes User-Agent header
def rate_limit_by_ua(ip: str, user_agent: str) -> bool:
    key = f"rl:{ip}:{user_agent}"
    # User-Agent strings can be thousands of chars and completely arbitrary
    # Attacker rotates User-Agent: "Mozilla/5.0 ... [random 200 chars]"
    # Creates one new key per request. Keyspace explosion.
    ...
 
# BROKEN: Key includes arbitrary query parameters
key = f"rl:{user_id}:{request.query_string}"
# ?q=<random 1000 char string> -> unique key per request

Why it is dangerous:

  • Redis stores all keys in memory. Unbounded unique keys = unbounded memory consumption.
  • An attacker can craft requests to create millions of unique keys, exhausting Redis memory.
  • This is a form of resource exhaustion / DoS attack against your rate limiter.

The Fix:

import hashlib
import re
 
# CORRECT: Normalize and hash all variable components
def make_rate_limit_key(
    user_id: str,
    endpoint: str,
    max_key_length: int = 200
) -> str:
    # Normalize endpoint: strip query params, normalize path params
    # /api/users/12345 -> /api/users/{id}
    normalized_endpoint = re.sub(r'/\d+', '/{id}', endpoint)
    normalized_endpoint = normalized_endpoint.split('?')[0]  # Remove query string
 
    raw_key = f"rl:{user_id}:{normalized_endpoint}"
 
    # If still too long, hash it
    if len(raw_key) > max_key_length:
        key_hash = hashlib.sha256(raw_key.encode()).hexdigest()[:16]
        return f"rl:h:{key_hash}"
 
    return raw_key
 
# CORRECT: Use enum/fixed set for endpoint rate limiting keys
RATE_LIMITED_ENDPOINTS = {
    "/api/users",
    "/api/orders",
    "/api/search",
    "/api/export",
}
 
def get_endpoint_key(path: str) -> str:
    # Only rate limit known, fixed endpoints - no arbitrary keys
    for endpoint in RATE_LIMITED_ENDPOINTS:
        if path.startswith(endpoint):
            return endpoint
    return "/api/other"  # catch-all bucket

Additional protection:

# Redis maxmemory and eviction policy
maxmemory 2gb
maxmemory-policy allkeys-lru
# LRU eviction removes least-recently-used keys first
# Rate limit keys that haven't been used are evicted before application data

Production Impact: Can cause Redis OOM (Out of Memory) killing the process, or
triggering aggressive eviction that removes valid rate limit counters.


AP-7: The Shared Counter Spaghetti

What it looks like:
Multiple unrelated services or teams share the same Redis rate limit counters without
clear ownership, leading to accidental cross-service interference.

# Service A (user API team):
def check_limit_service_a(user_id: str) -> bool:
    key = f"rl:{user_id}"       # Generic key
    count = redis.incr(key)
    redis.expire(key, 60)
    return count <= 100
 
# Service B (reporting team, different codebase, different team):
def check_limit_service_b(user_id: str) -> bool:
    key = f"rl:{user_id}"       # SAME generic key! Unintentional sharing!
    count = redis.incr(key)
    redis.expire(key, 60)
    return count <= 50
 
# Now user's requests to Service A AND Service B both increment the same counter.
# User makes 60 requests to Service A -> Service B starts rejecting them (counter=60 > 50)
# User has not made a single request to Service B but is still rate limited by it.

The Fix: Strict Namespacing

# Each service, team, and resource type gets its own namespace
class RateLimitKey:
    @staticmethod
    def user_api(user_id: str, window_id: int) -> str:
        return f"rl:svc:user-api:user:{user_id}:{window_id}"
 
    @staticmethod
    def reporting(user_id: str, window_id: int) -> str:
        return f"rl:svc:reporting:user:{user_id}:{window_id}"
 
    @staticmethod
    def auth_login(identifier: str, window_id: int) -> str:
        return f"rl:svc:auth:login:{identifier}:{window_id}"
 
# Key format: rl:{service}:{entity_type}:{identifier}:{window_id}
# Service A: rl:svc:user-api:user:user123:28956
# Service B: rl:svc:reporting:user:user123:28956
# Completely independent counters. No cross-contamination.

Production Impact: Extremely difficult to debug. One service's traffic silently
reduces another service's effective rate limit. Users report being rate limited when
they "barely used the API."


Business Logic Anti-Patterns


AP-8: The Zero-Context Limiter

What it looks like:
The same rate limit is applied to all users regardless of their trust level, subscription
tier, account age, or usage history.

# BROKEN: One limit for all users, always
def is_allowed(user_id: str) -> bool:
    return redis.incr(f"rl:{user_id}") <= 100
 
# Problems:
# 1. A bot with a fresh account gets the same limit as a 3-year paying customer
# 2. An enterprise customer paying $10,000/month gets same limit as free user
# 3. A verified developer gets same limit as an anonymous scraper
# 4. A background batch job gets same limit as a user's interactive session

The Fix: Context-Aware Limits

// Load user context once per request (cached in Redis, not DB)
public record UserContext(
    String userId,
    String tier,          // "free", "pro", "enterprise"
    String trustLevel,    // "anonymous", "new", "verified", "premium"
    boolean isBot,
    boolean isMachineClient
) {}
 
public int getEffectiveLimit(UserContext ctx) {
    int baseLimit = switch (ctx.tier()) {
        case "enterprise" -> 10_000;
        case "pro"        -> 1_000;
        case "free"       -> 100;
        default           -> 30;   // anonymous
    };
 
    // Trust multiplier
    double trustMultiplier = switch (ctx.trustLevel()) {
        case "premium"   -> 2.0;
        case "verified"  -> 1.0;
        case "new"       -> 0.5;   // warm-up period
        case "anonymous" -> 0.3;
        default          -> 0.5;
    };
 
    // Bot/machine clients get different limits
    if (ctx.isBot()) return Math.max(1, (int)(baseLimit * 0.1));
    if (ctx.isMachineClient()) return baseLimit * 2; // service accounts need more
 
    return Math.max(1, (int)(baseLimit * trustMultiplier));
}

AP-9: The Quota Without a Rate Limit

What it looks like:
A daily/monthly quota is enforced but no per-second rate limit protects against bursts.

Configuration:
  Daily quota: 10,000 requests/day
  Per-minute limit: NONE

What happens:
  Attacker sends 10,000 requests in 5 seconds.
  Quota enforcement allows all 10,000 (quota not yet hit).
  Your database receives 2,000 RPS for 5 seconds.
  Connection pool exhausts. Database falls over.
  All users are impacted. Service down for 10 minutes.

The Fix: Always layer rate + quota

# Correct: Per-second burst protection + per-minute + per-day quota
class TieredLimitConfig:
    free_tier = {
        "per_second": 2,        # burst protection
        "per_minute": 30,       # sustained rate
        "per_hour": 500,        # hourly budget
        "per_day": 2_000,       # daily quota
    }
    pro_tier = {
        "per_second": 20,
        "per_minute": 300,
        "per_hour": 5_000,
        "per_day": 50_000,
    }
 
# ALL four must pass for the request to be allowed.
# The per-second limit is the most important for protecting infrastructure.

Production Impact: Quota enforcement without rate limiting is the most common
cause of self-inflicted database outages from legitimate (but poorly written) client code.


AP-10: The Grandfathered Exemption

What it looks like:
Legacy clients or "important" customers are permanently exempted from rate limits.
This exemption is never reviewed or revisited.

# BROKEN: Permanent exemption list, grown over years
EXEMPT_API_KEYS = {
    "sk_legacy_abc123",    # "Big Enterprise - added 2019"
    "sk_legacy_def456",    # "CEO's personal project - never limit"
    "sk_legacy_ghi789",    # "Why is this here? Unknown - afraid to remove"
    # ... 47 more entries, nobody knows what they do
}
 
def is_allowed(api_key: str) -> bool:
    if api_key in EXEMPT_API_KEYS:
        return True  # Skip all rate limiting
    return rate_limiter.check(api_key)

Why it is dangerous:

  • Exempt keys become attack targets. Leaked key = unlimited access.
  • Exempt customers put the most load on your system during outages.
  • Nobody knows why exemptions exist. Removing them breaks unknown things.
  • Exemptions accumulate silently over years.

The Fix: Managed High-Limit Tiers

# CORRECT: No exemptions. Legitimate high-volume users get their own tier.
CUSTOMER_TIERS = {
    # "Big Enterprise" now has an enterprise tier with appropriate limits
    "sk_enterprise_abc": {"tier": "enterprise", "rpm": 100_000},
    # "CEO's project" has its own dedicated API key with explicitly high limits
    "sk_ceo_project": {"tier": "vip", "rpm": 10_000},
}
 
def get_limit(api_key: str) -> int:
    config = CUSTOMER_TIERS.get(api_key)
    if config:
        return config["rpm"]
    # Default tier logic
    return DEFAULT_LIMITS[get_subscription_tier(api_key)]
 
# Every key has a limit. Limits are documented. High-volume users have high limits.
# No magic exempt list. No unknown exceptions. Full audit trail.

AP-11: The Asymmetric Read-Write Limit

What it looks like:
Read endpoints have very high limits while write endpoints are tightly controlled.
Users discover they can use reads to amplify writes through side effects.

Read limit:  10,000 reads/minute
Write limit: 100 writes/minute

Attack: User creates a public resource and makes 10,000 read requests to it.
Each read increments a view counter (a write!). The view counter gets
10,000 writes/minute but only 100 were rate limited on the write endpoint.

Expensive side effect: Each read triggers a push notification to subscribers.
User reads their own resource 10,000 times. Their 50,000 subscribers each
receive 10,000 push notifications. Your push notification service is overwhelmed.

The Fix: Rate limit by impact, not by HTTP method

# Identify high-impact reads and rate limit them separately
ENDPOINT_LIMITS = {
    "GET /api/products/{id}":           (10_000, 60),  # cheap, cached
    "GET /api/analytics/report":        (10, 3600),    # expensive query
    "GET /api/feed":                    (100, 60),     # triggers notifications
    "POST /api/comments":               (30, 60),      # write + notifications
    "POST /api/bulk-export":            (2, 3600),     # very expensive
}
 
# Rate limit by "resource impact" not just HTTP verb
# GET /api/feed triggers the same backend work as POST /api/feed
# Both should have similar rate limits

AP-12: The Rolling Reset Surprise

What it looks like:
Rate limit windows that reset at predictable times (e.g., top of every hour) cause
user confusion and poor experience.

Scenario: User is on a 1000 requests/hour plan.
11:59 PM: User sends 1000 requests. All allowed.
11:59:30 PM: User sends 1 more request. Rejected (429). "Retry at 12:00:00"
12:00:00 AM: User rushes to send requests. THUNDERING HERD.
12:00:00 AM: 100,000 users all try to send requests. Server slammed.

Also: Users feel cheated because they "only had 30 seconds to use 1000 requests"

The Fix: Per-user jittered windows + rolling window

import hashlib
 
def get_user_window_offset(user_id: str, window_seconds: int) -> int:
    """
    Each user has a deterministic but different window offset.
    User alice: window resets at :07 seconds
    User bob:   window resets at :41 seconds
    This spreads resets across the full window duration.
    """
    user_hash = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
    return user_hash % window_seconds
 
def get_window_id(user_id: str, window_seconds: int) -> int:
    now = int(time.time())
    offset = get_user_window_offset(user_id, window_seconds)
    return (now - offset) // window_seconds
 
# Alternative: Use a TRUE sliding window (rolling window)
# "No more than 1000 requests in any 60-minute period"
# No reset points. Naturally prevents thundering herd. More fair.

AP-13: The Cost-Blind Limiter

What it looks like:
Every request consumes exactly 1 token, regardless of how expensive it is.

# BROKEN: Same cost for all requests
def is_allowed(user_id: str) -> bool:
    return rate_limiter.consume(user_id, cost=1)
 
# GET /api/users/{id}   -> cost 1 (2ms, hits cache, returns 100 bytes)
# POST /api/ml/predict  -> cost 1 (500ms, runs ML model, uses 4 GPUs)
# GET /api/reports/full -> cost 1 (15s query, scans 10M rows, returns 5MB)
 
# User does 100 requests/minute of ML predictions
# = 100 x 500ms = 50 CPU seconds/minute = 3000 CPU seconds/hour
# Same user doing 100 simple reads = 100 x 2ms = 0.2 CPU seconds/minute
# Both counted as "100 requests". Completely wrong.

The Fix: Cost-proportional token consumption

ENDPOINT_COSTS = {
    ("GET",  "/api/users/{id}"):         1,
    ("GET",  "/api/users"):              5,    # lists are more expensive
    ("POST", "/api/orders"):             3,
    ("GET",  "/api/reports/full"):       50,   # expensive query
    ("POST", "/api/ml/predict"):         100,  # GPU compute
    ("POST", "/api/bulk"):               None, # special: cost = len(items)
}
 
def get_request_cost(method: str, path: str, body: dict = None) -> int:
    normalized = normalize_path(path)  # /api/users/123 -> /api/users/{id}
    cost = ENDPOINT_COSTS.get((method, normalized), 1)
 
    if cost is None:  # bulk operation
        items = body.get("items", []) if body else []
        cost = max(1, len(items))
 
    return cost
 
def is_allowed(user_id: str, method: str, path: str, body: dict = None) -> bool:
    cost = get_request_cost(method, path, body)
    return token_bucket.consume(user_id, cost=cost)

Operational Anti-Patterns


AP-14: The One Environment for All

What it looks like:
The same rate limit values are used in production, staging, and development.

# BROKEN: Single config used in all environments
rate_limit:
  per_minute: 30 # Production value - very restrictive
  per_hour: 500
 
# Development consequence:
# Developer runs integration tests: 50 requests in 2 seconds -> rate limited
# Developer can't run performance tests without hitting limits
# CI/CD pipeline tests start failing intermittently from rate limits
# Developer spends hours debugging "why is CI slow?" -> it's rate limited

The Fix:

# CORRECT: Per-environment configuration
environments:
  production:
    rate_limit:
      per_second: 10
      per_minute: 100
      per_hour: 2_000
      enforce: true
 
  staging:
    rate_limit:
      per_second: 100 # 10x higher for test automation
      per_minute: 1_000
      per_hour: 20_000
      enforce: true # Still enforce (so staging tests are realistic)
 
  development:
    rate_limit:
      per_second: 10_000 # Effectively unlimited for dev
      per_minute: 100_000
      enforce: false # Can disable entirely in dev
      dry_run: true # Log what would be limited
 
  test:
    rate_limit:
      enforce: false # Never rate limit in unit tests

AP-15: The Unmonitored Limiter

What it looks like:
Rate limiting is deployed with no metrics, no alerting, and no visibility into behavior.

What you cannot see:
- How many requests are being rate limited per second?
- Which users are hitting limits most frequently?
- What percentage of traffic is being rejected?
- Is the rate limiter contributing to latency?
- When Redis goes down (and rate limiting fails open), do you know?
- Are limits too tight (blocking legitimate users)?
- Are limits too loose (not actually protecting the system)?

Answer: If you have no metrics, you have no answer to any of these.

The Fix: Minimum viable rate limit observability

from prometheus_client import Counter, Histogram, Gauge
import functools
 
# Metrics
rl_requests_total = Counter(
    "rate_limit_requests_total",
    "Total rate limit checks",
    ["endpoint_group", "result", "tier"]
)
rl_redis_latency = Histogram(
    "rate_limit_redis_duration_seconds",
    "Redis rate limit check latency",
    buckets=[0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25]
)
rl_redis_errors = Counter(
    "rate_limit_redis_errors_total",
    "Redis errors in rate limiter",
    ["error_type"]
)
 
 
def instrumented_rate_limit_check(identifier, endpoint, tier, limiter):
    import time
    start = time.time()
    try:
        result = limiter.is_allowed(identifier)
        duration = time.time() - start
 
        rl_redis_latency.observe(duration)
        rl_requests_total.labels(
            endpoint_group=endpoint,
            result="allowed" if result["allowed"] else "denied",
            tier=tier
        ).inc()
 
        return result
 
    except redis.RedisError as e:
        rl_redis_errors.labels(error_type=type(e).__name__).inc()
        # Fail open and return a metric-tracked result
        return {"allowed": True, "fallback": True}

Minimum alerting rules:

Alert: rl_denied_rate > 5% of total traffic for 5 minutes
  -> Limits may be too tight or there's an attack

Alert: rl_redis_errors_total increases > 0 for 30 seconds
  -> Rate limiter falling back to fail-open mode

Alert: rl_redis_latency p99 > 50ms for 2 minutes
  -> Redis is slow, affecting API response times

Alert: rl_denied_rate = 0 while traffic is high for 5 minutes
  -> Rate limiter may have silently stopped working

AP-16: The Stale Configuration

What it looks like:
Rate limits are set once during initial deployment and never revisited as the system grows.

Year 1: Service launched. 1,000 users. Limit set at 100 RPM (server can handle 50K RPM).
Year 2: 10,000 users. 100 RPM is still fine. Nobody thinks about it.
Year 3: 100,000 users. Server upgraded. Limit still 100 RPM.
Year 3.5: Competitor launches. Your users demand features that require more API calls.
Year 3.5: Users complain about rate limits. Churn increases. Developers switch platforms.
Year 4: Emergency: "Raise all limits 10x immediately."
        No impact analysis. No testing. Service degrades within hours.

The Fix: Rate Limit Review Process

# Rate limit configuration as code (not hardcoded, reviewed and versioned)
RATE_LIMITS_VERSION = "2026-Q2"
RATE_LIMITS_LAST_REVIEWED = "2026-04-01"
RATE_LIMITS_NEXT_REVIEW = "2026-07-01"
 
# Annotated with reasoning (reviewable in code review)
TIER_LIMITS = {
    "free": {
        "per_minute": 60,       # Set based on: p95 free user needs 30 RPM (2x buffer)
        "per_day": 1_000,       # Set based on: free tier conversion threshold research
        "rationale": "Lower limits encourage upgrade. 60 RPM supports typical use cases.",
        "last_load_test": "2026-03-15",
        "max_server_capacity_rpm": 500_000,
    }
}
 
# Quarterly review checklist:
# [ ] Are any users consistently hitting limits? (check: rate_limit_utilization > 90%)
# [ ] Did server capacity change?
# [ ] Did typical usage patterns change?
# [ ] What is the P95 usage per tier?
# [ ] Are competitors offering higher limits?

AP-17: The Incident Blackout

What it looks like:
No mechanism exists to quickly disable, loosen, or modify rate limits during an incident.

Scenario: A critical bug in your mobile app causes ALL users to send 10x normal requests.
Rate limiter is now rejecting 90% of legitimate requests.
Users cannot use the app. Support queue overflowing.

The right response: "Temporarily raise limits 10x while we fix the bug."
The actual response:
  "The limit is hardcoded in application.properties"
  "We need to redeploy to change it"
  "Deployment takes 45 minutes"
  "Our change approval process takes 2 hours"
  "So users will be blocked for ~3 hours while we fix a bug"

The Fix: Dynamic Configuration

class DynamicRateLimiter:
    """
    Rate limits loaded from Redis config store, refreshed every 30s.
    Limits can be changed in production without redeployment.
    """
 
    def __init__(self, r: redis.Redis, defaults: dict):
        self.r = r
        self.defaults = defaults
        self._config_cache = {}
        self._cache_ttl = 30  # refresh config every 30 seconds
        self._last_refresh = 0
 
    def _get_limit(self, tier: str, window: str) -> int:
        now = time.time()
        if now - self._last_refresh > self._cache_ttl:
            self._refresh_config()
 
        config_key = f"rl:config:{tier}:{window}"
        cached = self._config_cache.get(config_key)
        if cached is not None:
            return cached
        return self.defaults.get(tier, {}).get(window, 100)
 
    def _refresh_config(self):
        try:
            # Config stored in Redis as JSON
            config = self.r.get("rl:global_config")
            if config:
                self._config_cache = json.loads(config)
            self._last_refresh = time.time()
        except redis.RedisError:
            pass  # Keep using cached config
 
 
# To change limits during an incident (no deployment needed):
# redis-cli SET rl:global_config '{"free":{"per_minute":600},"pro":{"per_minute":6000}}'
# Takes effect within 30 seconds on all instances.
# To revert: redis-cli SET rl:global_config '{"free":{"per_minute":60},...}'

AP-18: The Cascading Quota Drain

What it looks like:
One team's automated job consumes another team's API quota without anyone realizing it.

Team A (Data Science): Runs nightly ML training job at 2 AM.
  Training job calls the internal "Data API" 50,000 times per run.

Team B (Product): Their users call the same "Data API" during business hours.
  Users share a global 100,000/day quota with Team A.

Result:
  2 AM - 4 AM: Team A's job consumes 50,000 of 100,000 daily quota
  9 AM - 5 PM: Team B users only have 50,000 remaining
  4 PM: Team B users start getting rate limited
  Team B files incident: "Data API is broken"
  Root cause: Team A's job consumed the shared quota

The Fix: Tenant-Isolated Quotas

# CORRECT: Each team/service/use-case has its own quota bucket
QUOTA_BUCKETS = {
    "user_interactive":      100_000,   # per day - human users
    "ml_training_batch":     500_000,   # per day - batch jobs
    "api_team_a":            200_000,   # per day - Team A's services
    "api_team_b":            200_000,   # per day - Team B's services
    "background_jobs":       300_000,   # per day - all background processing
}
 
def get_quota_bucket(caller_context: dict) -> str:
    if caller_context.get("is_batch_job"):
        return "ml_training_batch"
    if caller_context.get("team") == "team_a":
        return "api_team_a"
    if caller_context.get("is_human_session"):
        return "user_interactive"
    return "background_jobs"
 
# Now Team A's training job can never affect Team B's users.
# Each bucket has its own Redis counter.

Client-Side Anti-Patterns


AP-19: The Aggressive Poller

What it looks like:
A client polls an endpoint at maximum rate regardless of whether data has changed.

# BROKEN: Poll every second, read rate limit headers but ignore them
while True:
    response = requests.get("https://api.example.com/events")
    events = response.json()
    process(events)
    time.sleep(1)  # Poll every second regardless of response
 
# This sends 3,600 requests/hour just for polling.
# 90% of responses are probably empty (no new events).
# Wastes 90% of the rate limit quota on empty polls.

The Fix: Smart Polling or WebSockets

# OPTION 1: Conditional requests with ETag/Last-Modified
class SmartPoller:
    def __init__(self):
        self.etag = None
        self.last_modified = None
        self.poll_interval = 5  # seconds
 
    def poll(self):
        while True:
            headers = {}
            if self.etag:
                headers["If-None-Match"] = self.etag
            if self.last_modified:
                headers["If-Modified-Since"] = self.last_modified
 
            response = requests.get("https://api/events", headers=headers)
 
            if response.status_code == 304:
                # Not Modified - no new data, no rate limit cost (some APIs)
                self.poll_interval = min(60, self.poll_interval * 2)  # back off
            elif response.status_code == 200:
                self.etag = response.headers.get("ETag")
                self.last_modified = response.headers.get("Last-Modified")
                process(response.json())
                self.poll_interval = 5  # reset to base interval
            elif response.status_code == 429:
                retry_after = int(response.headers.get("Retry-After", 60))
                time.sleep(retry_after)
                continue
 
            time.sleep(self.poll_interval)
 
# OPTION 2: Use webhooks instead of polling (zero rate limit cost for polling)
# OPTION 3: Long-polling (one request waits up to 30s for new data - much more efficient)
# OPTION 4: WebSocket (one persistent connection, server pushes data)

AP-20: The Fan-Out Bomb

What it looks like:
One user action triggers hundreds or thousands of downstream API calls,
each counted against the user's rate limit or the system's capacity.

# BROKEN: Processing 1000 orders one at a time in a loop
def process_order_batch(order_ids: list[str]):
    for order_id in order_ids:  # 1000 iterations
        order = inventory_api.get_order(order_id)         # 1 API call per order
        customer = customer_api.get_customer(order.customer_id)  # 1 API call per order
        shipping = shipping_api.calculate(order)           # 1 API call per order
        # Total: 3000 API calls for 1000 orders
        # If rate limit is 100 calls/minute, this takes 30 minutes
        # Meanwhile, the user's rate limit is exhausted for interactive requests

The Fix: Batch and async

# CORRECT: Use bulk endpoints to collapse N calls into 1
def process_order_batch_smart(order_ids: list[str]):
    # Batch fetch: 1 API call for 1000 orders instead of 1000 calls
    orders = inventory_api.get_orders_bulk(order_ids)               # 1 call
    customer_ids = [o.customer_id for o in orders]
    customers = customer_api.get_customers_bulk(customer_ids)        # 1 call
    shippings = shipping_api.calculate_bulk(orders)                  # 1 call
    # Total: 3 API calls for 1000 orders. 1000x more efficient.
 
# CORRECT: Run in background with its own rate limit budget
def submit_batch_job(order_ids: list[str]):
    # Submit to a queue for async processing
    # Background job has its own rate limit bucket (not shared with user's interactive quota)
    background_queue.submit(
        job_type="process_orders",
        data={"order_ids": order_ids},
        rate_limit_bucket="batch_processing"  # separate quota
    )
    return {"job_id": "...", "status": "queued"}

AP-21: The Missing Cache Layer

What it looks like:
Client repeatedly calls the API for data that rarely changes, consuming rate limit quota
on unnecessarily repeated requests.

# BROKEN: Fetch user profile on every request to show in navbar
@app.route("/api/dashboard")
def dashboard():
    # This makes 1 API call per page load per user
    # A user loading the dashboard 10 times per hour = 10 API calls
    # Just to show their name and avatar (which never change)
    user_profile = external_api.get_user_profile(user_id)
    data = get_dashboard_data(user_id)
    return render(user_profile, data)

The Fix: Cache stable data

from functools import lru_cache
from cachetools import TTLCache
import time
 
# In-process cache for data that changes rarely
_profile_cache = TTLCache(maxsize=10_000, ttl=300)  # 5-minute TTL
 
def get_user_profile_cached(user_id: str) -> dict:
    if user_id in _profile_cache:
        return _profile_cache[user_id]
 
    profile = external_api.get_user_profile(user_id)
    _profile_cache[user_id] = profile
    return profile
 
# User loads dashboard 10 times in 5 minutes: 1 API call (first load), 9 cache hits
# Rate limit usage: 90% reduction
# Response time: also faster (no API round trip for cached requests)

AP-22: The Synchronous Bulk Stampede

What it looks like:
Processing a large batch synchronously, exhausting rate limits and blocking the calling thread.

# BROKEN: Synchronous bulk processor blocks for minutes
def migrate_10000_users(user_ids: list[str]):
    for user_id in user_ids:
        try:
            api.update_user(user_id, {"migrated": True})
        except RateLimitError as e:
            time.sleep(e.retry_after)  # Block the thread for minutes
            api.update_user(user_id, {"migrated": True})
 
# Problems:
# 1. Thread is blocked for potentially hours
# 2. If thread is killed, migration restarts from beginning (no checkpointing)
# 3. All rate limit tokens consumed, blocking other operations
# 4. No progress visibility for the user who submitted the job

The Fix: Async batch processing with checkpointing

import asyncio
import aiohttp
from dataclasses import dataclass
 
@dataclass
class BatchProgress:
    total: int
    processed: int
    failed: list[str]
    checkpoint_key: str  # Redis key for resumable progress
 
async def migrate_users_async(user_ids: list[str], rate_limit: int = 10):
    """
    Async batch migration with:
    - Non-blocking rate limiting (no sleep() on the main thread)
    - Checkpointing (resumable if interrupted)
    - Progress tracking
    - Exponential backoff on rate limit errors
    """
    semaphore = asyncio.Semaphore(rate_limit)  # max N concurrent requests
    progress = BatchProgress(
        total=len(user_ids),
        processed=0,
        failed=[],
        checkpoint_key=f"migration:progress:{int(time.time())}"
    )
 
    async def process_one(session: aiohttp.ClientSession, user_id: str, attempt: int = 0):
        async with semaphore:
            try:
                async with session.patch(f"/api/users/{user_id}",
                                         json={"migrated": True}) as resp:
                    if resp.status == 429:
                        retry_after = int(resp.headers.get("Retry-After", 1))
                        if attempt < 5:
                            await asyncio.sleep(retry_after * (2 ** attempt))
                            return await process_one(session, user_id, attempt + 1)
                        progress.failed.append(user_id)
                    elif resp.status == 200:
                        progress.processed += 1
                        # Checkpoint: save progress to Redis
                        if progress.processed % 100 == 0:
                            redis.set(progress.checkpoint_key, progress.processed)
            except Exception as e:
                progress.failed.append(user_id)
 
    async with aiohttp.ClientSession() as session:
        tasks = [process_one(session, uid) for uid in user_ids]
        await asyncio.gather(*tasks)
 
    return progress

Security Anti-Patterns


AP-23: The Predictable Window Attack Surface

What it looks like:
Fixed window rate limits with predictable reset times that attackers exploit.

Configuration: 100 requests/minute, window resets at :00 seconds each minute

Attacker knows: "At 12:00:00, the counter resets. I can send 100 requests."
Attack pattern:
  11:59:58 - 11:59:59: Send 100 requests (end of window 1, all allowed)
  12:00:00 - 12:00:01: Send 100 requests (start of window 2, all allowed)
  Result: 200 requests in 3 seconds, 67x the intended 3 requests/3 seconds.

More sophisticated: Automate this. Send exactly 100 requests at 23 seconds into
each minute window (giving 37 seconds before the next window for the attack burst).
200 effective requests per minute sustained indefinitely.

The Fix: Rolling window + jitter

# CORRECT: True sliding window or user-specific window offsets
class SecureRateLimiter:
    def __init__(self, r, limit: int, window: int):
        self.r = r
        self.limit = limit
        self.window = window
 
    def get_user_window_offset(self, user_id: str) -> int:
        """Deterministic offset: same user always gets same offset, but different per user."""
        import hashlib
        h = int(hashlib.sha256(user_id.encode()).hexdigest(), 16)
        return h % self.window
 
    def is_allowed(self, user_id: str) -> bool:
        now = int(time.time())
        offset = self.get_user_window_offset(user_id)
        # Shift the window start by the user's offset
        window_id = (now - offset) // self.window
 
        key = f"rl:secure:{user_id}:{window_id}"
        count = self.r.incr(key)
        if count == 1:
            self.r.expire(key, self.window * 2)
        return count <= self.limit
 
    # User alice: window resets at :07 of each minute (offset=7)
    # User bob:   window resets at :41 of each minute (offset=41)
    # Attacker targeting one user still hits window boundaries, but cannot exploit
    # them system-wide because every user has different boundaries.

AP-24: The Shared API Key Bypass

What it looks like:
Multiple users or services share a single API key. One heavy consumer exhausts the key's
rate limit, blocking all others sharing it.

Scenario: A team of 10 developers all use the same API key for testing.
Developer 1 runs a load test: 1000 requests in 1 minute.
Rate limit: 100 requests/minute per key.
Result: Developer 1 triggers rate limiting for all 10 developers.
Developers 2-10 cannot use the API until next minute.

Worse scenario: A shared key is leaked. Attacker uses it.
Rate limit blocks all legitimate users of the shared key.
Security team must rotate the key - but that breaks all 10 developers.

The Fix: One key per user/service, enforced

# Key provisioning system that enforces one-key-per-principal
class APIKeyManager:
    def provision_key(
        self,
        user_id: str,
        purpose: str,
        tier: str
    ) -> str:
        # Check: does this user already have a key for this purpose?
        existing = self.db.get_key_by_user_and_purpose(user_id, purpose)
        if existing:
            raise ValueError(
                f"User {user_id} already has a key for {purpose}. "
                f"Use the existing key or revoke it first."
            )
        # Generate new key
        key = secrets.token_urlsafe(32)
        key_hash = hashlib.sha256(key.encode()).hexdigest()
        self.db.store_key(key_hash, user_id, purpose, tier)
        return key  # Return raw key once, never store it
 
    def rate_limit_key(self, api_key: str) -> dict:
        key_hash = hashlib.sha256(api_key.encode()).hexdigest()
        metadata = self.db.get_key_metadata(key_hash)
        if not metadata:
            raise InvalidKeyError("Unknown API key")
 
        # Rate limit by the KEY HASH (not user_id - each key has independent limits)
        return self.limiter.check(f"rl:apikey:{key_hash}", metadata["tier"])

AP-25: The Unvalidated Forwarded IP

What it looks like:
Blindly trusting X-Forwarded-For without verifying it comes from a trusted proxy.

# BROKEN: Trust any X-Forwarded-For header blindly
def get_client_ip(request) -> str:
    return request.headers.get("X-Forwarded-For", request.remote_addr).split(",")[0].strip()
 
# Attack:
# Normal request: X-Forwarded-For: 203.0.113.1 (attacker's real IP)
# After 100 requests, rate limited.
# Attacker now sends: X-Forwarded-For: 8.8.8.8 (Google's DNS IP - trusted, never rate limited)
# Or:               X-Forwarded-For: 127.0.0.1 (localhost - whitelisted)
# Result: Attacker bypasses IP-based rate limiting completely by spoofing the header.

The Fix: Validate proxy chain

# CORRECT: Only trust X-Forwarded-For if the immediate sender is a known proxy
TRUSTED_PROXY_RANGES = [
    "10.0.0.0/8",       # internal network
    "172.16.0.0/12",    # internal network
    "192.168.0.0/16",   # internal network
    "100.64.0.0/10",    # Cloudflare's IP range (verify in Cloudflare docs)
]
 
import ipaddress
 
def get_real_client_ip(request) -> str:
    """
    Get the real client IP.
    Only trust X-Forwarded-For if the connection came from a trusted proxy.
    Otherwise, use the actual connection IP.
    """
    connection_ip = request.remote_addr
    connection_ipobj = ipaddress.ip_address(connection_ip)
 
    # Check if the connection is from a trusted proxy
    from_trusted_proxy = any(
        connection_ipobj in ipaddress.ip_network(range_)
        for range_ in TRUSTED_PROXY_RANGES
    )
 
    if from_trusted_proxy:
        # Trust the X-Forwarded-For header (but take the LAST untrusted IP, not the first)
        forwarded_for = request.headers.get("X-Forwarded-For", "")
        ips = [ip.strip() for ip in forwarded_for.split(",")]
        # Walk from right to left, find the first non-trusted IP
        for ip in reversed(ips):
            try:
                ip_obj = ipaddress.ip_address(ip)
                if not any(ip_obj in ipaddress.ip_network(r) for r in TRUSTED_PROXY_RANGES):
                    return ip
            except ValueError:
                continue
 
    # Not from trusted proxy: use connection IP directly
    return connection_ip

Summary: Anti-Pattern Quick Reference

Anti-PatternRisk LevelRoot CausePrimary Fix
In-Process IslandCriticalNo shared state in clustersUse Redis
Connection Pool KillerCriticalNew connections per requestModule-level pool
Oversized Lua ScriptCriticalComplex Lua blocks RedisKeep scripts simple
Missing Hash TagHighRedis Cluster routingUse {user_id} hash tags
TTL RouletteHighInconsistent key expiryLua atomic + window*2
Unbounded Key SpaceHighUser-controlled key partsNormalize + hash
Shared Counter SpaghettiHighNo namespace strategyService-scoped namespaces
Zero-Context LimiterHighSame limit for all usersTier + trust-based limits
Quota Without Rate LimitHighOnly daily/monthly limitsAdd per-second limit always
Grandfathered ExemptionMediumAccumulating exceptionsManaged tiers instead
Asymmetric Read-WriteMediumHTTP method biasLimit by impact, not method
Rolling Reset SurpriseMediumPredictable window resetsPer-user jitter / sliding window
Cost-Blind LimiterMediumFlat cost per requestCost-proportional tokens
One Environment for AllMediumNo env-specific configPer-environment limits
Unmonitored LimiterMediumNo observabilityMetrics + alerting
Stale ConfigurationMediumSet-and-forget policyQuarterly review process
Incident BlackoutHighNo dynamic configRedis-backed dynamic limits
Cascading Quota DrainHighShared quota across teamsTenant-isolated quotas
Aggressive PollerMediumInefficient client designETags / webhooks
Fan-Out BombHighLoop over individual callsBulk APIs + async processing
Missing Cache LayerMediumNo client-side cachingTTL cache for stable data
Synchronous Bulk StampedeMediumBlocking batch processingAsync + checkpointing
Predictable Window AttackHighFixed window + no jitterSliding window + user offset
Shared API Key BypassHighShared credentialsOne key per principal
Unvalidated Forwarded IPCritical (security)Trusting user inputValidate proxy chain

Next Supplement: Supplement 2 - Production Challenges