Rate Limiting Demystified - Part 3: Implementation Guide
Series Navigation:
Index |
Part 1 - Fundamentals |
Part 2 - Algorithms |
Part 4 - Distributed |
Part 5 - Advanced |
Part 6 - Interview Questions
Table of Contents
- Redis Fundamentals for Rate Limiting
- Redis: Fixed Window
- Redis: Sliding Window Log with Sorted Sets
- Redis: Token Bucket with Lua Script
- Java and Spring Boot
- Python Implementations
- Node.js Implementations
- Nginx Rate Limiting
- AWS API Gateway
- Kong API Gateway
- Client-Side Rate Limiting: Retry and Backoff
1. Redis Fundamentals for Rate Limiting
Redis is the industry-standard backend for distributed rate limiting. Here are the key
Redis commands used in rate limiting implementations.
Key Redis Commands
INCR key -- Atomically increment an integer value
INCRBY key amount -- Atomically increment by a specific amount
EXPIRE key seconds -- Set TTL on a key
TTL key -- Get remaining TTL
ZADD key score member -- Add to sorted set with a score (used for sliding window log)
ZREMRANGEBYSCORE key min max -- Remove sorted set members by score range
ZCARD key -- Count members in sorted set
ZCOUNT key min max -- Count members with score between min and max
HSET key field value -- Set hash field (used for token bucket state)
HMGET key f1 f2 -- Get multiple hash fields atomically
EVAL script numkeys key arg -- Execute Lua script
Why Redis for Rate Limiting?
-
Atomic operations: INCR is atomic. Even without Lua scripts, a single INCR
will never produce a race condition. The issue is the conditional check (if count > limit)
which is NOT atomic - hence Lua scripts. -
TTL support: Built-in expiry means keys self-clean. No need for a garbage collector.
-
Data structures: Sorted Sets (ZADD/ZREMRANGEBYSCORE) make sliding window log
trivial to implement. -
Pipelining: Send multiple commands in one round trip.
-
Lua scripting: Execute multi-step logic atomically on the server.
Redis Connection in Python
import redis
# Single instance
client = redis.Redis(
host='localhost',
port=6379,
db=0,
decode_responses=True,
socket_timeout=1, # 1 second timeout
socket_connect_timeout=1
)
# With connection pool (production)
pool = redis.ConnectionPool(
host='localhost',
port=6379,
db=0,
max_connections=50,
decode_responses=True
)
client = redis.Redis(connection_pool=pool)
# Redis Cluster
from redis.cluster import RedisCluster
cluster_client = RedisCluster(
startup_nodes=[{"host": "redis-1", "port": 6379}],
decode_responses=True
)2. Redis: Fixed Window
Basic Implementation
import redis
import time
def is_allowed_fixed_window(
r: redis.Redis,
identifier: str,
limit: int,
window_seconds: int
) -> dict:
"""
Fixed window rate limiting using INCR + EXPIRE.
Returns dict with:
allowed: bool
limit: int
remaining: int
reset_at: int (unix timestamp)
"""
now = int(time.time())
current_window = now // window_seconds
reset_at = (current_window + 1) * window_seconds
key = f"rl:fw:{identifier}:{current_window}"
pipe = r.pipeline()
pipe.incr(key)
pipe.ttl(key)
count, ttl = pipe.execute()
# Only set TTL on first request (count <mark class="obsidian-highlight"> 1)
# This is safe because INCR is atomic
if count </mark> 1:
r.expire(key, window_seconds + 10) # +10 seconds buffer
allowed = count <= limit
return {
"allowed": allowed,
"limit": limit,
"remaining": max(0, limit - count),
"reset_at": reset_at,
"current_count": count
}Spring Boot Filter Using Fixed Window
// Dependency: spring-boot-starter-data-redis
@Component
@Order(1)
public class RateLimitFilter extends OncePerRequestFilter {
private final StringRedisTemplate redisTemplate;
// Limits: endpoint pattern -> (limit, windowSeconds)
private static final Map<String, int[]> ENDPOINT_LIMITS = Map.of(
"/api/auth/login", new int[]{5, 60},
"/api/export", new int[]{2, 3600},
"/api/", new int[]{100, 60}
);
public RateLimitFilter(StringRedisTemplate redisTemplate) {
this.redisTemplate = redisTemplate;
}
@Override
protected void doFilterInternal(
HttpServletRequest request,
HttpServletResponse response,
FilterChain chain
) throws ServletException, IOException {
String identifier = extractIdentifier(request);
int[] limitConfig = getLimitConfig(request.getRequestURI());
int limit = limitConfig[0];
int windowSeconds = limitConfig[1];
RateLimitResult result = checkRateLimit(identifier, limit, windowSeconds);
// Always set rate limit headers
response.setIntHeader("X-RateLimit-Limit", limit);
response.setIntHeader("X-RateLimit-Remaining", result.getRemaining());
response.setLongHeader("X-RateLimit-Reset", result.getResetAt());
if (!result.isAllowed()) {
response.setStatus(429);
response.setContentType("application/json");
response.setHeader("Retry-After",
String.valueOf(result.getResetAt() - Instant.now().getEpochSecond()));
response.getWriter().write("""
{
"error": "rate_limit_exceeded",
"message": "Too many requests. Please retry after %d seconds.",
"retry_after": %d
}
""".formatted(
result.getResetAt() - Instant.now().getEpochSecond(),
result.getResetAt() - Instant.now().getEpochSecond()
));
return;
}
chain.doFilter(request, response);
}
private RateLimitResult checkRateLimit(
String identifier, int limit, int windowSeconds
) {
long now = Instant.now().getEpochSecond();
long currentWindow = now / windowSeconds;
long resetAt = (currentWindow + 1) * windowSeconds;
String key = "rl:fw:" + identifier + ":" + currentWindow;
Long count = redisTemplate.opsForValue().increment(key);
if (count == null) count = 1L;
if (count == 1) {
redisTemplate.expire(key, Duration.ofSeconds(windowSeconds + 10));
}
return new RateLimitResult(
count <= limit,
limit,
(int) Math.max(0, limit - count),
resetAt
);
}
private String extractIdentifier(HttpServletRequest request) {
// Prefer API key, fall back to user ID, then IP
String apiKey = request.getHeader("X-API-Key");
if (apiKey != null) return "apikey:" + apiKey;
// If authenticated, use user ID from JWT
String userId = (String) request.getAttribute("userId");
if (userId != null) return "user:" + userId;
// Fall back to IP
String ip = request.getHeader("X-Forwarded-For");
if (ip != null) ip = ip.split(",")[0].trim();
else ip = request.getRemoteAddr();
return "ip:" + ip;
}
private int[] getLimitConfig(String uri) {
for (Map.Entry<String, int[]> entry : ENDPOINT_LIMITS.entrySet()) {
if (uri.startsWith(entry.getKey())) {
return entry.getValue();
}
}
return new int[]{1000, 60}; // default: 1000/minute
}
}3. Redis: Sliding Window Log with Sorted Sets
import redis
import time
import uuid
# Lua script for atomic sliding window log
# This prevents race conditions between check and add
SLIDING_WINDOW_LOG_SCRIPT = """
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window_start = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local request_id = ARGV[4]
local ttl = tonumber(ARGV[5])
-- Remove old entries outside the window
redis.call('ZREMRANGEBYSCORE', key, 0, window_start)
-- Count current entries
local count = redis.call('ZCARD', key)
if count < limit then
-- Add this request
redis.call('ZADD', key, now, request_id)
redis.call('EXPIRE', key, ttl)
return {1, count + 1, limit - count - 1} -- allowed, current, remaining
else
-- Get the oldest entry to calculate retry time
local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
local retry_after = 0
if #oldest > 0 then
retry_after = tonumber(oldest[2]) + (now - window_start) - now
end
return {0, count, 0} -- denied, current, 0 remaining
end
"""
class SlidingWindowLogRedis:
def __init__(self, r: redis.Redis, limit: int, window_seconds: int):
self.r = r
self.limit = limit
self.window_seconds = window_seconds
self.script = r.register_script(SLIDING_WINDOW_LOG_SCRIPT)
def is_allowed(self, identifier: str) -> dict:
now = time.time()
window_start = now - self.window_seconds
request_id = str(uuid.uuid4())
ttl = self.window_seconds * 2
result = self.script(
keys=[f"rl:swl:{identifier}"],
args=[now, window_start, self.limit, request_id, ttl]
)
allowed = bool(result[0])
current = int(result[1])
remaining = int(result[2])
return {
"allowed": allowed,
"limit": self.limit,
"remaining": remaining,
"current_count": current
}4. Redis: Token Bucket with Lua Script
The token bucket MUST be implemented with a Lua script to be atomic. Without atomicity,
two concurrent requests can both read the same token count and both be allowed when only
one should be.
Lua Script
-- token_bucket.lua
-- KEYS[1]: Redis key for this user's bucket
-- ARGV[1]: bucket capacity
-- ARGV[2]: refill rate (tokens per second)
-- ARGV[3]: tokens requested by this operation
-- ARGV[4]: current timestamp (seconds, float)
-- ARGV[5]: TTL for the key (seconds)
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local requested = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
local ttl = tonumber(ARGV[5])
-- Load current state (tokens, last_refill_time)
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local current_tokens = tonumber(data[1])
local last_refill = tonumber(data[2])
-- Initialize on first use
if current_tokens == nil then
current_tokens = capacity
last_refill = now
end
-- Calculate tokens to add based on elapsed time
local elapsed = now - last_refill
local tokens_to_add = elapsed * refill_rate
local new_tokens = math.min(capacity, current_tokens + tokens_to_add)
-- Check if enough tokens
if new_tokens >= requested then
-- Allow: deduct tokens
local remaining = new_tokens - requested
redis.call('HMSET', key,
'tokens', remaining,
'last_refill', now
)
redis.call('EXPIRE', key, ttl)
-- Return: allowed=1, tokens_remaining, retry_after=0
return {1, remaining, 0}
else
-- Deny: update tokens without deducting (just update refill time)
redis.call('HMSET', key,
'tokens', new_tokens,
'last_refill', now
)
redis.call('EXPIRE', key, ttl)
-- Calculate when enough tokens will be available
local tokens_needed = requested - new_tokens
local retry_after = tokens_needed / refill_rate
-- Return: allowed=0, tokens_remaining, retry_after
return {0, new_tokens, retry_after}
endPython Client
import redis
import time
class TokenBucketRedis:
"""
Token Bucket Rate Limiter backed by Redis with Lua script for atomicity.
Thread-safe. Works across multiple application instances.
"""
LUA_SCRIPT = """
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local requested = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
local ttl = tonumber(ARGV[5])
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local current_tokens = tonumber(data[1])
local last_refill = tonumber(data[2])
if current_tokens == nil then
current_tokens = capacity
last_refill = now
end
local elapsed = now - last_refill
local new_tokens = math.min(capacity, current_tokens + elapsed * refill_rate)
if new_tokens >= requested then
redis.call('HMSET', key, 'tokens', new_tokens - requested, 'last_refill', now)
redis.call('EXPIRE', key, ttl)
return {1, new_tokens - requested, 0}
else
redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
redis.call('EXPIRE', key, ttl)
local retry_after = (requested - new_tokens) / refill_rate
return {0, new_tokens, retry_after}
end
"""
def __init__(
self,
r: redis.Redis,
capacity: float,
refill_rate: float,
key_prefix: str = "rl:tb"
):
self.r = r
self.capacity = capacity
self.refill_rate = refill_rate
self.key_prefix = key_prefix
self.script = r.register_script(self.LUA_SCRIPT)
def is_allowed(self, identifier: str, cost: float = 1.0) -> dict:
key = f"{self.key_prefix}:{identifier}"
now = time.time()
ttl = int(self.capacity / self.refill_rate) * 2 + 60
result = self.script(
keys=[key],
args=[self.capacity, self.refill_rate, cost, now, ttl]
)
allowed = bool(int(result[0]))
tokens_remaining = float(result[1])
retry_after = float(result[2])
return {
"allowed": allowed,
"tokens_remaining": tokens_remaining,
"retry_after_seconds": retry_after,
"limit": self.capacity,
"refill_rate": self.refill_rate
}
# Usage
r = redis.Redis(host="localhost", port=6379, decode_responses=True)
limiter = TokenBucketRedis(r, capacity=100, refill_rate=10)
# Standard API call (cost=1)
result = limiter.is_allowed("user:123")
print(f"Allowed: {result['allowed']}, Tokens: {result['tokens_remaining']:.1f}")
# Expensive operation (cost=10)
result = limiter.is_allowed("user:123", cost=10)
print(f"Expensive op - Allowed: {result['allowed']}")5. Java and Spring Boot
5.1 Bucket4j (Local or Redis-backed)
Bucket4j is the most popular Java rate limiting library. It implements token bucket
and is available in local (in-memory) and distributed (Redis, Hazelcast) modes.
<!-- pom.xml -->
<dependency>
<groupId>com.bucket4j</groupId>
<artifactId>bucket4j-core</artifactId>
<version>8.7.0</version>
</dependency>
<dependency>
<groupId>com.bucket4j</groupId>
<artifactId>bucket4j-redis</artifactId>
<version>8.7.0</version>
</dependency>import io.github.bucket4j.*;
import io.github.bucket4j.distributed.proxy.ProxyManager;
import io.github.bucket4j.redis.lettuce.cas.LettuceBasedProxyManager;
import io.lettuce.core.RedisClient;
import io.lettuce.core.api.StatefulRedisConnection;
import io.lettuce.core.codec.ByteArrayCodec;
import java.time.Duration;
import java.util.concurrent.ConcurrentHashMap;
@Configuration
public class RateLimitConfig {
@Bean
public ProxyManager<String> proxyManager(
StatefulRedisConnection<byte[], byte[]> connection
) {
return LettuceBasedProxyManager
.builderFor(connection)
.build();
}
@Bean
public StatefulRedisConnection<byte[], byte[]> redisConnection(
RedisClient redisClient
) {
return redisClient.connect(ByteArrayCodec.INSTANCE);
}
}
@Service
public class RateLimitService {
private final ProxyManager<String> proxyManager;
public RateLimitService(ProxyManager<String> proxyManager) {
this.proxyManager = proxyManager;
}
/**
* Build a bucket configuration based on user tier.
*/
private BucketConfiguration getBucketConfig(String tier) {
return switch (tier) {
case "free" -> BucketConfiguration.builder()
.addLimit(Bandwidth.classic(100, Refill.greedy(100, Duration.ofMinutes(1))))
.addLimit(Bandwidth.classic(1000, Refill.greedy(1000, Duration.ofHours(1))))
.build();
case "pro" -> BucketConfiguration.builder()
.addLimit(Bandwidth.classic(1000, Refill.greedy(1000, Duration.ofMinutes(1))))
.build();
case "enterprise" -> BucketConfiguration.builder()
.addLimit(Bandwidth.classic(10000, Refill.greedy(10000, Duration.ofMinutes(1))))
.build();
default -> BucketConfiguration.builder()
.addLimit(Bandwidth.classic(50, Refill.greedy(50, Duration.ofMinutes(1))))
.build();
};
}
public ConsumptionProbe tryConsume(String userId, String tier, long cost) {
String bucketKey = "bucket:" + userId;
Bucket bucket = proxyManager.builder()
.build(bucketKey, () -> getBucketConfig(tier));
return bucket.tryConsumeAndReturnRemaining(cost);
}
}
@RestController
@RequestMapping("/api")
public class ApiController {
private final RateLimitService rateLimitService;
public ApiController(RateLimitService rateLimitService) {
this.rateLimitService = rateLimitService;
}
@GetMapping("/data")
public ResponseEntity<?> getData(
@RequestAttribute("userId") String userId,
@RequestAttribute("userTier") String tier,
HttpServletResponse response
) {
ConsumptionProbe probe = rateLimitService.tryConsume(userId, tier, 1);
response.setHeader("X-RateLimit-Remaining",
String.valueOf(probe.getRemainingTokens()));
response.setHeader("X-RateLimit-Limit",
String.valueOf(probe.getRemainingTokens() + probe.getNanosToWaitForRefill()));
if (!probe.isConsumed()) {
long retryAfterSeconds =
TimeUnit.NANOSECONDS.toSeconds(probe.getNanosToWaitForRefill());
response.setHeader("Retry-After", String.valueOf(retryAfterSeconds));
return ResponseEntity.status(429)
.body(Map.of(
"error", "rate_limit_exceeded",
"retry_after", retryAfterSeconds
));
}
return ResponseEntity.ok(Map.of("data", "your data here"));
}
}5.2 Resilience4j Rate Limiter
Resilience4j is the recommended replacement for Hystrix and includes a rate limiter.
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot3</artifactId>
<version>2.1.0</version>
</dependency># application.yml
resilience4j:
ratelimiter:
instances:
backendA:
limitForPeriod: 100
limitRefreshPeriod: 1s
timeoutDuration: 0s # Don't wait, fail immediately
backendB:
limitForPeriod: 20
limitRefreshPeriod: 500ms
timeoutDuration: 100ms # Wait up to 100ms for a slot@Service
public class ExternalApiService {
// Annotation-based rate limiting
@RateLimiter(name = "backendA", fallbackMethod = "rateLimitFallback")
public String callExternalApi(String param) {
// This method is rate limited to 100 calls/second
return externalHttpClient.get(param);
}
// Fallback when rate limit is exceeded
public String rateLimitFallback(String param, RequestNotPermitted e) {
return "Rate limit exceeded. Please try again later.";
}
// Programmatic usage
public String callWithProgrammaticRateLimit(String param) {
RateLimiter limiter = RateLimiter.of("customLimiter",
RateLimiterConfig.custom()
.limitForPeriod(50)
.limitRefreshPeriod(Duration.ofSeconds(1))
.timeoutDuration(Duration.ZERO)
.build()
);
return RateLimiter.decorateSupplier(limiter,
() -> externalHttpClient.get(param)
).get();
}
}6. Python Implementations
6.1 Flask-Limiter
from flask import Flask, jsonify, request
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address
from werkzeug.exceptions import TooManyRequests
app = Flask(__name__)
# Configure limiter with Redis backend
limiter = Limiter(
key_func=get_remote_address,
app=app,
default_limits=["200 per day", "50 per hour"],
storage_uri="redis://localhost:6379",
strategy="moving-window", # or "fixed-window"
headers_enabled=True # adds X-RateLimit-* headers automatically
)
def get_api_key():
"""Custom key function: limit by API key instead of IP."""
return request.headers.get("X-API-Key", get_remote_address())
# Default limits apply to all routes
@app.route("/api/public")
def public_endpoint():
return jsonify({"data": "public data"})
# Override with endpoint-specific limit
@app.route("/api/search")
@limiter.limit("20 per minute")
def search():
return jsonify({"results": []})
# Different limits for different methods
@app.route("/api/export")
@limiter.limit("5 per hour", key_func=get_api_key)
def export():
return jsonify({"export": "data"})
# Exempt an endpoint completely
@app.route("/health")
@limiter.exempt
def health_check():
return jsonify({"status": "ok"})
# Custom error handler
@app.errorhandler(429)
def rate_limit_handler(e):
return jsonify({
"error": "rate_limit_exceeded",
"description": str(e.description),
"retry_after": e.retry_after if hasattr(e, "retry_after") else None
}), 4296.2 FastAPI with slowapi
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded
from slowapi.middleware import SlowAPIMiddleware
# Custom key function
def get_user_id(request: Request) -> str:
user_id = request.state.user_id if hasattr(request.state, "user_id") else None
if user_id:
return f"user:{user_id}"
api_key = request.headers.get("X-API-Key")
if api_key:
return f"apikey:{api_key}"
return get_remote_address(request)
limiter = Limiter(
key_func=get_user_id,
default_limits=["100/minute"],
storage_uri="redis://localhost:6379"
)
app = FastAPI()
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
app.add_middleware(SlowAPIMiddleware)
@app.get("/api/data")
@limiter.limit("50/minute")
async def get_data(request: Request):
return {"data": "some data"}
@app.post("/api/auth/login")
@limiter.limit("5/minute")
async def login(request: Request, credentials: dict):
return {"token": "jwt_token_here"}
# Conditional rate limiting
@app.get("/api/search")
@limiter.limit("100/minute", exempt_when=lambda req: req.headers.get("X-Bypass-Key") == "secret")
async def search(request: Request, q: str):
return {"results": [], "query": q}6.3 Custom Middleware (Framework-Agnostic)
import time
import functools
import redis
from typing import Callable
class RedisRateLimiter:
"""
Production-grade rate limiter with Redis backend.
Supports multiple limit types simultaneously.
"""
def __init__(self, redis_client: redis.Redis):
self.r = redis_client
self._script = redis_client.register_script("""
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local window_id = math.floor(now / window)
local window_key = key .. ':' .. window_id
local count = redis.call('INCR', window_key)
if count == 1 then
redis.call('EXPIRE', window_key, window * 2)
end
return {count, limit - count, (window_id + 1) * window}
""")
def check(
self,
identifier: str,
limits: list[tuple[int, int]] # list of (limit, window_seconds)
) -> dict:
"""
Check multiple limits simultaneously.
All limits must pass for the request to be allowed.
"""
now = int(time.time())
results = []
for limit, window in limits:
key = f"rl:{identifier}:{window}"
result = self._script(keys=[key], args=[limit, window, now])
count, remaining, reset_at = int(result[0]), int(result[1]), int(result[2])
results.append({
"window": window,
"limit": limit,
"count": count,
"remaining": remaining,
"reset_at": reset_at,
"allowed": count <= limit
})
# All limits must pass
all_allowed = all(r["allowed"] for r in results)
# Return the most restrictive remaining
min_remaining = min(r["remaining"] for r in results)
nearest_reset = min(r["reset_at"] for r in results)
return {
"allowed": all_allowed,
"remaining": max(0, min_remaining),
"reset_at": nearest_reset,
"details": results
}
def rate_limit(identifier_fn: Callable, limits: list[tuple[int, int]]):
"""Decorator for rate limiting any function."""
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
r = redis.Redis(host="localhost", decode_responses=True)
limiter = RedisRateLimiter(r)
identifier = identifier_fn(*args, **kwargs)
result = limiter.check(identifier, limits)
if not result["allowed"]:
raise Exception(f"Rate limit exceeded. Retry at {result['reset_at']}")
return func(*args, **kwargs)
return wrapper
return decorator7. Node.js Implementations
7.1 Express with express-rate-limit
const express = require("express");
const rateLimit = require("express-rate-limit");
const RedisStore = require("rate-limit-redis");
const { createClient } = require("redis");
const app = express();
const redisClient = createClient({
url: process.env.REDIS_URL || "redis://localhost:6379",
});
redisClient.connect();
// General API rate limiter
const apiLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 100,
standardHeaders: true, // Return RateLimit-* headers (RFC standard)
legacyHeaders: false, // Disable X-RateLimit-* legacy headers
store: new RedisStore({
sendCommand: (...args) => redisClient.sendCommand(args),
prefix: "rl:api:",
}),
keyGenerator: (req) => {
// Use API key if available, otherwise IP
return req.headers["x-api-key"] || req.ip;
},
handler: (req, res, next, options) => {
res.status(429).json({
error: "rate_limit_exceeded",
message: `Too many requests. Retry after ${Math.ceil(options.windowMs / 1000)} seconds.`,
retry_after: Math.ceil((req.rateLimit.resetTime - Date.now()) / 1000),
});
},
});
// Stricter limiter for auth endpoints
const authLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 10,
standardHeaders: true,
legacyHeaders: false,
store: new RedisStore({
sendCommand: (...args) => redisClient.sendCommand(args),
prefix: "rl:auth:",
}),
message: {
error: "too_many_login_attempts",
message: "Too many login attempts. Please try again in 15 minutes.",
},
});
// Apply to all API routes
app.use("/api/", apiLimiter);
// Apply stricter limit to auth
app.use("/api/auth/", authLimiter);
app.get("/api/data", (req, res) => {
res.json({
data: "your data",
rateLimit: {
limit: req.rateLimit?.limit,
remaining: req.rateLimit?.remaining,
resetTime: req.rateLimit?.resetTime,
},
});
});7.2 Custom Redis Token Bucket (Node.js)
const redis = require("redis");
const TOKEN_BUCKET_SCRIPT = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local requested = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
local ttl = tonumber(ARGV[5])
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local current_tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now
local elapsed = now - last_refill
local new_tokens = math.min(capacity, current_tokens + elapsed * refill_rate)
if new_tokens >= requested then
redis.call('HMSET', key, 'tokens', new_tokens - requested, 'last_refill', now)
redis.call('EXPIRE', key, ttl)
return {1, new_tokens - requested, 0}
else
redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
redis.call('EXPIRE', key, ttl)
local retry_after = (requested - new_tokens) / refill_rate
return {0, new_tokens, retry_after}
end
`;
class TokenBucketRedis {
constructor(redisClient, { capacity, refillRate, keyPrefix = "rl:tb" }) {
this.client = redisClient;
this.capacity = capacity;
this.refillRate = refillRate;
this.keyPrefix = keyPrefix;
}
async isAllowed(identifier, cost = 1) {
const key = `${this.keyPrefix}:${identifier}`;
const now = Date.now() / 1000; // Unix timestamp in seconds
const ttl = Math.ceil(this.capacity / this.refillRate) * 2 + 60;
const result = await this.client.eval(TOKEN_BUCKET_SCRIPT, {
keys: [key],
arguments: [
String(this.capacity),
String(this.refillRate),
String(cost),
String(now),
String(ttl),
],
});
return {
allowed: result[0] === 1,
tokensRemaining: parseFloat(result[1]),
retryAfterSeconds: parseFloat(result[2]),
};
}
}
// Express middleware using TokenBucketRedis
function createTokenBucketMiddleware(limiter) {
return async (req, res, next) => {
const identifier = req.headers["x-api-key"] || req.ip;
try {
const result = await limiter.isAllowed(identifier);
res.set({
"X-RateLimit-Limit": limiter.capacity,
"X-RateLimit-Remaining": Math.floor(result.tokensRemaining),
"X-RateLimit-RefillRate": limiter.refillRate,
});
if (!result.allowed) {
res.set("Retry-After", Math.ceil(result.retryAfterSeconds));
return res.status(429).json({
error: "rate_limit_exceeded",
retry_after: Math.ceil(result.retryAfterSeconds),
});
}
next();
} catch (err) {
// Fail open: allow request if Redis is unavailable
console.error("Rate limiter error:", err);
next();
}
};
}8. Nginx Rate Limiting
Nginx has built-in rate limiting via the ngx_http_limit_req_module.
Basic Configuration
http {
# Define rate limit zone
# $binary_remote_addr = client IP (4 bytes for IPv4, more compact than $remote_addr)
# zone=api_limit:10m = zone named "api_limit", 10MB shared memory
# rate=100r/m = 100 requests per minute
limit_req_zone $binary_remote_addr zone=per_ip:10m rate=100r/m;
limit_req_zone $http_x_api_key zone=per_apikey:10m rate=1000r/m;
# Status code for rate limited requests (default: 503)
limit_req_status 429;
# Log level for rate limited requests
limit_req_log_level warn;
server {
listen 80;
server_name api.example.com;
# Apply rate limiting to all /api/ routes
location /api/ {
# burst=50: allow queue of up to 50 requests
# nodelay: process burst requests immediately, not spread over time
limit_req zone=per_ip burst=50 nodelay;
# Custom 429 response
error_page 429 @rate_limit_error;
proxy_pass http://backend_servers;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
# Stricter limit for auth endpoints
location /api/auth/ {
limit_req zone=per_ip burst=5 nodelay;
error_page 429 @rate_limit_error;
proxy_pass http://backend_servers;
}
# Rate limit by API key (if available)
location /api/v2/ {
limit_req zone=per_apikey burst=200 nodelay;
error_page 429 @rate_limit_error;
proxy_pass http://backend_servers;
}
# Named location for 429 error response
location @rate_limit_error {
default_type application/json;
return 429 '{"error":"rate_limit_exceeded","message":"Too many requests. Please slow down."}';
}
# Limit concurrent connections per IP
limit_conn_zone $binary_remote_addr zone=conn_per_ip:10m;
location /downloads/ {
limit_conn conn_per_ip 5; # max 5 concurrent connections per IP
proxy_pass http://download_servers;
}
}
}Advanced: Rate Limit by URL Parameter
# Dynamically choose zone based on presence of API key header
map $http_x_api_key $rate_limit_key {
"" $binary_remote_addr; # No API key: limit by IP
default $http_x_api_key; # API key present: limit by key
}
limit_req_zone $rate_limit_key zone=dynamic_limit:10m rate=100r/m;Nginx Burst Behavior Explained
rate=10r/s, burst=20, nodelay
Without nodelay: requests beyond 10r/s are delayed (queued), burst allows 20 to wait
With nodelay: the first 20 excess requests are processed immediately, then 429
Visual:
Time: 0s 1s 2s
Requests: 30 0 0
Without nodelay:
t=0s: 10 allowed immediately, 20 queued (2s queue for 10r/s = 2s delay)
t=0-2s: queued requests drip out at 10r/s
With nodelay:
t=0s: all 30 processed immediately (10 + 20 burst), 0 to 429
t=1s: bucket refills to 10, next 10 allowed
Always use `nodelay` for APIs unless you want artificial latency injection.
9. AWS API Gateway
Usage Plans and API Keys
# Create a usage plan
aws apigateway create-usage-plan \
--name "FreeTier" \
--description "Free tier: 100 RPM, 10000/day" \
--throttle '{"rateLimit": 100, "burstLimit": 200}' \
--quota '{"limit": 10000, "period": "DAY"}'
# Associate a stage
aws apigateway create-usage-plan-key \
--usage-plan-id <plan-id> \
--key-id <api-key-id> \
--key-type "API_KEY"CloudFormation / SAM Configuration
# template.yaml (SAM)
Resources:
MyApi:
Type: AWS::Serverless::Api
Properties:
StageName: prod
Auth:
ApiKeyRequired: true
UsagePlan:
CreateUsagePlan: PER_STAGE
Throttle:
RateLimit: 100 # requests/second
BurstLimit: 200 # max burst
Quota:
Limit: 10000
Period: DAY
# Per-method throttling override
MyApiMethodThrottle:
Type: AWS::ApiGateway::UsagePlan
Properties:
ApiStages:
- ApiId: !Ref MyApi
Stage: prod
Throttle:
/export/GET:
RateLimit: 1
BurstLimit: 2AWS Response Headers
AWS API Gateway automatically adds these when rate limited:
x-amzn-RequestId: ...
x-amzn-errortype: ThrottlingException
Content-Type: application/json
{
"message": "Too Many Requests"
}
AWS returns 429 for per-key throttling and 429 for stage-level throttling.
10. Kong API Gateway
Rate Limiting Plugin
# Enable rate limiting on a service
curl -X POST http://localhost:8001/services/my-service/plugins \
--data "name=rate-limiting" \
--data "config.minute=100" \
--data "config.hour=10000" \
--data "config.day=100000" \
--data "config.policy=redis" \
--data "config.redis.host=redis" \
--data "config.redis.port=6379" \
--data "config.limit_by=consumer" # or ip, credential, service
# Enable on a specific route
curl -X POST http://localhost:8001/routes/my-route/plugins \
--data "name=rate-limiting" \
--data "config.second=10" \
--data "config.policy=local"Kong declarative config (deck / Kong Manager)
# kong.yaml
plugins:
- name: rate-limiting
service: my-service
config:
minute: 100
hour: 10000
policy: redis
redis_host: redis
redis_port: 6379
limit_by: consumer
fault_tolerant: true # fail open if Redis is down
hide_client_headers: false
error_code: 429
error_message: "Rate limit exceeded"Kong Response Headers
X-RateLimit-Limit-Minute: 100
X-RateLimit-Remaining-Minute: 73
X-RateLimit-Limit-Hour: 10000
X-RateLimit-Remaining-Hour: 9847
RateLimit-Limit: 100
RateLimit-Remaining: 73
RateLimit-Reset: 47
11. Client-Side Rate Limiting: Retry and Backoff
Any client calling a rate-limited API MUST implement proper retry logic.
11.1 Exponential Backoff with Jitter
import time
import random
import requests
from typing import Any
class RateLimitedAPIClient:
"""
HTTP client with built-in rate limit handling.
Implements exponential backoff with full jitter.
"""
def __init__(
self,
base_url: str,
max_retries: int = 5,
base_delay: float = 1.0,
max_delay: float = 60.0
):
self.base_url = base_url
self.max_retries = max_retries
self.base_delay = base_delay
self.max_delay = max_delay
def request(self, method: str, path: str, **kwargs) -> Any:
url = f"{self.base_url}{path}"
last_exception = None
for attempt in range(self.max_retries + 1):
try:
response = requests.request(method, url, **kwargs)
if response.status_code == 429:
# Check Retry-After header first
retry_after = response.headers.get("Retry-After")
if retry_after:
wait = float(retry_after)
else:
# Exponential backoff: 2^attempt * base_delay
wait = min(
self.max_delay,
(2 ** attempt) * self.base_delay
)
# Full jitter: randomize between 0 and wait
wait = random.uniform(0, wait)
if attempt < self.max_retries:
print(f"Rate limited. Retrying in {wait:.2f}s "
f"(attempt {attempt + 1}/{self.max_retries})")
time.sleep(wait)
continue
else:
response.raise_for_status()
# Proactively slow down when remaining is low
remaining = response.headers.get("X-RateLimit-Remaining")
if remaining and int(remaining) < 10:
time.sleep(0.5) # Voluntary slowdown
return response
except requests.exceptions.RequestException as e:
last_exception = e
if attempt < self.max_retries:
wait = min(self.max_delay, (2 ** attempt) * self.base_delay)
wait = random.uniform(0, wait)
time.sleep(wait)
raise last_exception or Exception("Max retries exceeded")
# Usage
client = RateLimitedAPIClient("https://api.example.com", max_retries=5)
response = client.request("GET", "/data", headers={"X-API-Key": "my_key"})11.2 Jitter Strategies Explained
import random
base_delay = 1.0 # seconds
attempt = 3 # 3rd retry
# 1. No jitter (DO NOT USE - causes thundering herd)
wait = (2 ** attempt) * base_delay # 8.0s - all clients retry at same time
# 2. Full jitter (RECOMMENDED for most cases)
wait = random.uniform(0, (2 ** attempt) * base_delay) # 0-8s random
# 3. Equal jitter (good balance)
cap = (2 ** attempt) * base_delay
wait = cap / 2 + random.uniform(0, cap / 2) # 4-8s
# 4. Decorrelated jitter (AWS recommended)
prev_sleep = 1.0 # previous sleep time
wait = random.uniform(base_delay, prev_sleep * 3) # correlates with last sleep11.3 Proactive Rate Limit Tracking
class ProactiveRateLimitClient:
"""
Tracks rate limit headers and throttles itself BEFORE getting a 429.
"""
def __init__(self, base_url: str, api_key: str):
self.base_url = base_url
self.api_key = api_key
self.remaining = None
self.reset_at = None
self.limit = None
def request(self, path: str) -> dict:
# If we know we are close to the limit, wait proactively
if self.remaining is not None and self.remaining < 5:
now = time.time()
if self.reset_at and self.reset_at > now:
wait = self.reset_at - now
print(f"Proactively waiting {wait:.1f}s (only {self.remaining} remaining)")
time.sleep(wait + 0.1)
response = requests.get(
f"{self.base_url}{path}",
headers={"X-API-Key": self.api_key}
)
# Update our tracking from response headers
self.limit = int(response.headers.get("X-RateLimit-Limit", 0) or 0)
self.remaining = int(response.headers.get("X-RateLimit-Remaining", 0) or 0)
reset = response.headers.get("X-RateLimit-Reset")
self.reset_at = int(reset) if reset else None
return response.json()Summary
| Technology | Best For | Algorithm | Notes |
|---|---|---|---|
| Redis INCR + EXPIRE | Simple, high-performance | Fixed Window | Most common pattern |
| Redis ZADD/ZRANGE | High accuracy | Sliding Window Log | Memory intensive |
| Redis + Lua | Any algorithm | Any | Use for atomicity |
| Bucket4j + Redis | Java production APIs | Token Bucket | Best Java library |
| Resilience4j | Java service-to-service | Fixed Window | Combined with CB |
| Flask-Limiter | Python Flask APIs | Configurable | Quick setup |
| slowapi | Python FastAPI | Configurable | FastAPI-native |
| express-rate-limit | Node.js APIs | Sliding Window | Most popular |
| Nginx limit_req | IP-based, edge | Token Bucket (burst) | No user-awareness |
| AWS API Gateway | Managed cloud APIs | Token Bucket | Tied to API keys |
| Kong | API Gateway layer | Sliding Window | Plugin-based |
Next: Part 4 - Distributed Rate Limiting
Learn why distributed rate limiting is fundamentally harder, the exact problems you will face,
and how production systems solve them.