Rate Limiting Demystified - Part 3: Implementation Guide

Redis Fundamentals for Rate Limiting
Redis: Fixed Window
Redis: Sliding Window Log with Sorted Sets
Redis: Token Bucket with Lua Script
Java and Spring Boot
Python Implementations
Node.js Implementations
Nginx Rate Limiting
AWS API Gateway
Kong API Gateway
Client-Side Rate Limiting: Retry and Backoff

1. Redis Fundamentals for Rate Limiting

Redis is the industry-standard backend for distributed rate limiting. Here are the key
Redis commands used in rate limiting implementations.

Key Redis Commands

INCR key              -- Atomically increment an integer value
INCRBY key amount     -- Atomically increment by a specific amount
EXPIRE key seconds    -- Set TTL on a key
TTL key               -- Get remaining TTL
ZADD key score member -- Add to sorted set with a score (used for sliding window log)
ZREMRANGEBYSCORE key min max  -- Remove sorted set members by score range
ZCARD key             -- Count members in sorted set
ZCOUNT key min max    -- Count members with score between min and max
HSET key field value  -- Set hash field (used for token bucket state)
HMGET key f1 f2       -- Get multiple hash fields atomically
EVAL script numkeys key arg -- Execute Lua script

Why Redis for Rate Limiting?

Atomic operations: INCR is atomic. Even without Lua scripts, a single INCR
will never produce a race condition. The issue is the conditional check (if count > limit)
which is NOT atomic - hence Lua scripts.
TTL support: Built-in expiry means keys self-clean. No need for a garbage collector.
Data structures: Sorted Sets (ZADD/ZREMRANGEBYSCORE) make sliding window log
trivial to implement.
Pipelining: Send multiple commands in one round trip.
Lua scripting: Execute multi-step logic atomically on the server.

Redis Connection in Python

import redis
 
# Single instance
client = redis.Redis(
    host='localhost',
    port=6379,
    db=0,
    decode_responses=True,
    socket_timeout=1,        # 1 second timeout
    socket_connect_timeout=1
)
 
# With connection pool (production)
pool = redis.ConnectionPool(
    host='localhost',
    port=6379,
    db=0,
    max_connections=50,
    decode_responses=True
)
client = redis.Redis(connection_pool=pool)
 
# Redis Cluster
from redis.cluster import RedisCluster
cluster_client = RedisCluster(
    startup_nodes=[{"host": "redis-1", "port": 6379}],
    decode_responses=True
)

2. Redis: Fixed Window

Basic Implementation

import redis
import time
 
 
def is_allowed_fixed_window(
    r: redis.Redis,
    identifier: str,
    limit: int,
    window_seconds: int
) -> dict:
    """
    Fixed window rate limiting using INCR + EXPIRE.
 
    Returns dict with:
        allowed: bool
        limit: int
        remaining: int
        reset_at: int (unix timestamp)
    """
    now = int(time.time())
    current_window = now // window_seconds
    reset_at = (current_window + 1) * window_seconds
    key = f"rl:fw:{identifier}:{current_window}"
 
    pipe = r.pipeline()
    pipe.incr(key)
    pipe.ttl(key)
    count, ttl = pipe.execute()
 
    # Only set TTL on first request (count <mark class="obsidian-highlight"> 1)
    # This is safe because INCR is atomic
    if count </mark> 1:
        r.expire(key, window_seconds + 10)  # +10 seconds buffer
 
    allowed = count <= limit
    return {
        "allowed": allowed,
        "limit": limit,
        "remaining": max(0, limit - count),
        "reset_at": reset_at,
        "current_count": count
    }

Spring Boot Filter Using Fixed Window

// Dependency: spring-boot-starter-data-redis
 
@Component
@Order(1)
public class RateLimitFilter extends OncePerRequestFilter {
 
    private final StringRedisTemplate redisTemplate;
 
    // Limits: endpoint pattern -> (limit, windowSeconds)
    private static final Map<String, int[]> ENDPOINT_LIMITS = Map.of(
        "/api/auth/login", new int[]{5, 60},
        "/api/export",     new int[]{2, 3600},
        "/api/",           new int[]{100, 60}
    );
 
    public RateLimitFilter(StringRedisTemplate redisTemplate) {
        this.redisTemplate = redisTemplate;
    }
 
    @Override
    protected void doFilterInternal(
        HttpServletRequest request,
        HttpServletResponse response,
        FilterChain chain
    ) throws ServletException, IOException {
 
        String identifier = extractIdentifier(request);
        int[] limitConfig = getLimitConfig(request.getRequestURI());
        int limit = limitConfig[0];
        int windowSeconds = limitConfig[1];
 
        RateLimitResult result = checkRateLimit(identifier, limit, windowSeconds);
 
        // Always set rate limit headers
        response.setIntHeader("X-RateLimit-Limit", limit);
        response.setIntHeader("X-RateLimit-Remaining", result.getRemaining());
        response.setLongHeader("X-RateLimit-Reset", result.getResetAt());
 
        if (!result.isAllowed()) {
            response.setStatus(429);
            response.setContentType("application/json");
            response.setHeader("Retry-After",
                String.valueOf(result.getResetAt() - Instant.now().getEpochSecond()));
            response.getWriter().write("""
                {
                    "error": "rate_limit_exceeded",
                    "message": "Too many requests. Please retry after %d seconds.",
                    "retry_after": %d
                }
                """.formatted(
                    result.getResetAt() - Instant.now().getEpochSecond(),
                    result.getResetAt() - Instant.now().getEpochSecond()
                ));
            return;
        }
        chain.doFilter(request, response);
    }
 
    private RateLimitResult checkRateLimit(
        String identifier, int limit, int windowSeconds
    ) {
        long now = Instant.now().getEpochSecond();
        long currentWindow = now / windowSeconds;
        long resetAt = (currentWindow + 1) * windowSeconds;
        String key = "rl:fw:" + identifier + ":" + currentWindow;
 
        Long count = redisTemplate.opsForValue().increment(key);
        if (count == null) count = 1L;
 
        if (count == 1) {
            redisTemplate.expire(key, Duration.ofSeconds(windowSeconds + 10));
        }
 
        return new RateLimitResult(
            count <= limit,
            limit,
            (int) Math.max(0, limit - count),
            resetAt
        );
    }
 
    private String extractIdentifier(HttpServletRequest request) {
        // Prefer API key, fall back to user ID, then IP
        String apiKey = request.getHeader("X-API-Key");
        if (apiKey != null) return "apikey:" + apiKey;
 
        // If authenticated, use user ID from JWT
        String userId = (String) request.getAttribute("userId");
        if (userId != null) return "user:" + userId;
 
        // Fall back to IP
        String ip = request.getHeader("X-Forwarded-For");
        if (ip != null) ip = ip.split(",")[0].trim();
        else ip = request.getRemoteAddr();
        return "ip:" + ip;
    }
 
    private int[] getLimitConfig(String uri) {
        for (Map.Entry<String, int[]> entry : ENDPOINT_LIMITS.entrySet()) {
            if (uri.startsWith(entry.getKey())) {
                return entry.getValue();
            }
        }
        return new int[]{1000, 60}; // default: 1000/minute
    }
}

3. Redis: Sliding Window Log with Sorted Sets

import redis
import time
import uuid
 
 
# Lua script for atomic sliding window log
# This prevents race conditions between check and add
SLIDING_WINDOW_LOG_SCRIPT = """
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window_start = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local request_id = ARGV[4]
local ttl = tonumber(ARGV[5])
 
-- Remove old entries outside the window
redis.call('ZREMRANGEBYSCORE', key, 0, window_start)
 
-- Count current entries
local count = redis.call('ZCARD', key)
 
if count < limit then
    -- Add this request
    redis.call('ZADD', key, now, request_id)
    redis.call('EXPIRE', key, ttl)
    return {1, count + 1, limit - count - 1}  -- allowed, current, remaining
else
    -- Get the oldest entry to calculate retry time
    local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
    local retry_after = 0
    if #oldest > 0 then
        retry_after = tonumber(oldest[2]) + (now - window_start) - now
    end
    return {0, count, 0}  -- denied, current, 0 remaining
end
"""
 
 
class SlidingWindowLogRedis:
 
    def __init__(self, r: redis.Redis, limit: int, window_seconds: int):
        self.r = r
        self.limit = limit
        self.window_seconds = window_seconds
        self.script = r.register_script(SLIDING_WINDOW_LOG_SCRIPT)
 
    def is_allowed(self, identifier: str) -> dict:
        now = time.time()
        window_start = now - self.window_seconds
        request_id = str(uuid.uuid4())
        ttl = self.window_seconds * 2
 
        result = self.script(
            keys=[f"rl:swl:{identifier}"],
            args=[now, window_start, self.limit, request_id, ttl]
        )
 
        allowed = bool(result[0])
        current = int(result[1])
        remaining = int(result[2])
 
        return {
            "allowed": allowed,
            "limit": self.limit,
            "remaining": remaining,
            "current_count": current
        }

4. Redis: Token Bucket with Lua Script

The token bucket MUST be implemented with a Lua script to be atomic. Without atomicity,
two concurrent requests can both read the same token count and both be allowed when only
one should be.

Lua Script

-- token_bucket.lua
-- KEYS[1]: Redis key for this user's bucket
-- ARGV[1]: bucket capacity
-- ARGV[2]: refill rate (tokens per second)
-- ARGV[3]: tokens requested by this operation
-- ARGV[4]: current timestamp (seconds, float)
-- ARGV[5]: TTL for the key (seconds)
 
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local requested = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
local ttl = tonumber(ARGV[5])
 
-- Load current state (tokens, last_refill_time)
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local current_tokens = tonumber(data[1])
local last_refill = tonumber(data[2])
 
-- Initialize on first use
if current_tokens == nil then
    current_tokens = capacity
    last_refill = now
end
 
-- Calculate tokens to add based on elapsed time
local elapsed = now - last_refill
local tokens_to_add = elapsed * refill_rate
local new_tokens = math.min(capacity, current_tokens + tokens_to_add)
 
-- Check if enough tokens
if new_tokens >= requested then
    -- Allow: deduct tokens
    local remaining = new_tokens - requested
    redis.call('HMSET', key,
        'tokens', remaining,
        'last_refill', now
    )
    redis.call('EXPIRE', key, ttl)
    -- Return: allowed=1, tokens_remaining, retry_after=0
    return {1, remaining, 0}
else
    -- Deny: update tokens without deducting (just update refill time)
    redis.call('HMSET', key,
        'tokens', new_tokens,
        'last_refill', now
    )
    redis.call('EXPIRE', key, ttl)
    -- Calculate when enough tokens will be available
    local tokens_needed = requested - new_tokens
    local retry_after = tokens_needed / refill_rate
    -- Return: allowed=0, tokens_remaining, retry_after
    return {0, new_tokens, retry_after}
end

Python Client

import redis
import time
 
 
class TokenBucketRedis:
    """
    Token Bucket Rate Limiter backed by Redis with Lua script for atomicity.
    Thread-safe. Works across multiple application instances.
    """
 
    LUA_SCRIPT = """
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local requested = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
local ttl = tonumber(ARGV[5])
 
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local current_tokens = tonumber(data[1])
local last_refill = tonumber(data[2])
 
if current_tokens == nil then
    current_tokens = capacity
    last_refill = now
end
 
local elapsed = now - last_refill
local new_tokens = math.min(capacity, current_tokens + elapsed * refill_rate)
 
if new_tokens >= requested then
    redis.call('HMSET', key, 'tokens', new_tokens - requested, 'last_refill', now)
    redis.call('EXPIRE', key, ttl)
    return {1, new_tokens - requested, 0}
else
    redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
    redis.call('EXPIRE', key, ttl)
    local retry_after = (requested - new_tokens) / refill_rate
    return {0, new_tokens, retry_after}
end
"""
 
    def __init__(
        self,
        r: redis.Redis,
        capacity: float,
        refill_rate: float,
        key_prefix: str = "rl:tb"
    ):
        self.r = r
        self.capacity = capacity
        self.refill_rate = refill_rate
        self.key_prefix = key_prefix
        self.script = r.register_script(self.LUA_SCRIPT)
 
    def is_allowed(self, identifier: str, cost: float = 1.0) -> dict:
        key = f"{self.key_prefix}:{identifier}"
        now = time.time()
        ttl = int(self.capacity / self.refill_rate) * 2 + 60
 
        result = self.script(
            keys=[key],
            args=[self.capacity, self.refill_rate, cost, now, ttl]
        )
 
        allowed = bool(int(result[0]))
        tokens_remaining = float(result[1])
        retry_after = float(result[2])
 
        return {
            "allowed": allowed,
            "tokens_remaining": tokens_remaining,
            "retry_after_seconds": retry_after,
            "limit": self.capacity,
            "refill_rate": self.refill_rate
        }
 
 
# Usage
r = redis.Redis(host="localhost", port=6379, decode_responses=True)
limiter = TokenBucketRedis(r, capacity=100, refill_rate=10)
 
# Standard API call (cost=1)
result = limiter.is_allowed("user:123")
print(f"Allowed: {result['allowed']}, Tokens: {result['tokens_remaining']:.1f}")
 
# Expensive operation (cost=10)
result = limiter.is_allowed("user:123", cost=10)
print(f"Expensive op - Allowed: {result['allowed']}")

5. Java and Spring Boot

5.1 Bucket4j (Local or Redis-backed)

Bucket4j is the most popular Java rate limiting library. It implements token bucket
and is available in local (in-memory) and distributed (Redis, Hazelcast) modes.

<!-- pom.xml -->
<dependency>
    <groupId>com.bucket4j</groupId>
    <artifactId>bucket4j-core</artifactId>
    <version>8.7.0</version>
</dependency>
<dependency>
    <groupId>com.bucket4j</groupId>
    <artifactId>bucket4j-redis</artifactId>
    <version>8.7.0</version>
</dependency>

import io.github.bucket4j.*;
import io.github.bucket4j.distributed.proxy.ProxyManager;
import io.github.bucket4j.redis.lettuce.cas.LettuceBasedProxyManager;
import io.lettuce.core.RedisClient;
import io.lettuce.core.api.StatefulRedisConnection;
import io.lettuce.core.codec.ByteArrayCodec;
 
import java.time.Duration;
import java.util.concurrent.ConcurrentHashMap;
 
@Configuration
public class RateLimitConfig {
 
    @Bean
    public ProxyManager<String> proxyManager(
        StatefulRedisConnection<byte[], byte[]> connection
    ) {
        return LettuceBasedProxyManager
            .builderFor(connection)
            .build();
    }
 
    @Bean
    public StatefulRedisConnection<byte[], byte[]> redisConnection(
        RedisClient redisClient
    ) {
        return redisClient.connect(ByteArrayCodec.INSTANCE);
    }
}
 
 
@Service
public class RateLimitService {
 
    private final ProxyManager<String> proxyManager;
 
    public RateLimitService(ProxyManager<String> proxyManager) {
        this.proxyManager = proxyManager;
    }
 
    /**
     * Build a bucket configuration based on user tier.
     */
    private BucketConfiguration getBucketConfig(String tier) {
        return switch (tier) {
            case "free"       -> BucketConfiguration.builder()
                    .addLimit(Bandwidth.classic(100, Refill.greedy(100, Duration.ofMinutes(1))))
                    .addLimit(Bandwidth.classic(1000, Refill.greedy(1000, Duration.ofHours(1))))
                    .build();
            case "pro"        -> BucketConfiguration.builder()
                    .addLimit(Bandwidth.classic(1000, Refill.greedy(1000, Duration.ofMinutes(1))))
                    .build();
            case "enterprise" -> BucketConfiguration.builder()
                    .addLimit(Bandwidth.classic(10000, Refill.greedy(10000, Duration.ofMinutes(1))))
                    .build();
            default           -> BucketConfiguration.builder()
                    .addLimit(Bandwidth.classic(50, Refill.greedy(50, Duration.ofMinutes(1))))
                    .build();
        };
    }
 
    public ConsumptionProbe tryConsume(String userId, String tier, long cost) {
        String bucketKey = "bucket:" + userId;
        Bucket bucket = proxyManager.builder()
            .build(bucketKey, () -> getBucketConfig(tier));
 
        return bucket.tryConsumeAndReturnRemaining(cost);
    }
}
 
 
@RestController
@RequestMapping("/api")
public class ApiController {
 
    private final RateLimitService rateLimitService;
 
    public ApiController(RateLimitService rateLimitService) {
        this.rateLimitService = rateLimitService;
    }
 
    @GetMapping("/data")
    public ResponseEntity<?> getData(
        @RequestAttribute("userId") String userId,
        @RequestAttribute("userTier") String tier,
        HttpServletResponse response
    ) {
        ConsumptionProbe probe = rateLimitService.tryConsume(userId, tier, 1);
 
        response.setHeader("X-RateLimit-Remaining",
            String.valueOf(probe.getRemainingTokens()));
        response.setHeader("X-RateLimit-Limit",
            String.valueOf(probe.getRemainingTokens() + probe.getNanosToWaitForRefill()));
 
        if (!probe.isConsumed()) {
            long retryAfterSeconds =
                TimeUnit.NANOSECONDS.toSeconds(probe.getNanosToWaitForRefill());
            response.setHeader("Retry-After", String.valueOf(retryAfterSeconds));
            return ResponseEntity.status(429)
                .body(Map.of(
                    "error", "rate_limit_exceeded",
                    "retry_after", retryAfterSeconds
                ));
        }
 
        return ResponseEntity.ok(Map.of("data", "your data here"));
    }
}

5.2 Resilience4j Rate Limiter

Resilience4j is the recommended replacement for Hystrix and includes a rate limiter.

<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-spring-boot3</artifactId>
    <version>2.1.0</version>
</dependency>

# application.yml
resilience4j:
  ratelimiter:
    instances:
      backendA:
        limitForPeriod: 100
        limitRefreshPeriod: 1s
        timeoutDuration: 0s # Don't wait, fail immediately
      backendB:
        limitForPeriod: 20
        limitRefreshPeriod: 500ms
        timeoutDuration: 100ms # Wait up to 100ms for a slot

@Service
public class ExternalApiService {
 
    // Annotation-based rate limiting
    @RateLimiter(name = "backendA", fallbackMethod = "rateLimitFallback")
    public String callExternalApi(String param) {
        // This method is rate limited to 100 calls/second
        return externalHttpClient.get(param);
    }
 
    // Fallback when rate limit is exceeded
    public String rateLimitFallback(String param, RequestNotPermitted e) {
        return "Rate limit exceeded. Please try again later.";
    }
 
    // Programmatic usage
    public String callWithProgrammaticRateLimit(String param) {
        RateLimiter limiter = RateLimiter.of("customLimiter",
            RateLimiterConfig.custom()
                .limitForPeriod(50)
                .limitRefreshPeriod(Duration.ofSeconds(1))
                .timeoutDuration(Duration.ZERO)
                .build()
        );
 
        return RateLimiter.decorateSupplier(limiter,
            () -> externalHttpClient.get(param)
        ).get();
    }
}

6. Python Implementations

6.1 Flask-Limiter

from flask import Flask, jsonify, request
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address
from werkzeug.exceptions import TooManyRequests
 
app = Flask(__name__)
 
# Configure limiter with Redis backend
limiter = Limiter(
    key_func=get_remote_address,
    app=app,
    default_limits=["200 per day", "50 per hour"],
    storage_uri="redis://localhost:6379",
    strategy="moving-window",  # or "fixed-window"
    headers_enabled=True       # adds X-RateLimit-* headers automatically
)
 
 
def get_api_key():
    """Custom key function: limit by API key instead of IP."""
    return request.headers.get("X-API-Key", get_remote_address())
 
 
# Default limits apply to all routes
@app.route("/api/public")
def public_endpoint():
    return jsonify({"data": "public data"})
 
 
# Override with endpoint-specific limit
@app.route("/api/search")
@limiter.limit("20 per minute")
def search():
    return jsonify({"results": []})
 
 
# Different limits for different methods
@app.route("/api/export")
@limiter.limit("5 per hour", key_func=get_api_key)
def export():
    return jsonify({"export": "data"})
 
 
# Exempt an endpoint completely
@app.route("/health")
@limiter.exempt
def health_check():
    return jsonify({"status": "ok"})
 
 
# Custom error handler
@app.errorhandler(429)
def rate_limit_handler(e):
    return jsonify({
        "error": "rate_limit_exceeded",
        "description": str(e.description),
        "retry_after": e.retry_after if hasattr(e, "retry_after") else None
    }), 429

6.2 FastAPI with slowapi

from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded
from slowapi.middleware import SlowAPIMiddleware
 
# Custom key function
def get_user_id(request: Request) -> str:
    user_id = request.state.user_id if hasattr(request.state, "user_id") else None
    if user_id:
        return f"user:{user_id}"
    api_key = request.headers.get("X-API-Key")
    if api_key:
        return f"apikey:{api_key}"
    return get_remote_address(request)
 
 
limiter = Limiter(
    key_func=get_user_id,
    default_limits=["100/minute"],
    storage_uri="redis://localhost:6379"
)
 
app = FastAPI()
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
app.add_middleware(SlowAPIMiddleware)
 
 
@app.get("/api/data")
@limiter.limit("50/minute")
async def get_data(request: Request):
    return {"data": "some data"}
 
 
@app.post("/api/auth/login")
@limiter.limit("5/minute")
async def login(request: Request, credentials: dict):
    return {"token": "jwt_token_here"}
 
 
# Conditional rate limiting
@app.get("/api/search")
@limiter.limit("100/minute", exempt_when=lambda req: req.headers.get("X-Bypass-Key") == "secret")
async def search(request: Request, q: str):
    return {"results": [], "query": q}

6.3 Custom Middleware (Framework-Agnostic)

import time
import functools
import redis
from typing import Callable
 
 
class RedisRateLimiter:
    """
    Production-grade rate limiter with Redis backend.
    Supports multiple limit types simultaneously.
    """
 
    def __init__(self, redis_client: redis.Redis):
        self.r = redis_client
        self._script = redis_client.register_script("""
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local window_id = math.floor(now / window)
local window_key = key .. ':' .. window_id
local count = redis.call('INCR', window_key)
if count == 1 then
    redis.call('EXPIRE', window_key, window * 2)
end
return {count, limit - count, (window_id + 1) * window}
""")
 
    def check(
        self,
        identifier: str,
        limits: list[tuple[int, int]]  # list of (limit, window_seconds)
    ) -> dict:
        """
        Check multiple limits simultaneously.
        All limits must pass for the request to be allowed.
        """
        now = int(time.time())
        results = []
 
        for limit, window in limits:
            key = f"rl:{identifier}:{window}"
            result = self._script(keys=[key], args=[limit, window, now])
            count, remaining, reset_at = int(result[0]), int(result[1]), int(result[2])
            results.append({
                "window": window,
                "limit": limit,
                "count": count,
                "remaining": remaining,
                "reset_at": reset_at,
                "allowed": count <= limit
            })
 
        # All limits must pass
        all_allowed = all(r["allowed"] for r in results)
        # Return the most restrictive remaining
        min_remaining = min(r["remaining"] for r in results)
        nearest_reset = min(r["reset_at"] for r in results)
 
        return {
            "allowed": all_allowed,
            "remaining": max(0, min_remaining),
            "reset_at": nearest_reset,
            "details": results
        }
 
 
def rate_limit(identifier_fn: Callable, limits: list[tuple[int, int]]):
    """Decorator for rate limiting any function."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            r = redis.Redis(host="localhost", decode_responses=True)
            limiter = RedisRateLimiter(r)
            identifier = identifier_fn(*args, **kwargs)
            result = limiter.check(identifier, limits)
            if not result["allowed"]:
                raise Exception(f"Rate limit exceeded. Retry at {result['reset_at']}")
            return func(*args, **kwargs)
        return wrapper
    return decorator

7. Node.js Implementations

7.1 Express with express-rate-limit

const express = require("express");
const rateLimit = require("express-rate-limit");
const RedisStore = require("rate-limit-redis");
const { createClient } = require("redis");
 
const app = express();
const redisClient = createClient({
  url: process.env.REDIS_URL || "redis://localhost:6379",
});
redisClient.connect();
 
// General API rate limiter
const apiLimiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100,
  standardHeaders: true, // Return RateLimit-* headers (RFC standard)
  legacyHeaders: false, // Disable X-RateLimit-* legacy headers
  store: new RedisStore({
    sendCommand: (...args) => redisClient.sendCommand(args),
    prefix: "rl:api:",
  }),
  keyGenerator: (req) => {
    // Use API key if available, otherwise IP
    return req.headers["x-api-key"] || req.ip;
  },
  handler: (req, res, next, options) => {
    res.status(429).json({
      error: "rate_limit_exceeded",
      message: `Too many requests. Retry after ${Math.ceil(options.windowMs / 1000)} seconds.`,
      retry_after: Math.ceil((req.rateLimit.resetTime - Date.now()) / 1000),
    });
  },
});
 
// Stricter limiter for auth endpoints
const authLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 10,
  standardHeaders: true,
  legacyHeaders: false,
  store: new RedisStore({
    sendCommand: (...args) => redisClient.sendCommand(args),
    prefix: "rl:auth:",
  }),
  message: {
    error: "too_many_login_attempts",
    message: "Too many login attempts. Please try again in 15 minutes.",
  },
});
 
// Apply to all API routes
app.use("/api/", apiLimiter);
 
// Apply stricter limit to auth
app.use("/api/auth/", authLimiter);
 
app.get("/api/data", (req, res) => {
  res.json({
    data: "your data",
    rateLimit: {
      limit: req.rateLimit?.limit,
      remaining: req.rateLimit?.remaining,
      resetTime: req.rateLimit?.resetTime,
    },
  });
});

7.2 Custom Redis Token Bucket (Node.js)

const redis = require("redis");
 
const TOKEN_BUCKET_SCRIPT = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local requested = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
local ttl = tonumber(ARGV[5])
 
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local current_tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now
 
local elapsed = now - last_refill
local new_tokens = math.min(capacity, current_tokens + elapsed * refill_rate)
 
if new_tokens >= requested then
    redis.call('HMSET', key, 'tokens', new_tokens - requested, 'last_refill', now)
    redis.call('EXPIRE', key, ttl)
    return {1, new_tokens - requested, 0}
else
    redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
    redis.call('EXPIRE', key, ttl)
    local retry_after = (requested - new_tokens) / refill_rate
    return {0, new_tokens, retry_after}
end
`;
 
class TokenBucketRedis {
  constructor(redisClient, { capacity, refillRate, keyPrefix = "rl:tb" }) {
    this.client = redisClient;
    this.capacity = capacity;
    this.refillRate = refillRate;
    this.keyPrefix = keyPrefix;
  }
 
  async isAllowed(identifier, cost = 1) {
    const key = `${this.keyPrefix}:${identifier}`;
    const now = Date.now() / 1000; // Unix timestamp in seconds
    const ttl = Math.ceil(this.capacity / this.refillRate) * 2 + 60;
 
    const result = await this.client.eval(TOKEN_BUCKET_SCRIPT, {
      keys: [key],
      arguments: [
        String(this.capacity),
        String(this.refillRate),
        String(cost),
        String(now),
        String(ttl),
      ],
    });
 
    return {
      allowed: result[0] === 1,
      tokensRemaining: parseFloat(result[1]),
      retryAfterSeconds: parseFloat(result[2]),
    };
  }
}
 
// Express middleware using TokenBucketRedis
function createTokenBucketMiddleware(limiter) {
  return async (req, res, next) => {
    const identifier = req.headers["x-api-key"] || req.ip;
 
    try {
      const result = await limiter.isAllowed(identifier);
 
      res.set({
        "X-RateLimit-Limit": limiter.capacity,
        "X-RateLimit-Remaining": Math.floor(result.tokensRemaining),
        "X-RateLimit-RefillRate": limiter.refillRate,
      });
 
      if (!result.allowed) {
        res.set("Retry-After", Math.ceil(result.retryAfterSeconds));
        return res.status(429).json({
          error: "rate_limit_exceeded",
          retry_after: Math.ceil(result.retryAfterSeconds),
        });
      }
      next();
    } catch (err) {
      // Fail open: allow request if Redis is unavailable
      console.error("Rate limiter error:", err);
      next();
    }
  };
}

8. Nginx Rate Limiting

Nginx has built-in rate limiting via the ngx_http_limit_req_module.

Basic Configuration

http {
    # Define rate limit zone
    # $binary_remote_addr = client IP (4 bytes for IPv4, more compact than $remote_addr)
    # zone=api_limit:10m = zone named "api_limit", 10MB shared memory
    # rate=100r/m = 100 requests per minute
    limit_req_zone $binary_remote_addr zone=per_ip:10m rate=100r/m;
    limit_req_zone $http_x_api_key zone=per_apikey:10m rate=1000r/m;
 
    # Status code for rate limited requests (default: 503)
    limit_req_status 429;
 
    # Log level for rate limited requests
    limit_req_log_level warn;
 
    server {
        listen 80;
        server_name api.example.com;
 
        # Apply rate limiting to all /api/ routes
        location /api/ {
            # burst=50: allow queue of up to 50 requests
            # nodelay: process burst requests immediately, not spread over time
            limit_req zone=per_ip burst=50 nodelay;
 
            # Custom 429 response
            error_page 429 @rate_limit_error;
 
            proxy_pass http://backend_servers;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }
 
        # Stricter limit for auth endpoints
        location /api/auth/ {
            limit_req zone=per_ip burst=5 nodelay;
            error_page 429 @rate_limit_error;
            proxy_pass http://backend_servers;
        }
 
        # Rate limit by API key (if available)
        location /api/v2/ {
            limit_req zone=per_apikey burst=200 nodelay;
            error_page 429 @rate_limit_error;
            proxy_pass http://backend_servers;
        }
 
        # Named location for 429 error response
        location @rate_limit_error {
            default_type application/json;
            return 429 '{"error":"rate_limit_exceeded","message":"Too many requests. Please slow down."}';
        }
 
        # Limit concurrent connections per IP
        limit_conn_zone $binary_remote_addr zone=conn_per_ip:10m;
        location /downloads/ {
            limit_conn conn_per_ip 5;  # max 5 concurrent connections per IP
            proxy_pass http://download_servers;
        }
    }
}

Advanced: Rate Limit by URL Parameter

# Dynamically choose zone based on presence of API key header
map $http_x_api_key $rate_limit_key {
    ""      $binary_remote_addr;  # No API key: limit by IP
    default $http_x_api_key;      # API key present: limit by key
}
 
limit_req_zone $rate_limit_key zone=dynamic_limit:10m rate=100r/m;

Nginx Burst Behavior Explained

rate=10r/s, burst=20, nodelay

Without nodelay: requests beyond 10r/s are delayed (queued), burst allows 20 to wait
With nodelay: the first 20 excess requests are processed immediately, then 429

Visual:
Time:     0s      1s      2s
Requests: 30      0       0

Without nodelay:
  t=0s: 10 allowed immediately, 20 queued (2s queue for 10r/s = 2s delay)
  t=0-2s: queued requests drip out at 10r/s

With nodelay:
  t=0s: all 30 processed immediately (10 + 20 burst), 0 to 429
  t=1s: bucket refills to 10, next 10 allowed

Always use `nodelay` for APIs unless you want artificial latency injection.

9. AWS API Gateway

Usage Plans and API Keys

# Create a usage plan
aws apigateway create-usage-plan \
    --name "FreeTier" \
    --description "Free tier: 100 RPM, 10000/day" \
    --throttle '{"rateLimit": 100, "burstLimit": 200}' \
    --quota '{"limit": 10000, "period": "DAY"}'
 
# Associate a stage
aws apigateway create-usage-plan-key \
    --usage-plan-id <plan-id> \
    --key-id <api-key-id> \
    --key-type "API_KEY"

CloudFormation / SAM Configuration

# template.yaml (SAM)
Resources:
  MyApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: prod
      Auth:
        ApiKeyRequired: true
        UsagePlan:
          CreateUsagePlan: PER_STAGE
          Throttle:
            RateLimit: 100 # requests/second
            BurstLimit: 200 # max burst
          Quota:
            Limit: 10000
            Period: DAY
 
  # Per-method throttling override
  MyApiMethodThrottle:
    Type: AWS::ApiGateway::UsagePlan
    Properties:
      ApiStages:
        - ApiId: !Ref MyApi
          Stage: prod
          Throttle:
            /export/GET:
              RateLimit: 1
              BurstLimit: 2

AWS Response Headers

AWS API Gateway automatically adds these when rate limited:

x-amzn-RequestId: ...
x-amzn-errortype: ThrottlingException
Content-Type: application/json

{
    "message": "Too Many Requests"
}

AWS returns 429 for per-key throttling and 429 for stage-level throttling.

10. Kong API Gateway

Rate Limiting Plugin

# Enable rate limiting on a service
curl -X POST http://localhost:8001/services/my-service/plugins \
    --data "name=rate-limiting" \
    --data "config.minute=100" \
    --data "config.hour=10000" \
    --data "config.day=100000" \
    --data "config.policy=redis" \
    --data "config.redis.host=redis" \
    --data "config.redis.port=6379" \
    --data "config.limit_by=consumer"  # or ip, credential, service
 
# Enable on a specific route
curl -X POST http://localhost:8001/routes/my-route/plugins \
    --data "name=rate-limiting" \
    --data "config.second=10" \
    --data "config.policy=local"

Kong declarative config (deck / Kong Manager)

# kong.yaml
plugins:
  - name: rate-limiting
    service: my-service
    config:
      minute: 100
      hour: 10000
      policy: redis
      redis_host: redis
      redis_port: 6379
      limit_by: consumer
      fault_tolerant: true # fail open if Redis is down
      hide_client_headers: false
      error_code: 429
      error_message: "Rate limit exceeded"

Kong Response Headers

X-RateLimit-Limit-Minute: 100
X-RateLimit-Remaining-Minute: 73
X-RateLimit-Limit-Hour: 10000
X-RateLimit-Remaining-Hour: 9847
RateLimit-Limit: 100
RateLimit-Remaining: 73
RateLimit-Reset: 47

11. Client-Side Rate Limiting: Retry and Backoff

Any client calling a rate-limited API MUST implement proper retry logic.

11.1 Exponential Backoff with Jitter

import time
import random
import requests
from typing import Any
 
 
class RateLimitedAPIClient:
    """
    HTTP client with built-in rate limit handling.
    Implements exponential backoff with full jitter.
    """
 
    def __init__(
        self,
        base_url: str,
        max_retries: int = 5,
        base_delay: float = 1.0,
        max_delay: float = 60.0
    ):
        self.base_url = base_url
        self.max_retries = max_retries
        self.base_delay = base_delay
        self.max_delay = max_delay
 
    def request(self, method: str, path: str, **kwargs) -> Any:
        url = f"{self.base_url}{path}"
        last_exception = None
 
        for attempt in range(self.max_retries + 1):
            try:
                response = requests.request(method, url, **kwargs)
 
                if response.status_code == 429:
                    # Check Retry-After header first
                    retry_after = response.headers.get("Retry-After")
                    if retry_after:
                        wait = float(retry_after)
                    else:
                        # Exponential backoff: 2^attempt * base_delay
                        wait = min(
                            self.max_delay,
                            (2 ** attempt) * self.base_delay
                        )
                        # Full jitter: randomize between 0 and wait
                        wait = random.uniform(0, wait)
 
                    if attempt < self.max_retries:
                        print(f"Rate limited. Retrying in {wait:.2f}s "
                              f"(attempt {attempt + 1}/{self.max_retries})")
                        time.sleep(wait)
                        continue
                    else:
                        response.raise_for_status()
 
                # Proactively slow down when remaining is low
                remaining = response.headers.get("X-RateLimit-Remaining")
                if remaining and int(remaining) < 10:
                    time.sleep(0.5)  # Voluntary slowdown
 
                return response
 
            except requests.exceptions.RequestException as e:
                last_exception = e
                if attempt < self.max_retries:
                    wait = min(self.max_delay, (2 ** attempt) * self.base_delay)
                    wait = random.uniform(0, wait)
                    time.sleep(wait)
 
        raise last_exception or Exception("Max retries exceeded")
 
 
# Usage
client = RateLimitedAPIClient("https://api.example.com", max_retries=5)
response = client.request("GET", "/data", headers={"X-API-Key": "my_key"})

11.2 Jitter Strategies Explained

import random
 
base_delay = 1.0  # seconds
attempt = 3       # 3rd retry
 
# 1. No jitter (DO NOT USE - causes thundering herd)
wait = (2 ** attempt) * base_delay          # 8.0s - all clients retry at same time
 
# 2. Full jitter (RECOMMENDED for most cases)
wait = random.uniform(0, (2 ** attempt) * base_delay)  # 0-8s random
 
# 3. Equal jitter (good balance)
cap = (2 ** attempt) * base_delay
wait = cap / 2 + random.uniform(0, cap / 2)  # 4-8s
 
# 4. Decorrelated jitter (AWS recommended)
prev_sleep = 1.0  # previous sleep time
wait = random.uniform(base_delay, prev_sleep * 3)  # correlates with last sleep

11.3 Proactive Rate Limit Tracking

class ProactiveRateLimitClient:
    """
    Tracks rate limit headers and throttles itself BEFORE getting a 429.
    """
 
    def __init__(self, base_url: str, api_key: str):
        self.base_url = base_url
        self.api_key = api_key
        self.remaining = None
        self.reset_at = None
        self.limit = None
 
    def request(self, path: str) -> dict:
        # If we know we are close to the limit, wait proactively
        if self.remaining is not None and self.remaining < 5:
            now = time.time()
            if self.reset_at and self.reset_at > now:
                wait = self.reset_at - now
                print(f"Proactively waiting {wait:.1f}s (only {self.remaining} remaining)")
                time.sleep(wait + 0.1)
 
        response = requests.get(
            f"{self.base_url}{path}",
            headers={"X-API-Key": self.api_key}
        )
 
        # Update our tracking from response headers
        self.limit = int(response.headers.get("X-RateLimit-Limit", 0) or 0)
        self.remaining = int(response.headers.get("X-RateLimit-Remaining", 0) or 0)
        reset = response.headers.get("X-RateLimit-Reset")
        self.reset_at = int(reset) if reset else None
 
        return response.json()

Summary

Technology	Best For	Algorithm	Notes
Redis INCR + EXPIRE	Simple, high-performance	Fixed Window	Most common pattern
Redis ZADD/ZRANGE	High accuracy	Sliding Window Log	Memory intensive
Redis + Lua	Any algorithm	Any	Use for atomicity
Bucket4j + Redis	Java production APIs	Token Bucket	Best Java library
Resilience4j	Java service-to-service	Fixed Window	Combined with CB
Flask-Limiter	Python Flask APIs	Configurable	Quick setup
slowapi	Python FastAPI	Configurable	FastAPI-native
express-rate-limit	Node.js APIs	Sliding Window	Most popular
Nginx limit_req	IP-based, edge	Token Bucket (burst)	No user-awareness
AWS API Gateway	Managed cloud APIs	Token Bucket	Tied to API keys
Kong	API Gateway layer	Sliding Window	Plugin-based

Next: Part 4 - Distributed Rate Limiting

Learn why distributed rate limiting is fundamentally harder, the exact problems you will face,
and how production systems solve them.

Series: Rate Limiting Demystified