Caching Demystified - Part 3: Technologies and Tools

In-Process Caches (Caffeine, Guava)
Redis - Deep Dive
Memcached - When Simplicity Wins
Redis vs Memcached - Decision Guide
HTTP Caching - Caching at the Protocol Level
CDN Caching - Caching at the Edge
Database-Level Caching
Spring Boot Caching - Comprehensive Guide

1. In-Process Caches

In-process caches live inside the application's own memory. They are the fastest possible cache because there is no network round trip. The trade-off is that each application instance has its own separate cache.

1.1 Caffeine Cache (Java)

Caffeine is the state-of-the-art in-process Java cache. It replaced Guava Cache as the recommended choice and is used internally by Spring Boot 3.x. Caffeine outperforms Guava Cache significantly in benchmarks (typically 3-5x higher throughput).

Why Caffeine is fast:

Uses the W-TinyLFU eviction algorithm (better hit rate than LRU)
Asynchronous maintenance operations (eviction, expiry, stats) run on a separate thread pool
Lock-free read path using striped buffering
Optimized for modern multi-core processors

Basic Setup

<!-- pom.xml -->
<dependency>
    <groupId>com.github.ben-manes.caffeine</groupId>
    <artifactId>caffeine</artifactId>
    <version>3.1.8</version>
</dependency>

Size-Based Eviction

Cache<String, User> userCache = Caffeine.newBuilder()
    .maximumSize(10_000)            // Evict when size exceeds 10,000 entries
    .recordStats()                   // Enable metrics
    .build();
 
// Manual cache-aside usage
User user = userCache.get("user:123", key -> {
    // This lambda is the cache loader - called on miss
    return userRepository.findById(extractId(key)).orElse(null);
});

Weight-Based Eviction (for variable-size objects)

Cache<String, byte[]> blobCache = Caffeine.newBuilder()
    .maximumWeight(100 * 1024 * 1024)  // Max 100 MB total weight
    .weigher((String key, byte[] value) -> value.length)  // Weight = byte size
    .build();

Time-Based Eviction

Cache<String, UserSession> sessionCache = Caffeine.newBuilder()
    .maximumSize(100_000)
    .expireAfterWrite(30, TimeUnit.MINUTES)   // Absolute TTL from write time
    .expireAfterAccess(10, TimeUnit.MINUTES)  // Sliding TTL from last access
    .build();
// Note: expireAfterWrite and expireAfterAccess can be combined.
// The entry expires at whichever comes first.

LoadingCache (Read-Through Pattern)

LoadingCache<Long, Product> productCache = Caffeine.newBuilder()
    .maximumSize(50_000)
    .expireAfterWrite(5, TimeUnit.MINUTES)
    .refreshAfterWrite(4, TimeUnit.MINUTES)   // Refresh-Ahead after 4 minutes
    .recordStats()
    .build(productId -> productRepository.findById(productId).orElseThrow());
 
// Usage - always call cache, never call DB directly
Product product = productCache.get(productId);  // Handles miss and refresh automatically
 
// Bulk loading
Map<Long, Product> products = productCache.getAll(List.of(1L, 2L, 3L));

AsyncLoadingCache (Non-blocking)

AsyncLoadingCache<Long, Product> asyncCache = Caffeine.newBuilder()
    .maximumSize(10_000)
    .expireAfterWrite(5, TimeUnit.MINUTES)
    .buildAsync(productId ->
        CompletableFuture.supplyAsync(() -> productRepository.findById(productId).orElseThrow())
    );
 
// Non-blocking read
CompletableFuture<Product> future = asyncCache.get(productId);
future.thenAccept(product -> renderPage(product));

Monitoring Caffeine Metrics

Cache<String, User> cache = Caffeine.newBuilder()
    .maximumSize(10_000)
    .recordStats()  // Must enable stats recording
    .build();
 
// Access stats
CacheStats stats = cache.stats();
System.out.println("Hit rate: " + stats.hitRate());           // e.g., 0.95
System.out.println("Miss rate: " + stats.missRate());         // e.g., 0.05
System.out.println("Evictions: " + stats.evictionCount());
System.out.println("Load count: " + stats.loadCount());
System.out.println("Avg load penalty: " + stats.averageLoadPenalty() + " ns");
 
// Integration with Micrometer for Prometheus/Grafana
CaffeineCacheMetrics.monitor(meterRegistry, cache, "users");

1.2 Guava Cache (Legacy Reference)

Guava Cache was the predecessor to Caffeine. If you see it in an older codebase, understand its API but prefer migrating to Caffeine.

// Guava - older API, similar concepts
Cache<String, User> guavaCache = CacheBuilder.newBuilder()
    .maximumSize(1000)
    .expireAfterWrite(5, TimeUnit.MINUTES)
    .build();

Migration: Guava Cache and Caffeine have nearly identical APIs. Replacing the import and changing CacheBuilder to Caffeine.newBuilder() covers most migrations.

2. Redis - Deep Dive

Redis (Remote Dictionary Server) is the most widely used distributed cache in production systems. It is an open-source, in-memory data structure store that can function as a cache, database, message broker, and streaming engine.

2.1 Redis Architecture

Single-threaded event loop (for command processing):

All read and write commands are processed sequentially in a single thread.
This eliminates race conditions without needing locks.
Very high throughput despite being single-threaded: 100,000 - 1,000,000+ operations/second.

Separate threads for:

Slow I/O operations (disk persistence, network I/O)
Background tasks (key expiration, AOF rewrite)
Since Redis 6.0: Multi-threaded I/O for network reading/writing (but command processing remains single-threaded)

Client Connections
       |
       v
  [Network Layer]    <- Multi-threaded I/O since Redis 6.0
       |
       v
  [Event Loop]       <- Single-threaded command processing
       |
  +---------+--------+
  |         |        |
 [Memory] [Disk I/O] [Replication]

2.2 Redis Data Structures and Use Cases

Redis is not just a key-value store. It supports rich data structures, each optimized for specific use cases.

String

The simplest and most versatile type. Can store text, numbers, or serialized binary data.

# Basic operations
SET user:123 "Alice"               # Set a string value
GET user:123                       # Get: "Alice"
SET counter 0
INCR counter                       # Atomic increment: 1
INCRBY counter 5                   # Increment by 5: 6
DECRBY counter 2                   # Decrement by 2: 4
 
# With TTL
SET session:abc123 "data" EX 3600  # Expire in 3600 seconds
SET product:789 "data" PX 300000   # Expire in 300,000 milliseconds
SET key value NX                   # Set only if key does NOT exist (used for distributed locks)
SET key value XX                   # Set only if key EXISTS
SET key value GET                  # Return old value and set new one (Redis 6.2+)

Use cases:

Session storage: SET session:{id} {serialized_session} EX 1800
Rate limiting counters: INCR ratelimit:{user}:{minute} with EXPIRE
Caching any serialized object (JSON, Protobuf, Java serialized)
Feature flags: SET feature:dark_mode "true"
Distributed locks: SET lock:{resource} {token} NX EX 10

Hash

A map of field-value pairs stored under a single key. Ideal for representing objects.

# User profile as a Hash
HSET user:123 name "Alice" email "alice@example.com" role "admin" age "30"
HGET user:123 name             # "Alice"
HGETALL user:123               # { name: Alice, email: ..., role: ..., age: 30 }
HSET user:123 email "newemail@example.com"  # Update single field
HDEL user:123 age              # Remove field
HINCRBY user:123 login_count 1 # Increment a numeric field
HKEYS user:123                 # [name, email, role, age]
HLEN user:123                  # 4 (number of fields)

Why Hash instead of serialized String?

Update individual fields without deserializing/re-serializing the entire object.
More memory efficient for small-to-medium objects (Redis uses ziplist encoding internally for Hashes with few fields).
HSET user:123 email "new" vs GET user:123 -> deserialize -> update -> serialize -> SET user:123

Use cases:

User profiles, product details, configuration objects
Shopping carts: HSET cart:{userId} product:456 3 (product ID -> quantity)
Rate limit tracking: HSET stats:{userId} requests 100 window 1717545600

List

An ordered collection of strings, implemented as a doubly linked list. Supports operations at both head and tail.

# Queue (FIFO): push to tail, pop from head
RPUSH queue:emails "email1" "email2" "email3"  # Push to right (tail)
LPOP queue:emails                               # Pop from left (head): "email1"
 
# Stack (LIFO): push and pop from same end
LPUSH stack "item1"
LPUSH stack "item2"
LPOP stack                                      # "item2" (last pushed)
 
# Blocking pop (waits for data, used for message queues)
BLPOP queue:emails 30                           # Block up to 30 seconds waiting for data
 
# Recent items (timeline, activity feed)
LPUSH user:123:activity "logged_in"             # Add new item to front
LTRIM user:123:activity 0 49                    # Keep only 50 most recent
LRANGE user:123:activity 0 49                   # Get last 50 activities
LLEN user:123:activity                          # Length

Use cases:

Task queues and job queues
Activity feeds and timelines (keep last N items)
Recent search history
Simple pub/sub messaging

Set

An unordered collection of unique strings. Set operations (union, intersection, difference) are very efficient.

# User tags
SADD user:123:interests "tech" "gaming" "cooking"
SISMEMBER user:123:interests "gaming"           # 1 (true)
SISMEMBER user:123:interests "sports"           # 0 (false)
SMEMBERS user:123:interests                     # { tech, gaming, cooking }
SCARD user:123:interests                        # 3 (cardinality)
 
# Mutual friends (Set intersection)
SADD user:123:friends "alice" "bob" "charlie"
SADD user:456:friends "bob" "charlie" "dave"
SINTER user:123:friends user:456:friends       # { bob, charlie } - mutual friends
 
# Followers / Following
SADD user:123:following "user:456" "user:789"
SADD user:456:followers "user:123"
SUNION user:123:following user:456:followers   # All people in both sets
 
# Random items
SRANDMEMBER user:123:interests 2               # Return 2 random interests
SPOP user:123:interests                        # Remove and return a random element

Use cases:

Unique tags or categories
Social graph (followers, friends, mutual connections)
Tracking unique visitors: SADD page:home:visitors:2026-06-05 "user:123"
Set membership testing
Blacklists and whitelists

Sorted Set (ZSet)

Like a Set, but each member has an associated score (a floating-point number). Members are sorted by score. Score ties are broken by lexicographic order of the member string.

# Leaderboard
ZADD leaderboard 9850 "alice" 8200 "bob" 7500 "charlie"
ZADD leaderboard 9900 "alice"                  # Update score
ZRANK leaderboard "bob"                        # 1 (0-based rank, low to high)
ZREVRANK leaderboard "alice"                   # 0 (0-based rank, high to low - #1)
ZSCORE leaderboard "alice"                     # 9900
ZRANGE leaderboard 0 2 WITHSCORES             # Bottom 3 with scores
ZREVRANGE leaderboard 0 9 WITHSCORES          # Top 10 with scores
ZINCRBY leaderboard 150 "bob"                  # Add 150 to bob's score
 
# Time-based data (timestamp as score)
ZADD events:stream 1717545600 "event:1"        # Unix timestamp as score
ZRANGEBYSCORE events:stream 1717545600 1717549200  # Events in 1-hour window
ZREMRANGEBYSCORE events:stream 0 (NOW - 86400) # Delete events older than 24 hours

Use cases:

Leaderboards (games, rankings)
Priority queues (score = priority)
Rate limiting with sliding window (score = timestamp)
Autocomplete (sorted by relevance score)
Event time-series (score = timestamp)
Trending topics (score = engagement count)

Bitmap

Not a separate data type - implemented on top of Strings. Treats the string as a bit array, allowing operations on individual bits.

# Track daily user logins (user ID as bit offset)
SETBIT user:logins:2026-06-05 123 1    # User 123 logged in on June 5
SETBIT user:logins:2026-06-05 456 1    # User 456 logged in on June 5
GETBIT user:logins:2026-06-05 123      # 1 (logged in)
GETBIT user:logins:2026-06-05 789      # 0 (did not log in)
BITCOUNT user:logins:2026-06-05        # How many users logged in today
 
# Users logged in both days (bitwise AND)
BITOP AND active:both 2026-06-04 2026-06-05
BITCOUNT active:both                   # Users active on both days
 
# Feature flags per user
SETBIT feature:new_ui:users 123 1      # Enable new_ui for user 123
GETBIT feature:new_ui:users 123        # 1 (enabled)

Use cases:

Daily/monthly active user tracking
Feature flag rollouts (enable for user ID subsets)
User activity tracking (which days was user active this month)
Bloom filter approximation

HyperLogLog

A probabilistic data structure that estimates the cardinality (number of distinct elements) of a set using very little memory (~12 KB), with a standard error of 0.81%.

# Count unique visitors
PFADD page:home:visitors "user:123" "user:456" "user:789"
PFADD page:home:visitors "user:123"    # Duplicate - does not increase count
PFCOUNT page:home:visitors             # Approximately 3
 
# Merge multiple HyperLogLogs
PFADD day1:visitors "user:1" "user:2"
PFADD day2:visitors "user:2" "user:3"
PFMERGE week:visitors day1:visitors day2:visitors
PFCOUNT week:visitors                  # Approximately 3 (unique across both days)

Use cases:

Counting unique page views, unique API callers, unique items seen
Any cardinality estimation where approximate counts are acceptable
12 KB per HyperLogLog regardless of whether it tracks 1 or 1 billion unique items

When NOT to use: When exact counts are required. Use a regular Set for exact counting (at higher memory cost).

Stream

An append-only log of messages. Similar to Kafka but built into Redis. Supports consumer groups for distributed processing.

# Append messages to stream
XADD events:orders * orderId "12345" userId "123" amount "99.99"
# * means Redis auto-generates a stream ID (timestamp-based: 1717545600000-0)
 
# Read from stream
XREAD COUNT 10 STREAMS events:orders 0     # Read 10 messages from start
XREAD COUNT 10 STREAMS events:orders $     # Read new messages only (blocking possible)
 
# Consumer groups (for distributed processing)
XGROUP CREATE events:orders order-processor $ MKSTREAM
XREADGROUP GROUP order-processor worker1 COUNT 10 STREAMS events:orders >
# '>' means: give me undelivered messages for this consumer group
 
# Acknowledge processing
XACK events:orders order-processor 1717545600000-0

Use cases:

Event sourcing and audit trails
Real-time analytics ingestion
Microservice messaging without a separate Kafka cluster
IoT sensor data ingestion

2.3 Key Expiration in Redis

Redis supports per-key TTL in seconds or milliseconds.

EXPIRE key 300           # Set TTL to 300 seconds
PEXPIRE key 300000       # Set TTL to 300,000 milliseconds
EXPIREAT key 1893456000  # Expire at specific Unix timestamp
PEXPIREAT key 1893456000000  # Expire at specific Unix timestamp in ms
 
TTL key                  # Remaining TTL in seconds (-1 = no expiry, -2 = key not found)
PTTL key                 # Remaining TTL in milliseconds
PERSIST key              # Remove expiry (make key permanent)

How Redis expires keys internally:

Redis uses two approaches:

Lazy expiration: Key is checked for expiry when it is accessed. If expired, it is deleted and a miss is returned. No CPU overhead until the key is touched.
Active expiration: Every 100ms, Redis samples a configurable number of keys with TTLs set, deletes any that have expired. If more than 25% of sampled keys are expired, it repeats immediately.

The combination ensures expired keys are eventually deleted without constantly scanning all keys.

2.4 Redis Persistence

Redis is an in-memory database, but it supports persistence to disk for durability.

RDB (Redis Database Backup) - Snapshots

Redis periodically takes a point-in-time snapshot of all data and writes it to a .rdb file.

# redis.conf - RDB configuration
save 3600 1      # Save if at least 1 key changed in the last 3600 seconds
save 300 100     # Save if at least 100 keys changed in the last 300 seconds
save 60 10000    # Save if at least 10,000 keys changed in the last 60 seconds
dbfilename dump.rdb
dir /var/lib/redis

How BGSAVE works:

Redis Process (parent)     Fork     Child Process
     |                       |            |
     |  Continues serving  <-+-> Writes snapshot to disk
     |  requests normally       (uses copy-on-write memory)
     |
     |  When new writes occur:
     |  - Parent: gets new memory page (copy-on-write)
     |  - Child: still references old page
     |
     |  Child completes snapshot write -> parent is notified
     |  Old .rdb is replaced with new one atomically

Pros: Fast restart (bulk loading), compact file, great for backups.

Cons: Data loss between last snapshot and crash. With save 60 10000 config, you could lose up to 60 seconds of writes.

AOF (Append-Only File)

Every write command is appended to an AOF log file. On restart, Redis replays the AOF to rebuild state.

# redis.conf - AOF configuration
appendonly yes
appendfilename "appendonly.aof"

# fsync policy - critical performance/durability trade-off
appendfsync always    # fsync after every write. Slowest, safest. (1-2ms per write)
appendfsync everysec  # fsync every second. Fast, at most 1 second of data loss. (Default)
appendfsync no        # Let OS decide when to fsync. Fastest, potentially more data loss.

AOF Rewrite:
Over time, the AOF file grows large. Redis periodically rewrites it to a compact form (e.g., 100 INCR commands become a single SET command):

BGREWRITEAOF  # Trigger manual AOF rewrite

Auto-configured with: auto-aof-rewrite-percentage 100 and auto-aof-rewrite-min-size 64mb

Pros: Much less data loss (potentially zero with appendfsync always). More granular.

Cons: AOF files are larger than RDB. Restart (replay) is slower than loading an RDB file.

Combining RDB + AOF (Recommended for production)

appendonly yes      # Use AOF for data safety
save 3600 1         # Keep RDB snapshots for fast restart and backups

On restart, Redis prefers AOF (more complete) but uses RDB for disaster recovery if AOF is corrupted.

2.5 Redis High Availability

Replication (Master-Replica)

Master Redis
    |
    |------ Replica 1 (read traffic, failover)
    |------ Replica 2 (read traffic, backup)
    |------ Replica 3 (cross-datacenter)

Replicas receive all write commands from the master asynchronously.
Replicas can serve READ requests, distributing read load.
Replication is asynchronous: there is a small lag between master write and replica visibility.
Configure: replicaof <master-ip> <master-port> in replica redis.conf

Redis Sentinel (Automatic Failover)

Sentinel is Redis's high availability solution for a single-shard deployment.

    +------------+   +------------+   +------------+
    | Sentinel 1 |   | Sentinel 2 |   | Sentinel 3 |
    +------------+   +------------+   +------------+
           |                |                |
           +----------------+----------------+
                            |
                     +------+------+
                     |             |
               [Master Redis]  [Replica 1]
                               [Replica 2]

Sentinel responsibilities:

Monitoring: Continuously checks master and replicas are running.
Notification: Alerts administrators/monitoring systems on failures.
Automatic failover: If master becomes unavailable (ODOWN - objectively down, confirmed by quorum), Sentinel promotes a replica to master.
Configuration provider: Clients ask Sentinel which is the current master address.

Quorum: Sentinel requires a majority (quorum) of Sentinel instances to agree before declaring a master down and starting failover. With 3 Sentinels, quorum = 2. This prevents false failovers due to network partitions.

Redis Cluster (Sharding + High Availability)

Redis Cluster provides both data sharding (partitioning) and high availability for large datasets.

Cluster with 6 nodes (3 masters + 3 replicas):

Master A (hash slots 0-5460)       <- Replica A
Master B (hash slots 5461-10922)   <- Replica B
Master C (hash slots 10923-16383)  <- Replica C

16,384 hash slots:

Redis Cluster partitions the hash space into 16,384 slots.
Each key maps to a slot: CRC16(key) % 16384
Each master owns a subset of slots.
Moving data = reassigning slot ownership between masters.

Key implications:

Keys in different hash slots can be on different nodes.
Multi-key operations (MGET, MSET, SUNION) only work if all keys are on the same node.
Use hash tags to force related keys to the same slot: user:{123}:profile and user:{123}:sessions both hash by {123}.

# Cluster info
redis-cli CLUSTER INFO
redis-cli CLUSTER NODES
redis-cli CLUSTER SLOTS
 
# Hash slot for a key
redis-cli CLUSTER KEYSLOT "user:123"    # Returns slot number

2.6 Redis Memory Management

# redis.conf
maxmemory 4gb                         # Maximum memory Redis will use
maxmemory-policy allkeys-lru          # Eviction policy when maxmemory reached
maxmemory-samples 10                  # Sample size for LRU approximation (higher = more accurate)

Memory optimization tips:

Use appropriate data types (Hash is more efficient than multiple String keys for objects).
Use compression for large values (compress JSON before storing).
Set TTLs on all cache keys (no TTL = key lives forever, fills cache).
Monitor mem_fragmentation_ratio: if significantly > 1 (e.g., > 1.5), memory is fragmented.
Consider OBJECT ENCODING key to verify Redis is using efficient internal encodings.

2.7 Performance Features

Pipelining

Send multiple commands without waiting for individual responses. Reduces network round-trip overhead.

// Without pipeline: 3 round trips
redisTemplate.opsForValue().set("key1", "val1");  // Round trip 1
redisTemplate.opsForValue().set("key2", "val2");  // Round trip 2
redisTemplate.opsForValue().set("key3", "val3");  // Round trip 3
 
// With pipeline: 1 round trip for all 3 commands
List<Object> results = redisTemplate.executePipelined(connection -> {
    connection.set("key1".getBytes(), "val1".getBytes());
    connection.set("key2".getBytes(), "val2".getBytes());
    connection.set("key3".getBytes(), "val3".getBytes());
    return null;
});

Pipelining is especially valuable for batch operations (loading cache data, bulk deletes, bulk updates).

Transactions (MULTI/EXEC)

Redis transactions group multiple commands to execute atomically (no other commands interleave, but NOT rolled back on error).

MULTI
SET account:123:balance 1000
DECRBY account:123:balance 100
EXEC

WATCH for optimistic locking:

// Optimistic locking: increment a counter only if nobody else changes it
redisTemplate.execute(new SessionCallback<Object>() {
    public Object execute(RedisOperations ops) {
        ops.watch("counter:123");            // Watch for changes
        ops.multi();                          // Start transaction
        ops.opsForValue().increment("counter:123");
        List<Object> result = ops.exec();    // Execute - returns null if WATCH key changed
        if (result == null) {
            // Transaction aborted because counter was changed by another client
            // Retry logic here
        }
        return result;
    }
});

Important: MULTI/EXEC in Redis is NOT a full ACID transaction. It:

Guarantees atomicity (all or none execute)
Does NOT rollback on command errors (subsequent commands still execute)
Does NOT support reading data within a transaction and acting on it (use WATCH for this)

Lua Scripting (Atomic Complex Operations)

Lua scripts execute atomically in Redis (no other commands run during script execution). Use for operations that require read-then-write atomicity without MULTI/EXEC complexity.

// Rate limiter using Lua script (atomic check-and-increment)
String script =
    "local current = redis.call('GET', KEYS[1])\n" +
    "if current and tonumber(current) >= tonumber(ARGV[1]) then\n" +
    "  return 0\n" +
    "end\n" +
    "redis.call('INCR', KEYS[1])\n" +
    "redis.call('EXPIRE', KEYS[1], ARGV[2])\n" +
    "return 1";
 
DefaultRedisScript<Long> rateLimitScript = new DefaultRedisScript<>(script, Long.class);
 
// Returns 1 if request allowed, 0 if rate limit exceeded
Long allowed = redisTemplate.execute(
    rateLimitScript,
    Collections.singletonList("ratelimit:user:123"),
    "100",   // ARGV[1]: max requests
    "60"     // ARGV[2]: window in seconds
);

3. Memcached

Memcached is an older, simpler distributed cache. It predates Redis and was designed with a single focus: ultra-fast key-value caching.

Key Characteristics

Pure cache only: No persistence, no replication, no data structures beyond key-value.
Multi-threaded: Utilizes multiple CPU cores for parallel request handling.
Simpler memory management: Uses slab allocation, which is predictable but can waste memory if object sizes vary widely.
Stateless: Each node is independent. Client handles routing via consistent hashing.
Very fast: Due to multi-threading and simplicity.

Memcached Architecture

Application
    |
    | (Client library handles consistent hashing)
    |
+--------+   +--------+   +--------+
|  Mem 1 |   |  Mem 2 |   |  Mem 3 |
+--------+   +--------+   +--------+
  (each node is independent, no replication)

Basic Operations

# Memcached operations are simpler than Redis
set key 0 300 5      # set key, flags=0, TTL=300s, value-length=5
> hello
get key              # Returns "hello"
delete key
incr counter 1       # Atomic increment
decr counter 1       # Atomic decrement
flush_all            # Clear all data (dangerous in production!)
stats                # Server statistics

Memcached Limitations

No data structures (only String key-value)
No persistence (data is lost on restart)
No replication (no failover)
No authentication (must use network-level security)
Keys limited to 250 bytes
Values limited to 1 MB
No transactions or scripting

4. Redis vs Memcached - Decision Guide

Feature              Redis                            Memcached
-------------------  ---------                        ----------
Data Structures      Rich (String, Hash, List,        String key-value only
                     Set, ZSet, Stream, etc.)
Persistence          Yes (RDB + AOF)                  No
Replication          Yes (Master-Replica)             No (client-side only)
High Availability    Yes (Sentinel, Cluster)          No (no built-in)
Multi-threading      Partially (I/O threads)          Yes (fully multi-threaded)
Pub/Sub              Yes                              No
Transactions         Yes (MULTI/EXEC)                No
Scripting            Yes (Lua)                        No
Memory Efficiency    Good with ziplist encoding        Good with slab allocation
Max Value Size       512 MB                           1 MB
Max Key Size         512 MB                           250 bytes
Horizontal Scaling   Redis Cluster                    Client-side consistent hash

When to Choose Memcached

You need an ultra-simple, multi-threaded, pure string cache.
You are already running Memcached and do not need Redis features.
You have an extremely CPU-bound cache workload and need multi-core utilization.
Memory efficiency for a large number of small, uniform objects (Memcached's slab allocator excels here).

When to Choose Redis (the default choice for new projects)

You need any data structure beyond simple string caching.
You need persistence (cache survives restart).
You need replication/high availability.
You need pub/sub, streams, transactions, Lua scripting.
You want a single tool that can serve as both cache and simple database.
Most new projects: Redis.

5. HTTP Caching

HTTP caching is a standardized protocol-level caching mechanism built into the HTTP specification. It operates between clients, CDNs, reverse proxies, and origin servers without requiring application code.

5.1 The Cache-Control Header

The primary mechanism for controlling HTTP caching behavior. Set in the HTTP response by the origin server.

HTTP/1.1 200 OK
Cache-Control: max-age=3600, public
Content-Type: application/json

Cache-Control Directives

max-age=N:
The response can be cached for N seconds from when it was received. The most common directive.

Cache-Control: max-age=86400          # Cache for 24 hours

no-cache:
The response CAN be stored in cache, but the cache MUST revalidate with the origin server before serving it. This does NOT mean "no caching" - it means "always validate before use."

Cache-Control: no-cache               # Cache it, but always check if it's still fresh

no-store:
The response MUST NOT be stored anywhere. Not in browser cache, not in CDN. Every request goes to origin. Use for sensitive data (banking, personal health info).

Cache-Control: no-store               # Never cache this response

public:
Response can be cached by any cache (browser, CDN, proxy). Even if it contains authentication-related data.

Cache-Control: public, max-age=3600   # Cache everywhere for 1 hour

private:
Response is intended for a single user only. Only the browser cache may store it. Intermediary caches (CDN, proxy) MUST NOT cache it.

Cache-Control: private, max-age=3600  # Cache in browser only, not CDN

s-maxage=N:
Like max-age but applies only to shared caches (CDN, reverse proxy). Overrides max-age for CDN. Browser uses max-age. CDN uses s-maxage.

Cache-Control: max-age=3600, s-maxage=86400
# Browser caches for 1 hour, CDN caches for 24 hours

must-revalidate:
Once the entry expires, the cache MUST revalidate with the origin. If the origin is unreachable, it MUST return a 504 error (not stale data).

Cache-Control: max-age=3600, must-revalidate

stale-while-revalidate=N:
Serve stale content for up to N seconds while fetching fresh content in the background. The Refresh-Ahead pattern at the HTTP protocol level.

Cache-Control: max-age=3600, stale-while-revalidate=86400
# Serve the cached response for up to 1 hour fresh, then serve stale for up to 24 hours
# while simultaneously fetching a fresh version in the background.

immutable:
The response will never change (e.g., content-addressed assets). Browser will not revalidate even on page reload.

Cache-Control: max-age=31536000, immutable
# Use for hashed static assets: style.abc123.css will never change

5.2 ETags - Validation-Based Caching

An ETag (Entity Tag) is an identifier for a specific version of a resource. The server generates an ETag, typically a hash of the response body.

Flow:

Initial Request:
Client --> GET /api/users/123
Server --> 200 OK
           ETag: "v1-abc123def456"
           Cache-Control: no-cache    (must validate, but can cache)
           { name: "Alice", ... }

Subsequent Request (client sends conditional request):
Client --> GET /api/users/123
           If-None-Match: "v1-abc123def456"

Server checks if data has changed:
  - If NOT changed: 304 Not Modified (no body sent, saves bandwidth)
  - If CHANGED: 200 OK with ETag: "v2-new-hash" and new body

Spring Boot ETag support:

@GetMapping("/users/{id}")
public ResponseEntity<User> getUser(@PathVariable Long id,
                                     @RequestHeader(value="If-None-Match",
                                                    required=false) String ifNoneMatch) {
    User user = userService.getUser(id);
    String etag = "\"" + user.getVersion() + "\"";  // or hash of the response body
 
    if (etag.equals(ifNoneMatch)) {
        return ResponseEntity.status(HttpStatus.NOT_MODIFIED).build(); // 304
    }
 
    return ResponseEntity.ok()
            .eTag(etag)
            .cacheControl(CacheControl.noCache())
            .body(user);
}

Or use ShallowEtagHeaderFilter for automatic ETag generation:

@Bean
public FilterRegistrationBean<ShallowEtagHeaderFilter> shallowEtagHeaderFilter() {
    FilterRegistrationBean<ShallowEtagHeaderFilter> bean = new FilterRegistrationBean<>();
    bean.setFilter(new ShallowEtagHeaderFilter());
    bean.addUrlPatterns("/api/*");
    return bean;
}

5.3 Last-Modified

A simpler alternative to ETag. The server includes the last modification timestamp.

Initial Response:
Last-Modified: Wed, 04 Jun 2026 10:00:00 GMT

Subsequent Request:
If-Modified-Since: Wed, 04 Jun 2026 10:00:00 GMT

Server response:
  - Not modified: 304 Not Modified
  - Modified: 200 OK with new Last-Modified header

ETag vs Last-Modified:

ETags are more precise (detect byte-for-byte changes even at the same timestamp)
Last-Modified has 1-second granularity (misses changes within same second)
ETags handle multi-server deployments better (different servers generating same ETag for same content)
ETags require computing a hash; Last-Modified just reads a timestamp

5.4 Vary Header

Tells caches that the response varies based on specific request headers. The cache stores separate entries for each distinct header value.

HTTP Response:
Vary: Accept-Encoding

Meaning: Cache separately for requests with:
  - Accept-Encoding: gzip
  - Accept-Encoding: br
  - Accept-Encoding: (missing)

Common Vary usage:

Vary: Accept-Encoding          # Cache separate versions for gzip/brotli/none
Vary: Accept-Language          # Cache separate versions per language
Vary: Authorization            # Cache separately per user (careful - can explode cache size)
Vary: X-Custom-Feature-Flag    # Cache separately per feature flag state

Warning: Vary: Authorization or Vary: Cookie effectively creates per-user cache entries, defeating the purpose of a shared cache.

5.5 HTTP Caching in Spring Boot

@GetMapping("/products/{id}")
public ResponseEntity<Product> getProduct(@PathVariable Long id) {
    Product product = productService.getProduct(id);
    return ResponseEntity.ok()
        .cacheControl(CacheControl.maxAge(1, TimeUnit.HOURS).cachePublic())
        .body(product);
}
 
@GetMapping("/users/{id}/profile")
public ResponseEntity<UserProfile> getProfile(@PathVariable Long id) {
    UserProfile profile = userService.getProfile(id);
    return ResponseEntity.ok()
        .cacheControl(CacheControl.maxAge(5, TimeUnit.MINUTES).cachePrivate())
        .eTag(String.valueOf(profile.getVersion()))
        .body(profile);
}
 
@GetMapping("/assets/logo.png")
public ResponseEntity<byte[]> getLogo() {
    byte[] logo = assetService.getLogo();
    return ResponseEntity.ok()
        // Immutable + 1 year max-age for content-addressed assets
        .cacheControl(CacheControl.maxAge(365, TimeUnit.DAYS)
                                   .cachePublic()
                                   .immutable())
        .body(logo);
}

6. CDN Caching

How CDNs Work

A CDN is a geographically distributed network of edge servers (Points of Presence - PoPs). When a user requests content, the CDN serves from the nearest edge server rather than the origin.

Without CDN:
User in Tokyo --> Origin Server in US West Coast --> 150ms latency

With CDN:
User in Tokyo --> CDN Edge in Tokyo --> 5ms latency (served from cache)
                                    OR --> CDN Edge in Tokyo --> Origin --> 150ms (cache miss)

CDN Cache Hierarchy

User Browser Cache (Level 1)
      |
      | Miss
      v
CDN Edge Node - POP (Level 2)    <- Geographically closest to user
      |
      | Miss
      v
CDN Origin Shield (Level 3)       <- Regional aggregation layer, protects origin
      |
      | Miss
      v
Origin Server (Level 4)           <- Your application

What to Cache in a CDN

Cache at CDN (public, static or slow-changing):

Images, videos, files: Cache-Control: public, max-age=31536000, immutable (for hashed filenames)
CSS/JavaScript bundles: Same as above with content hashing
API responses that are the same for all users: Cache-Control: public, s-maxage=300
HTML pages for public content: Cache-Control: public, s-maxage=60

Do NOT cache at CDN (private or dynamic):

Authenticated user data: Cache-Control: private, no-store
Shopping cart, personalized recommendations
Session cookies
Payment pages

CDN Cache Invalidation

CDN cache invalidation (purging) is how you remove stale content from edge nodes before TTL expires.

# AWS CloudFront invalidation via CLI
aws cloudfront create-invalidation \
  --distribution-id EXXXXXXXXXX \
  --paths "/images/*" "/api/products/*"
 
# Cloudflare cache purge via API
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \
     -H "Authorization: Bearer {token}" \
     -d '{"files":["https://example.com/api/products/123"]}'

Strategies to minimize invalidation need:

Content-addressed URLs: Include a hash of the content in the URL. style.abc123.css. When content changes, the URL changes, so old cache entries naturally become orphaned. No invalidation needed.
Short TTLs for frequently changing content: s-maxage=60 for API responses. Accept 60-second staleness instead of managing invalidation.
Versioned URLs: /api/v2/products/123 - deploy new version, old cache is irrelevant.

7. Database-Level Caching

7.1 InnoDB Buffer Pool (MySQL)

The most important MySQL performance configuration. The buffer pool caches data pages and index pages in memory.

-- Check buffer pool size
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
 
-- Check buffer pool hit rate
SHOW STATUS LIKE 'Innodb_buffer_pool_read_requests'; -- Total read requests
SHOW STATUS LIKE 'Innodb_buffer_pool_reads';          -- Reads from disk (misses)
 
-- Hit rate = 1 - (reads / read_requests) * 100
-- Target: > 99% hit rate for production

# my.cnf - Buffer pool configuration
[mysqld]
innodb_buffer_pool_size = 10G         # 70-80% of RAM on dedicated DB server
innodb_buffer_pool_instances = 8      # Split into 8 instances to reduce contention
innodb_buffer_pool_chunk_size = 128M  # Chunk size for dynamic resizing

7.2 Materialized Views

Pre-computed, physically stored query results. Ideal for expensive aggregation queries used in dashboards or reports.

-- PostgreSQL Materialized View
CREATE MATERIALIZED VIEW product_category_summary AS
    SELECT
        c.category_id,
        c.category_name,
        COUNT(p.product_id) AS product_count,
        AVG(p.price) AS avg_price,
        SUM(p.sales_count) AS total_sales
    FROM categories c
    JOIN products p ON c.category_id = p.category_id
    GROUP BY c.category_id, c.category_name;
 
-- Create index on the materialized view
CREATE INDEX ON product_category_summary (category_id);
 
-- Refresh (re-compute the data)
REFRESH MATERIALIZED VIEW CONCURRENTLY product_category_summary;
-- CONCURRENTLY allows reads during refresh (requires unique index)
 
-- Schedule refresh (every hour via cron or pg_cron)
-- Query the view (fast! pre-computed)
SELECT * FROM product_category_summary ORDER BY total_sales DESC LIMIT 10;

8. Spring Boot Caching

Spring Boot provides a powerful, declarative caching abstraction that works with multiple cache providers (Caffeine, Redis, Ehcache, etc.) through a consistent annotation-based API.

8.1 Setup

<!-- pom.xml -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-cache</artifactId>
</dependency>
<dependency>
    <groupId>com.github.ben-manes.caffeine</groupId>
    <artifactId>caffeine</artifactId>
</dependency>
<!-- Or for Redis: spring-boot-starter-data-redis -->

@SpringBootApplication
@EnableCaching                    // Required: activates Spring's caching infrastructure
public class Application {
    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }
}

8.2 @Cacheable - Cache-Aside for Reads

@Service
public class ProductService {
 
    // Cache the result. On cache hit, method body is NOT executed.
    @Cacheable(value = "products", key = "#productId")
    public Product getProduct(Long productId) {
        // This code ONLY runs on cache miss
        return productRepository.findById(productId).orElseThrow();
    }
 
    // Conditional caching
    @Cacheable(value = "products",
               key = "#productId",
               condition = "#productId > 0",           // Cache only if condition is true
               unless = "#result == null")             // Do NOT cache if result is null
    public Product getProductConditional(Long productId) {
        return productRepository.findById(productId).orElse(null);
    }
 
    // Complex key generation
    @Cacheable(value = "productSearch",
               key = "#category + ':' + #page + ':' + #pageSize")
    public Page<Product> searchProducts(String category, int page, int pageSize) {
        return productRepository.findByCategory(category, PageRequest.of(page, pageSize));
    }
 
    // Sync mode (prevents Cache Stampede for this method)
    @Cacheable(value = "products", key = "#productId", sync = true)
    public Product getProductSynced(Long productId) {
        // sync=true: only one thread executes the method on cache miss.
        // Other threads wait for the first thread's result.
        return productRepository.findById(productId).orElseThrow();
    }
}

8.3 @CachePut - Write-Through for Writes

@Service
public class ProductService {
 
    // ALWAYS executes the method AND updates the cache.
    // Use for write operations to keep cache in sync.
    @CachePut(value = "products", key = "#result.id")
    public Product createProduct(ProductCreateRequest request) {
        return productRepository.save(new Product(request));
        // After method executes, result is stored in cache with key = result.id
    }
 
    @CachePut(value = "products", key = "#product.id")
    public Product updateProduct(Product product) {
        return productRepository.save(product);
    }
}

8.4 @CacheEvict - Invalidation

@Service
public class ProductService {
 
    // Remove specific entry from cache
    @CacheEvict(value = "products", key = "#productId")
    public void deleteProduct(Long productId) {
        productRepository.deleteById(productId);
        // After method executes, cache entry is removed
    }
 
    // Clear entire cache (use with caution)
    @CacheEvict(value = "products", allEntries = true)
    public void clearProductCache() {
        // All entries in "products" cache are removed
    }
 
    // Evict before method execution (use when you want cache cleared even on exception)
    @CacheEvict(value = "products", key = "#productId", beforeInvocation = true)
    public void deleteProductBeforeInvocation(Long productId) {
        productRepository.deleteById(productId);
    }
 
    // Evict from multiple caches
    @Caching(evict = {
        @CacheEvict(value = "products", key = "#productId"),
        @CacheEvict(value = "productSearch", allEntries = true),
        @CacheEvict(value = "productCount", allEntries = true)
    })
    public void updateProductAndEvictAll(Long productId, ProductUpdateRequest request) {
        productRepository.updateProduct(productId, request);
    }
}

8.5 @Caching - Multiple Annotations

@Service
public class UserService {
 
    // Combine multiple cache operations on one method
    @Caching(
        cacheable = {
            @Cacheable(value = "users", key = "#userId"),
            @Cacheable(value = "usersById", key = "#userId")
        }
    )
    public User getUserMultiCache(Long userId) {
        return userRepository.findById(userId).orElseThrow();
    }
 
    @Caching(
        put = {
            @CachePut(value = "users", key = "#result.id"),
            @CachePut(value = "usersByEmail", key = "#result.email")
        },
        evict = {
            @CacheEvict(value = "userList", allEntries = true)
        }
    )
    public User createUser(CreateUserRequest request) {
        return userRepository.save(new User(request));
    }
}

8.6 Configuring Caffeine with Spring Boot

# application.yml
spring:
  cache:
    type: caffeine
    caffeine:
      spec: maximumSize=500,expireAfterWrite=600s
    cache-names:
      - products
      - users
      - sessions

Or programmatic configuration for per-cache settings:

@Configuration
@EnableCaching
public class CacheConfig implements CachingConfigurer {
 
    @Override
    public CacheManager cacheManager() {
        CaffeineCacheManager manager = new CaffeineCacheManager();
        manager.setCacheNames(List.of("products", "users", "sessions"));
        manager.setCaffeine(defaultCaffeineBuilder());
        return manager;
    }
 
    private Caffeine<Object, Object> defaultCaffeineBuilder() {
        return Caffeine.newBuilder()
                .maximumSize(10_000)
                .expireAfterWrite(5, TimeUnit.MINUTES)
                .recordStats();
    }
}
 
// For per-cache configuration with different TTLs:
@Bean
public CacheManager cacheManager() {
    SimpleCacheManager manager = new SimpleCacheManager();
    List<CaffeineCache> caches = List.of(
        buildCache("products", 50_000, 60, TimeUnit.MINUTES),
        buildCache("users", 100_000, 10, TimeUnit.MINUTES),
        buildCache("sessions", 500_000, 30, TimeUnit.MINUTES)
    );
    manager.setCaches(caches);
    return manager;
}
 
private CaffeineCache buildCache(String name, long size, long duration, TimeUnit unit) {
    return new CaffeineCache(name,
        Caffeine.newBuilder()
            .maximumSize(size)
            .expireAfterWrite(duration, unit)
            .recordStats()
            .build());
}

8.7 Configuring Redis with Spring Boot

# application.yml
spring:
  data:
    redis:
      host: redis.prod.internal
      port: 6379
      password: ${REDIS_PASSWORD}
      timeout: 2000ms
      lettuce:
        pool:
          max-active: 20
          max-idle: 10
          min-idle: 5
          max-wait: 1000ms
  cache:
    type: redis
    redis:
      time-to-live: 300000 # 5 minutes default TTL in milliseconds
      cache-null-values: true # Cache null results to prevent penetration

@Configuration
@EnableCaching
public class RedisCacheConfig {
 
    @Bean
    public RedisCacheManager cacheManager(RedisConnectionFactory factory) {
        // Default configuration
        RedisCacheConfiguration defaultConfig = RedisCacheConfiguration.defaultCacheConfig()
                .entryTtl(Duration.ofMinutes(5))
                .serializeKeysWith(RedisSerializationContext.SerializationPair
                        .fromSerializer(new StringRedisSerializer()))
                .serializeValuesWith(RedisSerializationContext.SerializationPair
                        .fromSerializer(new GenericJackson2JsonRedisSerializer()))
                .disableCachingNullValues();  // Remove this to allow null caching
 
        // Per-cache TTL configuration
        Map<String, RedisCacheConfiguration> cacheConfigs = Map.of(
            "products",  defaultConfig.entryTtl(Duration.ofHours(1)),
            "users",     defaultConfig.entryTtl(Duration.ofMinutes(10)),
            "sessions",  defaultConfig.entryTtl(Duration.ofMinutes(30))
        );
 
        return RedisCacheManager.builder(factory)
                .cacheDefaults(defaultConfig)
                .withInitialCacheConfigurations(cacheConfigs)
                .transactionAware()
                .build();
    }
}

8.8 Custom Key Generation

// Custom KeyGenerator bean
@Component("myKeyGenerator")
public class CustomKeyGenerator implements KeyGenerator {
 
    @Override
    public Object generate(Object target, Method method, Object... params) {
        StringBuilder sb = new StringBuilder();
        sb.append(target.getClass().getSimpleName()).append(":");
        sb.append(method.getName()).append(":");
        for (Object param : params) {
            sb.append(param.toString()).append(":");
        }
        return sb.toString();
    }
}
 
// Usage
@Cacheable(value = "products", keyGenerator = "myKeyGenerator")
public List<Product> searchProducts(String query, int page) {
    return productRepository.search(query, page);
}

Summary

The technology landscape for caching is rich and each tool has its place:

Caffeine: Default choice for in-process JVM caching. W-TinyLFU algorithm, rich configuration, excellent metrics.
Redis: Default choice for distributed caching. Rich data structures, persistence, clustering, and far more than just a cache.
Memcached: Legacy choice for pure high-throughput multi-threaded key-value caching. No modern features.
HTTP Caching (Cache-Control, ETags): Essential for any web API or frontend. Zero cost per cached request.
CDN: Geographic distribution reduces latency for global users. Scales effortlessly for static content.
InnoDB Buffer Pool / Database Caches: Already working for free in your database. Tune innodb_buffer_pool_size as a first step.
Spring @Cacheable: Clean declarative caching for Java services. Works with Caffeine, Redis, and any JSR-107 provider.

Previous: Part 2 - Strategies
Next: Part 4 - Pitfalls and Solutions

Series: Caching Demystified

Caching Demystified - Part 3: Technologies and Tools

Table of Contents

1. In-Process Caches

1.1 Caffeine Cache (Java)

Basic Setup

Size-Based Eviction

Weight-Based Eviction (for variable-size objects)

Time-Based Eviction

LoadingCache (Read-Through Pattern)

AsyncLoadingCache (Non-blocking)

Monitoring Caffeine Metrics

1.2 Guava Cache (Legacy Reference)

2. Redis - Deep Dive

2.1 Redis Architecture

2.2 Redis Data Structures and Use Cases

String

Hash

List

Set

Sorted Set (ZSet)

Bitmap

HyperLogLog

Stream

2.3 Key Expiration in Redis

2.4 Redis Persistence

RDB (Redis Database Backup) - Snapshots

AOF (Append-Only File)

Combining RDB + AOF (Recommended for production)

2.5 Redis High Availability

Replication (Master-Replica)

Redis Sentinel (Automatic Failover)

Redis Cluster (Sharding + High Availability)

2.6 Redis Memory Management

2.7 Performance Features

Pipelining

Transactions (MULTI/EXEC)

Lua Scripting (Atomic Complex Operations)

3. Memcached

Key Characteristics

Memcached Architecture

Basic Operations

Memcached Limitations

4. Redis vs Memcached - Decision Guide

When to Choose Memcached

When to Choose Redis (the default choice for new projects)

5. HTTP Caching

5.1 The Cache-Control Header

Cache-Control Directives

5.2 ETags - Validation-Based Caching

5.3 Last-Modified

5.4 Vary Header

5.5 HTTP Caching in Spring Boot

6. CDN Caching

How CDNs Work

CDN Cache Hierarchy

What to Cache in a CDN

CDN Cache Invalidation

7. Database-Level Caching

7.1 InnoDB Buffer Pool (MySQL)

7.2 Materialized Views

8. Spring Boot Caching

8.1 Setup

8.2 @Cacheable - Cache-Aside for Reads

8.3 @CachePut - Write-Through for Writes

8.4 @CacheEvict - Invalidation

8.5 @Caching - Multiple Annotations

8.6 Configuring Caffeine with Spring Boot

8.7 Configuring Redis with Spring Boot

8.8 Custom Key Generation

Summary