Caching Demystified - Part 1: Fundamentals

What is Caching?
The Memory Hierarchy - Speed vs Cost vs Capacity
Why Caching Matters - The Numbers Behind the Decisions
Core Terminology - The Language of Caching
Types of Caches - The Full Landscape
Cache Eviction Policies - What Gets Removed and Why
Cache Metrics - Measuring What Matters
Cache Invalidation - The Hardest Problem in Computer Science
Cache Sizing and Capacity Planning

1. What is Caching?

A cache is a high-speed, temporary storage layer that holds copies of data so that future requests for that same data can be served faster - without going back to the original, slower source.

The core principle behind caching is simple: avoid repeating expensive work.

Analogy 1 - The Kitchen Counter

Imagine a chef in a busy restaurant kitchen:

The grocery store is 10 minutes away. You go there when you need something rare.
The pantry stores hundreds of ingredients. Fetching from there takes 2 minutes.
The kitchen counter holds the 10 most-used ingredients right at hand. Fetching takes 2 seconds.

The kitchen counter is the cache. A smart chef places frequently used items on the counter to make cooking fast. When a new ingredient is needed, they fetch it from the pantry and may place it on the counter if they expect to use it again soon.

When the counter is full and a new item is needed, the chef removes the item they have not used in the longest time. That is an eviction.

Analogy 2 - The Human Brain

When someone asks you "What is 2 + 2?", you do not go through the mental process of counting. Your brain has cached that result. The answer 4 comes instantly.

But if someone asks "What is 7,463 x 9,872?", you do not have that cached. You have to compute it from scratch (a cache miss). That computation is expensive and takes time.

Analogy 3 - The Office Desk

Think of a developer's workspace:

Desk surface: Papers currently being worked on (fastest, very limited space)
Filing cabinet in the office: Documents accessed weekly (fast, limited space)
Central company archive: All documents ever created (slow, unlimited space)

You keep active documents on your desk. When you finish with one, you file it. When you need an archived document, you retrieve it and put it on your desk.

The Technical Picture

Without caching:

Every Request:
User --> Application --> Database (100ms) --> Application --> User

1,000 requests for "product:123":
1,000 x 100ms = 100,000ms total database time
Database receives 1,000 queries for the same data

With caching:

First Request (Cache Miss):
User --> Application --> Cache (miss, 1ms) --> Database (100ms) --> Cache (store) --> User

All Subsequent Requests (Cache Hit):
User --> Application --> Cache (hit, 1ms) --> User

1,000 requests for "product:123" with 99% hit rate:
- 10 database queries: 10 x 100ms = 1,000ms
- 990 cache hits:      990 x 1ms  = 990ms
- Total: ~2 seconds vs 100 seconds without cache

2. The Memory Hierarchy

Computers are built on a fundamental trade-off: faster storage is physically smaller and more expensive. This creates a natural hierarchy where data is stored closer to the CPU if it is accessed frequently.

+------------------------------------------------------------------+
|  SPEED    LAYER                     SIZE          LATENCY        |
+------------------------------------------------------------------+
| Fastest   CPU Registers             < 1 KB        < 0.3 ns       |
|           L1 Cache (per core)       32 - 128 KB   ~1 ns          |
|           L2 Cache (per core)       256 KB - 1 MB ~4 ns          |
|           L3 Cache (shared)         4 MB - 64 MB  ~10-30 ns      |
|           Main Memory (RAM)         GBs           ~100 ns        |
|           NVMe SSD                  TBs           ~100 us        |
|           SATA SSD                  TBs           ~500 us        |
|           Hard Disk Drive (HDD)     TBs           ~10 ms         |
| Slowest   Network (Remote DB)       Unlimited     ~100-1000 ms   |
+------------------------------------------------------------------+

Latency Numbers Every Engineer Must Know

These numbers, popularized by Jeff Dean at Google, are the foundation of every performance discussion:

Operation                                      Approximate Latency
-----------------------------------------------------------
L1 cache reference                             0.5 ns
Branch misprediction                           5 ns
L2 cache reference                             7 ns
Mutex lock / unlock                            25 ns
Main memory reference (RAM)                    100 ns
Compress 1 KB with fast compression            3,000 ns    (3 us)
Send 1 KB over 1 Gbps network                  10,000 ns   (10 us)
Read 4 KB randomly from NVMe SSD               150,000 ns  (150 us)
Read 1 MB sequentially from RAM                250,000 ns  (250 us)
Round trip within same datacenter              500,000 ns  (0.5 ms)
Read 1 MB sequentially from SSD               1,000,000 ns (1 ms)
Disk seek (HDD)                               10,000,000 ns (10 ms)
Read 1 MB sequentially from HDD              20,000,000 ns (20 ms)
Send packet from US West Coast to Europe     150,000,000 ns (150 ms)

What These Numbers Mean for Caching

If your application fetches from a remote database over a network:

Average latency: ~1-10 ms (1,000,000 to 10,000,000 ns)

If your application reads from a local in-process cache (RAM):

Average latency: ~100 ns to ~1 us

The cache is 10,000x to 100,000x faster than the database for a cache hit.

Application to Software Architecture

Every software system has its own "memory hierarchy":

Level       Storage                  Latency     Capacity      Cost
------      -------                  -------     --------      ----
L1          In-process memory cache  < 1 ms      100 MB        Low
L2          Distributed cache        1-5 ms      100 GB        Medium
L3          Database (with indexes)  5-50 ms     1 TB          High
L4          Cold storage / archive   100-1000ms  Unlimited     Very Low

Understanding this hierarchy tells you where to put your cache and why.

3. Why Caching Matters - The Numbers

Scenario: E-Commerce Product Page

A product detail page hits 5 database tables:

Products table: 20ms
Inventory table: 15ms
Pricing/promotions table: 25ms
Reviews aggregate: 40ms
Recommendations table: 30ms
Total uncached: 130ms per request

With caching, all 5 queries are replaced by a single cache lookup: 1ms.

At 10,000 requests per second:

Uncached: 10,000 x 130ms = requires massive database cluster
Cached (95% hit rate): 500 x 130ms + 9,500 x 1ms = handles with a much smaller database

The Effective Access Time Formula

EAT (Effective Access Time) quantifies the average time to access data given a specific hit rate:

EAT = (Hit Rate x Cache Latency) + (Miss Rate x (Cache Latency + DB Latency))

Example:

Cache latency: 1ms
Database latency: 100ms
Hit Rate: 90% (0.9), Miss Rate: 10% (0.1)

EAT = (0.9 x 1ms) + (0.1 x (1ms + 100ms))
EAT = 0.9ms + 10.1ms
EAT = 11ms   (vs 100ms without cache = 9x improvement)

With 99% hit rate:

EAT = (0.99 x 1ms) + (0.01 x 101ms)
EAT = 0.99ms + 1.01ms
EAT = 2ms    (vs 100ms without cache = 50x improvement)

Lesson: Pushing hit rate from 90% to 99% improves performance by another 5x. Every percentage point matters.

When Caching Is the Right Tool

Caching works best when:

Read-heavy workloads: Data is read far more often than it is written (10:1 or higher read/write ratio).
Repeated access patterns: The same data is requested by multiple users or requests.
Expensive origin fetches: Database queries involve joins, aggregations, or full table scans.
External API calls: Third-party services have rate limits, latency, or cost per call.
Computationally expensive results: Machine learning inference, complex calculations, report generation.
Tolerable staleness: The business can accept data being slightly behind (seconds to minutes).

Caching is the wrong tool when:

Every request accesses unique data (no temporal locality).
Data changes on every single write and cannot be stale at all.
The operation is write-heavy with very infrequent reads.
Compliance requirements prohibit storing data in additional systems.

4. Core Terminology - The Language of Caching

Understanding these terms precisely is essential for both building cache systems and discussing them in interviews.

Cache Hit

A cache hit occurs when the requested data key is found in the cache and the cached value is returned directly.

Request: GET user:123
Cache state: { "user:123": { name: "Alice", role: "admin" } }
Result: Cache HIT - return { name: "Alice", role: "admin" } immediately

Cache Miss

A cache miss occurs when the requested data key is NOT found in the cache. The system must fetch from the origin store.

Request: GET user:456
Cache state: { "user:123": { ... } }  -- user:456 not here
Result: Cache MISS - go to database, store in cache, return value

Types of Cache Misses

Cold Miss (Compulsory Miss):

The data is being accessed for the very first time ever.
No cache can prevent this. It is unavoidable.
Example: Application just started, cache is empty.

Capacity Miss:

The cache had the data at some point but evicted it because the cache was full.
Solution: Increase cache capacity or improve eviction policy.

Conflict Miss (CPU caches only):

In direct-mapped or N-way set-associative CPU caches, two memory addresses may hash to the same cache slot, forcing eviction even when the cache has logical space.
Not applicable to application-level caches (which are fully associative).

Cache Hit Ratio (Hit Rate)

The percentage of total cache requests that result in a hit:

Hit Ratio = Cache Hits / (Cache Hits + Cache Misses) x 100%

Example: 9,500 hits and 500 misses out of 10,000 requests:

Hit Ratio = 9,500 / 10,000 = 95%

What Constitutes a Good Hit Ratio?

< 70%     - Cache may not be helping much. Check access patterns and TTL.
70 - 85%  - Acceptable for variable/user-specific workloads.
85 - 95%  - Good. Typical for well-configured production systems.
95 - 99%  - Excellent. Target for read-heavy, repetitive workloads.
> 99%     - Outstanding. Common for static content and reference data.

Time to Live (TTL)

TTL is the maximum duration a cache entry is considered valid. After the TTL elapses, the entry expires and must be refreshed from the origin.

cache.set("product:789", productData, ttl=300)  // 300 seconds = 5 minutes

After 300 seconds, any request for "product:789" results in a cache miss.

Absolute TTL: Entry expires at a fixed time from when it was created, regardless of access.

Sliding TTL (Idle TTL): The TTL is reset every time the entry is accessed. Keeps hot data alive indefinitely. Can cause stale data to persist if the entry is accessed frequently but the source changes.

Cache Eviction

Eviction is the automatic removal of cache entries to make room for new ones when the cache reaches its capacity limit. Eviction is driven by capacity constraints and is controlled by the eviction policy.

Eviction is about space management, not about data correctness.

Cache Invalidation

Invalidation is the deliberate removal or marking as stale of a specific cache entry because the underlying source data has changed. Invalidation is about data correctness, not space.

User updates their profile photo:
1. Database is updated with new photo URL.
2. Cache entry for "user:123:profile" is INVALIDATED (deleted).
3. Next read for user:123 goes to DB and re-populates the cache.

The famous quote from Phil Karlton: "There are only two hard things in Computer Science: cache invalidation and naming things."

Stale Data

A cache entry is stale when the source data has been updated but the cached copy has not been refreshed. Stale data represents a form of cache-origin inconsistency.

Staleness is acceptable when:

The data does not change often (reference data, product catalog).
The business can tolerate reading slightly outdated values.
The TTL is short enough that the stale window is acceptable.

Staleness is NOT acceptable when:

Financial data (account balance, transaction records).
Inventory levels (stock availability).
Any data that drives critical business decisions requiring real-time accuracy.

Cache Warm vs Cache Cold

Warm Cache: The cache contains the anticipated working set. Cache hit rates are high. The system performs well.

Cold Cache: The cache is empty or contains irrelevant data. Cache miss rates are high. Every request goes to the database. This happens:

When the application first starts.
After a cache flush.
After a cache server restart.
After a rolling deployment.

The transition from cold to warm is called cache warming.

Working Set

The working set is the subset of data that is actively accessed during a given time window. In most real-world systems, 20% of the data accounts for 80% of the reads (Pareto principle / power-law distribution).

Caching only the working set provides most of the benefit at a fraction of the storage cost.

Cache Stampede (Thundering Herd)

When a popular cache entry expires, many concurrent requests all simultaneously experience a cache miss and all simultaneously query the database. This creates a sudden spike in database load. This problem is covered in depth in Part 4.

Cache Coherence

In systems with multiple cache instances (e.g., multiple app servers each with a local in-memory cache), coherence is the guarantee that all instances see consistent data.

If Server A updates data and only invalidates its own local cache, Server B still serves the old value. This is a coherence violation.

Cache Penetration

Requesting data that does not exist in either the cache or the database, causing every request to pass through to the database. Covered in depth in Part 4.

Cache Avalanche

When a large number of cache entries expire simultaneously, causing a flood of cache misses and a corresponding spike in database queries. Covered in depth in Part 4.

Write-Allocate vs No-Write-Allocate

Write-Allocate: On a write to a key that is NOT in the cache, the data is loaded into the cache before being updated. Keeps the cache in sync with writes.

No-Write-Allocate: On a write to a key that is NOT in the cache, the data is written directly to the backing store without loading it into cache first.

5. Types of Caches - The Full Landscape

Caches exist at every layer of the computing stack. Knowing the full landscape helps you choose the right cache for each problem.

5.1 Hardware Caches (CPU L1, L2, L3)

Built directly into the processor chip. Managed entirely by the CPU hardware and are invisible to application code.

+---------------------------+
|        CPU Chip           |
|  Core 0     Core 1        |
|  [L1 I$]   [L1 I$]        |   L1 Instruction Cache (per core)
|  [L1 D$]   [L1 D$]        |   L1 Data Cache (per core)
|  [L2    ]  [L2    ]       |   L2 Cache (per core, combined)
|    +------------------+   |
|    |   L3 Cache       |   |   L3 Cache (shared across all cores)
|    +------------------+   |
+---------------------------+

L1: ~32-128 KB per core, ~1 ns latency. Split into instruction cache and data cache.
L2: ~256 KB - 1 MB per core, ~4 ns latency. Combined instructions and data.
L3: 4 MB - 64 MB shared, ~10-30 ns latency. Shared across all cores on the chip.

CPU caches work on cache lines (typically 64 bytes). Accessing one byte loads an entire 64-byte cache line. This is why iterating over an array sequentially is orders of magnitude faster than random access through a linked list.

Cache-friendly programming:

Use arrays instead of linked lists for sequential access.
Access 2D arrays row-by-row, not column-by-column.
Keep hot data structures small (fits in L1/L2).

5.2 Operating System Page Cache

The OS kernel maintains a page cache (also called buffer cache) in RAM that caches disk pages that have been recently read or written. All file I/O goes through the page cache.

When your database reads a data file, the OS caches those disk pages. Subsequent reads of the same pages come from RAM, not disk. This is why database performance often improves over time after a cold start.

Tools to observe:

Linux: free -h shows buff/cache column
Linux: vmstat -s shows page cache statistics

5.3 Browser Cache

The web browser stores static assets (HTML, CSS, JavaScript, images, fonts) on the local filesystem. Subsequent visits to the same site serve assets from local disk instead of re-downloading.

Cache behavior is controlled by HTTP response headers:

Cache-Control: max-age=31536000 - cache for 1 year
ETag: "abc123" - version identifier for conditional requests
Last-Modified: Wed, 05 Jun 2026 10:00:00 GMT - timestamp for conditional requests

5.4 DNS Cache

Every DNS resolution (converting hostname to IP address) involves network round trips to DNS servers. DNS caches store resolved hostnames locally at multiple levels:

Browser DNS cache (typically 1 minute)
OS DNS resolver cache (follows record TTL)
Router DNS cache
ISP recursive resolver cache

When you update a DNS record, the change propagates slowly because old entries are cached across this hierarchy until their TTL expires. This is why "DNS propagation" can take up to 48 hours in the worst case.

5.5 Application-Level In-Process Cache

A cache that lives inside the application process itself, using the application's own memory (JVM heap for Java, process heap for Python, etc.).

Examples:

Java: Caffeine, Guava Cache, Ehcache (in-process mode)
Python: functools.lru_cache, cachetools
.NET: IMemoryCache

Characteristics:

Extremely fast: no network overhead, sub-millisecond access
Private: data is NOT shared between different application instances
Volatile: cleared when the application restarts
Bounded by application server memory
Risk: Too much data cached in-process causes JVM Garbage Collection pauses or OOM

Best use cases:

Reference data (country codes, currency codes, configuration constants)
User session data for a single-server application
Results of expensive computations that are the same for all users
Lookup tables, dictionaries, static mappings

5.6 Distributed / Remote Cache

A cache that runs as a separate service or cluster, accessible by all application instances over the network.

Examples: Redis, Memcached, Hazelcast, Apache Ignite, Couchbase

Characteristics:

Network overhead: typically 0.5ms - 2ms on a local network (much slower than in-process)
Shared: all application instances see the same data (solves coherence problem)
Durable: data survives application restarts (Redis with persistence)
Horizontally scalable: add more cache nodes as data grows
Independent: scales and deploys independently of the application

Best use cases:

User sessions in a multi-instance deployment
Shared application state (rate limiting counters, distributed locks)
Results of expensive database queries shared across all users
Inter-service caching in microservices

5.7 CDN (Content Delivery Network) Cache

A globally distributed network of edge servers that cache content physically close to end users, reducing latency by serving from a nearby geographic location.

User in Tokyo                     User in London
     |                                 |
     v                                 v
[CDN Edge - Asia]              [CDN Edge - Europe]
     |   Cache HIT                     |   Cache MISS
     |   (served locally)              |        |
     v                                 v        v
  User gets response fast    [Origin Server - US West]
  (~5ms)                       (100ms+ from London)

Cache hierarchy:

Browser cache (client-side)
CDN edge cache (geographically closest server)
CDN origin shield (intermediate CDN layer, regional)
Origin server (your application server)

CDN caches respect HTTP Cache-Control headers and support cache purging via API.

Examples: Cloudflare, AWS CloudFront, Akamai, Fastly, Google Cloud CDN

5.8 Database Internal Caches

Databases maintain their own internal caching mechanisms:

InnoDB Buffer Pool (MySQL):

The primary MySQL performance tuning parameter.
Caches frequently accessed data pages and index pages in RAM.
Typically configured to 70-80% of available RAM on a dedicated database server.
Most MySQL read queries are served from the buffer pool.
SHOW STATUS LIKE 'Innodb_buffer_pool_read_requests' - total reads attempted
SHOW STATUS LIKE 'Innodb_buffer_pool_reads' - reads that required disk I/O
Buffer pool hit rate = 1 - (reads / read_requests), should be > 99%

PostgreSQL shared_buffers:

Similar to InnoDB Buffer Pool.
Recommended: 25% of available RAM.
Works in conjunction with the OS page cache.

MySQL Query Cache (deprecated in 8.0, removed in 8.0.3):

Cached the complete result set of SELECT queries by the query text.
Was a global mutex bottleneck under high concurrency.
Deprecated because it caused more problems than it solved at scale.
Lesson: even major databases make caching mistakes.

Materialized Views:

Pre-computed query results stored as actual database tables.
Must be refreshed periodically or on data change.
Extremely effective for heavy aggregation queries (dashboards, reports).

5.9 ORM-Level Cache (Hibernate / JPA)

First-Level Cache (Session Cache):

Attached to the Hibernate Session (scoped to a single transaction or unit of work).
Automatically deduplicates queries within a transaction.
Example: Calling session.get(User.class, 1L) twice in one transaction only hits the DB once.
Always on, cannot be disabled.

Second-Level Cache (SessionFactory Cache):

Shared across all sessions.
Requires explicit configuration.
Cache providers: Ehcache, Hazelcast, Infinispan, Redis via Redisson.
Configured per entity: @Cache(usage = CacheConcurrencyStrategy.READ_WRITE).

Query Cache:

Caches the list of entity IDs returned by a query.
Works in conjunction with the second-level cache.
Invalidated on any change to the queried entity type.

5.10 Web Server Cache

Reverse proxy servers like Nginx and Varnish can cache HTTP responses:

# Nginx proxy cache configuration
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=STATIC:10m inactive=24h max_size=1g;
 
server {
    location /api/ {
        proxy_cache STATIC;
        proxy_cache_valid 200 5m;
        proxy_cache_key "$scheme$request_method$host$request_uri";
    }
}

Varnish is purpose-built for HTTP caching and supports complex cache invalidation rules (cache tags, regex-based purging).

6. Cache Eviction Policies - What Gets Removed and Why

When a cache is full and a new entry must be stored, the eviction policy determines which existing entry is removed. The goal is to keep entries that are most likely to be requested again.

6.1 LRU - Least Recently Used

The most widely used eviction policy in production systems.

Core idea: Evict the entry that was last accessed (read or written) the longest time ago. The assumption: recently accessed data is likely to be accessed again soon (temporal locality).

Implementation: A doubly linked list combined with a hash map:

Hash map: O(1) lookup by key
Doubly linked list: maintains access order
- Most recently accessed entry: head of list
- Least recently accessed entry: tail of list
On cache hit: move the accessed node to the head
On eviction: remove the node at the tail

Cache capacity: 4 entries. Operations: A, B, C, D, E, access B again, F

After A:            [A]
After B:            [B, A]
After C:            [C, B, A]
After D:            [D, C, B, A]     <- Cache full
Request E (miss):   Evict A (tail), add E -> [E, D, C, B]
Access B:           Move B to head  -> [B, E, D, C]
Request F (miss):   Evict C (tail), add F -> [F, B, E, D]

Time complexity: O(1) for both get and put with the doubly linked list + hash map combination.

Drawback - Cache Pollution / Scan Resistance:
A batch job that scans every database record at 2 AM floods the LRU cache, evicting all the hot web app data. The next morning, users experience high miss rates until the cache warms back up. This is called cache pollution.

Solution: Use LRU-K (only promote entries to cache after K accesses), or use a segmented LRU.

6.2 LFU - Least Frequently Used

Core idea: Evict the entry with the lowest access count. Assumes that data accessed frequently will continue to be frequently accessed.

Implementation:

Maintain a frequency map: key -> access count
Maintain a frequency list: count -> list of keys with that count
Track the minimum frequency for O(1) eviction

Example:

Keys and access counts:
user:123  -> accessed 47 times
product:A -> accessed 3 times
config:v2 -> accessed 102 times
session:X -> accessed 1 time

Evict: session:X (lowest frequency = 1)

Advantage over LRU: Resistant to cache pollution from sequential scans. A batch job that touches each record once will not evict items with high frequency counts.

Drawbacks:

Frequency bias: Data that was popular months ago but is no longer relevant keeps its high count and blocks new hot data. This is the "cache aging" problem.
More complex implementation than LRU.

Solution to frequency bias: Use time-decayed frequency (TinyLFU algorithm) where old access counts are gradually reduced.

6.3 FIFO - First In, First Out

Core idea: Evict the entry that has been in the cache the longest, regardless of access patterns. A simple queue.

Example:

Cache capacity: 3
Add A, B, C: Queue = [A(front), B, C(back)]
Add D: Evict A -> Queue = [B(front), C, D(back)]
Access B: (no position change in FIFO)
Add E: Evict B -> Queue = [C(front), D, E(back)]

Advantage: Extremely simple to implement. O(1) operations.

Disadvantage: Does not consider access frequency or recency. Frequently accessed entries get evicted just because they were added early. Generally inferior to LRU for real-world workloads.

6.4 MRU - Most Recently Used

Core idea: The exact opposite of LRU. Evict the entry that was accessed most recently.

This seems counterintuitive, but it is optimal for specific access patterns:

Use case 1 - Video streaming: Users watch a video once and rarely re-watch it immediately. Keeping recently watched videos in cache wastes space; older favorites are more likely to be re-watched.

Use case 2 - Database scan operations: In certain join algorithms, the most recently accessed page is least likely to be needed again. MRU outperforms LRU for these sequential scan workloads.

6.5 Random Replacement

Core idea: Evict a randomly chosen cache entry.

Advantages:

Trivial to implement
Zero overhead (no need to track access time or frequency)
Avoids pathological worst-case behavior of deterministic algorithms
Surprisingly good performance for uniform access patterns

Disadvantage: No intelligence - may evict a very hot entry by chance.

6.6 ARC - Adaptive Replacement Cache

Developed by IBM Research. ARC is a sophisticated self-tuning algorithm that combines LRU and LFU.

How it works: ARC maintains four lists of equal total capacity:

T1: Recently accessed data (cache entries accessed exactly once recently)
T2: Frequently accessed data (cache entries accessed more than once)
B1: Ghost entries (metadata only, no data) of recently evicted T1 entries
B2: Ghost entries of recently evicted T2 entries

The adaptive behavior:

Cache hit in B1 (was recently evicted from T1): Increase T1's target size (recency was more valuable)
Cache hit in B2 (was recently evicted from T2): Increase T2's target size (frequency was more valuable)
ARC continuously adjusts the balance between T1 and T2 based on observed workload

Advantage: Self-tuning without any manual configuration. Adapts to both recency-biased and frequency-biased workloads automatically.

6.7 Redis Eviction Policies

Redis exposes 8 configurable eviction policies via the maxmemory-policy configuration:

Policy              Eviction Pool         Algorithm
noeviction          None                  Return error when memory is full (default)
allkeys-lru         All keys              Approximate LRU (5 random samples)
volatile-lru        Keys with TTL only    Approximate LRU (5 random samples)
allkeys-lfu         All keys              Approximate LFU
volatile-lfu        Keys with TTL only    Approximate LFU
allkeys-random      All keys              Random eviction
volatile-random     Keys with TTL only    Random eviction
volatile-ttl        Keys with TTL only    Evict key with smallest TTL first

Notes:

Redis uses an approximate LRU/LFU algorithm, not a perfect one. It samples 5 random keys (configurable via maxmemory-samples) and evicts the best candidate from the sample. This avoids maintaining a sorted data structure for all keys.
noeviction is the correct policy when Redis is used as a primary database (not a cache).
allkeys-lru is the most common choice for a pure caching deployment.
volatile-lru is appropriate when Redis serves as both cache and persistent store. Keys without TTL (persistent) are never evicted.

6.8 LRU Implementation in Java

import java.util.LinkedHashMap;
import java.util.Map;
 
public class LRUCache<K, V> extends LinkedHashMap<K, V> {
    private final int capacity;
 
    public LRUCache(int capacity) {
        // accessOrder=true: iteration order is access order (LRU at tail)
        super(capacity, 0.75f, true);
        this.capacity = capacity;
    }
 
    @Override
    protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
        // Remove the eldest (least recently used) when over capacity
        return size() > capacity;
    }
}
 
// Usage
LRUCache<String, String> cache = new LRUCache<>(100);
cache.put("user:123", "Alice");
cache.get("user:123");  // marks user:123 as most recently used

6.9 Eviction Policy Selection Guide

Workload Pattern                    Recommended Policy
-----------------------------------------
General web app (mixed reads)       LRU (allkeys-lru in Redis)
Skewed access (hot keys stay hot)   LFU (allkeys-lfu in Redis)
Scan-resistant needed               LFU or ARC
Only expire keys with TTL           volatile-lru or volatile-lfu
Streaming / scan workloads          MRU
Need simplicity + no overhead       Random
Zero tolerance for data loss        noeviction (Redis as DB)

7. Cache Metrics - Measuring What Matters

You cannot optimize what you cannot measure. Monitor these metrics for any cache.

Primary Metrics

Hit Rate (most important):

Hit Rate = Hits / (Hits + Misses) x 100%

Monitor this continuously. A declining hit rate is an early warning signal.

Miss Rate:

Miss Rate = 1 - Hit Rate = Misses / (Hits + Misses) x 100%

Eviction Rate:
Number of entries evicted per second. High eviction rates mean the cache is too small for the working set. Increasing cache size will improve the hit rate.

Latency (p50, p95, p99):
Cache access should be very fast. Use percentile metrics, not averages:

p50 (median): Typical experience
p95: Most users' experience
p99: Worst experience affecting 1% of requests

If p99 latency is high, investigate serialization overhead, network issues, or lock contention.

Memory Utilization:

Memory Utilization = Used Memory / Total Allocated Memory x 100%

Near 100% utilization with high eviction rates = cache is undersized. Very low utilization = cache may be over-allocated.

Key Expiration Rate:
How many keys expire per second. A sudden spike in expirations can trigger a cache avalanche (see Part 4).

Redis-Specific Metrics to Monitor

redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses|evicted_keys|expired_keys"

keyspace_hits:    9500000
keyspace_misses:  500000
evicted_keys:     1234
expired_keys:     45678

Derived metrics:

Hit Rate: 9500000 / (9500000 + 500000) = 95%
Eviction rate per minute: evicted_keys / uptime_in_seconds x 60

redis-cli INFO memory | grep -E "used_memory_human|maxmemory_human|mem_fragmentation_ratio"

used_memory_human:       3.50G
maxmemory_human:         4.00G   (87.5% utilized)
mem_fragmentation_ratio: 1.2     (acceptable, < 1.5 is fine)

8. Cache Invalidation - The Hardest Problem in Computer Science

Phil Karlton's famous quote exists because cache invalidation appears conceptually simple but has subtle, hard-to-avoid failure modes in real distributed systems.

Why It Is Hard

Consider this deceptively simple scenario:

Thread A:                        Thread B:
1. Read user:123 from DB
2.                               Update user:123 in DB
3.                               Delete cache key user:123
4. Write user:123 to cache       (stale value!)

Thread A reads the OLD value from the database, then Thread B deletes the cache entry (correctly), but Thread A then writes the old value into the cache. The cache now contains stale data.

This race condition between reads and writes is the fundamental challenge of cache invalidation.

Invalidation Strategies

Strategy 1: TTL-Based Invalidation (Passive Expiration)

Every cache entry carries a TTL. When the TTL expires, the entry is automatically removed. The next request fetches fresh data.

cache.set("user:123", userData, ttl=300)  // expires in 5 minutes

Pros:

Simple to implement
Self-healing: stale data automatically expires
No need to detect or react to data changes

Cons:

Data can be stale for up to the full TTL duration
Too-short TTL degrades hit rate; too-long TTL allows stale data

Best for: Data that changes infrequently and can tolerate staleness:

Product catalog
Configuration data
Country/city lookup tables
User profile (non-critical fields)

Strategy 2: Event-Driven Invalidation (Active Invalidation)

When data changes in the source system, an explicit invalidation event is triggered to remove or update the cache entry.

Sub-strategy 2a: Write-Invalidate (Delete on Write)

public void updateUser(User user) {
    userRepository.save(user);                    // 1. Update database
    cacheManager.evict("users", user.getId());    // 2. Delete cache entry
    // Next read will re-populate cache from DB
}

Sub-strategy 2b: Write-Update (Update on Write)

public void updateUser(User user) {
    userRepository.save(user);                    // 1. Update database
    cacheManager.put("users", user.getId(), user); // 2. Update cache with new value
}

Why Write-Invalidate is safer than Write-Update:

Write-Update has a race condition:

Time:   Thread A (update)         Thread B (concurrent update)
T1:     DB write (version 2)
T2:                               DB write (version 3)
T3:                               Cache update (version 3) <- correct
T4:     Cache update (version 2)  <- STALE! Thread A writes old value last

Write-Invalidate avoids this: deleting the entry forces any subsequent reader to fetch from DB, which will always return the latest value.

Cons of event-driven invalidation:

More complex to implement
Invalidation event might fail (cache entry remains stale)
Requires tight coupling between write path and cache layer

Strategy 3: Tag-Based Invalidation

Cache entries are associated with tags representing related entities. When an entity changes, all entries with that tag are invalidated atomically.

// Store product data with tags
cache.set("product:list:page:1", productList,
          tags=["category:electronics", "brand:apple"])

cache.set("product:789:detail", product,
          tags=["product:789", "category:electronics"])

// When Apple updates pricing, invalidate all Apple-tagged entries
cache.invalidateByTag("brand:apple")
// This removes all cache entries tagged with brand:apple

Implementations:

Varnish cache (Surrogate-Keys / xkey module)
Drupal cache tags
Custom implementation with Redis SETS (store keys per tag, delete all)

Strategy 4: Change Data Capture (CDC)

A database-level trigger mechanism that captures every change to the database and publishes it as an event stream. The cache layer subscribes to this stream and invalidates/updates entries.

Technology stack:

Debezium: captures MySQL/PostgreSQL/MongoDB changes as Kafka events
Kafka: event stream
Application/service: subscribes to Kafka, invalidates cache

This is the most robust but also most complex approach. It guarantees the cache is updated even for changes made directly to the database (bypassing the application).

The Double-Write Race Condition

Thread A (Reader):                   Thread B (Writer):
1. Cache miss for user:123
2.                                   Update user:123 in DB (new value)
3.                                   Delete cache key user:123
4. Read user:123 from DB (old value!) <-- reads BEFORE Thread B's DB update commits
5. Write OLD value to cache           <-- stale data persists until TTL expires

Best practice to minimize this race condition:

Always use Write-Invalidate (delete), never Write-Update.
Use a very short TTL as a safety net (even 5 seconds catches most races).
Use distributed locks around the cache population logic (see Cache Stampede solutions in Part 4).
For critical data: use database read-your-writes consistency for the invalidating transaction itself.

Distributed Invalidation in Microservices

When Service A owns the data and Service B caches it, invalidation across services is challenging:

Approach 1: Shared cache with event publication

Service A writes data --> publishes "user.updated" event to message bus
Service B subscribes to "user.updated" --> invalidates its cache entry

Approach 2: Short TTLs with eventual consistency
Accept that Service B's cache may be stale for up to N seconds (the TTL). After TTL, data self-heals. Simple, but data accuracy is bounded by TTL.

Approach 3: Versioned cache keys

// Version is stored in a central registry or passed in the request
String cacheKey = "user:123:v" + userVersion;
// When user is updated, version increments, old cache key is automatically "orphaned"
// Old entries eventually evict (but can waste memory until then)

9. Cache Sizing and Capacity Planning

The Working Set Principle

You do not need to cache everything. The working set is the subset of data that drives the vast majority of traffic. Cache the working set and you capture most of the value.

Pareto principle applied to caches:

80% of requests access 20% of data
Caching that 20% gives you 80% hit rate
For high-traffic systems, you may achieve 90%+ hit rates by caching 10-15% of total data

Estimating Required Cache Size

Cache Size = (Number of hot objects) x (Average object size in bytes)
           + (Overhead per key: ~64-100 bytes in Redis)
           + (25-30% buffer for metadata and fragmentation)

Worked example:

50 million registered users, 10% active (working set): 5 million user objects
Average serialized user object: 800 bytes
Base estimate: 5,000,000 x 800 bytes = 4 GB
Key overhead: 5,000,000 x 80 bytes = 400 MB
30% buffer: ~1.3 GB
Total recommended cache size: ~5.7 GB, allocate 6 GB

Monitoring and Adjusting Cache Size

Cache sizing is empirical. Monitor, measure, and adjust:

Observation                                  Action
-----------------------                      ------
High eviction rate + Low hit rate            Increase cache size
High eviction rate + Acceptable hit rate     Acceptable (working set > cache, but hot data stays)
Near-zero eviction + Very high memory usage  Consider reducing allocated memory
Sudden hit rate drop                         Check if working set has changed or TTL too short
Hit rate gradually declining                 Data access patterns changed, re-evaluate TTL and keys

Cost-Benefit Framework for Cache Sizing

Monthly DB Cost (queries/sec x DB compute cost)   vs   Monthly Cache Cost (RAM x price)

Example:
- Without cache: Database cluster handles 50,000 queries/sec
  Cost: $15,000/month

- With 10 GB Redis cache at 95% hit rate: Database handles 2,500 queries/sec
  Database cost drops to: $2,000/month
  Redis cost (10 GB): $300/month
  Total: $2,300/month

Net savings: $12,700/month

Key Design for Effective Caching

Cache keys should be:

Unique: No two different data items should share the same key
Descriptive: Clear what data the key refers to
Namespaced: Prefixed to avoid collisions across different services or data types
Predictable: Deterministic from the input parameters

Best practice patterns:

{service}:{entity}:{id}
{service}:{entity}:{id}:{version}
{service}:{entity}:{query_param1}:{query_param2}

Examples:
users-service:user:123
catalog-service:product:456
catalog-service:products:category:electronics:page:2
auth-service:session:abc123def456

Anti-patterns:

user_123                  // No namespace, collision risk
user:123:full:data:v2    // Too verbose
u123                     // Too cryptic
SELECT * FROM users...   // Full query text as key (dangerous with user input)

Summary

Caching is not a magic solution but a powerful engineering tool when applied thoughtfully:

Understand your data access patterns before adding a cache. Caching uniform or constantly changing data yields little benefit.
Know the memory hierarchy. The further from the CPU, the more expensive the access. Caching brings data closer to the access point.
Measure first. Use hit rate, eviction rate, and latency metrics to validate that caching is working and correctly sized.
Choose the right eviction policy. LRU is the default right choice for most web applications. Use LFU for highly skewed access patterns.
Cache invalidation is the hardest part. Use Write-Invalidate (not Write-Update). Use TTL as a safety net. Accept eventual consistency unless strong consistency is a hard requirement.
Size the cache to fit the working set, not the full dataset. The Pareto principle usually applies: caching 20% of data captures 80% of reads.

Next: Part 2 - Caching Strategies and Patterns

Series: Caching Demystified