Caching Demystified - Part 5: Interview Questions and Answers

Questions are ordered from most frequently asked to most advanced and tricky.
Every question that appears in a real interview at any level is covered here.

Section 1: Core Fundamentals (Asked in virtually every interview)
Section 2: Caching Patterns and Strategy Design
Section 3: Redis Deep Dive (Asked when JD mentions Redis)
Section 4: Distributed Systems and Scale
Section 5: Cache Pathologies and Problem Solving
Section 6: HTTP and CDN Caching
Section 7: System Design (Caching in Architecture)
Section 8: Tricky and Brain-Teaser Questions
Section 9: Spring Boot and Java Caching
Section 10: Must-Know Behavioral and Deep-Dive Questions

Section 1: Core Fundamentals

Q1. What is caching? Why do we use it?

Answer:

Caching is the process of storing copies of frequently accessed data in a fast, temporary storage layer (the cache) so that future requests for that data can be served faster, without going back to the original, slower data source.

We use caching for four primary reasons:

Latency reduction: A cache read takes 1ms; a database query takes 10-100ms. For high-traffic systems, this difference compounds massively.
Throughput improvement: A database that handles 1,000 queries/second with caching might only need to serve 50 queries/second (the 5% cache misses), freeing up capacity.
Cost reduction: Fewer database queries = smaller database cluster = lower cloud infrastructure costs.
Resilience: A cache can continue serving reads even when the database is temporarily unavailable or slow.

When caching is appropriate: Read-heavy workloads (10:1 or higher read/write ratio), repetitive access patterns, expensive queries, external API calls with rate limits.

When caching is NOT appropriate: Write-heavy workloads, data that changes on every request, data that requires guaranteed real-time consistency (account balances during a transaction).

Q2. What is a cache hit and a cache miss? What is the cache hit ratio?

Answer:

Cache Hit: The requested data IS found in the cache. The result is returned directly without going to the database. Very fast (sub-millisecond for in-process, 1-2ms for distributed cache).

Cache Miss: The requested data is NOT found in the cache. The system must fetch from the origin (database, API), optionally store in cache, then return. Slower (database latency applies).

Types of cache misses:

Cold Miss (Compulsory): First-ever access. Unavoidable.
Capacity Miss: Data was in cache but was evicted because the cache was full.
Conflict Miss: (CPU caches only) Two addresses map to the same cache set.

Cache Hit Ratio:

Hit Ratio = Hits / (Hits + Misses) x 100%

Example: 9,500 hits out of 10,000 requests = 95% hit ratio.

What is good:

< 80%: Investigate - cache may not be helping
80-90%: Acceptable
90-99%: Good production performance
99%: Excellent (achievable for static/reference data)

The hit ratio is the most important metric for evaluating cache effectiveness.

Q3. What is TTL in caching? What is the difference between absolute TTL and sliding TTL?

Answer:

TTL (Time to Live) is the duration for which a cache entry is considered valid. After the TTL elapses, the entry expires. The next request for that key results in a cache miss, and fresh data is fetched from the source.

Absolute TTL (Fixed TTL):
The entry expires at a fixed time from when it was CREATED, regardless of how many times it is accessed.

Entry created at: T=0
Absolute TTL: 300 seconds
Entry expires at: T=300s (even if it was accessed at T=299s)

Use case: Product catalog data, configuration data.

Sliding TTL (Idle TTL / Expire After Access):
The TTL is reset every time the entry is ACCESSED. The entry only expires if it has not been accessed for the full TTL duration.

Sliding TTL: 300 seconds
Entry accessed at T=0, T=100, T=200 (resets each time)
Entry finally expires at T=500 (300 seconds after last access at T=200)

Use case: User session data (active users stay logged in, inactive users are timed out).

Key difference: Sliding TTL can keep hot entries alive indefinitely (a double-edged sword). Absolute TTL guarantees an entry is eventually refreshed regardless of access frequency.

Q4. What are the common cache eviction policies? Explain LRU.

Answer:

Cache eviction policies determine which entry is removed when the cache is full and a new entry must be added.

LRU - Least Recently Used (most common):
Evict the entry that was accessed (read or written) least recently. Assumes recently accessed data is likely to be accessed again (temporal locality).

Implementation: Doubly linked list + hash map = O(1) get and put.

LFU - Least Frequently Used:
Evict the entry with the lowest access count. Better than LRU for skewed workloads (power-law access distribution). Susceptible to "frequency bias" for stale-popular data.

FIFO - First In, First Out:
Evict the oldest entry by insertion time. Simple but ignores access patterns. Poor performance for most workloads.

MRU - Most Recently Used:
Evict the most recently accessed entry. Useful for streaming or scan workloads where recent data is least likely to be re-accessed.

Random Replacement:
Evict a random entry. Simple, no overhead, surprisingly good for uniform access patterns.

ARC - Adaptive Replacement Cache:
Self-tuning combination of LRU and LFU. Maintains ghost lists to observe workload and adjust the balance dynamically. Used in ZFS.

Redis eviction policies (configured via maxmemory-policy):

allkeys-lru: Evict any key using LRU (use for pure cache deployments)
volatile-lru: Evict only keys with TTL set, using LRU
allkeys-lfu: Evict any key using LFU
noeviction: Return error when memory full (use when Redis is primary DB)

Q5. What is cache invalidation? Why is it considered hard?

Answer:

Cache invalidation is the process of removing or marking as stale a cache entry when the underlying source data changes. It is about correctness (the cache should not serve outdated data) rather than capacity management (eviction).

Why it is hard (Phil Karlton: "There are only two hard things in CS: cache invalidation and naming things"):

Race conditions between read and write paths:

Thread A reads old value -> Thread B writes new value + deletes cache ->
Thread A writes old value to cache -> Cache now has stale data

Distributed system complexity: In microservices, multiple services may cache the same data. When data changes, all service caches must be invalidated. Achieving this reliably across process boundaries requires messaging systems.
Partial failures: Invalidation might succeed in deleting from Redis but fail to reach Service B's local cache. Now two caches have different views.
Consistency vs availability trade-off: Ensuring strong consistency (cache always matches DB) requires synchronous operations that reduce availability and increase latency.

Invalidation strategies:

TTL-based (passive): Data self-heals after expiry. Simple, eventual consistency.
Write-invalidate (active): Delete cache entry on write. Next read re-populates.
Write-update (active): Update cache entry on write. Faster reads, but race condition risk.
Event-driven (CDC): Database change events trigger cache invalidation via Kafka/message bus.

Best practice: Always prefer Write-Invalidate (delete on write) over Write-Update. Deletions are idempotent and race-condition-safe.

Q6. What is the difference between a cache and a database?

Answer:

Characteristic    Cache                         Database
--------------    -----                         --------
Purpose           Temporary storage for speed   Permanent storage, source of truth
Durability        Not guaranteed (volatile)     Guaranteed (ACID transactions)
Data lifetime     Short (TTL, eviction)         Permanent (until explicitly deleted)
Size              Limited (expensive RAM)       Large (cheap disk)
Access latency    Sub-millisecond to 2ms        10ms to 100ms+
Consistency       Eventual (by design)          Strong (ACID)
Failure impact    Performance degradation       Data loss
Query complexity  Simple key-value lookups      Complex SQL, joins, aggregations
Data completeness Partial (hot subset only)     Complete

Key insight: A cache is a performance optimization layer, not a data store. You should always be able to rebuild the cache entirely from the database. If the cache is lost, data is NOT lost.

Redis as both: Redis can serve as both a cache (volatile, TTL-based) and a database (persistent, with AOF/RDB). This dual role requires careful configuration:

Pure cache: maxmemory-policy allkeys-lru, no persistence required
Cache + DB: maxmemory-policy volatile-lru (only evict keys with TTL), enable persistence

Q7. What is the difference between an in-process cache and a distributed cache?

Answer:

In-Process Cache (Local Cache):

Lives inside the application process (same JVM heap or process memory)
Examples: Caffeine, Guava Cache, .NET MemoryCache
Access time: ~0.01ms to 0.1ms (no network)
Scope: Private to one application instance
Not shared between instances
Lost when application restarts
Risk: Large cache increases GC pressure (for JVM)

Distributed Cache (Remote Cache):

Runs as a separate service accessed over the network
Examples: Redis, Memcached, Hazelcast
Access time: 1-5ms over local network
Scope: Shared across all application instances
Survives application restarts (with persistence)
Scales independently

When to use each:

Use Case	In-Process	Distributed
Single application instance	Yes	Unnecessary
Multiple instances, shared state needed	No (incoherent)	Yes
Data must survive app restart	No	Yes
User session in clustered deployment	No	Yes
Reference data (country codes)	Yes	Either
Rate limiting counters	No (each instance counts separately)	Yes

Best practice for high-traffic systems: Use BOTH in a tiered (L1 + L2) architecture. L1 (in-process) catches the hottest traffic with sub-millisecond latency. L2 (Redis) provides shared state for L1 misses.

Q8. What are the main caching strategies?

Answer:

There are five primary caching strategies. Each suits different read/write patterns.

1. Cache-Aside (Lazy Loading):

Application checks cache. On miss: app queries DB, stores in cache, returns data.
Write: App writes to DB, then deletes (invalidates) cache entry.
Most common strategy. Application manages both cache and DB.
Pros: Resilient (works without cache), only caches requested data.
Cons: Cache miss is always a slow path. Risk of stale data.

2. Read-Through:

Application only talks to cache. Cache queries DB on miss transparently.
Cleaner separation. Cache library handles DB fallback.
Cons: Tighter coupling between cache and DB.

3. Write-Through:

Every write goes to BOTH cache AND DB synchronously.
Cache is always in sync after writes. No stale data for written entries.
Cons: Higher write latency (both cache + DB must complete). Cache pollution (written data may never be read).

4. Write-Behind (Write-Back):

Write to cache only. Cache asynchronously writes to DB later.
Lowest write latency.
Cons: Risk of data loss if cache fails before async write.

5. Write-Around:

Writes bypass cache entirely, go directly to DB.
Reads use Cache-Aside on subsequent access.
Good for write-once, read-rarely data.

Refresh-Ahead (Prefetching):

Cache proactively refreshes entries before they expire.
No cache miss latency for hot keys.

Q9. What is a distributed cache? Give examples.

Answer:

A distributed cache is a cache system that runs as a service separate from the application and is shared across multiple application instances. The cache data is partitioned across multiple nodes for scale and replicated for fault tolerance.

Examples:

Redis: Most popular. Rich data structures, persistence, clustering, pub/sub.
Memcached: Simpler, pure key-value, multi-threaded. Older, less feature-rich.
Hazelcast: Java-native distributed computing platform with caching, near-cache.
Apache Ignite: In-memory computing platform with distributed SQL, cache, compute.
Couchbase: Distributed NoSQL database with built-in caching layer.
AWS ElastiCache: Managed service for Redis or Memcached.
Azure Cache for Redis: Managed Redis on Azure.

Key properties of a distributed cache:

Shared state across all application instances
Scales horizontally (add nodes to increase capacity)
High availability (replication, automatic failover)
Data partitioned across nodes (consistent hashing or hash slots)
Network access (1-5ms latency vs sub-millisecond for in-process)

Q10. How do you implement caching in Spring Boot?

Answer:

Spring Boot provides a declarative caching abstraction with four key annotations:

Step 1: Enable caching

@SpringBootApplication
@EnableCaching
public class Application { ... }

@Cacheable - Cache-Aside for reads (most used):

@Cacheable(value = "products", key = "#productId")
public Product getProduct(Long productId) {
    return productRepository.findById(productId).orElseThrow();
    // Method only executes on cache miss. Result is stored in cache.
}

@CachePut - Write-Through for writes:

@CachePut(value = "products", key = "#result.id")
public Product updateProduct(Product product) {
    return productRepository.save(product);
    // Always executes AND updates the cache. Use for write-through.
}

@CacheEvict - Invalidation:

@CacheEvict(value = "products", key = "#productId")
public void deleteProduct(Long productId) {
    productRepository.deleteById(productId);
    // After method executes, cache entry is removed.
}

Choose the cache provider in application.yml:

spring:
  cache:
    type: caffeine # or: redis, ehcache, simple (in-memory map)
    caffeine:
      spec: maximumSize=10000,expireAfterWrite=600s

Important nuance: @Cacheable(sync = true) prevents Cache Stampede by ensuring only one thread executes the method on a miss; others wait for the result.

Q11. What is the difference between Redis and Memcached?

Answer:

Feature               Redis                      Memcached
-------------------   ---------                  ----------
Data Types            String, Hash, List, Set,   String only
                      ZSet, Bitmap, HLL, Stream
Persistence           Yes (RDB + AOF)             No
Replication           Yes (Master-Replica)        No
High Availability     Sentinel, Cluster           Client-side only
Multi-threading       Partially (I/O layer)       Fully multi-threaded
Transactions          Yes (MULTI/EXEC)            No
Pub/Sub               Yes                        No
Scripting             Lua scripts                No
Max Value Size        512 MB                     1 MB
Max Key Size          512 MB                     250 bytes
Memory Management     Multiple encodings         Slab allocator
Cluster/Sharding      Built-in (hash slots)      Client-side

When to choose Memcached:

Pure string key-value caching with no other needs
Very high multi-core CPU utilization needed
Existing Memcached deployment already in place

When to choose Redis (default for new projects):

Any need for data structures beyond strings
Need persistence (cache survives restart)
Need replication and automatic failover
Need pub/sub, streams, transactions, Lua scripting
Any modern new project: Choose Redis

Q12. How does Redis persist data?

Answer:

Redis is an in-memory data store but supports two persistence mechanisms:

RDB (Redis Database Backup) - Snapshots:

Redis periodically takes a point-in-time snapshot of all data and writes it to a .rdb file on disk.
Uses BGSAVE: forks a child process that writes the snapshot while the parent continues serving requests.
Configured with: save 60 1000 (save every 60 seconds if at least 1,000 keys changed).
Pros: Fast startup (load from compact binary file), good for backups.
Cons: Data loss between snapshots (up to the configured interval).

AOF (Append-Only File):

Every write command is appended to an AOF file. On restart, Redis replays the file.
fsync policy: always (safest, slowest), everysec (default, at most 1 second loss), no (OS decides, fastest).
Auto-rewrite: Compacts the AOF by rewriting equivalent history (100 INCRs -> 1 SET).
Pros: Much less data loss. With fsync always: zero data loss.
Cons: Larger file, slower restart (replay vs binary load).

Combined (recommended for production):

appendonly yes   # Use AOF for durability
save 3600 1      # Keep RDB for fast restart and backups

On restart, Redis uses AOF if present (more complete), otherwise falls back to RDB.

Section 2: Caching Patterns

Q13. Compare Cache-Aside and Read-Through. When would you use each?

Answer:

Cache-Aside (Lazy Loading):

The APPLICATION is responsible for checking the cache, fetching from DB on miss, and populating the cache.
Application interacts with BOTH cache and DB independently.

User user = cache.get("user:123");
if (user == null) {
    user = db.findUser(123);      // App calls DB
    cache.set("user:123", user);  // App populates cache
}
return user;

Read-Through:

The APPLICATION only interacts with the CACHE.
The CACHE is responsible for fetching from DB on miss (via a configured CacheLoader).

// App only calls cache - DB logic is in the CacheLoader
User user = userCache.get(123L);  // Cache calls DB on miss automatically

When to use Cache-Aside:

When different methods need different caching logic (some cached, some not)
When cache resilience is important (app still works without cache)
When using Spring @Cacheable (which implements Cache-Aside internally)
Default choice for most situations

When to use Read-Through:

When you want clean separation: application has no database awareness
When using Caffeine LoadingCache or Guava CacheLoader
When the same data is accessed identically from many code locations

Key difference: In Cache-Aside, if the cache is down, the application falls back to the database. In Read-Through, if the cache is down, the application may fail (unless the CacheLoader has its own fallback).

Q14. Compare Write-Through and Write-Behind caching.

Answer:

Write-Through:

Write goes to cache AND database SYNCHRONOUSLY.
Write is not acknowledged until BOTH cache and DB confirm.
Cache and database are always consistent after a write.
Higher write latency (must wait for DB).

Write-Behind (Write-Back):

Write goes to cache only. Acknowledgment is sent immediately.
Cache asynchronously writes to DB in the background (batched or delayed).
Lower write latency (no DB wait).
Risk of data loss: if cache crashes before async write, data is lost.

Write-Through:
App --> Cache --> DB --> ACK   (serial, both must succeed)
Latency: Cache latency + DB latency

Write-Behind:
App --> Cache --> ACK          (fast!)
              [background] --> DB  (async, could fail)
Latency: Cache latency only

Choose Write-Through when:

Write consistency is critical
Loss of writes is unacceptable
Write latency of a few extra milliseconds is acceptable
Financial data, user authentication

Choose Write-Behind when:

Write throughput is the bottleneck
Application can tolerate some data loss (analytics, counters, draft saves)
Database cannot keep up with write rate
Gaming, IoT sensor data ingestion

Q15. What is consistent hashing? Why is it needed for distributed caches?

Answer:

The problem with naive hashing:

With node = hash(key) % N (N = number of nodes):

When you add node 4 to a 3-node cluster, N changes from 3 to 4.
Almost every key maps to a different node.
75% of keys are now on the "wrong" node = massive cache miss storm during rebalancing.

Consistent hashing solution:

Arrange nodes on a virtual ring (hash space 0 to 2^32).

Each node is placed at a hash-computed position on the ring.
To find the node for a key: hash the key, walk clockwise on the ring to find the next node.

When adding/removing a node:

Only the keys between the removed/added node and its predecessor are remapped.
All other keys are unaffected.
Impact: only ~1/N keys are remapped (vs ~100% with naive hashing).

Virtual nodes (Vnodes):
With few physical nodes, the distribution may be uneven. Virtual nodes create multiple ring positions per physical node (typically 150-300 virtual nodes per node), ensuring statistically uniform distribution.

Used in:

Redis Cluster (uses 16,384 hash slots, a fixed-slot variant of consistent hashing)
Cassandra (virtual nodes)
DynamoDB (consistent hashing with virtual nodes)
Memcached client libraries

Q16. What is multi-level caching? Give a real-world example.

Answer:

Multi-level caching uses a hierarchy of caches at different speeds and sizes to maximize performance while managing cost.

Architecture:

Request -> L1 (In-process, ~0.01ms) -> L2 (Redis, ~2ms) -> L3 (DB, ~20ms)

Real-world example: Amazon product detail page

L1 - In-process cache (Caffeine): Stores top 1,000 products in each application instance's heap. TTL: 30 seconds. Hit rate for top items: ~70%.
L2 - Distributed cache (Redis): Stores top 10 million products shared across all instances. TTL: 10 minutes. Hit rate: ~95% of L1 misses.
L3 - Database (MySQL InnoDB): Source of truth. Only hit for items not in L1 or L2 (~5% of requests after cache warmup).

Benefits:

70% of requests served at ~0.01ms (L1 hit)
28.5% of requests served at ~2ms (L2 hit)
1.5% of requests go to the database
Database receives only 1.5% of total traffic

Java implementation sketch:

public Product getProduct(Long id) {
    // L1
    Product p = localCache.getIfPresent(id);
    if (p != null) return p;
 
    // L2
    p = (Product) redis.get("product:" + id);
    if (p != null) { localCache.put(id, p); return p; }
 
    // L3
    p = productRepository.findById(id).orElseThrow();
    redis.set("product:" + id, p, Duration.ofMinutes(10));
    localCache.put(id, p);
    return p;
}

Section 3: Redis Deep Dive

Q17. What Redis data structures would you use for the following use cases?

Leaderboard (top players by score):
Use Sorted Set (ZSet).

ZADD leaderboard 9850 "player:alice"
ZADD leaderboard 8200 "player:bob"
ZREVRANGE leaderboard 0 9 WITHSCORES   # Top 10
ZRANK leaderboard "player:alice"        # Rank of alice

Shopping cart:
Use Hash. One hash per user, product ID as field, quantity as value.

HSET cart:user:123 product:456 2    # User 123 has 2 of product 456
HSET cart:user:123 product:789 1
HGETALL cart:user:123               # Get entire cart

Rate limiting:
Use String with INCR and EXPIRE.

INCR ratelimit:user:123:2026-06-05-14:00   # Increment counter for current minute
EXPIRE ratelimit:user:123:2026-06-05-14:00 60

Unique page views:
Use HyperLogLog for approximate counts (memory efficient).

PFADD page:home:visitors:2026-06-05 "user:123" "user:456"
PFCOUNT page:home:visitors:2026-06-05   # Approx unique visitors

User online status (millions of users):
Use Bitmap. One bit per user ID.

SETBIT online:users 123 1    # User 123 is online
GETBIT online:users 123      # 1 (online)
BITCOUNT online:users        # Count of online users

Activity feed / timeline:
Use List. LPUSH new items, LTRIM to maintain last N.

LPUSH user:123:feed "post:789"
LTRIM user:123:feed 0 99    # Keep last 100 items
LRANGE user:123:feed 0 9    # Get 10 most recent

Q18. How does Redis expire keys?

Answer:

Redis uses a two-pronged approach to key expiration:

1. Lazy Expiration (Passive):
When a key is accessed, Redis checks if it has an expiry set and if it has passed. If expired, Redis deletes the key and returns a miss to the caller.

Pro: No overhead for unaccessed keys
Con: Expired keys that are never accessed again are not cleaned up immediately

2. Active Expiration (Background Task):
Redis runs a background task (every 100ms by default) that:

Randomly samples 20 keys from the set of keys with TTLs.
Deletes any that have expired.
If more than 25% of the sampled keys were expired, repeat immediately (indicating many expirations are happening).
Otherwise, wait for the next cycle.

This probabilistic approach ensures expired keys are eventually cleaned without requiring a full scan.

Commands:

EXPIRE key 300        # Set TTL in seconds
PEXPIRE key 300000    # Set TTL in milliseconds
EXPIREAT key 1893456000   # Set absolute Unix timestamp expiry
TTL key               # Remaining seconds (-1=no expiry, -2=key missing)
PERSIST key           # Remove TTL (make key permanent)

Important interview point: Keys can remain in memory for a short time after their TTL expires. Under heavy load, the active expiration loop may not run frequently enough. This means Redis memory usage can temporarily exceed what you calculate from TTLs alone.

Q19. What is Redis Sentinel vs Redis Cluster?

Answer:

Both provide high availability but solve different problems:

Redis Sentinel:

High availability solution for a SINGLE Redis dataset (single shard).
Does NOT scale data capacity. All data is on one master.
Provides: Monitoring, automatic failover, and configuration provider to clients.
Setup: 3+ Sentinel processes watch the master and replicas.
Failover: When quorum of Sentinels agree master is down, a replica is promoted.
Use when: Dataset fits on one server. HA is the goal, not horizontal scaling.

Redis Cluster:

Provides BOTH sharding (horizontal scaling) AND high availability.
Data is split across multiple master nodes (16,384 hash slots distributed).
Each master can have replicas for HA.
Automatically handles failover for individual shards.
Use when: Dataset exceeds single server capacity OR write throughput exceeds single master capacity.

Sentinel:                      Cluster:
[Master]                       [Master A (slots 0-5460)] + [Replica A]
[Replica 1]       vs           [Master B (slots 5461-10922)] + [Replica B]
[Replica 2]                    [Master C (slots 10923-16383)] + [Replica C]
   [Sentinel 1/2/3]            (Built-in HA per shard)

Key differences:

Aspect	Sentinel	Cluster
Sharding	No	Yes (16,384 hash slots)
Scale	Single master capacity	Multi-master horizontal scale
Setup complexity	Medium	High
Client requirement	Sentinel-aware client	Cluster-aware client
Multi-key ops	Always work	Only if keys on same node
Use case	HA for small-medium datasets	Large datasets + HA

Q20. What is Redis pipelining? Why is it useful?

Answer:

Redis pipelining sends multiple commands to Redis in a single network round trip, without waiting for individual responses. Responses are buffered and returned together.

Without pipelining (3 round trips):

Client -> SET key1 value1 -> Redis -> OK  (1 round trip)
Client -> SET key2 value2 -> Redis -> OK  (1 round trip)
Client -> SET key3 value3 -> Redis -> OK  (1 round trip)
Total: 3 x 1ms = 3ms (not counting command execution time)

With pipelining (1 round trip):

Client -> [SET key1 value1, SET key2 value2, SET key3 value3] -> Redis -> [OK, OK, OK]
Total: 1 x 1ms = 1ms

Performance impact: For 100 commands over a 1ms network:

Without pipeline: 100 x 1ms = 100ms
With pipeline: 1ms + execution time

Spring Data Redis example:

List<Object> results = redisTemplate.executePipelined(connection -> {
    connection.set("key1".getBytes(), "val1".getBytes());
    connection.set("key2".getBytes(), "val2".getBytes());
    connection.hSet("user:123".getBytes(), "name".getBytes(), "Alice".getBytes());
    return null; // Must return null
});

Pipelining vs Transactions (MULTI/EXEC):

Pipelining: Performance optimization (batch network I/O). No atomicity guarantee.
MULTI/EXEC: Atomicity guarantee (no interleaving). Still sends all commands in one batch.

Q21. What is a Redis distributed lock? How do you implement it?

Answer:

A distributed lock using Redis prevents multiple application instances from executing a critical section concurrently. Redis's atomic SET NX EX command is the foundation.

The basic pattern:

String lockKey = "lock:resource:order_processing";
String lockValue = UUID.randomUUID().toString(); // Unique token per lock holder
 
// Acquire lock (atomic SET if Not eXists with EXpiry)
Boolean acquired = redisTemplate.opsForValue()
        .setIfAbsent(lockKey, lockValue, Duration.ofSeconds(10));
 
if (Boolean.TRUE.equals(acquired)) {
    try {
        // Critical section - only one instance executes this
        processOrder(orderId);
    } finally {
        // Release lock ONLY if we own it (prevent releasing someone else's lock)
        // Use Lua script for atomic check-and-delete
        String releaseScript =
            "if redis.call('GET', KEYS[1]) == ARGV[1] then " +
            "  return redis.call('DEL', KEYS[1]) " +
            "else " +
            "  return 0 " +
            "end";
        redisTemplate.execute(
            new DefaultRedisScript<>(releaseScript, Long.class),
            List.of(lockKey),
            lockValue
        );
    }
}

Critical safety details:

Always set an expiry (EX): Prevents deadlock if the lock holder crashes.
Use a unique value per lock holder: Prevents one instance from releasing another's lock.
Lua script for release: Makes the check-and-delete atomic (no race condition).
Do NOT use SET key + EXPIRE key separately: Race condition exists between these two commands.

Production-ready: Use the Redisson library for Java, which implements the full Redlock algorithm (multi-node locks) and handles edge cases.

Section 4: Distributed Systems and Scale

Q22. How do you handle cache invalidation in a microservices architecture?

Answer:

Cache invalidation across microservices is one of the hardest distributed systems problems. There is no perfect solution, only trade-offs.

Problem setup:

Service A owns User data and writes to it.
Service B, C, D all cache User data from Service A.
When Service A updates a user, how do B, C, and D know to invalidate?

Approach 1: Event-Driven Invalidation via Message Bus (Most Robust)

Service A writes -> publishes "user.updated:{userId}" to Kafka/RabbitMQ
Service B, C, D subscribe to "user.*" events
On receipt: each service deletes "user:{userId}" from its cache

Pros: Decoupled, reliable (message bus persists events if consumer is down)
Cons: Asynchronous (small stale window between write and invalidation delivery)

Approach 2: Short TTL (Simplest)
All services use a very short TTL (30-60 seconds). Accept eventual consistency.

Pros: No complexity, self-healing
Cons: Data may be stale for up to TTL duration. Not suitable for strongly consistent data.

Approach 3: Cache Ownership via API (Simple)
Service B calls Service A's API to get user data. Service A controls the cache. Service B does not cache User data at all; it relies on Service A's response cache.

Pros: Single source of truth, no cross-service invalidation
Cons: Service A becomes a hotspot if many services call it frequently

Approach 4: Change Data Capture (CDC)
A CDC tool (Debezium) captures all database changes and publishes them as events. Services subscribe to database-level change events.

Pros: Catches ALL changes including direct DB writes (bypassing app layer)
Cons: Complex setup, requires Kafka

Q23. Explain the CAP theorem and how it applies to distributed caching.

Answer:

CAP Theorem: In a distributed system, you can guarantee at most 2 of these 3 properties simultaneously:

C - Consistency: All nodes see the same data at the same time.
A - Availability: Every request receives a response (not an error).
P - Partition Tolerance: System continues operating despite network partitions.

Since network partitions are unavoidable in real distributed systems, you must choose between C and A:

CP (Consistency + Partition Tolerance):
If a network partition occurs, the system refuses to serve stale data. Requests may fail or block until consistency is restored.

Example: ZooKeeper, HBase
Cache implication: If you cannot verify data freshness during a partition, return an error rather than stale data.

AP (Availability + Partition Tolerance):
During a network partition, the system continues serving data but it may be stale (inconsistent). Eventually, consistency is restored.

Example: Redis Sentinel/Cluster (by default), Cassandra, DynamoDB
Cache implication: During partition, caches may serve stale data. After healing, data converges.

Redis's position: Redis is AP by default. During a master-replica failover, a brief window exists where the new master may not have all the old master's writes. This is accepted in exchange for availability.

Practical cache design implication:

Most caching use cases should be AP. A stale cache response is almost always better than no response.
Use TTL as a consistency bound: "Data is guaranteed to be at most N seconds stale."
For strongly consistent requirements: Use write-through with synchronous replication and accept the latency cost.

Q24. How would you design a rate limiter using Redis?

Answer:

Rate limiting is one of the most classic Redis use cases.

Fixed Window Rate Limiter (Simple):

public boolean isAllowed(String userId, int maxRequests, int windowSeconds) {
    String key = "ratelimit:" + userId + ":" + (System.currentTimeMillis() / (windowSeconds * 1000L));
    Long count = redisTemplate.opsForValue().increment(key);
    if (count == 1) {
        redisTemplate.expire(key, Duration.ofSeconds(windowSeconds));
    }
    return count <= maxRequests;
}
// Problem: At window boundary, user could send 2x maxRequests (end of one window + start of next)

Sliding Window Rate Limiter using Sorted Set (More Accurate):

public boolean isAllowed(String userId, int maxRequests, long windowMs) {
    String key = "ratelimit:sliding:" + userId;
    long now = System.currentTimeMillis();
    long windowStart = now - windowMs;
 
    // Lua script for atomic operations
    String script =
        "redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])\n" +  // Remove old entries
        "redis.call('ZADD', KEYS[1], ARGV[2], ARGV[2])\n" +              // Add current request
        "redis.call('EXPIRE', KEYS[1], ARGV[3])\n" +                     // Set expiry
        "return redis.call('ZCARD', KEYS[1])";                           // Count recent requests
 
    Long count = (Long) redisTemplate.execute(
        new DefaultRedisScript<>(script, Long.class),
        List.of(key),
        String.valueOf(windowStart),          // ARGV[1]: window start
        String.valueOf(now),                  // ARGV[2]: current timestamp (as score and member)
        String.valueOf(windowMs / 1000 + 1)   // ARGV[3]: TTL slightly longer than window
    );
 
    return count != null && count <= maxRequests;
}
// More accurate: counts exact requests within the sliding window.
// Storage: O(maxRequests) per user (each request stored as a ZSet member)

Section 5: Cache Pathologies and Problem Solving

Q25. What is a cache stampede (thundering herd)? How do you prevent it?

Answer:

When a popular cache entry expires, many concurrent threads simultaneously experience a cache miss and all rush to the database to refresh the same key. This can overwhelm the database.

Prevention strategies:

1. Mutex Lock:
Only one thread fetches from the database. Others wait for the result.

Boolean acquired = redis.setIfAbsent("lock:" + key, "1", Duration.ofSeconds(5));
if (acquired) { /* fetch and cache */ } else { Thread.sleep(50); return cache.get(key); }

2. Refresh-Ahead:
Refresh entries proactively before they expire. Cache always has valid data.

// Caffeine: refreshAfterWrite(4, MINUTES) with expireAfterWrite(5, MINUTES)

3. Stale-While-Revalidate:
Return stale data immediately. Refresh in background asynchronously.

if (entry.isExpired()) { triggerAsyncRefresh(key); }
return entry.getValue();  // Return stale immediately

4. Randomized TTL:
Spread expiry times over a window so they do not all expire simultaneously.

Duration ttl = Duration.ofMinutes(5).plusSeconds(new Random().nextInt(60));

5. @Cacheable(sync=true) in Spring Boot:
Uses a local lock per cache key. Only one thread executes the method per key. Others wait.

Q26. What is cache penetration? How do you solve it?

Answer:

Cache penetration occurs when requests are made for keys that do not exist in either the cache OR the database. Every request bypasses the cache (nothing to cache) and hits the database, which also returns empty. An attacker can deliberately exploit this with invalid IDs.

Solution 1: Cache Null Values (Negative Caching):
When the database returns empty, cache a null/sentinel value with a SHORT TTL.

if (user == null) {
    cache.set("user:" + id, NULL_SENTINEL, Duration.ofSeconds(30));  // Short TTL
}

Con: If a user is created shortly after, the "not found" result may be served stale for up to the null TTL.

Solution 2: Bloom Filter:
A probabilistic data structure that can definitively say "this key DOES NOT EXIST."

Pre-populate with all valid IDs at startup.
On request: if Bloom filter says "definitely not," return null immediately (skip cache and DB).
If "might exist": continue to cache/DB lookup.
False positives possible (some invalid IDs pass through), but false negatives impossible.

if (!bloomFilter.mightContain(userId)) return null;  // Definitely doesn't exist
// Otherwise proceed with cache/DB lookup

Solution 3: Input Validation:
Validate that the ID format is valid before any lookup. Reject IDs that violate format rules.

Q27. What is a cache avalanche? How is it different from cache stampede?

Answer:

Cache Stampede: One specific hot key expires. Many threads simultaneously miss for that ONE key.

Cache Avalanche: MANY keys expire simultaneously (or the cache becomes entirely unavailable). A flood of misses across ALL expiring keys hits the database.

Example:

System starts and loads 100,000 product keys, all with TTL = 3600 seconds.
After exactly 1 hour: all 100,000 keys expire at the same moment.
Database receives 100,000 queries within seconds.
Database overwhelmed. System outage.

Solutions:

Randomized TTL: Add jitter when setting keys: TTL = base + random(0, jitter_max). This spreads expiries over a window instead of clustering them.
Staggered cache loading: Load keys with varying TTLs from the start.
Circuit breaker: Protect the database from flooding using a circuit breaker (Resilience4j).
Multi-level caching (L1 + L2): Even if L2 (Redis) avalanches, L1 (in-process) absorbs the initial burst for the hottest keys.
Cache warmup with jitter: Pre-populate cache at startup with varied TTLs.

Q28. What is the difference between cache stampede and cache breakdown?

Answer:

Both involve a cache miss under high concurrency, but the scope differs:

Cache Stampede / Thundering Herd:

General problem where any popular key expiry causes concurrent misses.
Can affect many different keys.
Typically a scalability issue affecting all high-traffic keys.

Cache Breakdown (Hot Key Expiry):

A SPECIFIC, extremely hot key expires under intense concurrent access.
"Hot" means this single key receives disproportionately high traffic (e.g., a celebrity's profile getting 10,000+ req/second).
Even a brief moment of 10,000 concurrent misses on one key can overwhelm the database.

Cache Breakdown-specific solutions:

Logical expiry (never expire the Redis key): Store expiry time inside the value. Return stale + refresh asynchronously. Redis key has no TTL.
Mutex lock per key: Only one of the 10,000 concurrent requests fetches from DB.
Manually controlled refresh: Operations team updates hot keys before expiry via a separate refresh job.

Q29. What is the hot key problem? How do you solve it?

Answer:

The hot key problem occurs when a single cache key receives a disproportionate share of total traffic. Even with a Redis cluster that distributes traffic across nodes, all requests for a hot key hit the single node that owns that hash slot.

Example: A viral video's metadata key gets 500,000 requests/second. All 500K requests go to the one Redis node holding that key. That node becomes a CPU bottleneck, slowing all other keys on the same node.

Solutions:

1. Local L1 cache replication:
Replicate the hot key to every application instance's local cache (very short TTL: 2-10 seconds). Most traffic is absorbed locally without touching Redis.

2. Key sharding (replica reads):
Store the same value under multiple keys (N shards). Reads are randomly distributed across shards.

Write: SET key:shard:0, SET key:shard:1, ..., SET key:shard:9
Read:  GET key:shard:{random(0,9)}   # Distributed across 10 nodes

3. Read from replicas:
Configure Redis clients to route reads for the hot key to replicas (read-from: replica-preferred).

4. Identify hot keys (Redis hotkeys command):

redis-cli --hotkeys   # Requires maxmemory-policy != noeviction
redis-cli OBJECT FREQ key  # Access frequency for LFU mode

Section 6: HTTP and CDN Caching

Q30. Explain HTTP Cache-Control headers. What does "no-cache" actually mean?

Answer:

Common Cache-Control directives:

max-age=N: Cache the response for N seconds. Most common.

no-cache: Counter-intuitively, this does NOT mean "do not cache." It means: cache the response, but you MUST revalidate with the server before serving it. The cached copy can be used only if the server confirms it is still fresh (304 Not Modified).

no-store: Truly "do not cache." The response must not be stored anywhere - not browser, not CDN, not proxy. Every request goes to origin. Use for sensitive data (health records, financial data).

public: Response can be cached by any cache (browser, CDN, proxy).

private: Response can only be cached by the browser. CDNs and proxies must not cache it.

s-maxage=N: Like max-age but only for shared caches (CDN, proxy). Browser uses max-age. CDN uses s-maxage.

must-revalidate: Once expired, the cache MUST revalidate. Cannot serve stale data even in error conditions.

immutable: The response content will never change. Browser will not revalidate even on page reload. Use with hashed static assets.

stale-while-revalidate=N: Serve stale content for up to N seconds while simultaneously fetching fresh content in background (Refresh-Ahead at HTTP level).

Key interview point: no-cache vs no-store:

no-cache = "Cache it but always check if it is still fresh before using it"
no-store = "Never cache this response anywhere"

Q31. What is an ETag? How does it work?

Answer:

An ETag (Entity Tag) is a unique identifier for a specific version of a resource, generated by the server (typically a hash of the response body or a database version number).

Initial request flow:

Client: GET /products/123
Server: 200 OK
        ETag: "abc123def456"
        Cache-Control: no-cache
        Body: { product data... }
Client: Stores response + ETag in browser cache.

Subsequent request (conditional request):

Client: GET /products/123
        If-None-Match: "abc123def456"

Server checks: Has product 123 changed? -> No
Server: 304 Not Modified (no body, saves bandwidth)
Client: Use the cached copy.

Server checks: Has product 123 changed? -> Yes (new price)
Server: 200 OK
        ETag: "xyz789uvw012"   (new ETag)
        Body: { updated product data... }

Benefits:

Bandwidth savings: 304 responses have no body.
Precise versioning: Unlike Last-Modified (1-second granularity), ETags can detect sub-second changes.
Works across multiple servers: Different server instances can generate the same ETag for the same content.

ETags vs Last-Modified:
Both serve the same purpose (cache validation) but ETags are more reliable:

Last-Modified has 1-second resolution (misses fast writes)
Last-Modified can change even if content did not (e.g., re-save without changes)
ETags are content-based (hash) or version-based (DB version column)

Section 7: System Design

Q32. How would you design a caching layer for a high-traffic e-commerce product catalog?

Answer:

Requirements:

50 million products
100,000+ requests/second during peak
Products change prices every few hours, inventory changes every minute
Must handle sales events (10x traffic spikes)

Design:

1. Identify data access patterns:

Product details (name, description, images): Changes rarely. Cache 1-4 hours.
Product price: Changes during sales. Cache 2-5 minutes with shorter window during promotions.
Inventory count: Changes frequently. Cache 30-60 seconds maximum.

2. Tiered caching architecture:

Browser -> CDN (edge cache) -> App L1 (Caffeine) -> L2 (Redis) -> MySQL

CDN layer:

Product images, static assets: Cache-Control: public, max-age=86400, immutable
Product detail HTML (for static SSG pages): s-maxage=60

L1 - Caffeine (per instance):

Top 10,000 products by view count
TTL: 2 minutes (short for price changes, L2 has longer TTL)
Acts as buffer during Redis traffic spikes

L2 - Redis Cluster:

Product details: product:{id}:details -> TTL: 2 hours (event-driven invalidation for price)
Product price: product:{id}:price -> TTL: 2 minutes (randomized by +0 to +30 seconds)
Inventory: Not cached (always go to DB for inventory check at purchase time)
Redis Cluster: 3 masters x 3 replicas, ~100 GB total

3. Cache invalidation:

Price change: Write to DB -> publish "product.price.updated" event -> Kafka -> consumer evicts product:{id}:price from Redis + broadcasts L1 invalidation via Redis Pub/Sub.
Bulk update (sale starts): Schedule TTL refresh job to pre-update cache before sale event.

4. Handling traffic spikes (10x during flash sale):

Pre-warm cache: Load all sale items into cache 30 minutes before sale.
Increase L1 cache size on all instances before sale.
Auto-scaling triggers to add app instances (each with L1 cache).
Rate limiting at CDN for bots.

Q33. How would you design a session management system using caching?

Answer:

Requirements:

Millions of concurrent users
Sessions expire after 30 minutes of inactivity
Horizontally scaled application (many instances)
Session data includes: user ID, roles, preferences, cart ID (~2KB per session)

Design:

Storage: Redis with Sliding TTL

Key pattern: session:{session_id}
Value: JSON blob of session data
TTL: 1800 seconds (30 minutes), reset on every request (sliding TTL)

Implementation with Spring Session + Redis:

spring:
  session:
    store-type: redis
    timeout: 30m
    redis:
      flush-mode: on-save # Write to Redis only when session is modified
      namespace: spring:session

@GetMapping("/dashboard")
public String dashboard(HttpSession session) {
    // Spring Session automatically reads/writes to Redis
    // TTL is reset on every HttpSession interaction
    Long userId = (Long) session.getAttribute("userId");
    session.setAttribute("lastVisit", LocalDateTime.now()); // Updates session in Redis
    return "dashboard";
}

Redis data structure choice:

Use Hash for session data: allows updating individual fields without reading the entire session.
Or use String with serialized JSON: simpler, but requires full read/write for any update.

Security considerations:

Session ID must be cryptographically random (UUID v4 or SecureRandom, 128+ bits).
Use HTTPS always (session hijacking risk over HTTP).
Invalidate session on logout: session.invalidate() deletes from Redis.
Store session ID in HttpOnly, Secure, SameSite=Strict cookie.

Section 8: Tricky and Brain-Teaser Questions

Q34. Write-through ensures consistency, but what are its hidden performance trade-offs?

Answer:

Write-through guarantees that the cache and database are always in sync after a write. However, the hidden costs are:

Amplified write latency: Every write incurs BOTH cache write time AND database write time. Since they happen sequentially, total latency is approximately: cache_write_time + db_write_time.
Cache pollution: Write-through populates the cache with EVERY written value, even if it will never be read. A batch import of 1 million records (written once, read never) floods the cache, evicting hot read data.
Write amplification: Under high write load, cache capacity is consumed by writes while reads may be evicted. Cache can become less effective for read traffic despite the write-through guarantee.
Increased database load (counterintuitive): With write-through, you still write to the database on every write. Unlike write-behind, you get no database write offloading. Write-through only helps READ performance (data is in cache after write), not database write throughput.
Cold start after cache flush: If the cache is flushed, all write-through advantage is lost until every piece of data is re-written. Reads now go to the database until data is written again.

Bottom line: Write-through is excellent for systems where the same data is frequently written AND read, with no tolerance for stale reads. But for write-heavy systems where data is rarely re-read, write-through is wasteful.

Q35. You have a cache with LRU eviction. A nightly batch job accesses all database records. What problem does this cause and how do you fix it?

Answer:

The Problem - Cache Pollution / Scan Resistance:

The nightly batch job accesses every record in the database. Each access promotes that record to the head of the LRU list. As it scans through all records, it evicts all the hot web app data (frequently accessed records) from the cache, replacing them with rarely-accessed batch data.

After the batch job completes:

Cache is full of cold batch data (will never be re-accessed by the web app).
All hot web app data has been evicted.
The next morning, users experience very high cache miss rates until the cache slowly re-warms with hot data.
Database is hammered until cache re-warms (could take minutes to hours).

Solutions:

1. Use LFU instead of LRU:
LFU tracks access frequency, not just recency. The batch job touches each record once. Hot web app data has accumulated high frequency counts (hundreds or thousands of accesses). Batch data (frequency=1) is evicted first.

2. Configure Redis allkeys-lfu:

CONFIG SET maxmemory-policy allkeys-lfu

3. Use a separate Redis database or instance for batch processing.
Batch reads go to a different Redis/cache instance so they cannot pollute the web app's cache.

4. Use LRU-K (Promotion only after K accesses):
A key is only promoted in the LRU list after being accessed K times (e.g., K=2). A scan that touches each item once does not promote anything.

5. Separate cache pools:
Maintain a "hot pool" (LFU eviction, smaller) and a "cold pool" (LRU eviction, larger). Batch operations use the cold pool.

Q36. You are using Write-Behind caching and the cache server crashes before the async write completes. What happens?

Answer:

Data loss.

The write-behind cache acknowledged the write to the application as successful (the cache accepted the data). The application and the user believe the write is durable. But the database was never updated because the async write was still queued in memory when the crash occurred.

On restart:

The cache server is empty (in-memory data is lost).
The database does not have the write.
If the application retries the operation, it may succeed on retry.
If there is no retry, the data is permanently lost.

How to mitigate:

AOF persistence in Redis: If Redis is used as the write-behind buffer, enable AOF with appendfsync everysec or appendfsync always. On restart, Redis replays pending writes from the AOF log.
Use a durable message queue (Kafka) as the write-behind buffer: Writes are enqueued in Kafka (which persists to disk) before acknowledging. If the consumer (DB writer) crashes, it resumes from its last Kafka offset on restart.
Acknowledge only after durability guarantee: If you use Kafka, acknowledge the write only after Kafka persists it (not after just the cache stores it).
Use Write-Through instead if data loss is unacceptable. The extra write latency is the price of durability.
Design for idempotency: If writes are retried, they should be idempotent (same write applied twice = same result). This allows safe retry on crash recovery.

Key interview insight: Write-Behind is fundamentally incompatible with "no data loss" requirements. The trade-off is: lower write latency vs. risk of data loss on crash. Never use write-behind for financial transactions, user-visible records, or anything where loss would be noticed.

Q37. If you cache null values to prevent cache penetration, what new problem does this introduce?

Answer:

New problem: False "not found" responses for valid data.

Scenario:

User A requests resource "user:999" which does not exist.
Application caches NULL with TTL = 30 seconds.
Within 30 seconds, an admin CREATES user 999 in the database.
User B requests "user:999" - gets the cached NULL response even though user 999 now exists.
User B gets an incorrect "not found" response for 30 seconds.

Additional problems:

Memory usage: If attackers probe millions of non-existent IDs, you cache millions of null entries. This consumes Redis memory and may evict real data.
Operational confusion: Engineers investigating "user not found" errors may not realize it is a cached null, not an actual missing user.

Mitigations:

Short null TTL: Keep the null TTL very short (15-60 seconds). Acceptable for most scenarios. New data will be visible after at most the null TTL.

Explicit invalidation on creation: When a resource is created, explicitly delete the null cache entry:

public User createUser(CreateUserRequest request) {
    User user = userRepository.save(new User(request));
    cache.delete("user:" + user.getId()); // Remove any cached null
    return user;
}

Use Bloom filter instead of null caching: Bloom filter does not have the "stale null" problem for newly created resources (just add the new ID to the Bloom filter on creation).

Q38. Can you achieve strong consistency with a distributed cache? At what cost?

Answer:

Yes, but the cost is high. Strong consistency means every read reflects the most recent write. Achieving this in a distributed cache requires:

Option 1: Synchronous Write-Through + Synchronous Replication

Write to database AND all cache replicas synchronously before acknowledging the write.
All reads from any cache replica reflect the latest write.
Cost: Write latency = max(db_write_latency, slowest_replica_sync_latency). If one replica is slow or down, all writes stall. Availability is severely impacted. This violates "A" in CAP.

Option 2: Read-Your-Writes Consistency (Weaker but practical)

Not globally strong, but each user always sees their own writes.
Implementation: Route reads to the master (not replicas) for a brief window after a write, or use sticky sessions.
Cost: More reads hit the master (reduced read scaling benefit of replicas).

Option 3: Version-Based Consistency (Practical strong consistency for cache)

Each item has a version number. Cache entry is tagged with version.
On read: compare cache version with DB version. If mismatch: re-fetch.
Cost: One extra lightweight DB call per cache hit to check version. Often acceptable.

Option 4: Do not cache strongly consistent data

If strong consistency is required, bypass the cache entirely for that data.
Use the database as the single source of truth.
Example: Account balance during a financial transaction should never be served from cache.

Real-world recommendation:

Defaults: Eventual consistency with short TTLs. Suitable for 95% of caching scenarios.
User-visible consistency (read-your-writes): Route writes and immediate subsequent reads to master.
Mission-critical: No cache, go directly to DB with serializable isolation.

Q39. You have 10 application servers, each with a local in-memory cache. A user updates their profile. How do you ensure all 10 caches are invalidated?

Answer:

The challenge: Each of the 10 servers has its own private in-memory cache. Deleting from one server's cache does not affect the other 9. The user may hit a different server on their next request and see stale data.

Solutions:

1. Redis Pub/Sub Broadcast (Recommended for local cache invalidation):

// Publisher (the server that processed the update)
redisTemplate.convertAndSend("cache:invalidate:users", userId.toString());
 
// Subscriber (on ALL 10 servers)
@Component
public class CacheInvalidationListener implements MessageListener {
 
    @Autowired
    private Cache<Long, User> localCache;
 
    @Override
    public void onMessage(Message message, byte[] pattern) {
        Long userId = Long.parseLong(new String(message.getBody()));
        localCache.invalidate(userId);  // Remove from this server's local cache
    }
}
// Redis Pub/Sub delivers the message to all 10 subscribed servers

2. Short TTLs (Simplest):
Set L1 cache TTL to 30-60 seconds. Each server's stale data expires quickly. No explicit invalidation needed. Accept eventual consistency.

3. Only use distributed cache (no local cache):
Remove L1 local caches entirely. Only use Redis (L2). All 10 servers share one cache. Invalidating Redis invalidates for everyone. Simpler, but slightly slower (network call for every read).

4. Versioned keys:
Keep a global version counter in Redis: user:123:version = 42. Local cache key includes the version. On update, increment version. Old keys become orphaned.

Long version = redis.increment("user:" + userId + ":version");
String cacheKey = "user:" + userId + ":v" + version;

Trade-offs:

Pub/Sub: Near real-time but slightly complex. Redis Pub/Sub is at-most-once (message can be lost if subscriber is temporarily disconnected).
Short TTL: Simple, always works, but accepts staleness window.
Distributed-only: Correct, slightly slower.

Q40. Two services have different TTLs for the same cached data. Service A uses TTL=5 minutes, Service B uses TTL=60 minutes. What problems arise?

Answer:

Problem 1: Inconsistent data windows

User updates their email. Both services' caches are invalidated (assuming proper invalidation is set up).
Service A shows the new email within 5 minutes (worst case, if invalidation failed).
Service B shows the old email for up to 60 minutes (if its invalidation also fails).
If invalidation works correctly: no problem. But TTL is the fallback safety net, and Service B's fallback is weak.

Problem 2: Inconsistent user experience

User calls API from Service A: sees new email.
Same user calls API from Service B: sees old email.
User is confused. "Is my update saved or not?"

Problem 3: Event-driven invalidation complexity

If Service A publishes a "user.updated" event and Service B subscribes, this works.
But if Service B misses the event (message loss, service restart), it relies entirely on its 60-minute TTL fallback. A lot can happen in 60 minutes.

Problem 4: Cache Avalanche timing mismatch

If Service A and Service B cache was loaded at the same time, their avalanches will occur at different times, which is actually good for the database. But if the difference is not intentional and someone changes Service B's TTL to match Service A's, sudden coordinated expiry becomes an avalanche risk.

Recommendations:

Agree on a TTL standard for shared data. Document it.
Shorter TTL is safer (Service B should use 5 minutes, not 60).
Invest in proper event-driven invalidation so TTL is a safety net, not the primary mechanism.
Use a shared cache (both services read/write the same Redis key) so TTL is consistent.

Q41. Your application caches user sessions in Redis. How do you handle a forced logout of all sessions for a specific user?

Answer:

The problem: User has multiple active sessions (mobile, desktop, tablet). Security incident requires immediate logout of all sessions. How do you find and delete all session keys for one user?

Approach 1: Key pattern with session scanning (risky)

Key pattern: session:{userId}:{sessionId}

// Find all sessions for user 123
Set<String> keys = redis.keys("session:123:*");  // NEVER use KEYS in production!
redis.delete(keys);

KEYS is O(N) and blocks Redis. Do NOT use in production.

Approach 2: Maintain a user-sessions index (Recommended)

// On session create:
redis.sadd("user:sessions:" + userId, sessionId);  // Track session IDs per user
redis.set("session:" + sessionId, sessionData, TTL_30_MIN);
 
// On forced logout of all sessions for userId:
Set<String> sessionIds = redis.smembers("user:sessions:" + userId);
for (String sessionId : sessionIds) {
    redis.delete("session:" + sessionId);
}
redis.delete("user:sessions:" + userId);

Approach 3: Token version (simplest at scale)

// Store a user token version in Redis
redis.set("user:token:version:" + userId, "1");
 
// On session validation, check token version matches
String sessionVersion = session.getTokenVersion();
String currentVersion = redis.get("user:token:version:" + userId);
if (!sessionVersion.equals(currentVersion)) {
    throw new SessionExpiredException("All sessions invalidated");
}
 
// Forced logout: increment version
redis.increment("user:token:version:" + userId);
// All existing sessions now have stale version. They are effectively invalidated on next use.

This approach requires no key scanning and no index maintenance. Just one counter per user.

Section 9: Spring Boot and Java Caching

Q42. What is the difference between @Cacheable, @CachePut, and @CacheEvict?

Answer:

@Cacheable:

Used for READ operations (Cache-Aside pattern).
If data is in cache: return cached data. Method is NOT executed.
If data is NOT in cache: execute method, store result in cache, return result.
Use for: getUser(id), getProduct(id), searchProducts(query)

@CachePut:

Used for WRITE operations (Write-Through pattern).
ALWAYS executes the method.
After execution: updates the cache with the returned result.
Method always runs AND cache is always updated.
Use for: updateUser(user), createProduct(product)

@CacheEvict:

Used for DELETE/invalidation.
Removes the specified entry from cache.
allEntries=true: clears the entire cache (use with caution).
beforeInvocation=true: evicts BEFORE method runs (even if method throws).
Use for: deleteUser(id), clearProductCache()

@Service
public class UserService {
 
    @Cacheable(value="users", key="#id")              // Cache-Aside read
    public User getUser(Long id) { ... }
 
    @CachePut(value="users", key="#result.id")        // Write-Through update
    public User updateUser(User user) { ... }
 
    @CacheEvict(value="users", key="#id")             // Invalidate on delete
    public void deleteUser(Long id) { ... }
 
    @CacheEvict(value="users", allEntries=true)       // Clear entire cache
    public void reloadAllUsers() { ... }
}

Key difference summary:

@Cacheable: Read. May skip method.
@CachePut: Write. Always executes method.
@CacheEvict: Delete. Removes from cache.

Q43. What is the N+1 problem in JPA and how does caching help (and not help)?

Answer:

The N+1 Problem:
You execute 1 query to fetch N parent entities (e.g., N orders). Then for each order, you execute 1 more query to fetch the child entities (e.g., order items). Total: 1 + N queries.

List<Order> orders = orderRepository.findAll();  // Query 1: fetch 100 orders
for (Order order : orders) {
    // Query 2 through 101: each access triggers a separate query for items
    order.getItems().size();  // Lazy-loaded collection
}
// 101 database queries for what could be 1 JOIN query

How caching helps:
Hibernate's Second-Level Cache (L2C) caches entity state. If Order and OrderItem are L2C-cached, subsequent accesses to the same orders within a configurable time window hit the cache instead of the database.

How caching does NOT help N+1:
On the first load (cold cache), N+1 still occurs. Caching prevents the N+1 on REPEATED access of the same entities, not on first load.

Proper solution: Fix the query to use JOIN FETCH or EntityGraph:

@Query("SELECT o FROM Order o JOIN FETCH o.items WHERE o.userId = :userId")
List<Order> findOrdersWithItems(@Param("userId") Long userId);  // 1 query, not N+1

When to cache to mitigate N+1:
When the query cannot be easily modified (legacy code, third-party framework), cache the aggregate result (the fully populated DTO) to avoid repeated N+1 hits:

@Cacheable("user-orders-with-items")
public List<OrderDto> getUserOrdersWithItems(Long userId) {
    // This method triggers N+1 once per cache miss, but cached result is returned for N minutes
    return orderRepository.findAll()... // N+1 happens here
    // DTO with items already populated, serialized and cached
}

Section 10: Must-Know Deep-Dive Questions

Q44. What is "two-phase invalidation" and when do you need it?

Answer:

Two-phase invalidation is a pattern for safely invalidating cache entries in a distributed system where multiple application instances may be populating the same key simultaneously.

The problem it solves:

Instance 1:                         Instance 2:
T1: Reads old value from DB
T2:                                 Updates DB (new value)
T3:                                 Deletes cache key (phase 1)
T4: Writes OLD value to cache       <- Stale data just got cached!
T5:                                 Tries to delete cache again - gone (normal)

Two-Phase Invalidation:

Phase 1 - Mark as invalid (do not delete yet):
When a write occurs, mark the cache entry as "dirty" rather than deleting it immediately.

Phase 2 - Delete after a safe delay:
After a brief window (a few hundred milliseconds, enough for in-flight reads to complete), delete the marked entry.

public void updateProduct(Long productId, Product newData) {
    // Write to database
    productRepository.save(newData);
 
    // Phase 1: Mark cache entry as invalid
    redisTemplate.opsForValue().set("dirty:product:" + productId, "1",
                                     Duration.ofSeconds(1));
 
    // Phase 2: Delete after delay (background task)
    scheduler.schedule(() -> {
        redisTemplate.delete("product:" + productId);
        redisTemplate.delete("dirty:product:" + productId);
    }, 500, TimeUnit.MILLISECONDS);  // 500ms delay
}
 
public Product getProduct(Long productId) {
    // Check if marked dirty - if so, skip cache and go to DB
    if (redisTemplate.hasKey("dirty:product:" + productId)) {
        return productRepository.findById(productId).orElseThrow();
    }
    // Normal cache-aside logic
    ...
}

When to use: High-concurrency systems where the race condition between read and write paths is observed (stale data persisting after writes). Usually only needed in systems with millions of operations per second where the race window is not negligible.

Q45. What is the difference between cache coherence and cache consistency?

Answer:

These terms are used interchangeably in casual conversation but have precise technical meanings:

Cache Coherence:
Ensures that when multiple caches hold a copy of the same data, all copies are consistent. It is about WHO has what version of the data across multiple cache instances.

"If I write to data X on Node A, will Node B's cache reflect that write?"

CPU caches: Hardware-enforced coherence protocols (MESI: Modified, Exclusive, Shared, Invalid).
Application distributed caches: Requires explicit invalidation. There is no automatic hardware coherence.

Cache Consistency (Data Consistency):
Ensures that the cache always accurately reflects the state of the authoritative data source (database). It is about whether the cached data matches the ground truth.

"Is my cached value the same as what's in the database right now?"

Practical difference:

Coherence problem:

Server A and Server B both cache user:123.
Server A updates user:123 (marks its cache invalid).
Server B still has the old user:123 in its local cache.
Servers A and B are INCOHERENT about user:123.

Consistency problem:

Cache has user:123 with email "old@example.com".
Database has user:123 with email "new@example.com".
Cache is INCONSISTENT with the database.

In practice, solving coherence (all caches agree) automatically solves consistency (cache agrees with DB). The tools are the same: invalidation events, short TTLs, distributed pub/sub.

Q46. How do you monitor a production cache to detect degradation?

Answer:

Effective cache monitoring requires metrics at multiple levels.

Tier 1 - Business Impact Metrics:

Application response time (p99): The first sign of cache degradation is response time increase. Monitor via APM (New Relic, Datadog).
Error rate: Cache-related errors (connection failures, timeouts) often manifest as 5xx errors.

Tier 2 - Cache Health Metrics:

Hit rate: Target > 95%. Alert if drops below threshold.

redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses"

Eviction rate: Increasing evictions = cache is too small.
```
redis-cli INFO stats | grep evicted_keys
```
Memory utilization: Alert at 80% to give time to act before OOM.
```
redis-cli INFO memory | grep used_memory_human
```
Key expiration rate: Sudden spike = potential avalanche incoming.

Tier 3 - Connection Health:

Connected clients: Near the maxclients limit = connection pool exhaustion.
Blocked clients (BLPOP, BRPOP): High count = queue backup.
Rejected connections: Redis rejecting due to maxclients.

Tier 4 - Latency:

Command latency: redis-cli LATENCY HISTORY and LATENCY LATEST.
Slow log: redis-cli SLOWLOG GET 10 shows commands taking > slowlog threshold (default: 10ms).

Alerting thresholds (example):

Metric                    Warning     Critical
------------------        -------     --------
Cache Hit Rate            < 90%       < 80%
Memory Utilization        > 75%       > 90%
Eviction Rate             > 100/min   > 1000/min
P99 Cache Latency         > 5ms       > 20ms
Connected Clients         > 80%       > 95% of maxclients
Replication Lag           > 1 second  > 5 seconds

Q47. Explain cache coherence in CPU multi-core architectures. What is the MESI protocol?

Answer:

In a multi-core processor, each core has its own L1 and L2 caches, all backed by shared L3 and main memory. When multiple cores cache the same memory address, they must stay coherent (consistent with each other).

The Problem:

Core 1 reads variable X = 5 into its L1 cache.
Core 2 reads variable X = 5 into its L2 cache.
Core 1 writes X = 10 to its L1 cache.
Core 2 still has X = 5 in its cache.
Core 2 reads X = 5 (stale).

MESI Protocol (hardware solution):
Each cache line exists in one of four states:

M (Modified): This cache has the only up-to-date copy. Main memory is stale. Must write back to memory before another core can read it.
E (Exclusive): This cache has the only copy. Main memory is up-to-date. Can transition to M on write without notification.
S (Shared): This cache line is cached by multiple cores. All copies match main memory. Read only. Must inform other cores before writing.
I (Invalid): This cache line is stale. Must be re-fetched before use.

How it prevents stale reads:
When Core 1 writes to a Shared cache line, it sends an "Invalidate" message to all other cores. Those cores transition their copy to "Invalid." On their next access, they fetch the updated value from the bus (from Core 1's Modified copy or from memory).

Application-level analogy:
MESI is effectively a hardware-implemented version of what we manually implement in distributed application caches (publish invalidation events when data changes). The cache bus is like Redis Pub/Sub. The "Invalidate" message is like a cache eviction event.

Q48. A critical production incident: your cache hit rate dropped from 97% to 40% suddenly. Walk me through your investigation.

Answer:

This is a structured incident response question. A methodical approach:

Step 1: Establish baseline and impact

When did the drop start? (Check monitoring timeline)
What changed at that time? (Deployments, config changes, traffic pattern changes, scheduled jobs)
What is the business impact? (Response time spike, error rate, DB load)

Step 2: Rule out infrastructure issues

redis-cli ping                    # Is Redis responding?
redis-cli INFO replication        # Is there a failover? New master?
redis-cli INFO memory             # Is Redis OOM? Evicting excessively?
redis-cli INFO stats | grep evicted_keys  # High eviction?
redis-cli SLOWLOG GET 10          # Any slow commands blocking?

Step 3: Examine the keys

redis-cli DBSIZE                  # How many keys? Normal?
redis-cli INFO keyspace           # Key distribution by DB
redis-cli DEBUG SLEEP 0           # Check response latency

Step 4: Check for specific root causes

Scenario A: Redis was restarted or flushed

All data gone. Cold cache. Hit rate starts from 0%.
Check Redis logs for RESTART or FLUSHDB/FLUSHALL commands in the audit log.
Resolution: Allow cache to warm. Investigate who/what triggered the flush.

Scenario B: Code deployment changed cache keys

New code uses product:v2:{id} but old cache has product:{id}. All misses.
Check: git diff for cache key format changes.
Resolution: Update key format and warm new keys.

Scenario C: TTL changed (too short)

New config deployed with TTL = 30 seconds instead of 5 minutes. All entries expire rapidly.
Resolution: Revert TTL configuration.

Scenario D: Traffic pattern change

A new feature is generating requests for uncached data (new types of keys being requested).
Check: Are there new key patterns in Redis? What are the top misses?
Resolution: Cache the new data types.

Scenario E: Working set outgrew cache size

New data being added, cache too small. Evicting hot data constantly.
Check: High eviction rate in Redis INFO stats.
Resolution: Increase maxmemory, scale Redis cluster.

Step 5: Mitigation while investigating

If DB is under strain: Enable circuit breakers, rate limiting, shed non-critical traffic.
Add temporary fallback responses for non-critical data.

Quick Reference: Common Interview Anti-Patterns to Avoid

Anti-pattern 1: "Just add Redis"
Bad answer: "I would add Redis to cache everything."
Good answer: "I would profile the read/write pattern, identify hot queries, choose the right eviction policy and TTL strategy, implement Cache-Aside with Write-Invalidate, and add monitoring."

Anti-pattern 2: Forgetting cache failure mode
Always mention: "If the cache is down, the application falls back to the database (for Cache-Aside) or fails gracefully."

Anti-pattern 3: Ignoring consistency implications
Always ask: "What is the acceptable staleness window for this data?"

Anti-pattern 4: Cache for everything
Not every piece of data should be cached. Mention the framework: frequently read, rarely written, can tolerate some staleness, fetching is expensive.

Anti-pattern 5: Using Java serialization
In Java caching discussions, always mention: "I would use Jackson JSON serialization with FAIL_ON_UNKNOWN_PROPERTIES=false to handle schema evolution safely."

Final Summary: The 10 Things You Must Know Cold

Cache Hit Ratio: Formula, good ranges, and what causes it to drop.
LRU vs LFU: When to use each, implementation complexity, drawbacks.
Cache-Aside vs Write-Through: Core trade-offs for read and write paths.
Cache Stampede: What it is, mutex lock solution, Refresh-Ahead solution.
Cache Penetration: Bloom filter and null caching solutions.
Cache Avalanche: Randomized TTL prevention, circuit breaker protection.
Redis Sentinel vs Cluster: Sentinel = HA for single shard. Cluster = sharding + HA.
Consistent Hashing: Why naive hashing fails, virtual nodes, O(1/N) key remapping.
HTTP Cache-Control: no-cache (revalidate) vs no-store (never cache). max-age vs s-maxage.
Distributed Cache Invalidation: Event-driven via Kafka/Pub/Sub is most robust. TTL is the safety net.

Previous: Part 4 - Pitfalls and Solutions
Start from the beginning: Part 1 - Fundamentals

Series: Caching Demystified

Caching Demystified - Part 5: Interview Questions and Answers

Table of Contents

Section 1: Core Fundamentals

Q1. What is caching? Why do we use it?

Q2. What is a cache hit and a cache miss? What is the cache hit ratio?

Q3. What is TTL in caching? What is the difference between absolute TTL and sliding TTL?

Q4. What are the common cache eviction policies? Explain LRU.

Q5. What is cache invalidation? Why is it considered hard?

Q6. What is the difference between a cache and a database?

Q7. What is the difference between an in-process cache and a distributed cache?

Q8. What are the main caching strategies?

Q9. What is a distributed cache? Give examples.

Q10. How do you implement caching in Spring Boot?

Q11. What is the difference between Redis and Memcached?

Q12. How does Redis persist data?

Section 2: Caching Patterns

Q13. Compare Cache-Aside and Read-Through. When would you use each?

Q14. Compare Write-Through and Write-Behind caching.

Q15. What is consistent hashing? Why is it needed for distributed caches?

Q16. What is multi-level caching? Give a real-world example.

Section 3: Redis Deep Dive

Q17. What Redis data structures would you use for the following use cases?

Q18. How does Redis expire keys?

Q19. What is Redis Sentinel vs Redis Cluster?

Q20. What is Redis pipelining? Why is it useful?

Q21. What is a Redis distributed lock? How do you implement it?

Section 4: Distributed Systems and Scale

Q22. How do you handle cache invalidation in a microservices architecture?

Q23. Explain the CAP theorem and how it applies to distributed caching.

Q24. How would you design a rate limiter using Redis?

Section 5: Cache Pathologies and Problem Solving

Q25. What is a cache stampede (thundering herd)? How do you prevent it?

Q26. What is cache penetration? How do you solve it?

Q27. What is a cache avalanche? How is it different from cache stampede?

Q28. What is the difference between cache stampede and cache breakdown?

Q29. What is the hot key problem? How do you solve it?

Section 6: HTTP and CDN Caching

Q30. Explain HTTP Cache-Control headers. What does "no-cache" actually mean?

Q31. What is an ETag? How does it work?

Section 7: System Design

Q32. How would you design a caching layer for a high-traffic e-commerce product catalog?

Q33. How would you design a session management system using caching?

Section 8: Tricky and Brain-Teaser Questions

Q34. Write-through ensures consistency, but what are its hidden performance trade-offs?

Q35. You have a cache with LRU eviction. A nightly batch job accesses all database records. What problem does this cause and how do you fix it?

Q36. You are using Write-Behind caching and the cache server crashes before the async write completes. What happens?

Q37. If you cache null values to prevent cache penetration, what new problem does this introduce?

Q38. Can you achieve strong consistency with a distributed cache? At what cost?

Q39. You have 10 application servers, each with a local in-memory cache. A user updates their profile. How do you ensure all 10 caches are invalidated?

Q40. Two services have different TTLs for the same cached data. Service A uses TTL=5 minutes, Service B uses TTL=60 minutes. What problems arise?

Q41. Your application caches user sessions in Redis. How do you handle a forced logout of all sessions for a specific user?

Section 9: Spring Boot and Java Caching

Q42. What is the difference between @Cacheable, @CachePut, and @CacheEvict?

Q43. What is the N+1 problem in JPA and how does caching help (and not help)?

Section 10: Must-Know Deep-Dive Questions

Q44. What is "two-phase invalidation" and when do you need it?

Q45. What is the difference between cache coherence and cache consistency?

Q46. How do you monitor a production cache to detect degradation?

Q47. Explain cache coherence in CPU multi-core architectures. What is the MESI protocol?

Q48. A critical production incident: your cache hit rate dropped from 97% to 40% suddenly. Walk me through your investigation.

Quick Reference: Common Interview Anti-Patterns to Avoid

Final Summary: The 10 Things You Must Know Cold