← Back to Articles
6/6/2026Admin Post

consistency models part2 models

Consistency Models - Part 2: All Models Deep Dive

Navigation: Index | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6


Table of Contents

  1. Linearizability (Strong Consistency)
  2. Serializability
  3. Strict Serializability
  4. Sequential Consistency
  5. Causal Consistency
  6. Eventual Consistency
  7. Strong Eventual Consistency and CRDTs
  8. Read-Your-Writes Consistency
  9. Monotonic Reads
  10. Monotonic Writes
  11. Consistent Prefix Reads
  12. Bounded Staleness
  13. Session Consistency
  14. Transaction Isolation Levels (MySQL Deep Dive)
  15. Multi-Version Concurrency Control (MVCC)
  16. Quorum-Based Consistency
  17. Vector Clocks and Causal Ordering
  18. Conflict Resolution Strategies
  19. Consistency Model Comparison Matrix

1. Linearizability (Strong Consistency)

Definition

Linearizability is the strongest consistency model. It guarantees that:

  1. Every operation appears to execute instantaneously at some point between its invocation and its completion
  2. All operations are globally ordered -- every client in the system sees the same single, consistent view of history
  3. Operations respect real-time ordering -- if operation A completes before operation B begins, then A appears before B in the global order

The Analogy

Think of a bank teller at a physical branch. There is one teller (the single source of truth). Every transaction happens at a specific moment in real time. If you deposit 500at10:00AM,anyonewhochecksthebalanceafter10:00AMseesthe500 at 10:00 AM, anyone who checks the balance after 10:00 AM sees the 500. There is no ambiguity about ordering. All clients share the same reality.

Formal Intuition

In a linearizable system, you can draw a timeline where every operation has a single point (its "linearization point") where it takes effect. All operations are consistent with this timeline.

Client A: [WRITE x=5 |-----------| done]
Client B:                   [READ x -------> 5]    (sees the write)
Client C:              [READ x --> 5 or old]        (may see either, but consistent forever after)

Once Client C reads x=5, any subsequent read by any client must also return 5 (or a newer value). No "going back" in time.

When to Use

  • Distributed locks (a lock must be exclusive -- no two clients can hold it)
  • Leader election (only one node should be leader at a time)
  • Unique ID generation
  • Counter operations where correctness is critical
  • Configuration management (ZooKeeper, etcd)
  • Sequence numbers for ordered operations

Real-World Systems

SystemLinearizable?
ZooKeeperYes -- all writes go through a single leader
etcdYes -- Raft consensus provides linearizability
MySQL (single node, SERIALIZABLE)Yes -- for a single node
MySQL with synchronous replicationYes -- if read from primary only
Google SpannerYes -- uses TrueTime
DynamoDB with strong readsYes -- within a single partition
Redis (single master)Yes -- for a single master
Redis ClusterNo -- different slots on different masters

Cost of Linearizability

  • Latency: Every operation must coordinate with all replicas
  • Throughput: Serialization of operations limits parallelism
  • Availability: If the coordinator or a quorum is unavailable, operations block

2. Serializability

Definition

Serializability is a property of transactions (not individual operations). A concurrent execution of transactions is serializable if the result is equivalent to some serial (one-at-a-time) execution of those transactions.

Key difference from linearizability: Serializability does not require real-time ordering. The equivalent serial order may not match the real-time order in which transactions ran.

Example

Three transactions T1, T2, T3 ran concurrently. The actual execution was interleaved. The result is serializable if you can find some ordering T1 -> T2 -> T3 (or T2 -> T3 -> T1, etc.) that would produce the same final state.

Why It Matters

Serializability prevents all concurrency anomalies:

  • Dirty reads
  • Non-repeatable reads
  • Phantom reads
  • Lost updates
  • Write skew

MySQL achieves serializability at ISOLATION LEVEL SERIALIZABLE.

In Spring Boot

@Transactional(isolation = Isolation.SERIALIZABLE)
public void criticalOperation() {
    // This transaction is fully serializable
    // Slowest, safest isolation level
    // Uses range locks to prevent phantom reads
}

3. Strict Serializability

Definition

Strict Serializability = Linearizability + Serializability

It requires both:

  1. Transactions are equivalent to some serial execution (serializability)
  2. That serial execution respects real-time ordering (linearizability)

This is the gold standard of consistency but extremely expensive to achieve.

Used by: Google Spanner, CockroachDB


4. Sequential Consistency

Definition

Sequential consistency is weaker than linearizability but still quite strong:

  1. All operations appear to execute in some sequential order
  2. The operations from each individual client appear in the order they were issued by that client
  3. But the global order need not respect real-time -- the sequential order can differ from wall-clock time

The Analogy

Imagine a message board in an office. Each person posts messages in the order they write them. But you do not know the exact time each message was posted. The board shows a sequence of messages that is consistent with each person's posting order, but you cannot determine who posted first based on real time.

Difference from Linearizability

Linearizability: If A completed before B started, A must appear before B in global order.
Sequential:      Each client's operations are in order, but different clients' orders
                 can be interleaved arbitrarily -- real-time not required.

Example

Client 1: WRITE x=1, then WRITE x=2
Client 2: WRITE y=1, then WRITE y=2

Sequential consistent orderings (all valid):
  x=1, x=2, y=1, y=2
  y=1, y=2, x=1, x=2
  x=1, y=1, x=2, y=2
  x=1, y=1, y=2, x=2

NOT valid (violates Client 1's order):
  x=2, x=1, y=1, y=2  <-- Client 1's writes are reversed

Used By

CPU memory models (multiprocessor systems), some distributed systems research.


5. Causal Consistency

Definition

Causal consistency tracks causal relationships between operations. Operations that are causally related must be seen in causal order. Operations that are concurrent (not causally related) can be seen in any order.

Operation B is causally dependent on Operation A if:

  • B reads a value written by A
  • B and A are operations from the same client (program order)
  • B is causally dependent on some C that depends on A (transitivity)

The Analogy

Think of a conversation thread in Slack. If someone posts a message and someone replies to it, the reply must always appear after the original message -- because the reply is causally dependent on it. But two independent messages sent at the same time can appear in any order.

Why Causal Consistency Matters

Consider a social media example:

User A posts: "I'm going to delete my account"
User B replies: "Don't do it!"
User C reads the thread

Without causal consistency:
  User C might see: "Don't do it!" (reply)
                    ... no original post yet ...

With causal consistency:
  User C always sees the original post BEFORE the reply

Implementation: Vector Clocks

Causal consistency is typically implemented using vector clocks or hybrid logical clocks (HLC). See Section 17 for details.

Systems Using Causal Consistency

  • MongoDB (causal sessions)
  • Cassandra (lightweight transactions with lightweight consistency)
  • COPS (research system)
  • Some configurations of DynamoDB Streams

6. Eventual Consistency

Definition

Eventual consistency is the weakest useful consistency model:

If no new updates are made to a data item, all replicas will eventually converge to the same value.

No guarantee about:

  • When convergence happens (could be milliseconds, seconds, or longer)
  • What value is returned during the convergence window
  • Whether reads within the convergence window are "fresh"

The Analogy

Think of DNS propagation. When you update a DNS record (e.g., point your domain to a new IP), the change propagates across DNS servers worldwide. During propagation (which can take minutes to hours), different users resolve the domain to different IPs. Eventually, all DNS servers have the updated record and all users get the same IP.

What "Eventually" Means in Practice

In most well-engineered systems, "eventually" is:

  • Within the same data center: milliseconds to tens of milliseconds
  • Across availability zones: tens to hundreds of milliseconds
  • Across regions (global tables): hundreds of milliseconds to seconds

When Eventual Consistency Is Acceptable

  • Product catalogs (price showing 9.99vs9.99 vs 10.00 for a second is fine)
  • Social media timelines (post appears 1 second later is fine)
  • Recommendation engines (stale recommendations are fine)
  • Analytics dashboards (near-real-time is acceptable)
  • Configuration flags with low urgency (feature flags propagating over seconds)
  • User preferences that are not security-critical

When Eventual Consistency Is NOT Acceptable

  • Financial balances (you cannot read 0whenthebalanceis0 when the balance is 1000)
  • Inventory counts (cannot sell items you don't have)
  • Authentication tokens (must be valid or invalid immediately)
  • Distributed locks (two clients cannot both believe they hold the lock)
  • Unique constraint enforcement (two users cannot get the same username)

Conflict Resolution in Eventual Consistency

When replicas accept concurrent writes to the same key, conflicts arise. Resolution strategies:

StrategyHow It WorksRisk
Last-Write-Wins (LWW)Higher timestamp winsData loss if clocks are skewed
Multi-Value (siblings)Return all conflicting versionsApplication must resolve
Application-LevelApplication merges conflictsMost flexible, most work
CRDTsMath-based auto-mergeLimited to specific data types

7. Strong Eventual Consistency and CRDTs

Definition

Strong Eventual Consistency (SEC) adds one guarantee to eventual consistency:

All replicas that have received the same set of updates are in the same state, regardless of the order in which they received updates.

With SEC, there are no conflicts because the data structure is designed to commute -- the result is the same regardless of operation order.

Conflict-Free Replicated Data Types (CRDTs)

CRDTs are data structures mathematically designed to merge concurrent updates without conflict. They achieve SEC.

Types of CRDTs

G-Counter (Grow-only Counter)

Each node maintains its own counter. The global counter is the sum of all node counters. Increments are always safe -- no conflicts possible.

// Simplified G-Counter CRDT
public class GCounter {
    private final String nodeId;
    private final Map<String, Long> counts = new ConcurrentHashMap<>();
 
    public GCounter(String nodeId) {
        this.nodeId = nodeId;
        this.counts.put(nodeId, 0L);
    }
 
    public void increment() {
        counts.merge(nodeId, 1L, Long::sum);
    }
 
    public long value() {
        return counts.values().stream().mapToLong(Long::longValue).sum();
    }
 
    // Merge another replica's state -- always safe, commutative, idempotent
    public void merge(GCounter other) {
        other.counts.forEach((node, count) ->
            counts.merge(node, count, Math::max)
        );
    }
}

PN-Counter (Positive-Negative Counter)

Two G-Counters: one for increments (P) and one for decrements (N). Value = P - N. Used for distributed counters that can go up and down.

public class PNCounter {
    private final GCounter positive;
    private final GCounter negative;
 
    public PNCounter(String nodeId) {
        this.positive = new GCounter(nodeId);
        this.negative = new GCounter(nodeId);
    }
 
    public void increment() { positive.increment(); }
    public void decrement() { negative.increment(); }
    public long value() { return positive.value() - negative.value(); }
 
    public void merge(PNCounter other) {
        positive.merge(other.positive);
        negative.merge(other.negative);
    }
}

LWW-Register (Last-Write-Wins Register)

Stores a single value with a timestamp. Merge always picks the value with the higher timestamp. Simple but risks data loss with clock skew.

OR-Set (Observed-Remove Set)

A set that handles concurrent add/remove without conflicts. Each add gets a unique tag. Remove removes specific tagged elements. If you add then remove and add concurrently, the element stays.

CRDTs in Production

SystemCRDT Usage
RedisHyperLogLog (approximate counting CRDT)
RiakFull CRDT support
CassandraCounters (PN-Counter like behavior)
SoundCloudConflict resolution in distributed systems
Akka Distributed DataFull CRDT library

8. Read-Your-Writes Consistency

Definition

After a client performs a write, any subsequent read by that same client will always see the written value. Other clients may or may not see it immediately.

This is a session-level guarantee, not a global one.

Why It Matters

Imagine a user changes their profile photo. They hit "save" and are immediately shown their profile. If the read goes to a stale replica, they see their old photo and think the save failed. They hit save again. Frustrating.

With read-your-writes:

  • The user always sees their own changes immediately
  • Other users might see the old photo for a second -- acceptable
  • The user experience is consistent

Implementation Strategies

Strategy 1: Route Writes and Reads to Primary

Simplest solution: after a write, always read from the primary (not the replica).

@Service
public class UserService {
 
    @Transactional  // routes to primary (write datasource)
    public void updateProfile(Long userId, ProfileUpdateRequest request) {
        User user = userRepository.findById(userId).orElseThrow();
        user.updateFrom(request);
        userRepository.save(user);
    }
 
    @Transactional  // also routes to primary for read-your-writes
    public UserProfile getProfile(Long userId) {
        return userRepository.findById(userId)
            .map(UserProfile::from)
            .orElseThrow();
    }
}

Strategy 2: Track Last Write Timestamp

After a write, store the timestamp in the user's session. For subsequent reads, if the replica's replication position is behind the timestamp, route to primary.

@Service
public class ProfileService {
 
    @Autowired
    private SessionContext sessionContext;
 
    @Transactional
    public void updateProfile(Long userId, ProfileUpdateRequest request) {
        userRepository.save(/* ... */);
        // Record that this session has a pending read-your-writes guarantee
        sessionContext.setLastWriteTimestamp(Instant.now());
    }
 
    public UserProfile getProfile(Long userId) {
        // If we wrote recently, read from primary
        if (sessionContext.hasRecentWrite(Duration.ofSeconds(5))) {
            return readFromPrimary(userId);
        }
        return readFromReplica(userId);
    }
}

Strategy 3: Sticky Sessions

Route all requests from the same user session to the same replica. That replica will have the user's writes. This is a simpler but less precise approach.


9. Monotonic Reads

Definition

If a client reads a value of X, any subsequent read of X by that same client will return the same value or a more recent value. The client will never read an older value than what it has already seen.

The Problem Without Monotonic Reads

t=1: Replica A has: x = 5 (latest)
t=2: Replica B has: x = 3 (lagging)

Client reads x from Replica A --> gets 5
Client reads x from Replica B --> gets 3  (GOES BACKWARDS!)

This is deeply confusing to users. Imagine a bank balance going from 1000(afterdeposit)backto1000 (after deposit) back to 500 (before deposit). Users would think the deposit was reversed.

Implementation

The common approach is sticky reads -- route a client's reads to the same replica consistently. Since replicas are monotonically applying the replication log, once the client has seen a value, they will never see an older one from the same replica.

In AWS Aurora, this is achieved by reading from a specific reader endpoint or by using the writer endpoint for critical reads.


10. Monotonic Writes

Definition

Writes from the same client are applied to all replicas in the order they were issued. If client writes W1 then W2, then W1 must be reflected on any replica before W2.

Why It Matters

Client writes:
  W1: INSERT INTO orders (id, status) VALUES (100, 'PENDING');
  W2: UPDATE orders SET status = 'APPROVED' WHERE id = 100;

Without monotonic writes:
  W2 might arrive at Replica before W1
  --> UPDATE on a non-existent row does nothing
  --> Row is eventually inserted with status 'PENDING' (wrong!)

With monotonic writes:
  W1 always arrives and applies before W2

MySQL Replication and Monotonic Writes

MySQL replication is inherently monotonic writes within a single thread (single client). The binary log is sequential. However, with multi-threaded replication (parallel apply), writes from the same client can be applied out of order if not configured carefully.

Configuration:

-- Ensure writes from the same client are applied in order
SET GLOBAL slave_preserve_commit_order = ON;  -- MySQL 5.7+
-- In MySQL 8.0:
SET GLOBAL replica_preserve_commit_order = ON;

11. Consistent Prefix Reads

Definition

If a sequence of writes happened in a specific order (W1, W2, W3), a reader will never see a state that violates the order -- they might see (W1), or (W1, W2), or (W1, W2, W3), but never (W2 without W1) or (W3 without W2).

The Analogy

Think of reading a comic strip. You might only see the first three panels, but you never see panel 5 before panel 4. The story always makes sense up to the point you've read.

Why It Matters

A user posts:  "I'm going to order pizza"
A user posts:  "Actually, I ordered sushi"

Consistent prefix read scenarios (valid):
  Reader sees: Nothing yet
  Reader sees: "I'm going to order pizza"
  Reader sees: Both posts in order

Inconsistent prefix (violates this model):
  Reader sees: "Actually, I ordered sushi" -- without the first post
  (This is confusing without the context)

In MySQL

Consistent prefix reads are guaranteed within a single transaction using REPEATABLE READ isolation (MVCC snapshot). Within the same transaction, you always see a consistent snapshot -- never a partial write.


12. Bounded Staleness

Definition

Reads are guaranteed to be no more than K versions behind or T time units behind the most recent write. A weaker form of strong consistency with a defined staleness window.

Practical Value

Instead of "reads may be arbitrarily stale" (eventual consistency), bounded staleness says "reads are at most 5 seconds stale." This is much more predictable and allows for SLA-based guarantees.

AWS DynamoDB and Bounded Staleness

DynamoDB does not natively support bounded staleness, but you can approximate it:

// Read with explicit consistency level
GetItemRequest request = GetItemRequest.builder()
    .tableName("Products")
    .key(Map.of("productId", AttributeValue.builder().s(productId).build()))
    .consistentRead(false)  // eventually consistent (cheaper)
    // For critical operations:
    // .consistentRead(true)  // strongly consistent (2x read cost)
    .build();

For Aurora MySQL read replicas, you can check replication lag before routing:

@Service
public class SmartReadRouter {
 
    private static final Duration MAX_ACCEPTABLE_LAG = Duration.ofSeconds(5);
 
    public Connection getReadConnection(boolean acceptStaleness) {
        if (!acceptStaleness) {
            return primaryConnection();
        }
        long replicaLagSeconds = getReplicaLag();
        if (replicaLagSeconds <= MAX_ACCEPTABLE_LAG.getSeconds()) {
            return replicaConnection();
        }
        // Replica is too stale, fallback to primary
        return primaryConnection();
    }
 
    private long getReplicaLag() {
        // Query replica: SHOW SLAVE STATUS
        // or use CloudWatch metric: AuroraReplicaLag
        return auroraMontioringService.getReplicaLagSeconds();
    }
}

13. Session Consistency

Definition

Session consistency is a bundle of four guarantees, all scoped to a single user session:

  1. Read-Your-Writes
  2. Monotonic Reads
  3. Monotonic Writes
  4. Consistent Prefix Reads

This is the practical sweet spot for most user-facing applications. It gives users a consistent experience (their own actions are always visible and sensible) without requiring global strong consistency.

Real-World Systems

  • DynamoDB with client-side session token tracking
  • MongoDB sessions (causal consistency mode)
  • AWS ElastiCache with session-pinned reads

In Spring Boot with DynamoDB

// DynamoDB SDK v2 with client token for causal consistency
@Service
public class CartService {
 
    @Autowired
    private DynamoDbClient dynamoDbClient;
 
    @Autowired
    private SessionTokenStore sessionTokenStore;
 
    public void addToCart(String sessionId, String productId, int quantity) {
        // Write operation
        PutItemRequest request = PutItemRequest.builder()
            .tableName("Cart")
            .item(buildCartItem(sessionId, productId, quantity))
            .build();
 
        PutItemResponse response = dynamoDbClient.putItem(request);
        // Store the sequence token to enable read-your-writes
        sessionTokenStore.updateToken(sessionId, response.consumedCapacity());
    }
}

14. Transaction Isolation Levels (MySQL Deep Dive)

MySQL InnoDB supports four isolation levels. Understanding them is critical for production Java applications.

READ UNCOMMITTED

What it allows: A transaction can read uncommitted (dirty) data from other transactions.

Anomalies possible: Dirty reads, non-repeatable reads, phantom reads

Real risk: You read data from a transaction that is later rolled back. You made a decision based on data that never officially existed.

SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

In Production: Almost never use this. The only valid use case is approximate counts on very large tables where locking overhead is unacceptable and the approximate result is fine.

READ COMMITTED

What it allows: Only reads committed data. Each statement within a transaction sees the latest committed snapshot at the time of that statement.

Anomalies possible: Non-repeatable reads, phantom reads
Prevents: Dirty reads

Non-repeatable read example:

T1: SELECT balance FROM accounts WHERE id=1;  --> 1000
T2: UPDATE accounts SET balance=500 WHERE id=1; COMMIT;
T1: SELECT balance FROM accounts WHERE id=1;  --> 500 (changed!)

T1 read the same row twice and got different values. Non-repeatable.

@Transactional(isolation = Isolation.READ_COMMITTED)
public void processOrder(Long orderId) {
    // Each SELECT sees the latest committed state
    // Two reads of the same row may return different values
}

Good for: Most OLTP operations where you want to see the latest committed data but don't need strict consistency within a transaction. Used in many financial systems for non-critical reads.

REPEATABLE READ (MySQL Default)

What it allows: Within a single transaction, every read of the same data returns the same value (the snapshot at transaction start).

Anomalies possible: Phantom reads (but InnoDB uses gap locks to prevent most)
Prevents: Dirty reads, non-repeatable reads

How it works: InnoDB uses MVCC. At transaction start, a consistent read view (snapshot) is created. All reads use this snapshot. Writes by other transactions are invisible.

@Transactional(isolation = Isolation.REPEATABLE_READ)  // This is the DEFAULT
public void auditTransaction(Long accountId) {
    // All reads within this transaction see the same snapshot
    BigDecimal balance1 = accountRepository.findBalance(accountId);
    // ... some processing ...
    BigDecimal balance2 = accountRepository.findBalance(accountId);
    // balance1 == balance2 guaranteed (even if another transaction updated it)
}

Write Skew under REPEATABLE READ:

T1: SELECT * FROM doctors WHERE on_call = true;  --> 5 doctors
T1: If count >= 2: UPDATE doctors SET on_call=false WHERE id=1;

T2: SELECT * FROM doctors WHERE on_call = true;  --> 5 doctors (same snapshot)
T2: If count >= 2: UPDATE doctors SET on_call=false WHERE id=2;

Result: Both doctors go off call. Only 3 doctors remain on call.
The constraint "at least 2 doctors on call" is violated.

This is write skew -- a concurrency anomaly that REPEATABLE READ does not prevent.

SERIALIZABLE

What it allows: Full isolation. Transactions are serialized. Uses range locks (gap locks + next-key locks) to prevent all anomalies.

Anomalies possible: None
Prevents: All anomalies including write skew and phantom reads

@Transactional(isolation = Isolation.SERIALIZABLE)
public void criticalScheduleUpdate(Long departmentId) {
    // Full serialization -- no concurrent anomalies possible
    // Also the slowest -- use only when necessary
    List<Doctor> onCall = doctorRepository.findOnCall(departmentId);
    if (onCall.size() >= 2) {
        onCall.get(0).setOnCall(false);
        doctorRepository.save(onCall.get(0));
    }
}

Performance impact of SERIALIZABLE:

  • Acquires shared range locks on every read
  • Can cause deadlocks more frequently
  • Significantly reduces concurrency
  • Use only for critical, low-volume operations

Isolation Level Summary

Level               | Dirty | Non-Repeatable | Phantom | Write Skew
--------------------+-------+----------------+---------+-----------
READ UNCOMMITTED    |  Yes  |      Yes       |   Yes   |    Yes
READ COMMITTED      |  No   |      Yes       |   Yes   |    Yes
REPEATABLE READ     |  No   |      No        |   No*   |    Yes
SERIALIZABLE        |  No   |      No        |   No    |    No

* InnoDB prevents most phantom reads in REPEATABLE READ using gap locks

Setting Isolation Level in application.yml

spring:
  datasource:
    url: jdbc:mysql://your-rds-endpoint:3306/yourdb?useSSL=true&serverTimezone=UTC
    username: ${DB_USERNAME}
    password: ${DB_PASSWORD}
 
  jpa:
    properties:
      hibernate:
        connection:
          isolation: 2 # READ_COMMITTED = 2, REPEATABLE_READ = 4, SERIALIZABLE = 8

15. Multi-Version Concurrency Control (MVCC)

What Is MVCC?

MVCC is a concurrency control mechanism that allows readers and writers to coexist without blocking each other. Instead of locking data for reads, the database keeps multiple versions of the same data. Readers see a consistent snapshot; writers create new versions.

Core Principle: "Readers don't block writers. Writers don't block readers."

How MySQL InnoDB Implements MVCC

Every row in InnoDB has two hidden columns:

  • DB_TRX_ID: The transaction ID of the last transaction that modified this row
  • DB_ROLL_PTR: Pointer to the undo log for the previous version

When a transaction starts, InnoDB records the current transaction ID as the "read view." The read view determines which row versions are visible.

Versions of row (id=1, balance):

Version 3: balance=700 | trx_id=500 | roll_ptr --> Version 2
Version 2: balance=1000| trx_id=400 | roll_ptr --> Version 1
Version 1: balance=800 | trx_id=300 | roll_ptr --> null

Transaction T1 (started at trx_id=450):
  - Can see Version 2 (trx_id=400 < 450) -- VISIBLE
  - Cannot see Version 3 (trx_id=500 > 450) -- TOO NEW
  - T1 reads balance = 1000

MVCC in Practice (Spring Boot)

@Service
public class ReportService {
 
    // This long-running transaction sees a consistent snapshot throughout
    // Even if other transactions modify data while this runs, the snapshot is unchanged
    @Transactional(readOnly = true)
    public FinancialReport generateReport(LocalDate date) {
        // All reads in this method see the SAME snapshot
        // No blocking of writes by other transactions
        List<Transaction> transactions = transactionRepository.findByDate(date);
        BigDecimal total = transactions.stream()
            .map(Transaction::getAmount)
            .reduce(BigDecimal.ZERO, BigDecimal::add);
        return new FinancialReport(date, transactions, total);
    }
}

MVCC Benefits and Costs

Benefits:

  • High concurrency: readers and writers don't block each other
  • Consistent snapshots for long-running queries/transactions
  • Efficient for read-heavy workloads

Costs:

  • Storage overhead: multiple versions of rows stored in undo logs
  • Cleanup overhead: old versions must be purged (VACUUM in PostgreSQL, InnoDB purge thread in MySQL)
  • Long-running transactions prevent cleanup of old versions (undo log bloat)

Production Warning: Long-running transactions in MySQL hold open their read view, preventing InnoDB from purging old row versions. This causes undo log bloat. A single transaction open for hours can cause the undo tablespace to grow to gigabytes. Always set transaction timeouts.

# In application.yml
spring:
  transaction:
    default-timeout: 30  # seconds -- fail transactions running longer than 30s

16. Quorum-Based Consistency

The Concept

In a leaderless distributed system with N replicas, you can guarantee consistency by requiring:

  • W replicas to acknowledge a write before it is considered successful
  • R replicas to be read before returning a value
  • When W + R > N, every read is guaranteed to overlap with at least one node that has the latest write

The Formula

W + R > N  ==>  Guaranteed to read latest write (Strong Consistency)
W + R <= N ==>  Potentially stale reads (Eventual Consistency)

Example with N=3

RWW+RConsistencyNotes
336 > 3StrongAll nodes agree -- very slow
224 > 3StrongMost common production setting
134 > 3StrongAll writes confirmed, fast reads
314 > 3StrongFast writes, slow reads
112 < 3EventualFastest, may return stale
213 = 3BorderlineNot safe -- must be > N

DynamoDB Quorum (Simplified)

DynamoDB uses a quorum-based approach internally. When you request a strongly consistent read, DynamoDB reads from a majority of replicas (quorum) and returns the value agreed upon by the majority.

// Strongly consistent read (reads from quorum)
GetItemRequest strongRead = GetItemRequest.builder()
    .tableName("Inventory")
    .key(Map.of("productId", AttributeValue.fromS(productId)))
    .consistentRead(true)  // quorum read
    .build();
 
// Eventually consistent read (reads from one replica, possibly stale)
GetItemRequest eventualRead = GetItemRequest.builder()
    .tableName("Inventory")
    .key(Map.of("productId", AttributeValue.fromS(productId)))
    .consistentRead(false)  // single replica read -- costs 50% less
    .build();

17. Vector Clocks and Causal Ordering

The Problem with Physical Timestamps

Physical clocks across servers are never perfectly synchronized. Two events with the same millisecond timestamp could have happened in either order, or concurrently. You cannot reliably use physical time to order distributed events.

Lamport Timestamps (Logical Clocks)

A simple logical clock:

  • Each node maintains a counter
  • Increment counter before each event
  • When sending a message, include the counter
  • When receiving a message, set counter to max(local, received) + 1

Guarantee: If A happens before B (causally), then timestamp(A) < timestamp(B).

Limitation: If timestamp(A) < timestamp(B), you CANNOT conclude A happened before B. They might be concurrent.

Vector Clocks

Vector clocks solve the limitation. Each node maintains a vector of counters, one for each node in the system.

3-node system: [NodeA, NodeB, NodeC]

Initial state:   [0, 0, 0]

NodeA does event: [1, 0, 0]
NodeA sends to B: B receives [1, 0, 0]
  NodeB increments: [1, 1, 0]

NodeC does event: [0, 0, 1]  (concurrent with A and B's events)

Comparison rules:

  • [1,2,3] happens-before [2,2,3] if all components of first <= second AND at least one is strictly less
  • [1,2,3] and [2,1,3] are concurrent (neither dominates)

Java Implementation:

public class VectorClock {
    private final String nodeId;
    private final Map<String, Long> clock;
 
    public VectorClock(String nodeId, List<String> allNodes) {
        this.nodeId = nodeId;
        this.clock = new HashMap<>();
        allNodes.forEach(node -> clock.put(node, 0L));
    }
 
    // Increment this node's counter before an event
    public void tick() {
        clock.merge(nodeId, 1L, Long::sum);
    }
 
    // Merge received vector clock (when receiving a message)
    public void merge(Map<String, Long> received) {
        received.forEach((node, time) ->
            clock.merge(node, time, Math::max)
        );
        tick(); // also increment own counter
    }
 
    // Check if this clock happens-before other
    public boolean happensBefore(VectorClock other) {
        // All entries in this clock must be <= other
        // At least one entry must be strictly less
        boolean allLeq = clock.entrySet().stream()
            .allMatch(e -> e.getValue() <= other.clock.getOrDefault(e.getKey(), 0L));
        boolean oneStrict = clock.entrySet().stream()
            .anyMatch(e -> e.getValue() < other.clock.getOrDefault(e.getKey(), 0L));
        return allLeq && oneStrict;
    }
 
    // Check if concurrent (neither happens-before the other)
    public boolean isConcurrentWith(VectorClock other) {
        return !this.happensBefore(other) && !other.happensBefore(this);
    }
 
    public Map<String, Long> getClock() {
        return Collections.unmodifiableMap(clock);
    }
}

18. Conflict Resolution Strategies

When concurrent writes happen to the same data (multi-leader or leaderless replication), conflicts must be resolved.

Last-Write-Wins (LWW)

The write with the highest timestamp wins. All other concurrent writes are discarded.

Pro: Simple, automatic
Con: Data loss -- valid writes can be silently discarded. Clock skew can cause incorrect ordering.

Used by: Cassandra (default), DynamoDB (within a single partition, LWW with server-side timestamps)

// DynamoDB conditional write to implement optimistic LWW safely
Map<String, AttributeValue> expressionValues = Map.of(
    ":currentVersion", AttributeValue.fromN(String.valueOf(currentVersion)),
    ":newVersion", AttributeValue.fromN(String.valueOf(currentVersion + 1))
);
 
UpdateItemRequest request = UpdateItemRequest.builder()
    .tableName("Products")
    .key(Map.of("productId", AttributeValue.fromS(productId)))
    .updateExpression("SET price = :price, version = :newVersion")
    .conditionExpression("version = :currentVersion")  // optimistic locking
    .expressionAttributeValues(expressionValues)
    .build();

Merge Functions

Application-specific merge logic. Given two conflicting versions, a merge function produces the correct result.

Example: Shopping cart -- if a user adds item A on device 1 and item B on device 2 concurrently, the merge result should contain both A and B (union semantics).

public ShoppingCart merge(ShoppingCart cart1, ShoppingCart cart2) {
    // Union: take all items from both carts
    Map<String, CartItem> merged = new HashMap<>(cart1.getItems());
    cart2.getItems().forEach((productId, item) ->
        merged.merge(productId, item, (a, b) ->
            a.getQuantity() > b.getQuantity() ? a : b  // take higher quantity
        )
    );
    return new ShoppingCart(merged);
}

CRDTs (See Section 7)

Mathematical data structures where all merges are conflict-free. The preferred solution for systems that need SEC.


19. Consistency Model Comparison Matrix

ModelOrderingScopeReal-TimeCostUse Case
LinearizabilityGlobalAll clientsYesVery HighLocks, leader election, unique IDs
Strict SerializabilityGlobal + TransactionalAll clientsYesVery HighBanking, financial systems
SerializabilityTransactionalAll clientsNoHighCritical business transactions
SequentialGlobalAll clientsNoHighShared memory systems
CausalCausal dependenciesAll clientsNoMediumChat, social, collaborative editing
EventualNoneAll clientsNoLowCatalog, analytics, social feeds
Strong Eventual (SEC)Merge-basedAll clientsNoLow-MediumCollaborative docs, distributed counters
Read-Your-WritesClient's own writesSingle clientYes (self)LowUser profiles, session data
Monotonic ReadsMonotonically forwardSingle clientNoLowAny user-facing read
SessionMultiple guaranteesSingle sessionPartialLow-MediumMost user-facing apps
Bounded StalenessWithin time windowAll clientsApproximateMediumInventory, near-real-time dashboards

Next: Part 3: Java and Spring Boot Implementation -- Production-ready code patterns for every consistency challenge.


Part of the Consistency Models Demystified series
Stack: Java 17, Spring Boot 3.x, MySQL 8.0, AWS