Consistency Models - Part 2: All Models Deep Dive
Navigation: Index | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6
Table of Contents
- Linearizability (Strong Consistency)
- Serializability
- Strict Serializability
- Sequential Consistency
- Causal Consistency
- Eventual Consistency
- Strong Eventual Consistency and CRDTs
- Read-Your-Writes Consistency
- Monotonic Reads
- Monotonic Writes
- Consistent Prefix Reads
- Bounded Staleness
- Session Consistency
- Transaction Isolation Levels (MySQL Deep Dive)
- Multi-Version Concurrency Control (MVCC)
- Quorum-Based Consistency
- Vector Clocks and Causal Ordering
- Conflict Resolution Strategies
- Consistency Model Comparison Matrix
1. Linearizability (Strong Consistency)
Definition
Linearizability is the strongest consistency model. It guarantees that:
- Every operation appears to execute instantaneously at some point between its invocation and its completion
- All operations are globally ordered -- every client in the system sees the same single, consistent view of history
- Operations respect real-time ordering -- if operation A completes before operation B begins, then A appears before B in the global order
The Analogy
Think of a bank teller at a physical branch. There is one teller (the single source of truth). Every transaction happens at a specific moment in real time. If you deposit 500. There is no ambiguity about ordering. All clients share the same reality.
Formal Intuition
In a linearizable system, you can draw a timeline where every operation has a single point (its "linearization point") where it takes effect. All operations are consistent with this timeline.
Client A: [WRITE x=5 |-----------| done]
Client B: [READ x -------> 5] (sees the write)
Client C: [READ x --> 5 or old] (may see either, but consistent forever after)
Once Client C reads x=5, any subsequent read by any client must also return 5 (or a newer value). No "going back" in time.
When to Use
- Distributed locks (a lock must be exclusive -- no two clients can hold it)
- Leader election (only one node should be leader at a time)
- Unique ID generation
- Counter operations where correctness is critical
- Configuration management (ZooKeeper, etcd)
- Sequence numbers for ordered operations
Real-World Systems
| System | Linearizable? |
|---|---|
| ZooKeeper | Yes -- all writes go through a single leader |
| etcd | Yes -- Raft consensus provides linearizability |
| MySQL (single node, SERIALIZABLE) | Yes -- for a single node |
| MySQL with synchronous replication | Yes -- if read from primary only |
| Google Spanner | Yes -- uses TrueTime |
| DynamoDB with strong reads | Yes -- within a single partition |
| Redis (single master) | Yes -- for a single master |
| Redis Cluster | No -- different slots on different masters |
Cost of Linearizability
- Latency: Every operation must coordinate with all replicas
- Throughput: Serialization of operations limits parallelism
- Availability: If the coordinator or a quorum is unavailable, operations block
2. Serializability
Definition
Serializability is a property of transactions (not individual operations). A concurrent execution of transactions is serializable if the result is equivalent to some serial (one-at-a-time) execution of those transactions.
Key difference from linearizability: Serializability does not require real-time ordering. The equivalent serial order may not match the real-time order in which transactions ran.
Example
Three transactions T1, T2, T3 ran concurrently. The actual execution was interleaved. The result is serializable if you can find some ordering T1 -> T2 -> T3 (or T2 -> T3 -> T1, etc.) that would produce the same final state.
Why It Matters
Serializability prevents all concurrency anomalies:
- Dirty reads
- Non-repeatable reads
- Phantom reads
- Lost updates
- Write skew
MySQL achieves serializability at ISOLATION LEVEL SERIALIZABLE.
In Spring Boot
@Transactional(isolation = Isolation.SERIALIZABLE)
public void criticalOperation() {
// This transaction is fully serializable
// Slowest, safest isolation level
// Uses range locks to prevent phantom reads
}3. Strict Serializability
Definition
Strict Serializability = Linearizability + Serializability
It requires both:
- Transactions are equivalent to some serial execution (serializability)
- That serial execution respects real-time ordering (linearizability)
This is the gold standard of consistency but extremely expensive to achieve.
Used by: Google Spanner, CockroachDB
4. Sequential Consistency
Definition
Sequential consistency is weaker than linearizability but still quite strong:
- All operations appear to execute in some sequential order
- The operations from each individual client appear in the order they were issued by that client
- But the global order need not respect real-time -- the sequential order can differ from wall-clock time
The Analogy
Imagine a message board in an office. Each person posts messages in the order they write them. But you do not know the exact time each message was posted. The board shows a sequence of messages that is consistent with each person's posting order, but you cannot determine who posted first based on real time.
Difference from Linearizability
Linearizability: If A completed before B started, A must appear before B in global order.
Sequential: Each client's operations are in order, but different clients' orders
can be interleaved arbitrarily -- real-time not required.
Example
Client 1: WRITE x=1, then WRITE x=2
Client 2: WRITE y=1, then WRITE y=2
Sequential consistent orderings (all valid):
x=1, x=2, y=1, y=2
y=1, y=2, x=1, x=2
x=1, y=1, x=2, y=2
x=1, y=1, y=2, x=2
NOT valid (violates Client 1's order):
x=2, x=1, y=1, y=2 <-- Client 1's writes are reversed
Used By
CPU memory models (multiprocessor systems), some distributed systems research.
5. Causal Consistency
Definition
Causal consistency tracks causal relationships between operations. Operations that are causally related must be seen in causal order. Operations that are concurrent (not causally related) can be seen in any order.
Operation B is causally dependent on Operation A if:
- B reads a value written by A
- B and A are operations from the same client (program order)
- B is causally dependent on some C that depends on A (transitivity)
The Analogy
Think of a conversation thread in Slack. If someone posts a message and someone replies to it, the reply must always appear after the original message -- because the reply is causally dependent on it. But two independent messages sent at the same time can appear in any order.
Why Causal Consistency Matters
Consider a social media example:
User A posts: "I'm going to delete my account"
User B replies: "Don't do it!"
User C reads the thread
Without causal consistency:
User C might see: "Don't do it!" (reply)
... no original post yet ...
With causal consistency:
User C always sees the original post BEFORE the reply
Implementation: Vector Clocks
Causal consistency is typically implemented using vector clocks or hybrid logical clocks (HLC). See Section 17 for details.
Systems Using Causal Consistency
- MongoDB (causal sessions)
- Cassandra (lightweight transactions with lightweight consistency)
- COPS (research system)
- Some configurations of DynamoDB Streams
6. Eventual Consistency
Definition
Eventual consistency is the weakest useful consistency model:
If no new updates are made to a data item, all replicas will eventually converge to the same value.
No guarantee about:
- When convergence happens (could be milliseconds, seconds, or longer)
- What value is returned during the convergence window
- Whether reads within the convergence window are "fresh"
The Analogy
Think of DNS propagation. When you update a DNS record (e.g., point your domain to a new IP), the change propagates across DNS servers worldwide. During propagation (which can take minutes to hours), different users resolve the domain to different IPs. Eventually, all DNS servers have the updated record and all users get the same IP.
What "Eventually" Means in Practice
In most well-engineered systems, "eventually" is:
- Within the same data center: milliseconds to tens of milliseconds
- Across availability zones: tens to hundreds of milliseconds
- Across regions (global tables): hundreds of milliseconds to seconds
When Eventual Consistency Is Acceptable
- Product catalogs (price showing 10.00 for a second is fine)
- Social media timelines (post appears 1 second later is fine)
- Recommendation engines (stale recommendations are fine)
- Analytics dashboards (near-real-time is acceptable)
- Configuration flags with low urgency (feature flags propagating over seconds)
- User preferences that are not security-critical
When Eventual Consistency Is NOT Acceptable
- Financial balances (you cannot read 1000)
- Inventory counts (cannot sell items you don't have)
- Authentication tokens (must be valid or invalid immediately)
- Distributed locks (two clients cannot both believe they hold the lock)
- Unique constraint enforcement (two users cannot get the same username)
Conflict Resolution in Eventual Consistency
When replicas accept concurrent writes to the same key, conflicts arise. Resolution strategies:
| Strategy | How It Works | Risk |
|---|---|---|
| Last-Write-Wins (LWW) | Higher timestamp wins | Data loss if clocks are skewed |
| Multi-Value (siblings) | Return all conflicting versions | Application must resolve |
| Application-Level | Application merges conflicts | Most flexible, most work |
| CRDTs | Math-based auto-merge | Limited to specific data types |
7. Strong Eventual Consistency and CRDTs
Definition
Strong Eventual Consistency (SEC) adds one guarantee to eventual consistency:
All replicas that have received the same set of updates are in the same state, regardless of the order in which they received updates.
With SEC, there are no conflicts because the data structure is designed to commute -- the result is the same regardless of operation order.
Conflict-Free Replicated Data Types (CRDTs)
CRDTs are data structures mathematically designed to merge concurrent updates without conflict. They achieve SEC.
Types of CRDTs
G-Counter (Grow-only Counter)
Each node maintains its own counter. The global counter is the sum of all node counters. Increments are always safe -- no conflicts possible.
// Simplified G-Counter CRDT
public class GCounter {
private final String nodeId;
private final Map<String, Long> counts = new ConcurrentHashMap<>();
public GCounter(String nodeId) {
this.nodeId = nodeId;
this.counts.put(nodeId, 0L);
}
public void increment() {
counts.merge(nodeId, 1L, Long::sum);
}
public long value() {
return counts.values().stream().mapToLong(Long::longValue).sum();
}
// Merge another replica's state -- always safe, commutative, idempotent
public void merge(GCounter other) {
other.counts.forEach((node, count) ->
counts.merge(node, count, Math::max)
);
}
}PN-Counter (Positive-Negative Counter)
Two G-Counters: one for increments (P) and one for decrements (N). Value = P - N. Used for distributed counters that can go up and down.
public class PNCounter {
private final GCounter positive;
private final GCounter negative;
public PNCounter(String nodeId) {
this.positive = new GCounter(nodeId);
this.negative = new GCounter(nodeId);
}
public void increment() { positive.increment(); }
public void decrement() { negative.increment(); }
public long value() { return positive.value() - negative.value(); }
public void merge(PNCounter other) {
positive.merge(other.positive);
negative.merge(other.negative);
}
}LWW-Register (Last-Write-Wins Register)
Stores a single value with a timestamp. Merge always picks the value with the higher timestamp. Simple but risks data loss with clock skew.
OR-Set (Observed-Remove Set)
A set that handles concurrent add/remove without conflicts. Each add gets a unique tag. Remove removes specific tagged elements. If you add then remove and add concurrently, the element stays.
CRDTs in Production
| System | CRDT Usage |
|---|---|
| Redis | HyperLogLog (approximate counting CRDT) |
| Riak | Full CRDT support |
| Cassandra | Counters (PN-Counter like behavior) |
| SoundCloud | Conflict resolution in distributed systems |
| Akka Distributed Data | Full CRDT library |
8. Read-Your-Writes Consistency
Definition
After a client performs a write, any subsequent read by that same client will always see the written value. Other clients may or may not see it immediately.
This is a session-level guarantee, not a global one.
Why It Matters
Imagine a user changes their profile photo. They hit "save" and are immediately shown their profile. If the read goes to a stale replica, they see their old photo and think the save failed. They hit save again. Frustrating.
With read-your-writes:
- The user always sees their own changes immediately
- Other users might see the old photo for a second -- acceptable
- The user experience is consistent
Implementation Strategies
Strategy 1: Route Writes and Reads to Primary
Simplest solution: after a write, always read from the primary (not the replica).
@Service
public class UserService {
@Transactional // routes to primary (write datasource)
public void updateProfile(Long userId, ProfileUpdateRequest request) {
User user = userRepository.findById(userId).orElseThrow();
user.updateFrom(request);
userRepository.save(user);
}
@Transactional // also routes to primary for read-your-writes
public UserProfile getProfile(Long userId) {
return userRepository.findById(userId)
.map(UserProfile::from)
.orElseThrow();
}
}Strategy 2: Track Last Write Timestamp
After a write, store the timestamp in the user's session. For subsequent reads, if the replica's replication position is behind the timestamp, route to primary.
@Service
public class ProfileService {
@Autowired
private SessionContext sessionContext;
@Transactional
public void updateProfile(Long userId, ProfileUpdateRequest request) {
userRepository.save(/* ... */);
// Record that this session has a pending read-your-writes guarantee
sessionContext.setLastWriteTimestamp(Instant.now());
}
public UserProfile getProfile(Long userId) {
// If we wrote recently, read from primary
if (sessionContext.hasRecentWrite(Duration.ofSeconds(5))) {
return readFromPrimary(userId);
}
return readFromReplica(userId);
}
}Strategy 3: Sticky Sessions
Route all requests from the same user session to the same replica. That replica will have the user's writes. This is a simpler but less precise approach.
9. Monotonic Reads
Definition
If a client reads a value of X, any subsequent read of X by that same client will return the same value or a more recent value. The client will never read an older value than what it has already seen.
The Problem Without Monotonic Reads
t=1: Replica A has: x = 5 (latest)
t=2: Replica B has: x = 3 (lagging)
Client reads x from Replica A --> gets 5
Client reads x from Replica B --> gets 3 (GOES BACKWARDS!)
This is deeply confusing to users. Imagine a bank balance going from 500 (before deposit). Users would think the deposit was reversed.
Implementation
The common approach is sticky reads -- route a client's reads to the same replica consistently. Since replicas are monotonically applying the replication log, once the client has seen a value, they will never see an older one from the same replica.
In AWS Aurora, this is achieved by reading from a specific reader endpoint or by using the writer endpoint for critical reads.
10. Monotonic Writes
Definition
Writes from the same client are applied to all replicas in the order they were issued. If client writes W1 then W2, then W1 must be reflected on any replica before W2.
Why It Matters
Client writes:
W1: INSERT INTO orders (id, status) VALUES (100, 'PENDING');
W2: UPDATE orders SET status = 'APPROVED' WHERE id = 100;
Without monotonic writes:
W2 might arrive at Replica before W1
--> UPDATE on a non-existent row does nothing
--> Row is eventually inserted with status 'PENDING' (wrong!)
With monotonic writes:
W1 always arrives and applies before W2
MySQL Replication and Monotonic Writes
MySQL replication is inherently monotonic writes within a single thread (single client). The binary log is sequential. However, with multi-threaded replication (parallel apply), writes from the same client can be applied out of order if not configured carefully.
Configuration:
-- Ensure writes from the same client are applied in order
SET GLOBAL slave_preserve_commit_order = ON; -- MySQL 5.7+
-- In MySQL 8.0:
SET GLOBAL replica_preserve_commit_order = ON;11. Consistent Prefix Reads
Definition
If a sequence of writes happened in a specific order (W1, W2, W3), a reader will never see a state that violates the order -- they might see (W1), or (W1, W2), or (W1, W2, W3), but never (W2 without W1) or (W3 without W2).
The Analogy
Think of reading a comic strip. You might only see the first three panels, but you never see panel 5 before panel 4. The story always makes sense up to the point you've read.
Why It Matters
A user posts: "I'm going to order pizza"
A user posts: "Actually, I ordered sushi"
Consistent prefix read scenarios (valid):
Reader sees: Nothing yet
Reader sees: "I'm going to order pizza"
Reader sees: Both posts in order
Inconsistent prefix (violates this model):
Reader sees: "Actually, I ordered sushi" -- without the first post
(This is confusing without the context)
In MySQL
Consistent prefix reads are guaranteed within a single transaction using REPEATABLE READ isolation (MVCC snapshot). Within the same transaction, you always see a consistent snapshot -- never a partial write.
12. Bounded Staleness
Definition
Reads are guaranteed to be no more than K versions behind or T time units behind the most recent write. A weaker form of strong consistency with a defined staleness window.
Practical Value
Instead of "reads may be arbitrarily stale" (eventual consistency), bounded staleness says "reads are at most 5 seconds stale." This is much more predictable and allows for SLA-based guarantees.
AWS DynamoDB and Bounded Staleness
DynamoDB does not natively support bounded staleness, but you can approximate it:
// Read with explicit consistency level
GetItemRequest request = GetItemRequest.builder()
.tableName("Products")
.key(Map.of("productId", AttributeValue.builder().s(productId).build()))
.consistentRead(false) // eventually consistent (cheaper)
// For critical operations:
// .consistentRead(true) // strongly consistent (2x read cost)
.build();For Aurora MySQL read replicas, you can check replication lag before routing:
@Service
public class SmartReadRouter {
private static final Duration MAX_ACCEPTABLE_LAG = Duration.ofSeconds(5);
public Connection getReadConnection(boolean acceptStaleness) {
if (!acceptStaleness) {
return primaryConnection();
}
long replicaLagSeconds = getReplicaLag();
if (replicaLagSeconds <= MAX_ACCEPTABLE_LAG.getSeconds()) {
return replicaConnection();
}
// Replica is too stale, fallback to primary
return primaryConnection();
}
private long getReplicaLag() {
// Query replica: SHOW SLAVE STATUS
// or use CloudWatch metric: AuroraReplicaLag
return auroraMontioringService.getReplicaLagSeconds();
}
}13. Session Consistency
Definition
Session consistency is a bundle of four guarantees, all scoped to a single user session:
- Read-Your-Writes
- Monotonic Reads
- Monotonic Writes
- Consistent Prefix Reads
This is the practical sweet spot for most user-facing applications. It gives users a consistent experience (their own actions are always visible and sensible) without requiring global strong consistency.
Real-World Systems
- DynamoDB with client-side session token tracking
- MongoDB sessions (causal consistency mode)
- AWS ElastiCache with session-pinned reads
In Spring Boot with DynamoDB
// DynamoDB SDK v2 with client token for causal consistency
@Service
public class CartService {
@Autowired
private DynamoDbClient dynamoDbClient;
@Autowired
private SessionTokenStore sessionTokenStore;
public void addToCart(String sessionId, String productId, int quantity) {
// Write operation
PutItemRequest request = PutItemRequest.builder()
.tableName("Cart")
.item(buildCartItem(sessionId, productId, quantity))
.build();
PutItemResponse response = dynamoDbClient.putItem(request);
// Store the sequence token to enable read-your-writes
sessionTokenStore.updateToken(sessionId, response.consumedCapacity());
}
}14. Transaction Isolation Levels (MySQL Deep Dive)
MySQL InnoDB supports four isolation levels. Understanding them is critical for production Java applications.
READ UNCOMMITTED
What it allows: A transaction can read uncommitted (dirty) data from other transactions.
Anomalies possible: Dirty reads, non-repeatable reads, phantom reads
Real risk: You read data from a transaction that is later rolled back. You made a decision based on data that never officially existed.
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;In Production: Almost never use this. The only valid use case is approximate counts on very large tables where locking overhead is unacceptable and the approximate result is fine.
READ COMMITTED
What it allows: Only reads committed data. Each statement within a transaction sees the latest committed snapshot at the time of that statement.
Anomalies possible: Non-repeatable reads, phantom reads
Prevents: Dirty reads
Non-repeatable read example:
T1: SELECT balance FROM accounts WHERE id=1; --> 1000
T2: UPDATE accounts SET balance=500 WHERE id=1; COMMIT;
T1: SELECT balance FROM accounts WHERE id=1; --> 500 (changed!)
T1 read the same row twice and got different values. Non-repeatable.
@Transactional(isolation = Isolation.READ_COMMITTED)
public void processOrder(Long orderId) {
// Each SELECT sees the latest committed state
// Two reads of the same row may return different values
}Good for: Most OLTP operations where you want to see the latest committed data but don't need strict consistency within a transaction. Used in many financial systems for non-critical reads.
REPEATABLE READ (MySQL Default)
What it allows: Within a single transaction, every read of the same data returns the same value (the snapshot at transaction start).
Anomalies possible: Phantom reads (but InnoDB uses gap locks to prevent most)
Prevents: Dirty reads, non-repeatable reads
How it works: InnoDB uses MVCC. At transaction start, a consistent read view (snapshot) is created. All reads use this snapshot. Writes by other transactions are invisible.
@Transactional(isolation = Isolation.REPEATABLE_READ) // This is the DEFAULT
public void auditTransaction(Long accountId) {
// All reads within this transaction see the same snapshot
BigDecimal balance1 = accountRepository.findBalance(accountId);
// ... some processing ...
BigDecimal balance2 = accountRepository.findBalance(accountId);
// balance1 == balance2 guaranteed (even if another transaction updated it)
}Write Skew under REPEATABLE READ:
T1: SELECT * FROM doctors WHERE on_call = true; --> 5 doctors
T1: If count >= 2: UPDATE doctors SET on_call=false WHERE id=1;
T2: SELECT * FROM doctors WHERE on_call = true; --> 5 doctors (same snapshot)
T2: If count >= 2: UPDATE doctors SET on_call=false WHERE id=2;
Result: Both doctors go off call. Only 3 doctors remain on call.
The constraint "at least 2 doctors on call" is violated.
This is write skew -- a concurrency anomaly that REPEATABLE READ does not prevent.
SERIALIZABLE
What it allows: Full isolation. Transactions are serialized. Uses range locks (gap locks + next-key locks) to prevent all anomalies.
Anomalies possible: None
Prevents: All anomalies including write skew and phantom reads
@Transactional(isolation = Isolation.SERIALIZABLE)
public void criticalScheduleUpdate(Long departmentId) {
// Full serialization -- no concurrent anomalies possible
// Also the slowest -- use only when necessary
List<Doctor> onCall = doctorRepository.findOnCall(departmentId);
if (onCall.size() >= 2) {
onCall.get(0).setOnCall(false);
doctorRepository.save(onCall.get(0));
}
}Performance impact of SERIALIZABLE:
- Acquires shared range locks on every read
- Can cause deadlocks more frequently
- Significantly reduces concurrency
- Use only for critical, low-volume operations
Isolation Level Summary
Level | Dirty | Non-Repeatable | Phantom | Write Skew
--------------------+-------+----------------+---------+-----------
READ UNCOMMITTED | Yes | Yes | Yes | Yes
READ COMMITTED | No | Yes | Yes | Yes
REPEATABLE READ | No | No | No* | Yes
SERIALIZABLE | No | No | No | No
* InnoDB prevents most phantom reads in REPEATABLE READ using gap locks
Setting Isolation Level in application.yml
spring:
datasource:
url: jdbc:mysql://your-rds-endpoint:3306/yourdb?useSSL=true&serverTimezone=UTC
username: ${DB_USERNAME}
password: ${DB_PASSWORD}
jpa:
properties:
hibernate:
connection:
isolation: 2 # READ_COMMITTED = 2, REPEATABLE_READ = 4, SERIALIZABLE = 815. Multi-Version Concurrency Control (MVCC)
What Is MVCC?
MVCC is a concurrency control mechanism that allows readers and writers to coexist without blocking each other. Instead of locking data for reads, the database keeps multiple versions of the same data. Readers see a consistent snapshot; writers create new versions.
Core Principle: "Readers don't block writers. Writers don't block readers."
How MySQL InnoDB Implements MVCC
Every row in InnoDB has two hidden columns:
DB_TRX_ID: The transaction ID of the last transaction that modified this rowDB_ROLL_PTR: Pointer to the undo log for the previous version
When a transaction starts, InnoDB records the current transaction ID as the "read view." The read view determines which row versions are visible.
Versions of row (id=1, balance):
Version 3: balance=700 | trx_id=500 | roll_ptr --> Version 2
Version 2: balance=1000| trx_id=400 | roll_ptr --> Version 1
Version 1: balance=800 | trx_id=300 | roll_ptr --> null
Transaction T1 (started at trx_id=450):
- Can see Version 2 (trx_id=400 < 450) -- VISIBLE
- Cannot see Version 3 (trx_id=500 > 450) -- TOO NEW
- T1 reads balance = 1000
MVCC in Practice (Spring Boot)
@Service
public class ReportService {
// This long-running transaction sees a consistent snapshot throughout
// Even if other transactions modify data while this runs, the snapshot is unchanged
@Transactional(readOnly = true)
public FinancialReport generateReport(LocalDate date) {
// All reads in this method see the SAME snapshot
// No blocking of writes by other transactions
List<Transaction> transactions = transactionRepository.findByDate(date);
BigDecimal total = transactions.stream()
.map(Transaction::getAmount)
.reduce(BigDecimal.ZERO, BigDecimal::add);
return new FinancialReport(date, transactions, total);
}
}MVCC Benefits and Costs
Benefits:
- High concurrency: readers and writers don't block each other
- Consistent snapshots for long-running queries/transactions
- Efficient for read-heavy workloads
Costs:
- Storage overhead: multiple versions of rows stored in undo logs
- Cleanup overhead: old versions must be purged (VACUUM in PostgreSQL, InnoDB purge thread in MySQL)
- Long-running transactions prevent cleanup of old versions (undo log bloat)
Production Warning: Long-running transactions in MySQL hold open their read view, preventing InnoDB from purging old row versions. This causes undo log bloat. A single transaction open for hours can cause the undo tablespace to grow to gigabytes. Always set transaction timeouts.
# In application.yml
spring:
transaction:
default-timeout: 30 # seconds -- fail transactions running longer than 30s16. Quorum-Based Consistency
The Concept
In a leaderless distributed system with N replicas, you can guarantee consistency by requiring:
- W replicas to acknowledge a write before it is considered successful
- R replicas to be read before returning a value
- When W + R > N, every read is guaranteed to overlap with at least one node that has the latest write
The Formula
W + R > N ==> Guaranteed to read latest write (Strong Consistency)
W + R <= N ==> Potentially stale reads (Eventual Consistency)
Example with N=3
| R | W | W+R | Consistency | Notes |
|---|---|---|---|---|
| 3 | 3 | 6 > 3 | Strong | All nodes agree -- very slow |
| 2 | 2 | 4 > 3 | Strong | Most common production setting |
| 1 | 3 | 4 > 3 | Strong | All writes confirmed, fast reads |
| 3 | 1 | 4 > 3 | Strong | Fast writes, slow reads |
| 1 | 1 | 2 < 3 | Eventual | Fastest, may return stale |
| 2 | 1 | 3 = 3 | Borderline | Not safe -- must be > N |
DynamoDB Quorum (Simplified)
DynamoDB uses a quorum-based approach internally. When you request a strongly consistent read, DynamoDB reads from a majority of replicas (quorum) and returns the value agreed upon by the majority.
// Strongly consistent read (reads from quorum)
GetItemRequest strongRead = GetItemRequest.builder()
.tableName("Inventory")
.key(Map.of("productId", AttributeValue.fromS(productId)))
.consistentRead(true) // quorum read
.build();
// Eventually consistent read (reads from one replica, possibly stale)
GetItemRequest eventualRead = GetItemRequest.builder()
.tableName("Inventory")
.key(Map.of("productId", AttributeValue.fromS(productId)))
.consistentRead(false) // single replica read -- costs 50% less
.build();17. Vector Clocks and Causal Ordering
The Problem with Physical Timestamps
Physical clocks across servers are never perfectly synchronized. Two events with the same millisecond timestamp could have happened in either order, or concurrently. You cannot reliably use physical time to order distributed events.
Lamport Timestamps (Logical Clocks)
A simple logical clock:
- Each node maintains a counter
- Increment counter before each event
- When sending a message, include the counter
- When receiving a message, set counter to max(local, received) + 1
Guarantee: If A happens before B (causally), then timestamp(A) < timestamp(B).
Limitation: If timestamp(A) < timestamp(B), you CANNOT conclude A happened before B. They might be concurrent.
Vector Clocks
Vector clocks solve the limitation. Each node maintains a vector of counters, one for each node in the system.
3-node system: [NodeA, NodeB, NodeC]
Initial state: [0, 0, 0]
NodeA does event: [1, 0, 0]
NodeA sends to B: B receives [1, 0, 0]
NodeB increments: [1, 1, 0]
NodeC does event: [0, 0, 1] (concurrent with A and B's events)
Comparison rules:
- [1,2,3] happens-before [2,2,3] if all components of first <= second AND at least one is strictly less
- [1,2,3] and [2,1,3] are concurrent (neither dominates)
Java Implementation:
public class VectorClock {
private final String nodeId;
private final Map<String, Long> clock;
public VectorClock(String nodeId, List<String> allNodes) {
this.nodeId = nodeId;
this.clock = new HashMap<>();
allNodes.forEach(node -> clock.put(node, 0L));
}
// Increment this node's counter before an event
public void tick() {
clock.merge(nodeId, 1L, Long::sum);
}
// Merge received vector clock (when receiving a message)
public void merge(Map<String, Long> received) {
received.forEach((node, time) ->
clock.merge(node, time, Math::max)
);
tick(); // also increment own counter
}
// Check if this clock happens-before other
public boolean happensBefore(VectorClock other) {
// All entries in this clock must be <= other
// At least one entry must be strictly less
boolean allLeq = clock.entrySet().stream()
.allMatch(e -> e.getValue() <= other.clock.getOrDefault(e.getKey(), 0L));
boolean oneStrict = clock.entrySet().stream()
.anyMatch(e -> e.getValue() < other.clock.getOrDefault(e.getKey(), 0L));
return allLeq && oneStrict;
}
// Check if concurrent (neither happens-before the other)
public boolean isConcurrentWith(VectorClock other) {
return !this.happensBefore(other) && !other.happensBefore(this);
}
public Map<String, Long> getClock() {
return Collections.unmodifiableMap(clock);
}
}18. Conflict Resolution Strategies
When concurrent writes happen to the same data (multi-leader or leaderless replication), conflicts must be resolved.
Last-Write-Wins (LWW)
The write with the highest timestamp wins. All other concurrent writes are discarded.
Pro: Simple, automatic
Con: Data loss -- valid writes can be silently discarded. Clock skew can cause incorrect ordering.
Used by: Cassandra (default), DynamoDB (within a single partition, LWW with server-side timestamps)
// DynamoDB conditional write to implement optimistic LWW safely
Map<String, AttributeValue> expressionValues = Map.of(
":currentVersion", AttributeValue.fromN(String.valueOf(currentVersion)),
":newVersion", AttributeValue.fromN(String.valueOf(currentVersion + 1))
);
UpdateItemRequest request = UpdateItemRequest.builder()
.tableName("Products")
.key(Map.of("productId", AttributeValue.fromS(productId)))
.updateExpression("SET price = :price, version = :newVersion")
.conditionExpression("version = :currentVersion") // optimistic locking
.expressionAttributeValues(expressionValues)
.build();Merge Functions
Application-specific merge logic. Given two conflicting versions, a merge function produces the correct result.
Example: Shopping cart -- if a user adds item A on device 1 and item B on device 2 concurrently, the merge result should contain both A and B (union semantics).
public ShoppingCart merge(ShoppingCart cart1, ShoppingCart cart2) {
// Union: take all items from both carts
Map<String, CartItem> merged = new HashMap<>(cart1.getItems());
cart2.getItems().forEach((productId, item) ->
merged.merge(productId, item, (a, b) ->
a.getQuantity() > b.getQuantity() ? a : b // take higher quantity
)
);
return new ShoppingCart(merged);
}CRDTs (See Section 7)
Mathematical data structures where all merges are conflict-free. The preferred solution for systems that need SEC.
19. Consistency Model Comparison Matrix
| Model | Ordering | Scope | Real-Time | Cost | Use Case |
|---|---|---|---|---|---|
| Linearizability | Global | All clients | Yes | Very High | Locks, leader election, unique IDs |
| Strict Serializability | Global + Transactional | All clients | Yes | Very High | Banking, financial systems |
| Serializability | Transactional | All clients | No | High | Critical business transactions |
| Sequential | Global | All clients | No | High | Shared memory systems |
| Causal | Causal dependencies | All clients | No | Medium | Chat, social, collaborative editing |
| Eventual | None | All clients | No | Low | Catalog, analytics, social feeds |
| Strong Eventual (SEC) | Merge-based | All clients | No | Low-Medium | Collaborative docs, distributed counters |
| Read-Your-Writes | Client's own writes | Single client | Yes (self) | Low | User profiles, session data |
| Monotonic Reads | Monotonically forward | Single client | No | Low | Any user-facing read |
| Session | Multiple guarantees | Single session | Partial | Low-Medium | Most user-facing apps |
| Bounded Staleness | Within time window | All clients | Approximate | Medium | Inventory, near-real-time dashboards |
Next: Part 3: Java and Spring Boot Implementation -- Production-ready code patterns for every consistency challenge.
Part of the Consistency Models Demystified series
Stack: Java 17, Spring Boot 3.x, MySQL 8.0, AWS