← Back to Articles
6/6/2026Admin Post

consistency models part1 fundamentals

Consistency Models - Part 1: Fundamentals

Navigation: Index | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6


Table of Contents

  1. What Is Consistency?
  2. Why Single-Node Systems Are Easy
  3. Why Distributed Systems Are Hard
  4. The 8 Fallacies of Distributed Computing
  5. ACID Properties Deep Dive
  6. BASE Properties Deep Dive
  7. ACID vs BASE - The Full Comparison
  8. CAP Theorem - The Deep Dive
  9. PACELC Theorem - Beyond CAP
  10. Consistency vs Isolation - Critical Distinction
  11. Replication and Why It Creates Consistency Challenges
  12. The Consistency Spectrum Overview
  13. Key Takeaways

1. What Is Consistency?

Informal Definition

Consistency is the guarantee that when you read data, you see a correct, agreed-upon, and up-to-date view of the world.

More Precisely

In the context of distributed systems, consistency is about the agreement between multiple copies (replicas) of data. If you update data on one server, consistency models define when and how other servers reflect that update.

Why Multiple Definitions Exist

The word "consistency" is overloaded in computer science:

ContextWhat "Consistency" Means
ACID (databases)Data integrity constraints are maintained after every transaction
CAP TheoremEvery read reflects the most recent write (no stale data)
Consistency ModelsA contract defining what a system guarantees about the order and visibility of operations
Eventual ConsistencyGiven no new updates, replicas will eventually converge

Critical Point: The "C" in ACID and the "C" in CAP refer to completely different things. This confusion trips up engineers in interviews constantly. See Section 10 for a detailed explanation.


2. Why Single-Node Systems Are Easy

Consider a single-server MySQL database with no replicas:

[Client] --> [MySQL Single Node] --> [Disk]

On a single node:

  • There is exactly one copy of the data
  • All reads and writes go to the same place
  • The database engine serializes concurrent access with locks
  • After a COMMIT, the data is immediately visible to all subsequent reads
  • Consistency is guaranteed by the database engine itself

This is the world most developers assume they live in. It is clean, simple, and largely correct. But it does not scale.

The moment you add a replica, a cache, a second service, or a second data center, the assumptions of the single-node world shatter.


3. Why Distributed Systems Are Hard

Problem 1: Network Partitions

A network partition occurs when nodes in a distributed system cannot communicate with each other. This is not theoretical -- it happens in production due to:

  • Router failures
  • Misconfigured firewall rules
  • Data center outages
  • Cable cuts between availability zones
  • Software bugs in network stack
  • Overloaded network interfaces dropping packets

When a partition occurs, nodes that cannot communicate must decide: do I stop accepting requests (stay consistent) or continue with potentially stale data (stay available)?

Problem 2: Partial Failures

In a distributed system, components can fail partially. A message might be delivered once, twice, or never. A write might succeed on 2 out of 3 replicas. A node might crash mid-transaction after writing to disk but before sending acknowledgment. These partial states are impossible in a single-node system.

Problem 3: Latency and Asynchrony

Distributed systems communicate over networks with non-zero, variable latency. The speed of light is a real constraint:

  • Within an AWS Availability Zone: ~0.5ms
  • Between Availability Zones in the same region: ~1-2ms
  • Between AWS regions (US to Europe): ~80-150ms

This means that if you want every read to see the latest write, you must synchronously wait for all replicas to confirm the write. This introduces latency proportional to the number of replicas and their distance.

Problem 4: Concurrent Operations

Multiple clients reading and writing the same data simultaneously creates race conditions. Without careful coordination:

  • Two users might both read a quantity of 1, both decide to purchase it, and both successfully decrement -- resulting in quantity of -1
  • A write might overwrite another write that happened slightly earlier
  • A read might see data halfway through a multi-step update

Problem 5: Clock Skew

Each machine has its own physical clock. Even with NTP synchronization, clocks across servers differ by milliseconds. You cannot reliably use timestamps to determine the order of events in a distributed system. Two events with the same millisecond timestamp could have happened in either order.

Problem 6: The Observability Gap

In a distributed system, you cannot observe the global state at a single point in time. By the time you observe all nodes, some of them have already changed. This makes reasoning about "current state" fundamentally difficult.


4. The 8 Fallacies of Distributed Computing

Peter Deutsch at Sun Microsystems identified 8 assumptions that distributed system newcomers incorrectly make:

FallacyWhat You AssumeThe Reality
1. The network is reliableMessages always arrivePackets drop, connections timeout
2. Latency is zeroCommunication is instantNetwork latency is real and variable
3. Bandwidth is infiniteYou can send unlimited dataBandwidth is finite and shared
4. The network is secureYour messages are safeAssume breach; encrypt in transit
5. Topology never changesNetwork layout is staticIPs change, nodes scale up/down
6. There is one administratorOne team manages everythingMulti-team ownership, conflicting configs
7. Transport cost is zeroSerialization is freeJSON/Protobuf has CPU and network cost
8. The network is homogeneousAll parts use same protocolsMixed versions, protocols, operating systems

Every fallacy above has direct implications for consistency. For example:

  • Fallacy 1 (reliable network) means your distributed lock might not be released if the network drops during release
  • Fallacy 2 (zero latency) is why synchronous replication is expensive
  • Fallacy 6 (one admin) is why distributed systems have conflicting writes from multiple leaders

5. ACID Properties Deep Dive

ACID is a set of properties that guarantee reliable processing of database transactions. MySQL with InnoDB engine is ACID-compliant.

A - Atomicity

Definition: A transaction is all-or-nothing. Either all operations in the transaction succeed and are committed, or none of them are.

In MySQL: Implemented via the Write-Ahead Log (undo log). If a transaction is rolled back (or the server crashes mid-transaction), the undo log is used to reverse any partial changes.

Example:

BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;  -- debit
UPDATE accounts SET balance = balance + 500 WHERE id = 2;  -- credit
COMMIT;  -- both happen, or neither

If the server crashes between the two UPDATEs, MySQL rolls back the debit. The account never ends up in a state where $500 was debited but not credited.

Java / Spring Boot:

@Service
public class TransferService {
 
    @Transactional  // atomicity guaranteed
    public void transfer(Long fromId, Long toId, BigDecimal amount) {
        Account from = accountRepository.findById(fromId).orElseThrow();
        Account to = accountRepository.findById(toId).orElseThrow();
        from.debit(amount);   // both happen atomically
        to.credit(amount);    // or neither happens
    }
}

C - Consistency (ACID Context)

Definition: A transaction brings the database from one valid state to another valid state. All data integrity constraints (foreign keys, unique constraints, check constraints) must be satisfied before and after the transaction.

IMPORTANT: This is completely different from the "C" in CAP theorem. ACID Consistency is about data integrity constraints. CAP Consistency is about replicas having the same data.

Example: If you have a constraint that balance >= 0, no transaction should leave the balance negative. The database enforces this at commit time.

ALTER TABLE accounts ADD CONSTRAINT chk_balance CHECK (balance >= 0);

I - Isolation

Definition: Concurrent transactions execute as if they were serial (one at a time). Intermediate states of a transaction are not visible to other transactions.

MySQL Isolation Levels (from weakest to strongest):

LevelDirty ReadNon-Repeatable ReadPhantom Read
READ UNCOMMITTEDPossiblePossiblePossible
READ COMMITTEDPreventedPossiblePossible
REPEATABLE READ (default)PreventedPreventedPossible*
SERIALIZABLEPreventedPreventedPrevented

*MySQL InnoDB uses gap locks to prevent phantom reads even in REPEATABLE READ.

Setting isolation level in Spring Boot:

@Transactional(isolation = Isolation.SERIALIZABLE)
public void criticalOperation() {
    // Full isolation - no concurrent anomalies possible
}
 
@Transactional(isolation = Isolation.READ_COMMITTED)
public void standardOperation() {
    // Good default for most non-financial operations
}

D - Durability

Definition: Once a transaction is committed, it remains committed even in the case of system failure. The data is persisted to non-volatile storage.

In MySQL InnoDB: Durability is achieved through:

  • innodb_flush_log_at_trx_commit = 1 -- flushes redo log to disk on every commit (safest)
  • sync_binlog = 1 -- syncs binary log to disk on every commit
  • Double-write buffer -- prevents partial page writes during a crash

AWS RDS Configuration for Durability:

# These are the production defaults in AWS RDS for MySQL
innodb_flush_log_at_trx_commit = 1  (cannot change in RDS)
sync_binlog = 1                      (cannot change in RDS)

Production Tip: AWS RDS sets innodb_flush_log_at_trx_commit = 1 and sync_binlog = 1 by default and does not allow changing them. This is intentional -- it maximizes durability. If you see these set to 2 or 0 anywhere, it is a durability risk.


6. BASE Properties Deep Dive

BASE emerged as the alternative philosophy for distributed systems where ACID guarantees are too expensive to maintain at scale.

BA - Basically Available

The system guarantees availability (not 100%, but "basic" availability). During partial failures, the system continues to respond, possibly with degraded or stale data instead of errors.

Example: Amazon's shopping cart continues to function (you can add items) even if the inventory service is temporarily unreachable. You might add an item that turns out to be out of stock -- the system resolves this later (during checkout).

S - Soft State

The state of the system may change over time, even without any new input. Data is not guaranteed to be consistent at all times. Replicas might diverge temporarily.

Example: A DynamoDB table has 3 replicas. After a write to one replica, the other two are in a "soft state" -- they are being updated asynchronously. During this window, they may have older data.

E - Eventually Consistent

Given no new updates, all replicas will eventually converge to the same value. The system guarantees convergence, but not when.

Example: DNS is the canonical example. When you update a DNS record, the change propagates across DNS servers globally over minutes to hours. During propagation, different DNS servers return different values. Eventually, all return the new value.


7. ACID vs BASE - The Full Comparison

DimensionACIDBASE
Consistency GuaranteeStrong, immediateEventual
AvailabilityMay sacrifice availability for consistencyPrioritizes availability
PerformanceLower throughput due to coordinationHigher throughput
ScalabilityHarder to scale horizontallyDesigned for horizontal scale
ComplexityManaged by database engineApplication must handle conflicts
Use CasesFinancial, medical, legal dataSocial media, catalog, analytics
ExamplesMySQL, PostgreSQL, OracleCassandra, DynamoDB (eventual), Redis

The Spectrum in Practice

Real-world systems rarely pick pure ACID or pure BASE. They use different models for different data:

E-Commerce Platform:
  - Orders table:        ACID (MySQL) - money involved
  - Product catalog:     BASE (DynamoDB) - eventual is fine
  - User sessions:       BASE (Redis) - speed matters
  - Inventory counts:    ACID with atomic operations - prevent overselling
  - Product reviews:     BASE - seconds of lag is fine
  - Payment records:     ACID (MySQL SERIALIZABLE) - strictest

8. CAP Theorem - The Deep Dive

What CAP Actually Says

Eric Brewer's CAP theorem (proved formally by Gilbert and Lynch in 2002):

In any distributed data system, it is impossible to simultaneously guarantee all three of:

  • Consistency: Every read receives the most recent write or an error
  • Availability: Every request receives a non-error response
  • Partition Tolerance: The system continues operating despite network partitions

The Key Insight Everyone Gets Wrong

"P is not optional." Network partitions will happen in any real distributed system. You cannot design them away. Therefore:

The real choice is not "pick 2 of 3." The real choice is:

When a network partition occurs, what do you sacrifice: Consistency or Availability?

  • CP systems choose Consistency: stop serving requests (or return errors) during a partition to avoid returning stale/inconsistent data. ZooKeeper, etcd.
  • AP systems choose Availability: keep serving requests during a partition, potentially returning stale data. Cassandra, CouchDB, DynamoDB (default).

A Concrete Example

Imagine you have a distributed key-value store with two nodes, A and B, and a client:

              +--------+
              | Client |
              +--------+
              /         \
         writes          reads
            /               \
     +------+               +------+
     | Node A|               | Node B|
     +------+               +------+
         |                       |
         +--- Network Partition --+
             (nodes cannot talk)

Scenario: Client writes x = 5 to Node A. Network partition occurs. Client reads x from Node B.

  • CP choice: Node B detects the partition. It refuses to answer (returns error). Consistency preserved. Client frustrated.
  • AP choice: Node B answers with the old value of x. Client gets stale data. System keeps running.

Where Common Systems Fall

SystemChoiceReasoning
MySQL RDS (single node)Not applicableNo replication = no partition
MySQL with synchronous replicasCPWaits for replica acknowledgment
MySQL Aurora (writes)CPSingle writer enforces consistency
MySQL Aurora (reads from replicas)APReplicas may lag
DynamoDB (eventual reads)APReturns possibly stale data
DynamoDB (strong reads)CPWaits for quorum
Cassandra (quorum reads/writes)CPWith tuning
Cassandra (default)APPrioritizes availability
ZooKeeperCPStops serving if cannot reach majority
Redis (single master)CP (sort of)Single node = no partition
Redis ClusterAP (default)Can accept writes to minority partitions

CAP Misconceptions to Avoid

Misconception 1: "CA systems exist (without P)."

  • Reality: In a true distributed system (multiple nodes), partition tolerance is a physical reality. A "CA" system is just a single node -- which is not distributed.

Misconception 2: "CAP is a permanent choice."

  • Reality: Systems can be tuned per operation. DynamoDB lets you choose per-request. You can do strongly consistent reads for critical data and eventually consistent for others.

Misconception 3: "Consistency in CAP = Consistency in ACID."

  • Reality: They are completely different. See Section 10.

Misconception 4: "CAP means you can only have 2 properties."

  • Reality: You always have P (in a real distributed system). You always have A and C during normal operation (no partition). CAP only matters during a partition.

9. PACELC Theorem - Beyond CAP

Why CAP Is Not Enough

CAP only addresses what happens during a network partition. But network partitions are rare. What about the 99.9% of the time when everything is healthy?

Daniel Abadi (2012) extended CAP with PACELC:

If Partition (P):
    choose between Availability (A) and Consistency (C)
Else (E, normal operation):
    choose between Latency (L) and Consistency (C)

The Latency-Consistency Trade-Off

In normal operation (no partition), synchronous replication means:

  • Write must be acknowledged by all (or quorum) replicas before returning success
  • Every write has higher latency (proportional to replica count and network distance)
  • But reads are always consistent

Asynchronous replication means:

  • Write returns as soon as the primary writes
  • Replication happens in the background
  • Lower write latency
  • But reads from replicas may be stale

PACELC Classification of Common Systems

SystemDuring PartitionDuring Normal OperationClassification
MySQL RDS Multi-AZ (sync)CPEC (higher latency for consistency)PC/EC
Aurora (writer)CPECPC/EC
Aurora (reader endpoints)APELPA/EL
DynamoDB (strong reads)CPECPC/EC
DynamoDB (eventual reads)APELPA/EL
CassandraAPELPA/EL
ZooKeeperCPECPC/EC
MongoDB (primary reads)CPECPC/EC

Key Insight for Architects: Most systems that are AP during partitions are also EL during normal operation (they consistently prefer availability/low-latency over consistency). Most CP systems are also EC. The choice is usually consistent across both scenarios.


10. Consistency vs Isolation - Critical Distinction

This is one of the most commonly confused topics in interviews and system design discussions. Let us be absolutely clear.

ACID Consistency (The "C" in ACID)

What it is: Data integrity constraints are satisfied before and after every transaction. The database never enters a state that violates its defined rules.

Who enforces it: The database engine, based on constraints you define.

Examples:

  • UNIQUE constraint: No two rows can have the same email address
  • FOREIGN KEY constraint: An order must reference a valid customer
  • CHECK constraint: Balance cannot be negative
  • Application-level constraints enforced within a transaction

This is about data validity, not about replicas.

CAP Consistency (The "C" in CAP)

What it is: In a distributed system, every read returns the most recent write across all replicas, or an error. No stale reads.

Who enforces it: The distributed system's replication and coordination protocol.

This is about replica agreement, not data validity.

Isolation (The "I" in ACID)

What it is: Concurrent transactions are isolated from each other. Each transaction sees a consistent snapshot of the data, as if it were the only transaction running.

This is about concurrency anomalies within a single database, not about replicas.

Side-by-Side Comparison

PropertyQuestion It AnswersScopeEnforced By
ACID ConsistencyIs the data valid according to defined rules?Single node, data integrityDB constraints
CAP ConsistencyDo all replicas agree on the current value?Multiple nodes, replica stateReplication protocol
IsolationAre concurrent transactions separated?Single node, concurrencyLocking / MVCC

Why This Confusion Matters in Production

A common mistake: "We use MySQL, which is ACID, so we have consistency."

This is true for ACID Consistency (data constraints are enforced). But if you have read replicas with asynchronous replication, you do not have CAP Consistency -- your replicas may lag and return stale data.

// This might return stale data if routed to a replica
@Transactional(readOnly = true)
public UserProfile getProfile(Long userId) {
    // ACID Consistency: profile satisfies all constraints (valid)
    // CAP Consistency: may be seconds behind the primary (stale)
    return userRepository.findById(userId).orElseThrow();
}

11. Replication and Why It Creates Consistency Challenges

Why Replication Exists

Replication (having multiple copies of data) serves three goals:

  1. High Availability: If one node fails, others take over
  2. Read Scaling: Multiple replicas serve read requests, reducing load on primary
  3. Disaster Recovery: Replicas in different regions survive regional failures

Types of Replication

Synchronous Replication

Client
  |
  | Write request
  v
Primary -----(replicate)-----> Replica 1
  |                              |
  |<-------(acknowledge)---------|
  |
  | Respond to client (after replica confirms)
  • Consistency: Strong -- data is on all nodes before success
  • Latency: Higher -- must wait for replica
  • Availability: Lower -- if replica is slow or down, writes stall
  • Used by: Aurora (synchronously replicates across AZs), Synchronous MySQL replicas

Asynchronous Replication

Client
  |
  | Write request
  v
Primary -----> Respond to client (immediately)
  |
  | (background replication, may lag 10ms to 10 seconds)
  v
Replica 1, Replica 2, ...
  • Consistency: Eventual -- replicas may lag
  • Latency: Lower -- no wait for replicas
  • Availability: Higher -- replica failure does not block writes
  • Used by: MySQL RDS read replicas (default), Aurora read replicas, DynamoDB global tables

Leaderless Replication (Quorum-Based)

No designated primary. Client writes to multiple nodes and reads from multiple nodes. Uses quorum (majority) to determine consistency.

Client writes to 3 of 5 nodes (W = 3)
Client reads from 3 of 5 nodes (R = 3)
W + R > N (3 + 3 > 5) -- guaranteed to overlap with the write
  • Used by: Cassandra, DynamoDB (internally), Riak

The Replication Lag Problem

With asynchronous replication, the replica is always behind the primary. This lag creates consistency windows where:

t=0: Primary has: balance = 1000
t=1: Client writes balance = 500 to primary
t=2: Client reads from replica --> gets 1000  (STALE!)
t=3: Replication completes
t=4: Client reads from replica --> gets 500  (CURRENT)

The window between t=1 and t=3 is the replication lag window. During this window, reads from the replica are stale.

Typical replication lag in AWS Aurora: 10-100 milliseconds
In RDS MySQL read replicas under heavy load: Can be seconds or even minutes

MySQL Binary Log (Binlog) and Replication

MySQL replication works via the binary log:

  1. Primary records all changes to the binlog
  2. Replica's I/O thread reads the binlog from the primary
  3. Replica's SQL thread applies the changes
Primary:
  [Transaction] --> [InnoDB Commit] --> [Binlog Write]
                                              |
                                    (async transfer)
                                              v
Replica:
                    [Relay Log] --> [SQL Thread Apply]

The time between "Binlog Write" on primary and "SQL Thread Apply" on replica is the replication lag.


12. The Consistency Spectrum Overview

Consistency models form a spectrum from strongest to weakest. Each model is a trade-off:

STRONGER CONSISTENCY                          WEAKER CONSISTENCY
     |                                                |
     |  Linearizability (Atomic Consistency)          |
     |    "Real-time globally ordered"                |
     |                                                |
     |  Sequential Consistency                        |
     |    "Same order, not real-time"                 |
     |                                                |
     |  Causal Consistency                            |
     |    "Causally related ops in order"             |
     |                                                |
     |  FIFO Consistency (Monotonic Writes)           |
     |    "Per-client ordering preserved"             |
     |                                                |
     |  Eventual Consistency                          |
     |    "Will converge, no time guarantee"          |
     |                                                |
     v                                                v
STRONGER CONSISTENCY                          WEAKER CONSISTENCY

Session-level models (often layered on top):

  • Read-Your-Writes
  • Monotonic Reads
  • Consistent Prefix Reads
  • Bounded Staleness

Deep dive into each model is in Part 2.


13. Key Takeaways

  1. Consistency is not binary. It exists on a spectrum from linearizability (strongest) to eventual consistency (weakest).

  2. ACID Consistency != CAP Consistency != Isolation. These are three different concepts. Mixing them up is a serious mistake.

  3. CAP theorem means P is mandatory. The real choice is Consistency vs Availability during a partition.

  4. PACELC extends CAP. Even without partitions, you trade consistency for lower latency.

  5. Replication is the root cause. Consistency challenges arise because replication is asynchronous in most real systems.

  6. There is no free lunch. Stronger consistency = more coordination = higher latency = lower availability.

  7. Domain matters. Financial data needs strong consistency. Social media feeds work fine with eventual consistency.

  8. Most production systems use a mix. Different services, tables, and operations within the same application use different consistency models.


Next: Part 2: Consistency Models Deep Dive -- Every model explained with analogies, trade-offs, and real system examples.


Part of the Consistency Models Demystified series
Stack: Java 17, Spring Boot 3.x, MySQL 8.0, AWS