Consistency Models - Part 4: AWS Production Configurations

Amazon RDS for MySQL - Production Setup
Amazon Aurora MySQL - Architecture and Configuration
Aurora Read Replicas - Consistency Management
Aurora Global Database - Multi-Region Consistency
Amazon DynamoDB - Consistency Models
DynamoDB Transactions and Conditional Writes
Amazon ElastiCache Redis - Cache Consistency
Amazon SQS - Message Delivery Guarantees
Amazon S3 - Consistency Model
Multi-Region Architecture Patterns
AWS CDK Infrastructure as Code Examples
Monitoring Consistency on AWS

1. Amazon RDS for MySQL - Production Setup

Parameter Group for Consistency

AWS RDS for MySQL uses Parameter Groups to configure engine behavior. The following parameters are critical for consistency:

# RDS Parameter Group: production-mysql8-params
# These settings maximize durability and consistency
 
# Transaction logging -- CRITICAL for durability
innodb_flush_log_at_trx_commit = 1
# 0: Flush every second (risk: up to 1s data loss)
# 1: Flush on every commit (safest -- AWS default, cannot change)
# 2: Write to OS cache every commit, flush every second (some risk)
 
sync_binlog = 1
# 0: No sync (fastest but risky)
# 1: Sync on every commit (safest -- AWS default)
# N: Sync every N commits
 
# Isolation level
transaction_isolation = READ-COMMITTED
# Default in AWS RDS is REPEATABLE-READ
# READ-COMMITTED is recommended for most OLTP workloads
# (less locking, better concurrency, avoids long-running read views)
 
# Prevent table lock issues during DDL
innodb_online_alter_log_max_size = 134217728  # 128MB
 
# Row-based binary logging (required for read replicas)
binlog_format = ROW
 
# Full image logging for point-in-time recovery
binlog_row_image = FULL
 
# Enable GTID for more reliable replication
gtid_mode = ON
enforce_gtid_consistency = ON
 
# Connection management
wait_timeout = 28800        # 8 hours idle connection timeout
interactive_timeout = 28800
max_connections = 500       # Tune based on instance class
 
# Deadlock detection
innodb_deadlock_detect = ON  # Default ON -- kill shorter transaction on deadlock
innodb_lock_wait_timeout = 5  # 5 seconds before lock wait timeout error
 
# Buffer pool (tune to 75% of instance RAM for db.r6g.large = 16GB RAM)
innodb_buffer_pool_size = 12884901888  # 12GB
innodb_buffer_pool_instances = 8      # 1 per GB for pools > 1GB

RDS Multi-AZ Consistency Behavior

RDS Multi-AZ uses synchronous replication to the standby:

Primary (us-east-1a) ----[sync replication]----> Standby (us-east-1b)
      |                                                  |
   Write committed only when                    Standby confirms
   standby confirms write                       write to storage

Consistency: Strong (synchronous)
Failover time: 60-120 seconds (DNS update based)
Data loss on failover: ZERO (synchronous)
Performance impact: ~1-2ms added per write (AZ round-trip)

Key Distinction: Multi-AZ standby is NOT a read replica. It does not serve reads. It only activates on primary failure. This is pure HA, not read scaling.

Spring Boot Configuration for RDS

spring:
  datasource:
    # Use the RDS instance endpoint (NOT read replica for writes)
    url: jdbc:mysql://myapp-prod.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com:3306/myapp
    username: ${DB_USERNAME}
    password: ${DB_PASSWORD}
    hikari:
      maximum-pool-size: 10 # RDS instance max_connections / number of app pods
      minimum-idle: 2
      connection-timeout: 30000
      idle-timeout: 600000
      max-lifetime: 1800000 # Less than MySQL wait_timeout
      keepalive-time: 60000 # Prevent idle connection drops by AWS VPC
      connection-test-query: SELECT 1
      data-source-properties:
        useSSL: true
        requireSSL: true
        verifyServerCertificate: false # true in strict environments
        characterEncoding: utf8mb4
        characterSetResults: utf8mb4
        connectionCollation: utf8mb4_unicode_ci
        useCompression: false
        rewriteBatchedStatements: true
        cachePrepStmts: true
        prepStmtCacheSize: 250
        prepStmtCacheSqlLimit: 2048
        autoReconnect: false # Let HikariCP manage reconnects
        failOverReadOnly: false # Important: do not make writes read-only on failover

RDS Proxy for Connection Pooling

When running many Lambda functions or many ECS tasks, use RDS Proxy to prevent connection exhaustion:

# RDS Proxy configuration notes:
# - Maintains warm connections to RDS
# - Multiplexes many app connections onto fewer DB connections
# - Handles failover transparently (faster than DNS-based failover)
# - Supports IAM authentication (recommended over password in production)
 
# In Spring Boot, point to Proxy endpoint instead of RDS endpoint
spring.datasource.url: jdbc:mysql://myapp-proxy.proxy-xxxx.us-east-1.rds.amazonaws.com:3306/myapp
 
# For IAM auth with RDS Proxy, use AWS IAM token generator
# (Spring Boot AWS IAM authentication library handles this)

2. Amazon Aurora MySQL - Architecture and Configuration

How Aurora Is Different from MySQL

Aurora MySQL is API-compatible with MySQL 8.0 but fundamentally different in architecture:

Standard MySQL Replication:
  Primary ----[binlog transfer]----> Replica
  (Full data pages copied)

Aurora Architecture:
  Writer Instance ----[redo log only]----> Reader Instances
                \
                 +--[6 copies across 3 AZs]----> Aurora Storage Cluster
                    (data exists in storage, not in instances)

Key benefits for consistency:

Writer writes to storage (6 copies) synchronously -- strong durability
Readers read from the same distributed storage -- much faster replica lag (typically 10-20ms vs 100ms+ for MySQL)
No buffer pool divergence -- readers and writers see the same underlying storage

Cluster Endpoints

Aurora provides multiple endpoint types. Choosing the right one is critical for consistency:

Cluster Endpoint (Writer):   myapp.cluster-xxxx.us-east-1.rds.amazonaws.com
  - Always points to the current writer instance
  - Use for ALL writes and reads requiring strong consistency
  - After failover, DNS updates (typically 30 seconds)

Reader Endpoint:             myapp.cluster-ro-xxxx.us-east-1.rds.amazonaws.com
  - Load balances across all reader instances
  - Eventually consistent reads (10-20ms lag typical)
  - Use for read-heavy, latency-tolerant queries

Individual Instance Endpoints:
  - Point to specific reader instances
  - Useful for session-pinned reads (sticky sessions to same replica)
  - myapp-instance-1.xxxx.us-east-1.rds.amazonaws.com

Custom Endpoints:
  - Route specific workloads to specific replica subsets
  - E.g., analytics queries to high-memory replicas

Aurora Configuration for Optimal Consistency

# Aurora Cluster Parameter Group
 
# Aurora-specific: faster replica lag
aurora_replica_read_consistency = EVENTUAL  # default
# Options: EVENTUAL, SESSION (experimental), GLOBAL
 
# Commit latency optimization
aurora_enable_replica_log_compression = 1
 
# Binary log for CDC (if using Debezium)
binlog_format = ROW
binlog_row_image = FULL
log_bin = ON
 
# GTID-based replication
gtid_mode = ON
enforce_gtid_consistency = ON
 
# Transaction isolation (set at application level preferably)
transaction_isolation = READ-COMMITTED
 
# Write replication configuration
aurora_lab_mode = 0  # Never enable in production
rds.force_ssl = 1    # Require SSL
 
# Auto-restart wait (Aurora auto-restarts crashed instances)
innodb_lock_wait_timeout = 5
innodb_deadlock_detect = ON

Application YAML for Aurora

spring:
  datasource:
    # Aurora Writer -- for all writes
    write:
      url: jdbc:mysql://myapp.cluster-xxxx.us-east-1.rds.amazonaws.com:3306/myapp?useSSL=true&serverTimezone=UTC
      username: ${AURORA_WRITE_USER}
      password: ${AURORA_WRITE_PASSWORD}
      hikari:
        pool-name: AuroraWritePool
        maximum-pool-size: 20
        minimum-idle: 5
        max-lifetime: 1740000 # 29 minutes (Aurora idle connection timeout is 30min)
        keepalive-time: 60000
 
    # Aurora Reader -- for eventually consistent reads
    read:
      url: jdbc:mysql://myapp.cluster-ro-xxxx.us-east-1.rds.amazonaws.com:3306/myapp?useSSL=true&serverTimezone=UTC
      username: ${AURORA_READ_USER}
      password: ${AURORA_READ_PASSWORD}
      hikari:
        pool-name: AuroraReadPool
        maximum-pool-size: 40
        minimum-idle: 10
        read-only: true

3. Aurora Read Replicas - Consistency Management

Understanding Replica Lag

Aurora reader instances are not perfectly in sync with the writer. There is a small replication lag:

Typical: 10-20ms (much better than standard MySQL replicas)
Under heavy write load: can be 100ms+
Useful CloudWatch metric: AuroraReplicaLag

Monitoring Replica Lag in Spring Boot

@Component
@RequiredArgsConstructor
@Slf4j
public class ReplicaLagMonitor {
 
    private final NamedParameterJdbcTemplate readJdbcTemplate;
    private final MeterRegistry meterRegistry;
 
    // Check replica lag every 30 seconds
    @Scheduled(fixedDelay = 30000)
    public void checkReplicaLag() {
        try {
            // Aurora-specific: shows replica lag in milliseconds
            Long lagMs = readJdbcTemplate.queryForObject(
                "SELECT FLOOR(milliseconds_behind_master) FROM information_schema.replica_host_status LIMIT 1",
                Map.of(),
                Long.class
            );
 
            if (lagMs != null) {
                meterRegistry.gauge("aurora.replica.lag.ms",
                    Tags.of("instance", "reader"), lagMs);
 
                if (lagMs > 5000) {  // 5 second lag is concerning
                    log.warn("Aurora replica lag is {}ms -- considering routing to writer", lagMs);
                }
                if (lagMs > 30000) {  // 30 second lag is critical
                    log.error("CRITICAL: Aurora replica lag is {}ms", lagMs);
                }
            }
        } catch (Exception e) {
            log.error("Failed to check replica lag", e);
        }
    }
}

Dynamic Read Routing Based on Lag

@Service
@RequiredArgsConstructor
public class AdaptiveReadRouter {
 
    private final ReplicaLagMonitor replicaLagMonitor;
    private static final long LAG_THRESHOLD_MS = 5000;  // 5 seconds
 
    public boolean shouldUseReplica() {
        long currentLagMs = replicaLagMonitor.getCurrentLagMs();
        return currentLagMs < LAG_THRESHOLD_MS;
    }
 
    public <T> T readWithFallback(Supplier<T> replicaRead, Supplier<T> primaryRead) {
        if (shouldUseReplica()) {
            return replicaRead.get();
        }
        log.info("Replica lag too high ({}ms), falling back to primary read",
            replicaLagMonitor.getCurrentLagMs());
        return primaryRead.get();
    }
}

4. Aurora Global Database - Multi-Region Consistency

Architecture

Aurora Global Database replicates an Aurora cluster to up to 5 secondary AWS regions. It uses storage-level replication (redo log shipping), not MySQL binlog.

Primary Region (us-east-1):
  Aurora Writer + 2 Readers
  All writes happen here

Secondary Region (eu-west-1):
  Read-only Readers (replicated from us-east-1)
  Typical replication lag: < 1 second (usually 100-300ms)
  Can be promoted to primary in < 1 minute during disaster

Secondary Region (ap-southeast-1):
  Same as above

Consistency Implications

Reads from secondary regions are eventually consistent (100-300ms behind primary)
For global users needing read-your-writes: route writes to primary, reads to nearest replica (with lag tolerance) or primary
RPO: typically < 1 second for secondary regions
RTO for failover: < 1 minute (with managed failover)

Spring Boot Multi-Region Configuration

@Configuration
public class MultiRegionDataSourceConfig {
 
    @Value("${aws.region.current}")
    private String currentRegion;
 
    @Value("${aurora.global.primary.endpoint}")
    private String primaryEndpoint;
 
    @Value("${aurora.global.secondary.endpoint}")
    private String secondaryEndpoint;
 
    @Bean
    public DataSource dataSource() {
        // Always write to global primary (us-east-1)
        // Read from local secondary (if running in eu-west-1)
        // This provides low-latency reads with eventual consistency
 
        if ("us-east-1".equals(currentRegion)) {
            return buildDataSource(primaryEndpoint, false);  // can write and read
        } else {
            // In secondary region: use local reader for reads, primary for writes
            return buildRoutingDataSource(primaryEndpoint, secondaryEndpoint);
        }
    }
 
    private DataSource buildRoutingDataSource(String writeEndpoint, String readEndpoint) {
        RoutingDataSource routing = new RoutingDataSource();
        routing.setTargetDataSources(Map.of(
            "WRITE", buildDataSource(writeEndpoint, false),
            "READ", buildDataSource(readEndpoint, true)
        ));
        routing.setDefaultTargetDataSource(buildDataSource(writeEndpoint, false));
        return routing;
    }
}

5. Amazon DynamoDB - Consistency Models

DynamoDB's Two Read Consistency Options

DynamoDB offers a per-request choice of consistency:

Option	Consistent	Latency	Cost	Use For
Eventually Consistent	Eventually	Lower	1 read unit	Reads tolerating <1s staleness
Strongly Consistent	Yes (quorum)	Higher	2 read units	Critical reads requiring latest data

Java SDK v2 - Consistency Configuration

@Configuration
public class DynamoDbConfig {
 
    @Bean
    public DynamoDbClient dynamoDbClient() {
        return DynamoDbClient.builder()
            .region(Region.US_EAST_1)
            .credentialsProvider(DefaultCredentialsProvider.create())
            .overrideConfiguration(ClientOverrideConfiguration.builder()
                .apiCallTimeout(Duration.ofSeconds(5))
                .apiCallAttemptTimeout(Duration.ofSeconds(2))
                .retryPolicy(RetryPolicy.builder()
                    .numRetries(3)
                    .build())
                .build())
            .build();
    }
 
    @Bean
    public DynamoDbEnhancedClient dynamoDbEnhancedClient(DynamoDbClient client) {
        return DynamoDbEnhancedClient.builder()
            .dynamoDbClient(client)
            .build();
    }
}
 
@Repository
@RequiredArgsConstructor
public class ProductDynamoRepository {
 
    private final DynamoDbEnhancedClient enhancedClient;
 
    // EVENTUALLY CONSISTENT read -- default, cheaper
    public Optional<Product> findById(String productId) {
        DynamoDbTable<Product> table = enhancedClient.table("Products", TableSchema.fromBean(Product.class));
        GetItemEnhancedRequest request = GetItemEnhancedRequest.builder()
            .key(Key.builder()
                .partitionValue(productId)
                .build())
            .consistentRead(false)  // Eventually consistent
            .build();
        return Optional.ofNullable(table.getItem(request));
    }
 
    // STRONGLY CONSISTENT read -- for critical operations
    public Optional<Product> findByIdStrong(String productId) {
        DynamoDbTable<Product> table = enhancedClient.table("Products", TableSchema.fromBean(Product.class));
        GetItemEnhancedRequest request = GetItemEnhancedRequest.builder()
            .key(Key.builder()
                .partitionValue(productId)
                .build())
            .consistentRead(true)  // Strongly consistent -- 2x cost
            .build();
        return Optional.ofNullable(table.getItem(request));
    }
 
    // For write operations, DynamoDB is always strongly consistent within a partition
    public void save(Product product) {
        DynamoDbTable<Product> table = enhancedClient.table("Products", TableSchema.fromBean(Product.class));
        table.putItem(product);
    }
}

DynamoDB Entity with Optimistic Locking

@DynamoDbBean
@Getter
@Setter
public class Product {
 
    private String productId;
    private String name;
    private BigDecimal price;
    private Integer stockQuantity;
    private Long version;         // Optimistic locking token
    private String status;
    private Instant updatedAt;
 
    @DynamoDbPartitionKey
    public String getProductId() { return productId; }
 
    @DynamoDbVersionAttribute
    public Long getVersion() { return version; }  // Auto-incremented by SDK, OCC enabled
}
 
@Repository
@RequiredArgsConstructor
public class ProductDynamoRepository {
 
    private final DynamoDbEnhancedClient enhancedClient;
 
    // This will automatically use version attribute for optimistic locking
    // If version doesn't match (concurrent update), throws TransactionCanceledException
    public void updateWithOptimisticLock(Product product) {
        DynamoDbTable<Product> table = enhancedClient.table("Products",
            TableSchema.fromBean(Product.class));
        try {
            table.updateItem(product);  // version check is automatic due to @DynamoDbVersionAttribute
        } catch (DynamoDbException e) {
            if (e.getMessage().contains("ConditionalCheckFailed")) {
                throw new OptimisticLockException("Product was modified by another transaction");
            }
            throw e;
        }
    }
 
    // Manual conditional write (without @DynamoDbVersionAttribute)
    public void atomicDecrementStock(String productId, int quantity, long expectedVersion) {
        DynamoDbClient rawClient = enhancedClient.getDynamoDbClient();
 
        UpdateItemRequest request = UpdateItemRequest.builder()
            .tableName("Products")
            .key(Map.of("productId", AttributeValue.fromS(productId)))
            .updateExpression("SET stockQuantity = stockQuantity - :qty, " +
                              "version = :newVersion, " +
                              "updatedAt = :now")
            .conditionExpression("version = :expectedVersion AND stockQuantity >= :qty")
            .expressionAttributeValues(Map.of(
                ":qty", AttributeValue.fromN(String.valueOf(quantity)),
                ":expectedVersion", AttributeValue.fromN(String.valueOf(expectedVersion)),
                ":newVersion", AttributeValue.fromN(String.valueOf(expectedVersion + 1)),
                ":now", AttributeValue.fromS(Instant.now().toString())
            ))
            .build();
 
        try {
            rawClient.updateItem(request);
        } catch (ConditionalCheckFailedException e) {
            throw new StockConflictException(
                "Stock update failed: concurrent modification or insufficient stock");
        }
    }
}

6. DynamoDB Transactions and Conditional Writes

DynamoDB ACID Transactions

DynamoDB supports transactions across multiple items (up to 100 items, 4MB total):

@Service
@RequiredArgsConstructor
public class OrderDynamoService {
 
    private final DynamoDbClient dynamoDbClient;
 
    // Atomic multi-item transaction
    public void createOrderWithInventoryReservation(Order order,
                                                     String productId,
                                                     int quantity) {
        // This entire operation is atomic -- all succeed or all fail
        TransactWriteItemsRequest request = TransactWriteItemsRequest.builder()
            .transactItems(
                // 1. Create the order
                TransactWriteItem.builder()
                    .put(Put.builder()
                        .tableName("Orders")
                        .item(convertToAttributeMap(order))
                        .conditionExpression("attribute_not_exists(orderId)")  // Prevent duplicates
                        .build())
                    .build(),
 
                // 2. Decrement inventory (with condition check)
                TransactWriteItem.builder()
                    .update(Update.builder()
                        .tableName("Products")
                        .key(Map.of("productId", AttributeValue.fromS(productId)))
                        .updateExpression("SET stockQuantity = stockQuantity - :qty, " +
                                         "version = version + :inc")
                        .conditionExpression("stockQuantity >= :qty AND attribute_exists(productId)")
                        .expressionAttributeValues(Map.of(
                            ":qty", AttributeValue.fromN(String.valueOf(quantity)),
                            ":inc", AttributeValue.fromN("1")
                        ))
                        .build())
                    .build()
            )
            .build();
 
        try {
            dynamoDbClient.transactWriteItems(request);
        } catch (TransactionCanceledException e) {
            // One or more conditions failed
            analyzeTransactionCancellation(e, productId, quantity);
        }
    }
 
    private void analyzeTransactionCancellation(TransactionCanceledException e,
                                                  String productId, int quantity) {
        e.cancellationReasons().forEach(reason -> {
            if ("ConditionalCheckFailed".equals(reason.code())) {
                throw new InsufficientStockException(productId, quantity);
            }
        });
        throw new OrderCreationException("Order creation failed", e);
    }
}

DynamoDB Global Tables - Multi-Region Consistency

@Configuration
public class GlobalDynamoDbConfig {
 
    // For global tables, writes can go to any region
    // DynamoDB replicates using LWW (Last-Write-Wins) with server-side timestamps
    // Typical cross-region replication lag: <1 second
 
    @Bean
    public DynamoDbClient dynamoDbClientPrimary() {
        return DynamoDbClient.builder()
            .region(Region.US_EAST_1)
            .credentialsProvider(DefaultCredentialsProvider.create())
            .build();
    }
 
    @Bean
    public DynamoDbClient dynamoDbClientSecondary() {
        return DynamoDbClient.builder()
            .region(Region.EU_WEST_1)  // European region
            .credentialsProvider(DefaultCredentialsProvider.create())
            .build();
    }
}
 
// When using Global Tables, to avoid conflicts:
// 1. Use strongly consistent reads for critical operations
// 2. Use condition expressions to prevent lost updates
// 3. Route all writes for a user to a single region using Route53 geolocation
// 4. For critical sections, use DynamoDB transactions (within a region)

7. Amazon ElastiCache Redis - Cache Consistency

ElastiCache Redis Cluster Configuration

# CloudFormation / CDK resource properties for ElastiCache Redis
ElastiCacheReplicationGroup:
  Type: AWS::ElastiCache::ReplicationGroup
  Properties:
    ReplicationGroupDescription: "Production Redis Cluster"
    NumNodeGroups: 3 # 3 shards
    ReplicasPerNodeGroup: 2 # 2 replicas per shard (1 primary + 2 replicas)
    CacheNodeType: cache.r6g.large # Memory-optimized
    Engine: redis
    EngineVersion: "7.0"
    AtRestEncryptionEnabled: true
    TransitEncryptionEnabled: true
    AutomaticFailoverEnabled: true # Auto promote replica on primary failure
    MultiAZEnabled: true # Replicas in different AZs
    CacheParameterGroupFamily: redis7

Redis Consistency Parameters

# ElastiCache Redis Parameter Group
# Key consistency-related parameters
 
# Persistence (affects durability, not just consistency)
appendonly yes            # AOF persistence: survives restarts
appendfsync everysec      # Sync AOF to disk every second (balance between perf and durability)
 
# Replication
min-slaves-to-write 1     # Require at least 1 replica to acknowledge write
min-slaves-max-lag 10     # Replica must be within 10 seconds of primary
 
# If min-slaves conditions not met, primary REFUSES writes
# This prevents split-brain writes
 
# Memory management
maxmemory-policy allkeys-lru  # Evict LRU keys when memory full

Spring Boot Redis Configuration for Consistency

@Configuration
public class RedisConfig {
 
    @Value("${spring.redis.cluster.nodes}")
    private List<String> clusterNodes;
 
    @Bean
    public RedisConnectionFactory redisConnectionFactory() {
        RedisClusterConfiguration clusterConfig =
            new RedisClusterConfiguration(clusterNodes);
        clusterConfig.setMaxRedirects(3);
 
        LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
            .commandTimeout(Duration.ofSeconds(2))
            .readFrom(ReadFrom.REPLICA_PREFERRED)  // Read from replica when possible
            // Options: MASTER (strong), MASTER_PREFERRED, REPLICA_PREFERRED (eventual), REPLICA
            .build();
 
        return new LettuceConnectionFactory(clusterConfig, clientConfig);
    }
 
    @Bean
    public RedisTemplate<String, Object> redisTemplate(RedisConnectionFactory factory) {
        RedisTemplate<String, Object> template = new RedisTemplate<>();
        template.setConnectionFactory(factory);
        template.setKeySerializer(new StringRedisSerializer());
        template.setValueSerializer(new GenericJackson2JsonRedisSerializer());
        template.setHashKeySerializer(new StringRedisSerializer());
        template.setHashValueSerializer(new GenericJackson2JsonRedisSerializer());
        template.afterPropertiesSet();
        return template;
    }
 
    // For operations requiring strong consistency (read from primary only)
    @Bean("primaryOnlyTemplate")
    public RedisTemplate<String, Object> primaryOnlyRedisTemplate(
            LettuceConnectionFactory factory) {
        // Override to always read from primary
        LettuceClientConfiguration primaryConfig = LettuceClientConfiguration.builder()
            .readFrom(ReadFrom.UPSTREAM)  // Always read from primary
            .build();
 
        RedisClusterConfiguration clusterConfig =
            new RedisClusterConfiguration(clusterNodes);
        LettuceConnectionFactory primaryFactory =
            new LettuceConnectionFactory(clusterConfig, primaryConfig);
        primaryFactory.afterPropertiesSet();
 
        RedisTemplate<String, Object> template = new RedisTemplate<>();
        template.setConnectionFactory(primaryFactory);
        template.setKeySerializer(new StringRedisSerializer());
        template.setValueSerializer(new GenericJackson2JsonRedisSerializer());
        template.afterPropertiesSet();
        return template;
    }
}

Cache-Aside Pattern with Proper Consistency

@Service
@RequiredArgsConstructor
@Slf4j
public class ProductCacheService {
 
    private final RedisTemplate<String, Product> redisTemplate;
    private final ProductRepository productRepository;
 
    private static final String CACHE_PREFIX = "product:v2:";  // versioned key
    private static final Duration TTL = Duration.ofMinutes(30);
    private static final String NULL_SENTINEL = "NULL_PLACEHOLDER";
 
    // Cache-aside with negative caching (prevent DB hammering for missing items)
    public Optional<Product> getProduct(Long productId) {
        String key = CACHE_PREFIX + productId;
 
        // 1. Check cache
        Object cached = redisTemplate.opsForValue().get(key);
 
        if (NULL_SENTINEL.equals(cached)) {
            // Cached miss -- avoid DB lookup
            return Optional.empty();
        }
 
        if (cached instanceof Product product) {
            return Optional.of(product);
        }
 
        // 2. Cache miss -- load from DB
        Optional<Product> dbProduct = productRepository.findById(productId);
 
        if (dbProduct.isPresent()) {
            redisTemplate.opsForValue().set(key, dbProduct.get(), TTL);
        } else {
            // Cache the miss too (negative caching) -- prevent stampede on missing keys
            redisTemplate.opsForValue().set(key, NULL_SENTINEL, Duration.ofMinutes(5));
        }
 
        return dbProduct;
    }
 
    // Invalidation with version-based key rotation (no stale data possible)
    @Transactional
    public Product updateProduct(Long productId, UpdateProductRequest request) {
        Product updated = productRepository.findById(productId)
            .map(p -> { p.update(request); return productRepository.save(p); })
            .orElseThrow(() -> new ProductNotFoundException(productId));
 
        // Invalidate cache (delete, don't update -- avoids write-through complexity)
        redisTemplate.delete(CACHE_PREFIX + productId);
 
        // If concerned about time-of-check/time-of-use between delete and next read:
        // Use a short TTL (e.g., 5 minutes) so stale data naturally expires
        return updated;
    }
 
    // Batch invalidation (after bulk update)
    public void invalidateBatch(List<Long> productIds) {
        List<String> keys = productIds.stream()
            .map(id -> CACHE_PREFIX + id)
            .collect(Collectors.toList());
        redisTemplate.delete(keys);
    }
}

8. Amazon SQS - Message Delivery Guarantees

SQS Standard vs FIFO

Property	SQS Standard	SQS FIFO
Ordering	Best-effort (not guaranteed)	Strictly ordered within message group
Delivery	At-least-once (duplicates possible)	Exactly-once processing
Throughput	Nearly unlimited	300 messages/sec per queue (3000 with batching)
Deduplication	Application must handle	Built-in (5-minute deduplication window)
Use Case	Decoupled processing, analytics	Workflows requiring strict ordering

Spring Boot FIFO Queue Configuration

@Configuration
public class SqsConfig {
 
    @Bean
    public SqsClient sqsClient() {
        return SqsClient.builder()
            .region(Region.US_EAST_1)
            .credentialsProvider(DefaultCredentialsProvider.create())
            .build();
    }
 
    @Bean
    public SqsTemplate sqsTemplate(SqsClient sqsClient) {
        return SqsTemplate.builder()
            .sqsAsyncClient(SqsAsyncClient.builder()
                .region(Region.US_EAST_1)
                .build())
            .build();
    }
}
 
@Service
@RequiredArgsConstructor
@Slf4j
public class OrderEventPublisher {
 
    private final SqsClient sqsClient;
 
    private static final String FIFO_QUEUE_URL =
        "https://sqs.us-east-1.amazonaws.com/123456789/order-events.fifo";
 
    // Publish to FIFO queue with ordering guarantee per customer
    public void publishOrderEvent(String customerId, String orderId, Object event) {
        try {
            String messageBody = objectMapper.writeValueAsString(event);
 
            SendMessageRequest request = SendMessageRequest.builder()
                .queueUrl(FIFO_QUEUE_URL)
                .messageBody(messageBody)
                .messageGroupId(customerId)          // All events for a customer are ordered
                .messageDeduplicationId(orderId + ":" + event.getClass().getSimpleName())
                .build();
 
            sqsClient.sendMessage(request);
            log.debug("Published {} for order {} to SQS FIFO", event.getClass().getSimpleName(), orderId);
 
        } catch (JsonProcessingException e) {
            throw new EventPublishException("Failed to serialize event", e);
        }
    }
}
 
@Component
@RequiredArgsConstructor
@Slf4j
public class OrderEventConsumer {
 
    private final OrderService orderService;
 
    // SQS listener with visibility timeout for at-least-once processing
    @SqsListener(value = "order-events.fifo",
                 maxNumberOfMessages = "10",
                 visibilityTimeout = "30",  // 30 seconds to process before re-queuing
                 waitTimeSeconds = "20")    // Long polling -- reduces empty receive calls
    public void handleOrderEvent(OrderEvent event, @Header("messageId") String messageId) {
        try {
            log.info("Processing order event: {} messageId: {}", event.getType(), messageId);
            orderService.processEvent(event);
        } catch (Exception e) {
            log.error("Failed to process order event {}: {}", messageId, e.getMessage(), e);
            // Don't catch the exception -- SQS will make message visible again
            // After maxReceiveCount exceeded, message goes to Dead Letter Queue
            throw e;
        }
    }
}

Dead Letter Queue Setup

// DLQ allows inspecting failed messages
// Configure in CloudFormation / CDK:
// RedrivePolicy: maxReceiveCount: 3 -> after 3 failures, send to DLQ
 
@Component
@RequiredArgsConstructor
@Slf4j
public class DeadLetterQueueProcessor {
 
    private final SqsClient sqsClient;
    private final AlertService alertService;
 
    // Process DLQ messages for analysis and alerting
    @Scheduled(fixedDelay = 60000)
    public void monitorDeadLetterQueue() {
        ReceiveMessageRequest request = ReceiveMessageRequest.builder()
            .queueUrl("https://sqs.us-east-1.amazonaws.com/123456789/order-events-dlq.fifo")
            .maxNumberOfMessages(10)
            .waitTimeSeconds(1)
            .build();
 
        List<Message> messages = sqsClient.receiveMessage(request).messages();
        if (!messages.isEmpty()) {
            log.error("CRITICAL: {} messages in Dead Letter Queue!", messages.size());
            messages.forEach(msg -> {
                alertService.sendDLQAlert(msg.messageId(), msg.body());
                log.error("DLQ Message: id={}, body={}", msg.messageId(), msg.body());
            });
        }
    }
}

9. Amazon S3 - Consistency Model

S3 Strong Read-After-Write Consistency (Since December 2020)

Before December 2020, S3 had eventually consistent reads for new objects in some scenarios. Since December 2020, S3 provides strong read-after-write consistency for all operations:

After a successful PUT of a new object, immediately visible in GET and LIST
After a successful DELETE, immediately reflects in GET (returns 404)
After a successful PUT overwrite, immediately visible in GET

This means: If your application writes a file to S3 and immediately reads it back, you will always see the latest version. No need for workarounds.

@Service
@RequiredArgsConstructor
public class S3ConsistentService {
 
    private final S3Client s3Client;
    private static final String BUCKET = "myapp-documents-prod";
 
    // Write then immediately read -- guaranteed to see the written file
    public String uploadAndVerify(String key, byte[] content) {
        PutObjectRequest putRequest = PutObjectRequest.builder()
            .bucket(BUCKET)
            .key(key)
            .contentType("application/octet-stream")
            .build();
 
        s3Client.putObject(putRequest, RequestBody.fromBytes(content));
 
        // Strong consistency -- no need to wait or retry
        GetObjectRequest getRequest = GetObjectRequest.builder()
            .bucket(BUCKET)
            .key(key)
            .build();
 
        byte[] retrieved = s3Client.getObjectAsBytes(getRequest).asByteArray();
        // retrieved will always match content -- S3 guarantees this post-2020
 
        return key;
    }
 
    // S3 versioning + consistent listing
    public List<String> listDocumentVersions(String prefix) {
        ListObjectVersionsRequest listRequest = ListObjectVersionsRequest.builder()
            .bucket(BUCKET)
            .prefix(prefix)
            .build();
 
        return s3Client.listObjectVersions(listRequest)
            .versions()
            .stream()
            .map(ObjectVersion::key)
            .collect(Collectors.toList());
    }
}

10. Multi-Region Architecture Patterns

Active-Passive (Disaster Recovery)

Primary Region (us-east-1) -- ACTIVE
  - Aurora MySQL Writer
  - All writes go here
  - All critical reads go here
  - ElastiCache primary

Secondary Region (eu-west-1) -- PASSIVE (warm standby)
  - Aurora Global Database replica (read-only)
  - Receives replication from primary (~100ms lag)
  - Promoted to primary only during disaster

Route53 Health Check Failover:
  - Primary DNS: myapp.example.com --> us-east-1 load balancer
  - If health check fails: --> eu-west-1 load balancer (automatic failover)
  - DNS TTL: 60 seconds (low for fast failover)

Active-Active (Multi-Region Write)

The hardest consistency problem. Both regions accept writes.

Challenge: Concurrent writes to same data in different regions = conflicts.

Solutions:

Geo-partition: Route users to a single region based on their location. US users always write to us-east-1. EU users always write to eu-west-1. Never cross-region writes for the same user data.
DynamoDB Global Tables: Built-in multi-region replication with LWW conflict resolution.
Event sourcing + CRDT: Eventual consistency with auto-merge.

@Service
public class GeoRoutingService {
 
    // Route user writes to their "home" region to avoid conflicts
    public String getHomeRegion(String userId) {
        // Consistent hashing based on userId
        // All writes for userId always go to the same region
        int hash = Math.abs(userId.hashCode());
        String[] regions = {"us-east-1", "eu-west-1", "ap-southeast-1"};
        return regions[hash % regions.length];
    }
 
    public boolean isLocalRegion(String userId) {
        String awsCurrentRegion = System.getenv("AWS_REGION");
        return awsCurrentRegion.equals(getHomeRegion(userId));
    }
}

11. AWS CDK Infrastructure as Code Examples

Aurora MySQL Cluster (Java CDK)

public class AuroraMySqlStack extends Stack {
 
    public AuroraMySqlStack(final Construct scope, final String id, final StackProps props) {
        super(scope, id, props);
 
        Vpc vpc = Vpc.Builder.create(this, "AppVpc")
            .maxAzs(3)
            .build();
 
        DatabaseCluster auroraCluster = DatabaseCluster.Builder.create(this, "AuroraCluster")
            .engine(DatabaseClusterEngine.auroraMysql(
                AuroraMysqlClusterEngineProps.builder()
                    .version(AuroraMysqlEngineVersion.VER_8_0_28)
                    .build()))
            .credentials(Credentials.fromGeneratedSecret("admin",
                CredentialsFromGeneratedSecretOptions.builder()
                    .secretName("myapp/aurora/credentials")
                    .build()))
            .instanceProps(InstanceProps.builder()
                .instanceType(InstanceType.of(InstanceClass.R6G, InstanceSize.LARGE))
                .vpcSubnets(SubnetSelection.builder()
                    .subnetType(SubnetType.PRIVATE_WITH_EGRESS)
                    .build())
                .vpc(vpc)
                .build())
            .instances(3)               // 1 writer + 2 readers
            .storageEncrypted(true)
            .deletionProtection(true)
            .backup(BackupProps.builder()
                .retention(Duration.days(14))
                .preferredWindow("03:00-04:00")  // 3 AM UTC
                .build())
            .cloudwatchLogsExports(List.of("slowquery", "error", "general"))
            .cloudwatchLogsRetention(RetentionDays.THREE_MONTHS)
            .build();
 
        // Output cluster endpoints
        CfnOutput.Builder.create(this, "WriterEndpoint")
            .value(auroraCluster.getClusterEndpoint().getHostname())
            .build();
 
        CfnOutput.Builder.create(this, "ReaderEndpoint")
            .value(auroraCluster.getClusterReadEndpoint().getHostname())
            .build();
    }
}

12. Monitoring Consistency on AWS

CloudWatch Metrics to Track

@Component
@RequiredArgsConstructor
@Slf4j
public class ConsistencyMetricsCollector {
 
    private final MeterRegistry meterRegistry;
    private final CloudWatchClient cloudWatchClient;
 
    @Scheduled(fixedRate = 60000)
    public void collectConsistencyMetrics() {
 
        // 1. Aurora Replica Lag
        double auroraReplicaLag = getCloudWatchMetric(
            "AWS/RDS", "AuroraReplicaLag",
            "DBClusterIdentifier", "myapp-cluster");
 
        meterRegistry.gauge("consistency.aurora.replica_lag_ms",
            Tags.of("cluster", "myapp"), auroraReplicaLag);
 
        if (auroraReplicaLag > 5000) {
            log.warn("Aurora replica lag is {}ms -- reads from replica may be stale", auroraReplicaLag);
        }
 
        // 2. DynamoDB Replication Lag (Global Tables)
        double dynamoDbReplicaAge = getCloudWatchMetric(
            "AWS/DynamoDB", "ReplicationLatency",
            "TableName", "Products");
 
        meterRegistry.gauge("consistency.dynamodb.replication_latency_ms",
            Tags.of("table", "Products"), dynamoDbReplicaAge);
 
        // 3. SQS Queue Depth (backlog indicates processing delay)
        double sqsQueueDepth = getCloudWatchMetric(
            "AWS/SQS", "ApproximateNumberOfMessagesNotVisible",
            "QueueName", "order-events.fifo");
 
        meterRegistry.gauge("consistency.sqs.queue_depth",
            Tags.of("queue", "order-events"), sqsQueueDepth);
 
        // 4. Redis Replication Lag
        double redisReplicaLag = getCloudWatchMetric(
            "AWS/ElastiCache", "ReplicationLag",
            "CacheClusterId", "myapp-redis");
 
        meterRegistry.gauge("consistency.redis.replication_lag",
            Tags.of("cluster", "myapp-redis"), redisReplicaLag);
    }
 
    private double getCloudWatchMetric(String namespace, String metricName,
                                        String dimensionName, String dimensionValue) {
        GetMetricStatisticsRequest request = GetMetricStatisticsRequest.builder()
            .namespace(namespace)
            .metricName(metricName)
            .dimensions(Dimension.builder()
                .name(dimensionName)
                .value(dimensionValue)
                .build())
            .startTime(Instant.now().minus(Duration.ofMinutes(5)))
            .endTime(Instant.now())
            .period(60)
            .statistics(Statistic.AVERAGE)
            .build();
 
        List<Datapoint> datapoints = cloudWatchClient.getMetricStatistics(request).datapoints();
        return datapoints.isEmpty() ? 0.0 :
            datapoints.get(datapoints.size() - 1).average();
    }
}

CloudWatch Alarms for Consistency

# CloudFormation alarm definitions
 
AuroraReplicaLagAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    AlarmName: aurora-replica-lag-high
    AlarmDescription: "Aurora replica lag exceeds 5 seconds -- reads may be significantly stale"
    MetricName: AuroraReplicaLag
    Namespace: AWS/RDS
    Statistic: Average
    Period: 60
    EvaluationPeriods: 2
    Threshold: 5000 # 5 seconds in milliseconds
    ComparisonOperator: GreaterThanThreshold
    Dimensions:
      - Name: DBClusterIdentifier
        Value: myapp-production-cluster
    AlarmActions:
      - !Ref OpsTeamSNSTopic
    TreatMissingData: notBreaching
 
SQSDeadLetterQueueAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    AlarmName: sqs-dlq-messages-detected
    AlarmDescription: "Messages found in Dead Letter Queue -- events failing processing"
    MetricName: ApproximateNumberOfMessagesNotVisible
    Namespace: AWS/SQS
    Statistic: Sum
    Period: 60
    EvaluationPeriods: 1
    Threshold: 1
    ComparisonOperator: GreaterThanOrEqualToThreshold
    Dimensions:
      - Name: QueueName
        Value: order-events-dlq.fifo
    AlarmActions:
      - !Ref CriticalAlertsSNSTopic

Next: Part 5: Pitfalls, Anti-Patterns, and Trade-Offs -- What goes wrong in production, how to prevent it, and how to make architectural decisions.

Part of the Consistency Models Demystified series
Stack: Java 17, Spring Boot 3.x, MySQL 8.0, AWS SDK v2

Series: Consistency Models Demystified