Consistency Models - Part 4: AWS Production Configurations
Navigation: Index | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6
Table of Contents
- Amazon RDS for MySQL - Production Setup
- Amazon Aurora MySQL - Architecture and Configuration
- Aurora Read Replicas - Consistency Management
- Aurora Global Database - Multi-Region Consistency
- Amazon DynamoDB - Consistency Models
- DynamoDB Transactions and Conditional Writes
- Amazon ElastiCache Redis - Cache Consistency
- Amazon SQS - Message Delivery Guarantees
- Amazon S3 - Consistency Model
- Multi-Region Architecture Patterns
- AWS CDK Infrastructure as Code Examples
- Monitoring Consistency on AWS
1. Amazon RDS for MySQL - Production Setup
Parameter Group for Consistency
AWS RDS for MySQL uses Parameter Groups to configure engine behavior. The following parameters are critical for consistency:
# RDS Parameter Group: production-mysql8-params
# These settings maximize durability and consistency
# Transaction logging -- CRITICAL for durability
innodb_flush_log_at_trx_commit = 1
# 0: Flush every second (risk: up to 1s data loss)
# 1: Flush on every commit (safest -- AWS default, cannot change)
# 2: Write to OS cache every commit, flush every second (some risk)
sync_binlog = 1
# 0: No sync (fastest but risky)
# 1: Sync on every commit (safest -- AWS default)
# N: Sync every N commits
# Isolation level
transaction_isolation = READ-COMMITTED
# Default in AWS RDS is REPEATABLE-READ
# READ-COMMITTED is recommended for most OLTP workloads
# (less locking, better concurrency, avoids long-running read views)
# Prevent table lock issues during DDL
innodb_online_alter_log_max_size = 134217728 # 128MB
# Row-based binary logging (required for read replicas)
binlog_format = ROW
# Full image logging for point-in-time recovery
binlog_row_image = FULL
# Enable GTID for more reliable replication
gtid_mode = ON
enforce_gtid_consistency = ON
# Connection management
wait_timeout = 28800 # 8 hours idle connection timeout
interactive_timeout = 28800
max_connections = 500 # Tune based on instance class
# Deadlock detection
innodb_deadlock_detect = ON # Default ON -- kill shorter transaction on deadlock
innodb_lock_wait_timeout = 5 # 5 seconds before lock wait timeout error
# Buffer pool (tune to 75% of instance RAM for db.r6g.large = 16GB RAM)
innodb_buffer_pool_size = 12884901888 # 12GB
innodb_buffer_pool_instances = 8 # 1 per GB for pools > 1GBRDS Multi-AZ Consistency Behavior
RDS Multi-AZ uses synchronous replication to the standby:
Primary (us-east-1a) ----[sync replication]----> Standby (us-east-1b)
| |
Write committed only when Standby confirms
standby confirms write write to storage
Consistency: Strong (synchronous)
Failover time: 60-120 seconds (DNS update based)
Data loss on failover: ZERO (synchronous)
Performance impact: ~1-2ms added per write (AZ round-trip)
Key Distinction: Multi-AZ standby is NOT a read replica. It does not serve reads. It only activates on primary failure. This is pure HA, not read scaling.
Spring Boot Configuration for RDS
spring:
datasource:
# Use the RDS instance endpoint (NOT read replica for writes)
url: jdbc:mysql://myapp-prod.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com:3306/myapp
username: ${DB_USERNAME}
password: ${DB_PASSWORD}
hikari:
maximum-pool-size: 10 # RDS instance max_connections / number of app pods
minimum-idle: 2
connection-timeout: 30000
idle-timeout: 600000
max-lifetime: 1800000 # Less than MySQL wait_timeout
keepalive-time: 60000 # Prevent idle connection drops by AWS VPC
connection-test-query: SELECT 1
data-source-properties:
useSSL: true
requireSSL: true
verifyServerCertificate: false # true in strict environments
characterEncoding: utf8mb4
characterSetResults: utf8mb4
connectionCollation: utf8mb4_unicode_ci
useCompression: false
rewriteBatchedStatements: true
cachePrepStmts: true
prepStmtCacheSize: 250
prepStmtCacheSqlLimit: 2048
autoReconnect: false # Let HikariCP manage reconnects
failOverReadOnly: false # Important: do not make writes read-only on failoverRDS Proxy for Connection Pooling
When running many Lambda functions or many ECS tasks, use RDS Proxy to prevent connection exhaustion:
# RDS Proxy configuration notes:
# - Maintains warm connections to RDS
# - Multiplexes many app connections onto fewer DB connections
# - Handles failover transparently (faster than DNS-based failover)
# - Supports IAM authentication (recommended over password in production)
# In Spring Boot, point to Proxy endpoint instead of RDS endpoint
spring.datasource.url: jdbc:mysql://myapp-proxy.proxy-xxxx.us-east-1.rds.amazonaws.com:3306/myapp
# For IAM auth with RDS Proxy, use AWS IAM token generator
# (Spring Boot AWS IAM authentication library handles this)2. Amazon Aurora MySQL - Architecture and Configuration
How Aurora Is Different from MySQL
Aurora MySQL is API-compatible with MySQL 8.0 but fundamentally different in architecture:
Standard MySQL Replication:
Primary ----[binlog transfer]----> Replica
(Full data pages copied)
Aurora Architecture:
Writer Instance ----[redo log only]----> Reader Instances
\
+--[6 copies across 3 AZs]----> Aurora Storage Cluster
(data exists in storage, not in instances)
Key benefits for consistency:
- Writer writes to storage (6 copies) synchronously -- strong durability
- Readers read from the same distributed storage -- much faster replica lag (typically 10-20ms vs 100ms+ for MySQL)
- No buffer pool divergence -- readers and writers see the same underlying storage
Cluster Endpoints
Aurora provides multiple endpoint types. Choosing the right one is critical for consistency:
Cluster Endpoint (Writer): myapp.cluster-xxxx.us-east-1.rds.amazonaws.com
- Always points to the current writer instance
- Use for ALL writes and reads requiring strong consistency
- After failover, DNS updates (typically 30 seconds)
Reader Endpoint: myapp.cluster-ro-xxxx.us-east-1.rds.amazonaws.com
- Load balances across all reader instances
- Eventually consistent reads (10-20ms lag typical)
- Use for read-heavy, latency-tolerant queries
Individual Instance Endpoints:
- Point to specific reader instances
- Useful for session-pinned reads (sticky sessions to same replica)
- myapp-instance-1.xxxx.us-east-1.rds.amazonaws.com
Custom Endpoints:
- Route specific workloads to specific replica subsets
- E.g., analytics queries to high-memory replicas
Aurora Configuration for Optimal Consistency
# Aurora Cluster Parameter Group
# Aurora-specific: faster replica lag
aurora_replica_read_consistency = EVENTUAL # default
# Options: EVENTUAL, SESSION (experimental), GLOBAL
# Commit latency optimization
aurora_enable_replica_log_compression = 1
# Binary log for CDC (if using Debezium)
binlog_format = ROW
binlog_row_image = FULL
log_bin = ON
# GTID-based replication
gtid_mode = ON
enforce_gtid_consistency = ON
# Transaction isolation (set at application level preferably)
transaction_isolation = READ-COMMITTED
# Write replication configuration
aurora_lab_mode = 0 # Never enable in production
rds.force_ssl = 1 # Require SSL
# Auto-restart wait (Aurora auto-restarts crashed instances)
innodb_lock_wait_timeout = 5
innodb_deadlock_detect = ONApplication YAML for Aurora
spring:
datasource:
# Aurora Writer -- for all writes
write:
url: jdbc:mysql://myapp.cluster-xxxx.us-east-1.rds.amazonaws.com:3306/myapp?useSSL=true&serverTimezone=UTC
username: ${AURORA_WRITE_USER}
password: ${AURORA_WRITE_PASSWORD}
hikari:
pool-name: AuroraWritePool
maximum-pool-size: 20
minimum-idle: 5
max-lifetime: 1740000 # 29 minutes (Aurora idle connection timeout is 30min)
keepalive-time: 60000
# Aurora Reader -- for eventually consistent reads
read:
url: jdbc:mysql://myapp.cluster-ro-xxxx.us-east-1.rds.amazonaws.com:3306/myapp?useSSL=true&serverTimezone=UTC
username: ${AURORA_READ_USER}
password: ${AURORA_READ_PASSWORD}
hikari:
pool-name: AuroraReadPool
maximum-pool-size: 40
minimum-idle: 10
read-only: true3. Aurora Read Replicas - Consistency Management
Understanding Replica Lag
Aurora reader instances are not perfectly in sync with the writer. There is a small replication lag:
- Typical: 10-20ms (much better than standard MySQL replicas)
- Under heavy write load: can be 100ms+
- Useful CloudWatch metric:
AuroraReplicaLag
Monitoring Replica Lag in Spring Boot
@Component
@RequiredArgsConstructor
@Slf4j
public class ReplicaLagMonitor {
private final NamedParameterJdbcTemplate readJdbcTemplate;
private final MeterRegistry meterRegistry;
// Check replica lag every 30 seconds
@Scheduled(fixedDelay = 30000)
public void checkReplicaLag() {
try {
// Aurora-specific: shows replica lag in milliseconds
Long lagMs = readJdbcTemplate.queryForObject(
"SELECT FLOOR(milliseconds_behind_master) FROM information_schema.replica_host_status LIMIT 1",
Map.of(),
Long.class
);
if (lagMs != null) {
meterRegistry.gauge("aurora.replica.lag.ms",
Tags.of("instance", "reader"), lagMs);
if (lagMs > 5000) { // 5 second lag is concerning
log.warn("Aurora replica lag is {}ms -- considering routing to writer", lagMs);
}
if (lagMs > 30000) { // 30 second lag is critical
log.error("CRITICAL: Aurora replica lag is {}ms", lagMs);
}
}
} catch (Exception e) {
log.error("Failed to check replica lag", e);
}
}
}Dynamic Read Routing Based on Lag
@Service
@RequiredArgsConstructor
public class AdaptiveReadRouter {
private final ReplicaLagMonitor replicaLagMonitor;
private static final long LAG_THRESHOLD_MS = 5000; // 5 seconds
public boolean shouldUseReplica() {
long currentLagMs = replicaLagMonitor.getCurrentLagMs();
return currentLagMs < LAG_THRESHOLD_MS;
}
public <T> T readWithFallback(Supplier<T> replicaRead, Supplier<T> primaryRead) {
if (shouldUseReplica()) {
return replicaRead.get();
}
log.info("Replica lag too high ({}ms), falling back to primary read",
replicaLagMonitor.getCurrentLagMs());
return primaryRead.get();
}
}4. Aurora Global Database - Multi-Region Consistency
Architecture
Aurora Global Database replicates an Aurora cluster to up to 5 secondary AWS regions. It uses storage-level replication (redo log shipping), not MySQL binlog.
Primary Region (us-east-1):
Aurora Writer + 2 Readers
All writes happen here
Secondary Region (eu-west-1):
Read-only Readers (replicated from us-east-1)
Typical replication lag: < 1 second (usually 100-300ms)
Can be promoted to primary in < 1 minute during disaster
Secondary Region (ap-southeast-1):
Same as above
Consistency Implications
- Reads from secondary regions are eventually consistent (100-300ms behind primary)
- For global users needing read-your-writes: route writes to primary, reads to nearest replica (with lag tolerance) or primary
- RPO: typically < 1 second for secondary regions
- RTO for failover: < 1 minute (with managed failover)
Spring Boot Multi-Region Configuration
@Configuration
public class MultiRegionDataSourceConfig {
@Value("${aws.region.current}")
private String currentRegion;
@Value("${aurora.global.primary.endpoint}")
private String primaryEndpoint;
@Value("${aurora.global.secondary.endpoint}")
private String secondaryEndpoint;
@Bean
public DataSource dataSource() {
// Always write to global primary (us-east-1)
// Read from local secondary (if running in eu-west-1)
// This provides low-latency reads with eventual consistency
if ("us-east-1".equals(currentRegion)) {
return buildDataSource(primaryEndpoint, false); // can write and read
} else {
// In secondary region: use local reader for reads, primary for writes
return buildRoutingDataSource(primaryEndpoint, secondaryEndpoint);
}
}
private DataSource buildRoutingDataSource(String writeEndpoint, String readEndpoint) {
RoutingDataSource routing = new RoutingDataSource();
routing.setTargetDataSources(Map.of(
"WRITE", buildDataSource(writeEndpoint, false),
"READ", buildDataSource(readEndpoint, true)
));
routing.setDefaultTargetDataSource(buildDataSource(writeEndpoint, false));
return routing;
}
}5. Amazon DynamoDB - Consistency Models
DynamoDB's Two Read Consistency Options
DynamoDB offers a per-request choice of consistency:
| Option | Consistent | Latency | Cost | Use For |
|---|---|---|---|---|
| Eventually Consistent | Eventually | Lower | 1 read unit | Reads tolerating <1s staleness |
| Strongly Consistent | Yes (quorum) | Higher | 2 read units | Critical reads requiring latest data |
Java SDK v2 - Consistency Configuration
@Configuration
public class DynamoDbConfig {
@Bean
public DynamoDbClient dynamoDbClient() {
return DynamoDbClient.builder()
.region(Region.US_EAST_1)
.credentialsProvider(DefaultCredentialsProvider.create())
.overrideConfiguration(ClientOverrideConfiguration.builder()
.apiCallTimeout(Duration.ofSeconds(5))
.apiCallAttemptTimeout(Duration.ofSeconds(2))
.retryPolicy(RetryPolicy.builder()
.numRetries(3)
.build())
.build())
.build();
}
@Bean
public DynamoDbEnhancedClient dynamoDbEnhancedClient(DynamoDbClient client) {
return DynamoDbEnhancedClient.builder()
.dynamoDbClient(client)
.build();
}
}
@Repository
@RequiredArgsConstructor
public class ProductDynamoRepository {
private final DynamoDbEnhancedClient enhancedClient;
// EVENTUALLY CONSISTENT read -- default, cheaper
public Optional<Product> findById(String productId) {
DynamoDbTable<Product> table = enhancedClient.table("Products", TableSchema.fromBean(Product.class));
GetItemEnhancedRequest request = GetItemEnhancedRequest.builder()
.key(Key.builder()
.partitionValue(productId)
.build())
.consistentRead(false) // Eventually consistent
.build();
return Optional.ofNullable(table.getItem(request));
}
// STRONGLY CONSISTENT read -- for critical operations
public Optional<Product> findByIdStrong(String productId) {
DynamoDbTable<Product> table = enhancedClient.table("Products", TableSchema.fromBean(Product.class));
GetItemEnhancedRequest request = GetItemEnhancedRequest.builder()
.key(Key.builder()
.partitionValue(productId)
.build())
.consistentRead(true) // Strongly consistent -- 2x cost
.build();
return Optional.ofNullable(table.getItem(request));
}
// For write operations, DynamoDB is always strongly consistent within a partition
public void save(Product product) {
DynamoDbTable<Product> table = enhancedClient.table("Products", TableSchema.fromBean(Product.class));
table.putItem(product);
}
}DynamoDB Entity with Optimistic Locking
@DynamoDbBean
@Getter
@Setter
public class Product {
private String productId;
private String name;
private BigDecimal price;
private Integer stockQuantity;
private Long version; // Optimistic locking token
private String status;
private Instant updatedAt;
@DynamoDbPartitionKey
public String getProductId() { return productId; }
@DynamoDbVersionAttribute
public Long getVersion() { return version; } // Auto-incremented by SDK, OCC enabled
}
@Repository
@RequiredArgsConstructor
public class ProductDynamoRepository {
private final DynamoDbEnhancedClient enhancedClient;
// This will automatically use version attribute for optimistic locking
// If version doesn't match (concurrent update), throws TransactionCanceledException
public void updateWithOptimisticLock(Product product) {
DynamoDbTable<Product> table = enhancedClient.table("Products",
TableSchema.fromBean(Product.class));
try {
table.updateItem(product); // version check is automatic due to @DynamoDbVersionAttribute
} catch (DynamoDbException e) {
if (e.getMessage().contains("ConditionalCheckFailed")) {
throw new OptimisticLockException("Product was modified by another transaction");
}
throw e;
}
}
// Manual conditional write (without @DynamoDbVersionAttribute)
public void atomicDecrementStock(String productId, int quantity, long expectedVersion) {
DynamoDbClient rawClient = enhancedClient.getDynamoDbClient();
UpdateItemRequest request = UpdateItemRequest.builder()
.tableName("Products")
.key(Map.of("productId", AttributeValue.fromS(productId)))
.updateExpression("SET stockQuantity = stockQuantity - :qty, " +
"version = :newVersion, " +
"updatedAt = :now")
.conditionExpression("version = :expectedVersion AND stockQuantity >= :qty")
.expressionAttributeValues(Map.of(
":qty", AttributeValue.fromN(String.valueOf(quantity)),
":expectedVersion", AttributeValue.fromN(String.valueOf(expectedVersion)),
":newVersion", AttributeValue.fromN(String.valueOf(expectedVersion + 1)),
":now", AttributeValue.fromS(Instant.now().toString())
))
.build();
try {
rawClient.updateItem(request);
} catch (ConditionalCheckFailedException e) {
throw new StockConflictException(
"Stock update failed: concurrent modification or insufficient stock");
}
}
}6. DynamoDB Transactions and Conditional Writes
DynamoDB ACID Transactions
DynamoDB supports transactions across multiple items (up to 100 items, 4MB total):
@Service
@RequiredArgsConstructor
public class OrderDynamoService {
private final DynamoDbClient dynamoDbClient;
// Atomic multi-item transaction
public void createOrderWithInventoryReservation(Order order,
String productId,
int quantity) {
// This entire operation is atomic -- all succeed or all fail
TransactWriteItemsRequest request = TransactWriteItemsRequest.builder()
.transactItems(
// 1. Create the order
TransactWriteItem.builder()
.put(Put.builder()
.tableName("Orders")
.item(convertToAttributeMap(order))
.conditionExpression("attribute_not_exists(orderId)") // Prevent duplicates
.build())
.build(),
// 2. Decrement inventory (with condition check)
TransactWriteItem.builder()
.update(Update.builder()
.tableName("Products")
.key(Map.of("productId", AttributeValue.fromS(productId)))
.updateExpression("SET stockQuantity = stockQuantity - :qty, " +
"version = version + :inc")
.conditionExpression("stockQuantity >= :qty AND attribute_exists(productId)")
.expressionAttributeValues(Map.of(
":qty", AttributeValue.fromN(String.valueOf(quantity)),
":inc", AttributeValue.fromN("1")
))
.build())
.build()
)
.build();
try {
dynamoDbClient.transactWriteItems(request);
} catch (TransactionCanceledException e) {
// One or more conditions failed
analyzeTransactionCancellation(e, productId, quantity);
}
}
private void analyzeTransactionCancellation(TransactionCanceledException e,
String productId, int quantity) {
e.cancellationReasons().forEach(reason -> {
if ("ConditionalCheckFailed".equals(reason.code())) {
throw new InsufficientStockException(productId, quantity);
}
});
throw new OrderCreationException("Order creation failed", e);
}
}DynamoDB Global Tables - Multi-Region Consistency
@Configuration
public class GlobalDynamoDbConfig {
// For global tables, writes can go to any region
// DynamoDB replicates using LWW (Last-Write-Wins) with server-side timestamps
// Typical cross-region replication lag: <1 second
@Bean
public DynamoDbClient dynamoDbClientPrimary() {
return DynamoDbClient.builder()
.region(Region.US_EAST_1)
.credentialsProvider(DefaultCredentialsProvider.create())
.build();
}
@Bean
public DynamoDbClient dynamoDbClientSecondary() {
return DynamoDbClient.builder()
.region(Region.EU_WEST_1) // European region
.credentialsProvider(DefaultCredentialsProvider.create())
.build();
}
}
// When using Global Tables, to avoid conflicts:
// 1. Use strongly consistent reads for critical operations
// 2. Use condition expressions to prevent lost updates
// 3. Route all writes for a user to a single region using Route53 geolocation
// 4. For critical sections, use DynamoDB transactions (within a region)7. Amazon ElastiCache Redis - Cache Consistency
ElastiCache Redis Cluster Configuration
# CloudFormation / CDK resource properties for ElastiCache Redis
ElastiCacheReplicationGroup:
Type: AWS::ElastiCache::ReplicationGroup
Properties:
ReplicationGroupDescription: "Production Redis Cluster"
NumNodeGroups: 3 # 3 shards
ReplicasPerNodeGroup: 2 # 2 replicas per shard (1 primary + 2 replicas)
CacheNodeType: cache.r6g.large # Memory-optimized
Engine: redis
EngineVersion: "7.0"
AtRestEncryptionEnabled: true
TransitEncryptionEnabled: true
AutomaticFailoverEnabled: true # Auto promote replica on primary failure
MultiAZEnabled: true # Replicas in different AZs
CacheParameterGroupFamily: redis7Redis Consistency Parameters
# ElastiCache Redis Parameter Group
# Key consistency-related parameters
# Persistence (affects durability, not just consistency)
appendonly yes # AOF persistence: survives restarts
appendfsync everysec # Sync AOF to disk every second (balance between perf and durability)
# Replication
min-slaves-to-write 1 # Require at least 1 replica to acknowledge write
min-slaves-max-lag 10 # Replica must be within 10 seconds of primary
# If min-slaves conditions not met, primary REFUSES writes
# This prevents split-brain writes
# Memory management
maxmemory-policy allkeys-lru # Evict LRU keys when memory fullSpring Boot Redis Configuration for Consistency
@Configuration
public class RedisConfig {
@Value("${spring.redis.cluster.nodes}")
private List<String> clusterNodes;
@Bean
public RedisConnectionFactory redisConnectionFactory() {
RedisClusterConfiguration clusterConfig =
new RedisClusterConfiguration(clusterNodes);
clusterConfig.setMaxRedirects(3);
LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
.commandTimeout(Duration.ofSeconds(2))
.readFrom(ReadFrom.REPLICA_PREFERRED) // Read from replica when possible
// Options: MASTER (strong), MASTER_PREFERRED, REPLICA_PREFERRED (eventual), REPLICA
.build();
return new LettuceConnectionFactory(clusterConfig, clientConfig);
}
@Bean
public RedisTemplate<String, Object> redisTemplate(RedisConnectionFactory factory) {
RedisTemplate<String, Object> template = new RedisTemplate<>();
template.setConnectionFactory(factory);
template.setKeySerializer(new StringRedisSerializer());
template.setValueSerializer(new GenericJackson2JsonRedisSerializer());
template.setHashKeySerializer(new StringRedisSerializer());
template.setHashValueSerializer(new GenericJackson2JsonRedisSerializer());
template.afterPropertiesSet();
return template;
}
// For operations requiring strong consistency (read from primary only)
@Bean("primaryOnlyTemplate")
public RedisTemplate<String, Object> primaryOnlyRedisTemplate(
LettuceConnectionFactory factory) {
// Override to always read from primary
LettuceClientConfiguration primaryConfig = LettuceClientConfiguration.builder()
.readFrom(ReadFrom.UPSTREAM) // Always read from primary
.build();
RedisClusterConfiguration clusterConfig =
new RedisClusterConfiguration(clusterNodes);
LettuceConnectionFactory primaryFactory =
new LettuceConnectionFactory(clusterConfig, primaryConfig);
primaryFactory.afterPropertiesSet();
RedisTemplate<String, Object> template = new RedisTemplate<>();
template.setConnectionFactory(primaryFactory);
template.setKeySerializer(new StringRedisSerializer());
template.setValueSerializer(new GenericJackson2JsonRedisSerializer());
template.afterPropertiesSet();
return template;
}
}Cache-Aside Pattern with Proper Consistency
@Service
@RequiredArgsConstructor
@Slf4j
public class ProductCacheService {
private final RedisTemplate<String, Product> redisTemplate;
private final ProductRepository productRepository;
private static final String CACHE_PREFIX = "product:v2:"; // versioned key
private static final Duration TTL = Duration.ofMinutes(30);
private static final String NULL_SENTINEL = "NULL_PLACEHOLDER";
// Cache-aside with negative caching (prevent DB hammering for missing items)
public Optional<Product> getProduct(Long productId) {
String key = CACHE_PREFIX + productId;
// 1. Check cache
Object cached = redisTemplate.opsForValue().get(key);
if (NULL_SENTINEL.equals(cached)) {
// Cached miss -- avoid DB lookup
return Optional.empty();
}
if (cached instanceof Product product) {
return Optional.of(product);
}
// 2. Cache miss -- load from DB
Optional<Product> dbProduct = productRepository.findById(productId);
if (dbProduct.isPresent()) {
redisTemplate.opsForValue().set(key, dbProduct.get(), TTL);
} else {
// Cache the miss too (negative caching) -- prevent stampede on missing keys
redisTemplate.opsForValue().set(key, NULL_SENTINEL, Duration.ofMinutes(5));
}
return dbProduct;
}
// Invalidation with version-based key rotation (no stale data possible)
@Transactional
public Product updateProduct(Long productId, UpdateProductRequest request) {
Product updated = productRepository.findById(productId)
.map(p -> { p.update(request); return productRepository.save(p); })
.orElseThrow(() -> new ProductNotFoundException(productId));
// Invalidate cache (delete, don't update -- avoids write-through complexity)
redisTemplate.delete(CACHE_PREFIX + productId);
// If concerned about time-of-check/time-of-use between delete and next read:
// Use a short TTL (e.g., 5 minutes) so stale data naturally expires
return updated;
}
// Batch invalidation (after bulk update)
public void invalidateBatch(List<Long> productIds) {
List<String> keys = productIds.stream()
.map(id -> CACHE_PREFIX + id)
.collect(Collectors.toList());
redisTemplate.delete(keys);
}
}8. Amazon SQS - Message Delivery Guarantees
SQS Standard vs FIFO
| Property | SQS Standard | SQS FIFO |
|---|---|---|
| Ordering | Best-effort (not guaranteed) | Strictly ordered within message group |
| Delivery | At-least-once (duplicates possible) | Exactly-once processing |
| Throughput | Nearly unlimited | 300 messages/sec per queue (3000 with batching) |
| Deduplication | Application must handle | Built-in (5-minute deduplication window) |
| Use Case | Decoupled processing, analytics | Workflows requiring strict ordering |
Spring Boot FIFO Queue Configuration
@Configuration
public class SqsConfig {
@Bean
public SqsClient sqsClient() {
return SqsClient.builder()
.region(Region.US_EAST_1)
.credentialsProvider(DefaultCredentialsProvider.create())
.build();
}
@Bean
public SqsTemplate sqsTemplate(SqsClient sqsClient) {
return SqsTemplate.builder()
.sqsAsyncClient(SqsAsyncClient.builder()
.region(Region.US_EAST_1)
.build())
.build();
}
}
@Service
@RequiredArgsConstructor
@Slf4j
public class OrderEventPublisher {
private final SqsClient sqsClient;
private static final String FIFO_QUEUE_URL =
"https://sqs.us-east-1.amazonaws.com/123456789/order-events.fifo";
// Publish to FIFO queue with ordering guarantee per customer
public void publishOrderEvent(String customerId, String orderId, Object event) {
try {
String messageBody = objectMapper.writeValueAsString(event);
SendMessageRequest request = SendMessageRequest.builder()
.queueUrl(FIFO_QUEUE_URL)
.messageBody(messageBody)
.messageGroupId(customerId) // All events for a customer are ordered
.messageDeduplicationId(orderId + ":" + event.getClass().getSimpleName())
.build();
sqsClient.sendMessage(request);
log.debug("Published {} for order {} to SQS FIFO", event.getClass().getSimpleName(), orderId);
} catch (JsonProcessingException e) {
throw new EventPublishException("Failed to serialize event", e);
}
}
}
@Component
@RequiredArgsConstructor
@Slf4j
public class OrderEventConsumer {
private final OrderService orderService;
// SQS listener with visibility timeout for at-least-once processing
@SqsListener(value = "order-events.fifo",
maxNumberOfMessages = "10",
visibilityTimeout = "30", // 30 seconds to process before re-queuing
waitTimeSeconds = "20") // Long polling -- reduces empty receive calls
public void handleOrderEvent(OrderEvent event, @Header("messageId") String messageId) {
try {
log.info("Processing order event: {} messageId: {}", event.getType(), messageId);
orderService.processEvent(event);
} catch (Exception e) {
log.error("Failed to process order event {}: {}", messageId, e.getMessage(), e);
// Don't catch the exception -- SQS will make message visible again
// After maxReceiveCount exceeded, message goes to Dead Letter Queue
throw e;
}
}
}Dead Letter Queue Setup
// DLQ allows inspecting failed messages
// Configure in CloudFormation / CDK:
// RedrivePolicy: maxReceiveCount: 3 -> after 3 failures, send to DLQ
@Component
@RequiredArgsConstructor
@Slf4j
public class DeadLetterQueueProcessor {
private final SqsClient sqsClient;
private final AlertService alertService;
// Process DLQ messages for analysis and alerting
@Scheduled(fixedDelay = 60000)
public void monitorDeadLetterQueue() {
ReceiveMessageRequest request = ReceiveMessageRequest.builder()
.queueUrl("https://sqs.us-east-1.amazonaws.com/123456789/order-events-dlq.fifo")
.maxNumberOfMessages(10)
.waitTimeSeconds(1)
.build();
List<Message> messages = sqsClient.receiveMessage(request).messages();
if (!messages.isEmpty()) {
log.error("CRITICAL: {} messages in Dead Letter Queue!", messages.size());
messages.forEach(msg -> {
alertService.sendDLQAlert(msg.messageId(), msg.body());
log.error("DLQ Message: id={}, body={}", msg.messageId(), msg.body());
});
}
}
}9. Amazon S3 - Consistency Model
S3 Strong Read-After-Write Consistency (Since December 2020)
Before December 2020, S3 had eventually consistent reads for new objects in some scenarios. Since December 2020, S3 provides strong read-after-write consistency for all operations:
- After a successful
PUTof a new object, immediately visible inGETandLIST - After a successful
DELETE, immediately reflects inGET(returns 404) - After a successful
PUToverwrite, immediately visible inGET
This means: If your application writes a file to S3 and immediately reads it back, you will always see the latest version. No need for workarounds.
@Service
@RequiredArgsConstructor
public class S3ConsistentService {
private final S3Client s3Client;
private static final String BUCKET = "myapp-documents-prod";
// Write then immediately read -- guaranteed to see the written file
public String uploadAndVerify(String key, byte[] content) {
PutObjectRequest putRequest = PutObjectRequest.builder()
.bucket(BUCKET)
.key(key)
.contentType("application/octet-stream")
.build();
s3Client.putObject(putRequest, RequestBody.fromBytes(content));
// Strong consistency -- no need to wait or retry
GetObjectRequest getRequest = GetObjectRequest.builder()
.bucket(BUCKET)
.key(key)
.build();
byte[] retrieved = s3Client.getObjectAsBytes(getRequest).asByteArray();
// retrieved will always match content -- S3 guarantees this post-2020
return key;
}
// S3 versioning + consistent listing
public List<String> listDocumentVersions(String prefix) {
ListObjectVersionsRequest listRequest = ListObjectVersionsRequest.builder()
.bucket(BUCKET)
.prefix(prefix)
.build();
return s3Client.listObjectVersions(listRequest)
.versions()
.stream()
.map(ObjectVersion::key)
.collect(Collectors.toList());
}
}10. Multi-Region Architecture Patterns
Active-Passive (Disaster Recovery)
Primary Region (us-east-1) -- ACTIVE
- Aurora MySQL Writer
- All writes go here
- All critical reads go here
- ElastiCache primary
Secondary Region (eu-west-1) -- PASSIVE (warm standby)
- Aurora Global Database replica (read-only)
- Receives replication from primary (~100ms lag)
- Promoted to primary only during disaster
Route53 Health Check Failover:
- Primary DNS: myapp.example.com --> us-east-1 load balancer
- If health check fails: --> eu-west-1 load balancer (automatic failover)
- DNS TTL: 60 seconds (low for fast failover)
Active-Active (Multi-Region Write)
The hardest consistency problem. Both regions accept writes.
Challenge: Concurrent writes to same data in different regions = conflicts.
Solutions:
- Geo-partition: Route users to a single region based on their location. US users always write to us-east-1. EU users always write to eu-west-1. Never cross-region writes for the same user data.
- DynamoDB Global Tables: Built-in multi-region replication with LWW conflict resolution.
- Event sourcing + CRDT: Eventual consistency with auto-merge.
@Service
public class GeoRoutingService {
// Route user writes to their "home" region to avoid conflicts
public String getHomeRegion(String userId) {
// Consistent hashing based on userId
// All writes for userId always go to the same region
int hash = Math.abs(userId.hashCode());
String[] regions = {"us-east-1", "eu-west-1", "ap-southeast-1"};
return regions[hash % regions.length];
}
public boolean isLocalRegion(String userId) {
String awsCurrentRegion = System.getenv("AWS_REGION");
return awsCurrentRegion.equals(getHomeRegion(userId));
}
}11. AWS CDK Infrastructure as Code Examples
Aurora MySQL Cluster (Java CDK)
public class AuroraMySqlStack extends Stack {
public AuroraMySqlStack(final Construct scope, final String id, final StackProps props) {
super(scope, id, props);
Vpc vpc = Vpc.Builder.create(this, "AppVpc")
.maxAzs(3)
.build();
DatabaseCluster auroraCluster = DatabaseCluster.Builder.create(this, "AuroraCluster")
.engine(DatabaseClusterEngine.auroraMysql(
AuroraMysqlClusterEngineProps.builder()
.version(AuroraMysqlEngineVersion.VER_8_0_28)
.build()))
.credentials(Credentials.fromGeneratedSecret("admin",
CredentialsFromGeneratedSecretOptions.builder()
.secretName("myapp/aurora/credentials")
.build()))
.instanceProps(InstanceProps.builder()
.instanceType(InstanceType.of(InstanceClass.R6G, InstanceSize.LARGE))
.vpcSubnets(SubnetSelection.builder()
.subnetType(SubnetType.PRIVATE_WITH_EGRESS)
.build())
.vpc(vpc)
.build())
.instances(3) // 1 writer + 2 readers
.storageEncrypted(true)
.deletionProtection(true)
.backup(BackupProps.builder()
.retention(Duration.days(14))
.preferredWindow("03:00-04:00") // 3 AM UTC
.build())
.cloudwatchLogsExports(List.of("slowquery", "error", "general"))
.cloudwatchLogsRetention(RetentionDays.THREE_MONTHS)
.build();
// Output cluster endpoints
CfnOutput.Builder.create(this, "WriterEndpoint")
.value(auroraCluster.getClusterEndpoint().getHostname())
.build();
CfnOutput.Builder.create(this, "ReaderEndpoint")
.value(auroraCluster.getClusterReadEndpoint().getHostname())
.build();
}
}12. Monitoring Consistency on AWS
CloudWatch Metrics to Track
@Component
@RequiredArgsConstructor
@Slf4j
public class ConsistencyMetricsCollector {
private final MeterRegistry meterRegistry;
private final CloudWatchClient cloudWatchClient;
@Scheduled(fixedRate = 60000)
public void collectConsistencyMetrics() {
// 1. Aurora Replica Lag
double auroraReplicaLag = getCloudWatchMetric(
"AWS/RDS", "AuroraReplicaLag",
"DBClusterIdentifier", "myapp-cluster");
meterRegistry.gauge("consistency.aurora.replica_lag_ms",
Tags.of("cluster", "myapp"), auroraReplicaLag);
if (auroraReplicaLag > 5000) {
log.warn("Aurora replica lag is {}ms -- reads from replica may be stale", auroraReplicaLag);
}
// 2. DynamoDB Replication Lag (Global Tables)
double dynamoDbReplicaAge = getCloudWatchMetric(
"AWS/DynamoDB", "ReplicationLatency",
"TableName", "Products");
meterRegistry.gauge("consistency.dynamodb.replication_latency_ms",
Tags.of("table", "Products"), dynamoDbReplicaAge);
// 3. SQS Queue Depth (backlog indicates processing delay)
double sqsQueueDepth = getCloudWatchMetric(
"AWS/SQS", "ApproximateNumberOfMessagesNotVisible",
"QueueName", "order-events.fifo");
meterRegistry.gauge("consistency.sqs.queue_depth",
Tags.of("queue", "order-events"), sqsQueueDepth);
// 4. Redis Replication Lag
double redisReplicaLag = getCloudWatchMetric(
"AWS/ElastiCache", "ReplicationLag",
"CacheClusterId", "myapp-redis");
meterRegistry.gauge("consistency.redis.replication_lag",
Tags.of("cluster", "myapp-redis"), redisReplicaLag);
}
private double getCloudWatchMetric(String namespace, String metricName,
String dimensionName, String dimensionValue) {
GetMetricStatisticsRequest request = GetMetricStatisticsRequest.builder()
.namespace(namespace)
.metricName(metricName)
.dimensions(Dimension.builder()
.name(dimensionName)
.value(dimensionValue)
.build())
.startTime(Instant.now().minus(Duration.ofMinutes(5)))
.endTime(Instant.now())
.period(60)
.statistics(Statistic.AVERAGE)
.build();
List<Datapoint> datapoints = cloudWatchClient.getMetricStatistics(request).datapoints();
return datapoints.isEmpty() ? 0.0 :
datapoints.get(datapoints.size() - 1).average();
}
}CloudWatch Alarms for Consistency
# CloudFormation alarm definitions
AuroraReplicaLagAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: aurora-replica-lag-high
AlarmDescription: "Aurora replica lag exceeds 5 seconds -- reads may be significantly stale"
MetricName: AuroraReplicaLag
Namespace: AWS/RDS
Statistic: Average
Period: 60
EvaluationPeriods: 2
Threshold: 5000 # 5 seconds in milliseconds
ComparisonOperator: GreaterThanThreshold
Dimensions:
- Name: DBClusterIdentifier
Value: myapp-production-cluster
AlarmActions:
- !Ref OpsTeamSNSTopic
TreatMissingData: notBreaching
SQSDeadLetterQueueAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: sqs-dlq-messages-detected
AlarmDescription: "Messages found in Dead Letter Queue -- events failing processing"
MetricName: ApproximateNumberOfMessagesNotVisible
Namespace: AWS/SQS
Statistic: Sum
Period: 60
EvaluationPeriods: 1
Threshold: 1
ComparisonOperator: GreaterThanOrEqualToThreshold
Dimensions:
- Name: QueueName
Value: order-events-dlq.fifo
AlarmActions:
- !Ref CriticalAlertsSNSTopicNext: Part 5: Pitfalls, Anti-Patterns, and Trade-Offs -- What goes wrong in production, how to prevent it, and how to make architectural decisions.
Part of the Consistency Models Demystified series
Stack: Java 17, Spring Boot 3.x, MySQL 8.0, AWS SDK v2