SAGA Patterns in Microservices and Distributed Systems - Complete Guide
Series Overview: A comprehensive, production-focused deep dive into SAGA patterns for building
reliable distributed systems. Written for Java developers using Spring Boot 3.x, AWS, and MySQL.
Series Navigation
| Part | Title | Topics Covered |
|---|---|---|
| Part 1 | Fundamentals and Theory | Distributed transaction problems, ACID vs BASE, CAP, 2PC failures, SAGA origins |
| Part 2 | Choreography Pattern | Event-driven SAGA, Kafka, AWS MSK/SNS/SQS, full Spring Boot implementation |
| Part 3 | Orchestration Pattern | Central orchestrator, AWS Step Functions, Axon, custom state machine |
| Part 4 | Deep Dive Implementation | Outbox pattern, idempotency, retry logic, distributed tracing, MySQL schemas |
| Part 5 | Advanced Patterns | CQRS + SAGA, Event Sourcing, parallel steps, sub-sagas, large scale AWS |
| Part 6 | Pitfalls and Best Practices | Anti-patterns, isolation anomalies, production incidents and solutions |
| Part 7 | Interview Mastery | 60+ questions from entry-level to Principal Architect with full answers |
What You Will Master
After completing this series you will be able to:
- Explain WHY SAGA patterns exist and the exact problems they solve
- Implement both Choreography and Orchestration SAGAs from scratch in Spring Boot
- Design compensating transactions that are idempotent and safe
- Integrate SAGAs with AWS services: Step Functions, SQS FIFO, SNS, MSK, DynamoDB
- Design MySQL schemas for SAGA state persistence
- Handle failures, retries, dead-letter queues, and out-of-order events
- Apply the Transactional Outbox pattern to eliminate dual-write problems
- Combine SAGAs with CQRS and Event Sourcing
- Identify and fix every common SAGA anti-pattern
- Ace SAGA-related interview questions at any seniority level
Prerequisites
| Knowledge Area | Why It Is Needed |
|---|---|
| Java 17+ | All code uses modern Java features (records, sealed classes) |
| Spring Boot 3.x | Primary framework for every implementation |
| Microservices architecture | Understanding service boundaries and data ownership |
| Apache Kafka basics | Event-driven communication in choreography |
| MySQL and JPA | Persistence layer for domain state and saga state |
| AWS fundamentals | Cloud-native integration patterns |
| REST / HTTP basics | Synchronous service-to-service communication |
The Central Example: E-Commerce Order Processing
This entire series uses ONE consistent example to illustrate all concepts.
Customer Places Order
|
v
+--------------+ +-----------------+ +--------------------+ +------------------+
| Order Service|---->| Payment Service |---->| Inventory Service |---->| Shipping Service |
+--------------+ +-----------------+ +--------------------+ +------------------+
| Create Order | | Charge Card | | Reserve Items | | Schedule Pickup |
| | | | | | | |
| COMPENSATION:| | COMPENSATION: | | COMPENSATION: | | COMPENSATION: |
| Cancel Order | | Refund Card | | Release Items | | Cancel Pickup |
+--------------+ +-----------------+ +--------------------+ +------------------+
Why this example?
- It is realistic and widely understood
- It has multiple services with different failure modes
- Payment reversal is expensive and illustrates compensation importance
- Inventory has race conditions (concurrent reservations)
- Shipping involves external third-party calls (hardest to compensate)
Architecture at a Glance
SAGA PATTERN TYPES
|
+-----------------------------+-----------------------------+
| |
CHOREOGRAPHY ORCHESTRATION
(Services talk via events) (Central coordinator drives flow)
| |
Event Bus (Kafka/SQS) Orchestrator service or
SNS Topics AWS Step Functions
| |
Each service: Orchestrator:
- Listens for events - Sends commands to services
- Does local work - Tracks state centrally
- Emits result events - Handles failures
- Handles compensations - Coordinates compensations
| |
PROS: PROS:
+ Loose coupling + Full visibility of saga state
+ Independent scaling + Easy debugging and monitoring
+ No single point of failure + Explicit business flow
| |
CONS: CONS:
- Hard to trace flows - Orchestrator = potential bottleneck
- Risk of cyclic events - Services coupled to orchestrator API
- Testing is complex - Added infrastructure
Technology Stack Used in This Series
| Layer | Technology | Version |
|---|---|---|
| Language | Java | 17 |
| Framework | Spring Boot | 3.2.x |
| Build Tool | Maven | 3.9.x |
| Messaging | Apache Kafka (AWS MSK) | 3.6.x |
| Database | MySQL | 8.0 |
| Cloud | AWS | - |
| Workflow | AWS Step Functions | - |
| Tracing | AWS X-Ray + Spring Cloud Sleuth | - |
| Monitoring | AWS CloudWatch + Micrometer | - |
| Testing | JUnit 5 + Testcontainers | - |
| Serialization | Jackson (JSON) | 2.16.x |
Quick Navigation Guide
New to Distributed Transactions?
- Begin at Part 1 - Fundamentals
- Read every section before looking at code
Familiar with Theory, Want Code?
- Part 2 - Choreography for event-driven approach
- Part 3 - Orchestration for central coordination
Building Production Systems?
- Part 4 - Deep Dive Implementation for production patterns
- Part 6 - Pitfalls for what to avoid
Debugging a Production Incident?
Preparing for an Interview?
- Part 7 - Interview Mastery
- Reference other parts for complete answers
Consistent Code Structure Throughout This Series
All code examples share the same domain model:
com.example.ordersaga
order-service/
domain/ -> Order, OrderItem, OrderStatus
events/ -> OrderCreatedEvent, OrderCancelledEvent, ...
commands/ -> CreateOrderCommand, CancelOrderCommand
repository/ -> OrderRepository, SagaStateRepository
service/ -> OrderService, OrderSagaService
outbox/ -> OutboxEvent, OutboxPublisher
config/ -> KafkaConfig, RetryConfig
payment-service/
domain/ -> Payment, PaymentStatus
events/ -> PaymentProcessedEvent, PaymentFailedEvent, ...
service/ -> PaymentService
inventory-service/
domain/ -> InventoryReservation, InventoryItem
events/ -> InventoryReservedEvent, InventoryFailedEvent
service/ -> InventoryService
shipping-service/
domain/ -> Shipment, ShipmentStatus
events/ -> ShipmentCreatedEvent, ShipmentFailedEvent
service/ -> ShippingService
saga-orchestrator/ -> (Part 3 only)
orchestrator/ -> OrderSagaOrchestrator
statemachine/ -> SagaStateMachine
state/ -> SagaState, SagaStep
How Each Part Is Structured
Every part follows this consistent format:
- Concept in Plain English - What is it and why does it exist
- Mental Model - How to visualize it
- Step-by-Step Breakdown - How it works mechanically
- Complete Code Example - Production-ready Spring Boot Java
- Configuration - application.yml and infrastructure config
- Best Practices - Tips from real production systems
- Common Mistakes - What to watch out for
- Summary Table - Key takeaways at a glance
Important Notes Before You Begin
Note on Eventual Consistency: SAGA patterns embrace eventual consistency. If your business
requirement absolutely demands strong consistency across services, SAGA may not be the right tool.
Part 1 explains exactly when to use and when to avoid SAGAs.
Note on Code: All code is production-ready but simplified for clarity. Real systems will need
additional security, monitoring, and resilience layers on top of what is shown here.
Note on AWS: AWS services are used throughout. Equivalent patterns exist for GCP and Azure.
The concepts are cloud-agnostic even when the implementation is AWS-specific.
Start Learning
Begin here: Part 1 - Fundamentals and Theory
Series created for engineers who want to truly master distributed transaction patterns,
not just know the definition.