← Back to Articles
6/6/2026Admin Post

saga demystified index

SAGA Patterns in Microservices and Distributed Systems - Complete Guide

Series Overview: A comprehensive, production-focused deep dive into SAGA patterns for building
reliable distributed systems. Written for Java developers using Spring Boot 3.x, AWS, and MySQL.


Series Navigation

PartTitleTopics Covered
Part 1Fundamentals and TheoryDistributed transaction problems, ACID vs BASE, CAP, 2PC failures, SAGA origins
Part 2Choreography PatternEvent-driven SAGA, Kafka, AWS MSK/SNS/SQS, full Spring Boot implementation
Part 3Orchestration PatternCentral orchestrator, AWS Step Functions, Axon, custom state machine
Part 4Deep Dive ImplementationOutbox pattern, idempotency, retry logic, distributed tracing, MySQL schemas
Part 5Advanced PatternsCQRS + SAGA, Event Sourcing, parallel steps, sub-sagas, large scale AWS
Part 6Pitfalls and Best PracticesAnti-patterns, isolation anomalies, production incidents and solutions
Part 7Interview Mastery60+ questions from entry-level to Principal Architect with full answers

What You Will Master

After completing this series you will be able to:

  • Explain WHY SAGA patterns exist and the exact problems they solve
  • Implement both Choreography and Orchestration SAGAs from scratch in Spring Boot
  • Design compensating transactions that are idempotent and safe
  • Integrate SAGAs with AWS services: Step Functions, SQS FIFO, SNS, MSK, DynamoDB
  • Design MySQL schemas for SAGA state persistence
  • Handle failures, retries, dead-letter queues, and out-of-order events
  • Apply the Transactional Outbox pattern to eliminate dual-write problems
  • Combine SAGAs with CQRS and Event Sourcing
  • Identify and fix every common SAGA anti-pattern
  • Ace SAGA-related interview questions at any seniority level

Prerequisites

Knowledge AreaWhy It Is Needed
Java 17+All code uses modern Java features (records, sealed classes)
Spring Boot 3.xPrimary framework for every implementation
Microservices architectureUnderstanding service boundaries and data ownership
Apache Kafka basicsEvent-driven communication in choreography
MySQL and JPAPersistence layer for domain state and saga state
AWS fundamentalsCloud-native integration patterns
REST / HTTP basicsSynchronous service-to-service communication

The Central Example: E-Commerce Order Processing

This entire series uses ONE consistent example to illustrate all concepts.

Customer Places Order
        |
        v
 +--------------+     +-----------------+     +--------------------+     +------------------+
 | Order Service|---->| Payment Service |---->| Inventory Service  |---->| Shipping Service |
 +--------------+     +-----------------+     +--------------------+     +------------------+
 | Create Order |     | Charge Card     |     | Reserve Items      |     | Schedule Pickup  |
 |              |     |                 |     |                    |     |                  |
 | COMPENSATION:|     | COMPENSATION:   |     | COMPENSATION:      |     | COMPENSATION:    |
 | Cancel Order |     | Refund Card     |     | Release Items      |     | Cancel Pickup    |
 +--------------+     +-----------------+     +--------------------+     +------------------+

Why this example?

  • It is realistic and widely understood
  • It has multiple services with different failure modes
  • Payment reversal is expensive and illustrates compensation importance
  • Inventory has race conditions (concurrent reservations)
  • Shipping involves external third-party calls (hardest to compensate)

Architecture at a Glance

SAGA PATTERN TYPES
       |
       +-----------------------------+-----------------------------+
       |                             |
  CHOREOGRAPHY                 ORCHESTRATION
  (Services talk via events)   (Central coordinator drives flow)
       |                             |
  Event Bus (Kafka/SQS)        Orchestrator service or
  SNS Topics                   AWS Step Functions
       |                             |
  Each service:                Orchestrator:
  - Listens for events         - Sends commands to services
  - Does local work            - Tracks state centrally
  - Emits result events        - Handles failures
  - Handles compensations      - Coordinates compensations
       |                             |
  PROS:                        PROS:
  + Loose coupling             + Full visibility of saga state
  + Independent scaling        + Easy debugging and monitoring
  + No single point of failure + Explicit business flow
       |                             |
  CONS:                        CONS:
  - Hard to trace flows        - Orchestrator = potential bottleneck
  - Risk of cyclic events      - Services coupled to orchestrator API
  - Testing is complex         - Added infrastructure

Technology Stack Used in This Series

LayerTechnologyVersion
LanguageJava17
FrameworkSpring Boot3.2.x
Build ToolMaven3.9.x
MessagingApache Kafka (AWS MSK)3.6.x
DatabaseMySQL8.0
CloudAWS-
WorkflowAWS Step Functions-
TracingAWS X-Ray + Spring Cloud Sleuth-
MonitoringAWS CloudWatch + Micrometer-
TestingJUnit 5 + Testcontainers-
SerializationJackson (JSON)2.16.x

Quick Navigation Guide

New to Distributed Transactions?

Familiar with Theory, Want Code?

Building Production Systems?

Debugging a Production Incident?

Preparing for an Interview?


Consistent Code Structure Throughout This Series

All code examples share the same domain model:

com.example.ordersaga
  order-service/
    domain/           -> Order, OrderItem, OrderStatus
    events/           -> OrderCreatedEvent, OrderCancelledEvent, ...
    commands/         -> CreateOrderCommand, CancelOrderCommand
    repository/       -> OrderRepository, SagaStateRepository
    service/          -> OrderService, OrderSagaService
    outbox/           -> OutboxEvent, OutboxPublisher
    config/           -> KafkaConfig, RetryConfig

  payment-service/
    domain/           -> Payment, PaymentStatus
    events/           -> PaymentProcessedEvent, PaymentFailedEvent, ...
    service/          -> PaymentService

  inventory-service/
    domain/           -> InventoryReservation, InventoryItem
    events/           -> InventoryReservedEvent, InventoryFailedEvent
    service/          -> InventoryService

  shipping-service/
    domain/           -> Shipment, ShipmentStatus
    events/           -> ShipmentCreatedEvent, ShipmentFailedEvent
    service/          -> ShippingService

  saga-orchestrator/   -> (Part 3 only)
    orchestrator/     -> OrderSagaOrchestrator
    statemachine/     -> SagaStateMachine
    state/            -> SagaState, SagaStep

How Each Part Is Structured

Every part follows this consistent format:

  1. Concept in Plain English - What is it and why does it exist
  2. Mental Model - How to visualize it
  3. Step-by-Step Breakdown - How it works mechanically
  4. Complete Code Example - Production-ready Spring Boot Java
  5. Configuration - application.yml and infrastructure config
  6. Best Practices - Tips from real production systems
  7. Common Mistakes - What to watch out for
  8. Summary Table - Key takeaways at a glance

Important Notes Before You Begin

Note on Eventual Consistency: SAGA patterns embrace eventual consistency. If your business
requirement absolutely demands strong consistency across services, SAGA may not be the right tool.
Part 1 explains exactly when to use and when to avoid SAGAs.

Note on Code: All code is production-ready but simplified for clarity. Real systems will need
additional security, monitoring, and resilience layers on top of what is shown here.

Note on AWS: AWS services are used throughout. Equivalent patterns exist for GCP and Azure.
The concepts are cloud-agnostic even when the implementation is AWS-specific.


Start Learning

Begin here: Part 1 - Fundamentals and Theory


Series created for engineers who want to truly master distributed transaction patterns,
not just know the definition.