system design · system-design

Design Amazon Order Processing (Saga + Idempotency)

Saga pattern across cart → payment → inventory → fulfillment. Tests distributed-transaction handling without 2PC.

hard4hgeneralkafkasqlsystem-design
Ask GPTConfidence

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

A single order touches 4-6 services. Distributed 2PC is impractical at scale. Saga orchestrates a sequence of local transactions, each with a compensating action. If any step fails, run compensations in reverse order to restore consistency.

Saga orchestrator holds the order state machine: CREATED → INVENTORY_RESERVED → PAYMENT_CHARGED → SHIPMENT_CREATED → COMPLETED. Each step is a local transaction on its service. Failures trigger compensations: SHIPMENT_FAILED → refund payment → release inventory → cancel order. Idempotency keys on every endpoint (UUID per order step) prevent duplicate effects on retry. Outbox pattern guarantees events emit atomically with DB write.

When to use

Any multi-service business transaction: orders, refunds, account onboarding, ride dispatch.

When not to

Pure read flows. Single-DB transactions (use native ACID).

sequenceDiagram
  participant C as Client
  participant O as Order Saga
  participant I as Inventory
  participant P as Payment
  participant S as Shipment
  C->>O: POST /orders (idempotency-key)
  O->>I: reserve(order_id) [step 1]
  I-->>O: reserved · TTL 15min
  O->>P: charge(order_id, amount) [step 2]
  P-->>O: paid
  O->>S: create_shipment(order_id) [step 3]
  alt happy path
    S-->>O: shipped
    O-->>C: 200 confirmed
  else step 3 fails
    O->>P: refund(order_id)
    O->>I: release(order_id)
    O-->>C: 500 + reason
  end

Key insights

  • Orchestration (central coordinator) vs choreography (event chain), orchestration easier to reason about; choreography lower coupling. Pick based on team familiarity.
  • Compensation is not always perfect inverse, a refund is not the same as un-charging. Document semantic differences.
  • Idempotency keys must be persisted on the server, not just trusted from client.
  • Outbox pattern: same DB transaction inserts business row + outbox row; relay reads outbox → publishes to Kafka. Prevents lost events.
  • Sagas can run for hours (e.g. payment auth → capture delay). State machine must survive process restart.