system design · system-design

Design Amazon Customer-Order-Product DB

Relational + NoSQL hybrid, OLTP vs OLAP separation, global tables. Tests data-modeling decisions at multi-region scale.

hard4hawsgeneralsqlsystem-design
Ask GPTConfidence

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

No one DB does everything. Customer profile (low write, joined reads) fits relational. Orders (high write, sharded reads) fit DynamoDB. Search uses denormalized index. Analytics uses columnar warehouse. Glue together with CDC.

Polyglot persistence: Aurora MySQL for customer/account (ACID, joins). DynamoDB for orders (sharded by customer_id, secondary index by order_id, global table for multi-region). OpenSearch for product search (CDC from catalog). Redshift / S3 + Athena for analytics (loaded via Kinesis Firehose from CDC streams). Each store owns its access pattern; CDC keeps them in sync.

When to use

Multi-tenant SaaS, marketplaces, anything with mixed OLTP + OLAP + search workloads.

When not to

Small data, one access pattern, single Postgres beats polyglot complexity.

flowchart LR
  App[App Servers] --> Aur[(Aurora MySQL · Customers)]
  App --> Dy[(DynamoDB · Orders · sharded)]
  App --> OS[(OpenSearch · Catalog)]
  Aur -.CDC.-> Stream[[Kinesis Stream]]
  Dy -.CDC.-> Stream
  Stream --> FH[Kinesis Firehose]
  FH --> S3[(S3 Data Lake)]
  S3 --> RS[(Redshift)]
  S3 --> Ath[Athena]

Key insights

  • Choose DB per access pattern, not per "we standardized on X".
  • OLTP and OLAP are different beasts. Co-locating tanks analytics under prod load.
  • CDC is the spine, must be reliable, idempotent, and ordered per key.
  • Global tables for orders give regional read locality; multi-master writes require conflict resolution rules.
  • Schema-on-read for analytics (Parquet on S3 + Athena) is cheaper than rigid warehouse schemas during exploration.