system design · system-design

Design a Distributed Cache (Memcached / mcrouter)

Consistent hashing, mcrouter routing, gutter pool, lease tokens. Meta E6+ infra design signature.

expert5hgeneralredissystem-design
Ask GPTConfidence

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

Memcached gives O(1) GET/SET but the cluster manages many failure modes Meta engineered around: hot keys, thundering herd, stale-reads after backend writes, regional failover. mcrouter is the proxy layer that hides all this from clients.

Clients talk to mcrouter (sidecar). mcrouter handles: consistent hashing to memcached nodes, gutter pool for failed-node fallback, lease tokens to prevent stale-set after DB write, intra-pool replication for hot keys, cross-region invalidation via async event log. On node failure, gutter pool accepts requests with short TTL; primary repaired in background.

When to use

Read-heavy workloads at scale (social, e-commerce, ads).

When not to

Strong consistency required, use DB. Sub-key fan-out workloads (use Redis).

flowchart LR
  App[App Server] --> MC[mcrouter sidecar]
  MC -->|consistent hash| Pool[Memcached Pool]
  Pool --> Node1[(Node 1)]
  Pool --> Node2[(Node 2)]
  Pool --> NodeN[(Node N)]
  MC -.on failure.-> Gutter[Gutter Pool · short TTL]
  App --> DB[(Backing DB)]
  DB -.invalidate.-> MC
  Region1[Region 1] -.async invalidate.-> Region2[Region 2]

Key insights

  • Lease tokens prevent stale-set: client gets a token with empty miss; only token-holder can set; concurrent updates yield to leader.
  • Gutter pool absorbs traffic during partial outage without exposing miss-storm to DB.
  • Hot key replicated to N nodes within pool; client picks random replica.
  • Cross-region invalidation via async log, eventually consistent, but reads from regional caches are fast.
  • mcrouter doubles as feature-flag layer, can route specific keys for migrations.