system design · system-design

Design Redis (Single-Threaded + Persistence + Cluster)

Single-threaded event loop, RDB+AOF persistence, replication, cluster mode. Meta E6+ infra round.

expert4hredissystem-design
Ask GPTConfidence

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

Single-threaded execution avoids locks entirely; all ops are atomic. Persistence is optional via RDB snapshots + AOF append-log. Replication is async leader-follower. Cluster shards by 16384 hash slots.

Event loop processes commands serially. RDB: fork() child writes memory snapshot to disk periodically. AOF: append every write to log; rewritten periodically for compaction. Replication: master streams commands to replicas async; replicas can serve reads. Cluster: slot = CRC16(key) % 16384; slots distributed across N masters; clients learn slot map; gossip protocol for membership.

When to use

In-memory cache, session store, leaderboards, pub/sub, distributed locks (carefully).

When not to

Data larger than memory. Strict durability without performance hit.

flowchart TB
  Client([Client]) --> EL[Single-Threaded Event Loop]
  EL --> Mem[(In-Memory Data)]
  Mem -.fork+save.-> RDB[(RDB Snapshot)]
  EL --> AOF[(AOF Append Log)]
  Master[Master] -.async stream.-> R1[Replica 1]
  Master -.async stream.-> R2[Replica 2]
  C1[Master · slots 0..5460] -.gossip.-> C2[Master · slots 5461..10922]
  C2 -.gossip.-> C3[Master · slots 10923..16383]

Key insights

  • Single-threaded works because ops are O(1)/O(log N) and memory-resident, no I/O blocking.
  • Slow command (KEYS * on big db) stalls everything. Use SCAN.
  • AOF + appendfsync everysec is the durability/perf sweet spot.
  • Async replication means failover can lose seconds of writes. Use synchronous WAIT for stricter durability.
  • Cluster cross-slot transactions not supported, pin related keys with hash tags {tag}key.