system design · system-design
Design Redis (Single-Threaded + Persistence + Cluster)
Single-threaded event loop, RDB+AOF persistence, replication, cluster mode. Meta E6+ infra round.
Theory
Explanation
Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.
Single-threaded execution avoids locks entirely; all ops are atomic. Persistence is optional via RDB snapshots + AOF append-log. Replication is async leader-follower. Cluster shards by 16384 hash slots.
Event loop processes commands serially. RDB: fork() child writes memory snapshot to disk periodically. AOF: append every write to log; rewritten periodically for compaction. Replication: master streams commands to replicas async; replicas can serve reads. Cluster: slot = CRC16(key) % 16384; slots distributed across N masters; clients learn slot map; gossip protocol for membership.
When to use
In-memory cache, session store, leaderboards, pub/sub, distributed locks (carefully).
When not to
Data larger than memory. Strict durability without performance hit.
flowchart TB Client([Client]) --> EL[Single-Threaded Event Loop] EL --> Mem[(In-Memory Data)] Mem -.fork+save.-> RDB[(RDB Snapshot)] EL --> AOF[(AOF Append Log)] Master[Master] -.async stream.-> R1[Replica 1] Master -.async stream.-> R2[Replica 2] C1[Master · slots 0..5460] -.gossip.-> C2[Master · slots 5461..10922] C2 -.gossip.-> C3[Master · slots 10923..16383]
Key insights
- Single-threaded works because ops are O(1)/O(log N) and memory-resident, no I/O blocking.
- Slow command (KEYS * on big db) stalls everything. Use SCAN.
- AOF + appendfsync everysec is the durability/perf sweet spot.
- Async replication means failover can lose seconds of writes. Use synchronous WAIT for stricter durability.
- Cluster cross-slot transactions not supported, pin related keys with hash tags {tag}key.