system design · system-design

Design Facebook Live (Low-Latency Streaming + Live Comments)

Live video at scale + concurrent comments. Tests RTMP ingest + HLS/LL-HLS distribution + WebSocket fan-out.

hard4hgeneralsystem-design

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

Live = continuous flow from one broadcaster to millions of viewers. Use video CDN for video, separate WebSocket fan-out for comments. Sub-second latency is impossible with classic HLS; use LL-HLS for ~2s glass-to-glass.

Broadcaster ingest via RTMP to nearest POP. Edge transcoder splits to bitrate ladder. LL-HLS chunks (200-400ms) published to CDN. Viewer playlist updates every chunk. Concurrent comments: WebSocket gateway per region; client subscribes to topic = live_id. Comment goes through moderation classifier → published to topic → fan-out via Redis pub/sub to gateway workers → push to subscribers. Comments rate-limited per user + per live.

When to use

Live sports, concerts, gaming streams, live commerce.

When not to

VOD, different optimization. Sub-second live (WebRTC instead).

flowchart LR
  Broadcaster([Broadcaster]) -->|RTMP| Ingest[Ingest POP]
  Ingest --> Tcode[Live Transcoder]
  Tcode --> LLHLS[LL-HLS Packager]
  LLHLS --> CDN[Video CDN]
  Viewer([Viewer]) --> CDN
  Viewer --> WSGW[Comment WS Gateway]
  WSGW --> Pub[(Redis Pub/Sub · per live)]
  Pub --> WSGW
  Commenter([Commenter]) --> WSGW
  WSGW --> Mod{ML Moderation}

Key insights

LL-HLS chunk size trades off latency vs request rate. 400ms chunks = ~1.5s latency.
Comment fan-out is N-to-millions per live. Use Redis pub/sub per topic with sharded gateway workers.
Moderation cannot block comment publish, fast classifier inline; deep classifier async with delete-after-publish.
Concurrent viewers per stream is uneven, sports streams peak, most lives small. Auto-scale gateways.
Slow consumers must drop messages, never back-pressure the publisher.