system design · system-design
Design Facebook Live (Low-Latency Streaming + Live Comments)
Live video at scale + concurrent comments. Tests RTMP ingest + HLS/LL-HLS distribution + WebSocket fan-out.
Theory
Explanation
Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.
Live = continuous flow from one broadcaster to millions of viewers. Use video CDN for video, separate WebSocket fan-out for comments. Sub-second latency is impossible with classic HLS; use LL-HLS for ~2s glass-to-glass.
Broadcaster ingest via RTMP to nearest POP. Edge transcoder splits to bitrate ladder. LL-HLS chunks (200-400ms) published to CDN. Viewer playlist updates every chunk. Concurrent comments: WebSocket gateway per region; client subscribes to topic = live_id. Comment goes through moderation classifier → published to topic → fan-out via Redis pub/sub to gateway workers → push to subscribers. Comments rate-limited per user + per live.
When to use
Live sports, concerts, gaming streams, live commerce.
When not to
VOD, different optimization. Sub-second live (WebRTC instead).
flowchart LR
Broadcaster([Broadcaster]) -->|RTMP| Ingest[Ingest POP]
Ingest --> Tcode[Live Transcoder]
Tcode --> LLHLS[LL-HLS Packager]
LLHLS --> CDN[Video CDN]
Viewer([Viewer]) --> CDN
Viewer --> WSGW[Comment WS Gateway]
WSGW --> Pub[(Redis Pub/Sub · per live)]
Pub --> WSGW
Commenter([Commenter]) --> WSGW
WSGW --> Mod{ML Moderation}Key insights
- LL-HLS chunk size trades off latency vs request rate. 400ms chunks = ~1.5s latency.
- Comment fan-out is N-to-millions per live. Use Redis pub/sub per topic with sharded gateway workers.
- Moderation cannot block comment publish, fast classifier inline; deep classifier async with delete-after-publish.
- Concurrent viewers per stream is uneven, sports streams peak, most lives small. Auto-scale gateways.
- Slow consumers must drop messages, never back-pressure the publisher.