system design · system-design
Design Netflix Video Streaming End-to-End
Encoding pipeline → multi-bitrate manifest → Open Connect CDN → ABR playback → resume, sub-2s start, 99.99% availability.
Theory
Explanation
Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.
Stack of layers each independently optimized: encode once (offline), fan-out to edge (Open Connect), serve manifest + segments adaptively, client switches bitrate on the fly. Goal: <2s startup, smooth playback, 99.99% availability.
Ingest → encoder farm produces per-title bitrate ladder (per-title encoding optimization). Package into HLS + DASH manifests. Push to Open Connect appliances inside ISPs. Client: requests manifest from control plane (auth), then fetches segments from nearest OCA. ABR algorithm picks rung based on measured throughput; switches without re-buffer. Resume position synced via per-user state service.
When to use
Premium SVOD platforms.
When not to
Live streaming (different optimization for latency).
flowchart LR Master[Master] --> Encode[Per-Title Encoder] Encode --> Ladder[Bitrate Ladder] Ladder --> Pkg[HLS+DASH Packager] Pkg --> Origin[(S3 Origin)] Origin --> OCA[Open Connect Appliances · in ISP] Client([Client]) --> Control[Control Plane · auth + manifest] Control --> Origin Client --> OCA Client -.ABR.-> OCA Resume[(Resume State)] -.cross-device.-> Client
Key insights
- Per-title encoding cuts bandwidth ~20% with no quality loss.
- Sub-2s startup requires manifest pre-loaded + first segment cached at edge.
- ABR switching happens segment-by-segment, never block playback.
- Resume state must be sync'd cross-device with <2s eventual consistency.
- 99.99% achieved via multi-region + chaos engineering testing every failure mode.