system design · system-design
Design Netflix Encoding / Transcoding Pipeline
Per-title encoding, per-shot encoding, AV1, codec ladder, GPU-accelerated encoding, cost optimization.
Theory
Explanation
Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.
Static encoding ladder wastes bandwidth (action movies need more bits than dialogues). Per-title encoding picks ladder rungs based on title complexity. Per-shot encoding goes further, each shot optimized independently. AV1 codec gives ~30% better compression at 5x encoding cost.
Ingest master. Per-title: encode at many bitrate × resolution candidates, score quality with VMAF, pick convex hull (best Q at each rate). Per-shot: split video into shots (scene detection), apply per-shot encoding decisions. Codec choice: H.264 universal, H.265 better quality, AV1 best compression but expensive. GPU encoders (NVENC, AMF) for 10x speedup. Job orchestrator distributes encode tasks across thousands of workers.
When to use
Large video catalogs where bandwidth >> encoding cost.
When not to
Tiny catalog (default ladder fine).
flowchart LR Master[Master] --> Shot[Shot Detection] Shot --> Jobs[Encoding Jobs] Jobs --> Workers[Encoder Worker Farm · GPU] Workers --> Vmaf[VMAF Scorer] Vmaf --> Hull[Convex Hull Picker] Hull --> Ladder[Final Ladder] Ladder --> Pkg[Packager · HLS+DASH] Pkg --> Origin[Origin] Codec1[H.264] --> Workers Codec2[H.265] --> Workers Codec3[AV1] --> Workers
Key insights
- VMAF is the quality metric, perceptual, not PSNR.
- Per-title saves ~20% bandwidth; per-shot another 10%.
- AV1 takes 5-50x more compute than H.264, amortized over many viewers.
- GPU encoders sacrifice ~1% quality for 10x speed, used for live, less for VOD.
- Encoding budget: hundreds of GPU-hours per popular title across the ladder.