system design · system-design

Design Netflix Encoding / Transcoding Pipeline

Per-title encoding, per-shot encoding, AV1, codec ladder, GPU-accelerated encoding, cost optimization.

hard4hgeneralsystem-design
Ask GPTConfidence

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

Static encoding ladder wastes bandwidth (action movies need more bits than dialogues). Per-title encoding picks ladder rungs based on title complexity. Per-shot encoding goes further, each shot optimized independently. AV1 codec gives ~30% better compression at 5x encoding cost.

Ingest master. Per-title: encode at many bitrate × resolution candidates, score quality with VMAF, pick convex hull (best Q at each rate). Per-shot: split video into shots (scene detection), apply per-shot encoding decisions. Codec choice: H.264 universal, H.265 better quality, AV1 best compression but expensive. GPU encoders (NVENC, AMF) for 10x speedup. Job orchestrator distributes encode tasks across thousands of workers.

When to use

Large video catalogs where bandwidth >> encoding cost.

When not to

Tiny catalog (default ladder fine).

flowchart LR
  Master[Master] --> Shot[Shot Detection]
  Shot --> Jobs[Encoding Jobs]
  Jobs --> Workers[Encoder Worker Farm · GPU]
  Workers --> Vmaf[VMAF Scorer]
  Vmaf --> Hull[Convex Hull Picker]
  Hull --> Ladder[Final Ladder]
  Ladder --> Pkg[Packager · HLS+DASH]
  Pkg --> Origin[Origin]
  Codec1[H.264] --> Workers
  Codec2[H.265] --> Workers
  Codec3[AV1] --> Workers

Key insights

  • VMAF is the quality metric, perceptual, not PSNR.
  • Per-title saves ~20% bandwidth; per-shot another 10%.
  • AV1 takes 5-50x more compute than H.264, amortized over many viewers.
  • GPU encoders sacrifice ~1% quality for 10x speed, used for live, less for VOD.
  • Encoding budget: hundreds of GPU-hours per popular title across the ladder.