system design · system-design

Design YouTube (Video Upload + Encoding + CDN)

Prototypical Google SDI. Video upload, encoding pipeline, CDN, view counts, recommendations, comments.

expert5hgcpgeneralsystem-design
Ask GPTConfidence

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

Same content-pipeline-plus-CDN pattern as Netflix, but UGC: anyone can upload, billions of low-volume videos with long tail. Storage > bandwidth optimization. Distinctive features: real-time-ish view counts, comments, live streaming.

Upload: chunked resumable upload (Tus protocol) to ingest service, dedup by content hash. Transcode farm processes asynchronously, emits bitrate ladder. Hot videos pushed to edge POPs; long-tail served from regional origin via on-demand fill. View counter: client emits beacon at 30s threshold, Kafka → aggregator → counters (Bigtable). Comments: per-video sharded NewSQL. Recommendation: two-tower retrieval + ranker, watch history feeds candidate generation.

When to use

UGC video at scale, education platforms, social video.

When not to

Premium SVOD (use Prime/Netflix model). Live-only (different pipeline).

flowchart LR
  Up[Uploader] --> Ingest[Chunked Upload]
  Ingest --> Raw[(Raw S3/GCS)]
  Raw --> Tcode[Transcoder Farm]
  Tcode --> Ladder[(Encoded Ladder)]
  Ladder --> CDN[Edge CDN]
  Viewer([Viewer]) --> CDN
  Viewer -.30s beacon.-> ViewQ[[View Beacons · Kafka]]
  ViewQ --> Aggr[Counter Aggregator]
  Aggr --> Counts[(View Counts · Bigtable)]
  Comments[(Comments DB)] --> CDN
  Reco{{Recommendation}} --> CDN

Key insights

  • View count is the most-falsified metric. Bot filter + dedup by viewer device + 30s threshold = "official" count.
  • Long-tail videos cannot all live on edge, pull-through cache + colder regional tier.
  • Comments are write-heavy but per-video; shard by video_id, hot videos get more shards.
  • Recommendation candidate gen runs offline daily; online ranker scores top-1K candidates per request.
  • Live streaming reuses transcode farm but adds chunked HLS with 2-3s segments for low latency.