system design · system-design

Design Tesla Fleet Telemetry Pipeline

Vehicle → edge → cloud, millions of vehicles, sensor data compression, hot/cold tiering, anomaly detection. Tesla #1 SDI.

expert5hcppgeneralkafkasystem-design
Ask GPTConfidence

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

5M vehicles each emitting 100s of signals/second = trillions of data points per day. Edge compresses + aggregates; cellular ships compact events; cloud stores time-series + raw blobs for replay.

Per-vehicle agent: collects sensor channels, downsamples + delta-encodes, batches every 30s. Cellular upload via Tesla-operated SIM. Cloud ingest via Pulsar/Kafka. Time-series points → InfluxDB-like time-series store. Heavy raw blobs (camera/LiDAR clips) → object store. Anomaly detector consumes stream; alerts when patterns deviate. Engineers query via metadata search across fleet.

When to use

Connected-vehicle fleets, drone fleets, industrial IoT.

When not to

Sub-100 device fleets (in-memory works).

flowchart LR
  Vehicle[Vehicle Compute · NVIDIA Orin] --> Agent[Telemetry Agent]
  Agent --> Compress[Delta + Downsample]
  Compress -->|cellular| Ingest[Cloud Ingest · Pulsar]
  Ingest --> TS[(Time-Series Store)]
  Ingest --> Blob[(Raw Blob · S3 tier)]
  TS --> Anom[Anomaly Detector]
  Anom --> Alert[Alerts → Engineers]
  Eng([Engineer]) --> Query[Fleet Query API]
  Query --> TS
  Query --> Blob

Key insights

  • Edge compression is the bandwidth lever, 100x reduction by sending deltas + downsampling.
  • Time-series DB optimized for write-heavy + retention (1 year hot, 5 years cold).
  • Raw clips uploaded only on trigger (anomaly, accident, manual flag), cheaper than always-on.
  • Anomaly detection runs streaming; explicit thresholds + ML scoring per signal.
  • Fleet query needs metadata index (location, weather, firmware version) for filtering.