system design · system-design
Design Tesla Fleet Telemetry Pipeline
Vehicle → edge → cloud, millions of vehicles, sensor data compression, hot/cold tiering, anomaly detection. Tesla #1 SDI.
Theory
Explanation
Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.
5M vehicles each emitting 100s of signals/second = trillions of data points per day. Edge compresses + aggregates; cellular ships compact events; cloud stores time-series + raw blobs for replay.
Per-vehicle agent: collects sensor channels, downsamples + delta-encodes, batches every 30s. Cellular upload via Tesla-operated SIM. Cloud ingest via Pulsar/Kafka. Time-series points → InfluxDB-like time-series store. Heavy raw blobs (camera/LiDAR clips) → object store. Anomaly detector consumes stream; alerts when patterns deviate. Engineers query via metadata search across fleet.
When to use
Connected-vehicle fleets, drone fleets, industrial IoT.
When not to
Sub-100 device fleets (in-memory works).
flowchart LR Vehicle[Vehicle Compute · NVIDIA Orin] --> Agent[Telemetry Agent] Agent --> Compress[Delta + Downsample] Compress -->|cellular| Ingest[Cloud Ingest · Pulsar] Ingest --> TS[(Time-Series Store)] Ingest --> Blob[(Raw Blob · S3 tier)] TS --> Anom[Anomaly Detector] Anom --> Alert[Alerts → Engineers] Eng([Engineer]) --> Query[Fleet Query API] Query --> TS Query --> Blob
Key insights
- Edge compression is the bandwidth lever, 100x reduction by sending deltas + downsampling.
- Time-series DB optimized for write-heavy + retention (1 year hot, 5 years cold).
- Raw clips uploaded only on trigger (anomaly, accident, manual flag), cheaper than always-on.
- Anomaly detection runs streaming; explicit thresholds + ML scoring per signal.
- Fleet query needs metadata index (location, weather, firmware version) for filtering.