Design Amazon Recommendation System
Candidate generation → ranking → re-ranking pipeline. Feature stores, A/B testing, real-time inference. Amazon signature SDI.
Transformers, LLM internals, RAG, MLOps, model serving, evaluation, agent architectures. NVIDIA-flavored loops add CUDA depth.
Primary categories: ml-ai · system-design
Candidate generation → ranking → re-ranking pipeline. Feature stores, A/B testing, real-time inference. Amazon signature SDI.
Vehicle → edge → cloud, sensor data compression, Kafka/Pulsar, hot/cold tiering, retraining loop.
Triton Inference Server, dynamic batching, KV-cache management, tensor parallelism, paged attention.
GPU performance is bound by memory bandwidth. Shared memory + coalesced global access are the two highest-leverage CUDA optimizations. Required depth at NVIDIA and Tesla Autopilot.
Auto-labeling, shadow mode, rare-event mining, training data curation, reprocessing.
Tensor + pipeline + data parallelism, NCCL all-reduce, FP8/BF16, ZeRO/FSDP, checkpointing. NVIDIA signature SDI.
Per-profile row construction, ranking, freshness, real-time signal incorporation. 2025 Netflix-reported prompt.
Bin packing, MIG slicing, gang scheduling, K8s GPU operator, preemption, QoS. NVIDIA signature SDI.
Checkpointing cadence, replica resurrection, NCCL recovery, elastic training. Survival in long-running jobs.
Offline candidate generation + online ranking, "row-of-rows" homepage, A/B testing infra, personalization signals.
Structured story format used by every Mag7 behavioral round. Google extends to STAR-L (Learnings). Amazon expects 1 LP per story. Netflix tunes to culture pillars.