Adaptive Recommenders in the Real World: Inference, Evals, and System Design

Modern personalization systems are shifting from hand-tuned heuristics to AI-native architectures, but building an adaptive recommendation engine in production—one that continuously learns, evolves, and delivers measurable business value—requires far more than deploying a model. It requires coordinated design across candidate generation, ranking, inference, evaluation, and feedback loops, all operating under real-world latency, cost, and reliability constraints.

In this talk, I share how we built and scaled an adaptive, AI-native recommendation system powering high-visibility content discovery surfaces. We’ll walk through the system end-to-end, covering hybrid candidate generation using embeddings, multi-signal ranking models integrating behavioral and contextual features, and architectural decisions enabling system evolution without disrupting product surfaces.

We’ll also discuss inference optimization for production—caching, quantization, asynchronous pipelines—and evaluation frameworks combining offline metrics, guardrails, simulation benches, and online A/B testing to accelerate iteration. Finally, we’ll show how grounding the system in measurable outcomes unlocked business value while maintaining resilience and transparency.

Key Takeaways: 

  1. A Practical Architecture for AI-Native Recommendations – Combine candidate generation, embeddings, contextual features, and multi-stage ranking into a scalable end-to-end system. 
  2. Operating a Real-Time Inference Layer at Scale – Sub-10ms inference, asynchronous fanout, GPU/CPU blending, and traffic routing strategies to avoid overload. 
  3. Building Evaluation Loops That Don’t Lie – Offline metrics, online A/B testing, LLM-based judgment, and counterfactual analysis to measure real impact. 
  4. AI-Driven Query Understanding Inside Recommender Systems – Integrate semantic embeddings or LLM-based understanding to improve relevance and handle sparse signals. 
  5. Practical Migration Strategies from Classical to AI-Native Recommenders – Dual writes, shadow traffic, progressive rollout, embedding bootstraps, backfill strategies, and safe-degradation modes.

Speaker

Mallika Rao

Engineering Leader @Netflix, Previously @Twitter and @Walmart

Mallika Rao who has been an Engineering Leader at Twitter, Walmart and Netflix with deep expertise in building and operating large-scale distributed systems, including search, recommendations, and personalization infrastructure. She brings a systems-thinking mindset to infrastructure strategy and is passionate about integrating AI into product and engineering in ways that enhance resilience, transparency, and operational excellence. Her work focuses on enabling teams to innovate rapidly while maintaining the stability and rigor required in enterprise-scale environments. Beyond her technical leadership, Mallika mentors senior engineers and leaders, and draws inspiration from the elegance of mathematics and the improvisational creativity of music.

Read more
Find Mallika Rao at: