Modern personalization systems are shifting from hand-tuned heuristics to AI-native architectures, but building an adaptive recommendation engine in production—one that continuously learns, evolves, and delivers measurable business value—requires far more than deploying a model. It requires coordinated design across candidate generation, ranking, inference, evaluation, and feedback loops, all operating under real-world latency, cost, and reliability constraints.

In this talk, I share how we built and scaled an adaptive, AI-native recommendation system powering high-visibility content discovery surfaces. We’ll walk through the system end-to-end, covering hybrid candidate generation using embeddings, multi-signal ranking models integrating behavioral and contextual features, and architectural decisions enabling system evolution without disrupting product surfaces.

We’ll also discuss inference optimization for production—caching, quantization, asynchronous pipelines—and evaluation frameworks combining offline metrics, guardrails, simulation benches, and online A/B testing to accelerate iteration. Finally, we’ll show how grounding the system in measurable outcomes unlocked business value while maintaining resilience and transparency.

Key Takeaways:

A Practical Architecture for AI-Native Recommendations – Combine candidate generation, embeddings, contextual features, and multi-stage ranking into a scalable end-to-end system.
Operating a Real-Time Inference Layer at Scale – Sub-10ms inference, asynchronous fanout, GPU/CPU blending, and traffic routing strategies to avoid overload.
Building Evaluation Loops That Don’t Lie – Offline metrics, online A/B testing, LLM-based judgment, and counterfactual analysis to measure real impact.
AI-Driven Query Understanding Inside Recommender Systems – Integrate semantic embeddings or LLM-based understanding to improve relevance and handle sparse signals.
Practical Migration Strategies from Classical to AI-Native Recommenders – Dual writes, shadow traffic, progressive rollout, embedding bootstraps, backfill strategies, and safe-degradation modes.

Interview:

What is your session about, and why is it important for senior software developers?

Early in my career working on large-scale recommendation infrastructure, I remember a moment where a team had just deployed what looked like a much stronger ranking model. Offline metrics were significantly better. Everyone expected engagement to jump.

But when we ran the online experiment, almost nothing changed.

That experience stuck with me. It was a reminder that recommendation systems are not just machine learning problems, they are systems problems. The model is only one part of a much larger architecture that includes candidate generation, inference layers, feature pipelines, experimentation frameworks, and feedback loops.

My session is about what it actually takes to build adaptive recommender systems in the real world, systems that continuously learn, evolve, and deliver measurable value.

In this talk, I walk through the end-to-end architecture of an AI-native recommender system: how embedding-based candidate generation works in practice, how ranking models combine behavioral and contextual signals, and how we design infrastructure that allows the system to evolve safely without destabilizing product surfaces.

For senior engineers, this matters because recommendation systems are no longer niche ML features. They are core infrastructure shaping how millions of users discover content, products, and information. Designing them requires thinking like both a distributed systems engineer and an ML practitioner.

Why is it critical for software leaders to focus on this topic right now?

We’re in the middle of a shift from heuristic-driven personalization to AI-native recommendation architectures.

For years, many systems relied heavily on business rules and carefully tuned heuristics. Today, embeddings, foundation models, and real-time learning systems are reshaping how recommendations are generated and ranked.

But here’s the uncomfortable truth: deploying a model doesn’t make a system adaptive.

In many organizations, the model is actually the easiest part. The hard part is building the infrastructure that allows the system to continuously learn, real-time inference layers, reliable feature pipelines, evaluation frameworks that actually reflect user behavior, and safe ways to evolve production systems.

Leaders who understand this distinction will build recommendation platforms that last. Everyone else risks building expensive ML demos that never translate into real product impact.

What are the common challenges developers and architects face in this area?

The hardest part of recommendation systems is that every layer of the stack matters at once.

You’re simultaneously dealing with distributed infrastructure, feature engineering pipelines, machine learning models, and user experience outcomes. A small design decision: like how candidate sets are generated or how features are cached, can have a large downstream impact on both system performance and recommendation quality.

Evaluation is another major challenge. Offline metrics like precision or recall often look promising, but they don’t always correlate with meaningful product impact. It’s surprisingly common for models that look great offline to fail to move real user engagement in production.

There’s also the challenge of system evolution. Many organizations operate legacy recommendation stacks that were never designed for embedding-based or AI-native approaches. Migrating safely requires careful strategies: shadow traffic, progressive rollouts, dual pipelines, and safe degradation paths.

In other words, the challenge isn’t just building a better model. It’s building a system that can evolve without breaking the product experience.

What's one thing you hope attendees will implement immediately after your talk?

I hope attendees walk away thinking more deeply about feedback loops in recommender systems.

Many systems collect enormous amounts of user interaction data like clicks, engagement signals, dwell time, but those signals often take days or weeks to influence the model.

That delay makes systems far less adaptive than we assume.

Teams should look closely at how signals move through their architecture:

How quickly can user behavior influence candidate generation or ranking?
How quickly can the system learn from new interactions?

Even modest improvements to these loops can dramatically accelerate how quickly a system adapts to changing user behavior.

What makes QCon stand out as a conference for senior software professionals?

What makes QCon unique is that it focuses on real engineering practice.

Many conferences highlight new tools or theoretical ideas. QCon tends to focus on something more valuable: how experienced teams actually solved complex problems in production.

That means you hear about tradeoffs, operational constraints, and the messy parts of system design that rarely show up in polished blog posts or conference decks.

For senior engineers and leaders, that kind of honest insight is incredibly valuable.

What was one interesting thing that you learned from a previous QCon?

One of the most interesting things about QCon is how often conversations reveal that teams across completely different industries are solving very similar problems.

I’ve spoken with engineers from streaming platforms, fintech companies, and e-commerce marketplaces, and the same themes keep coming up: how to evolve legacy systems safely, how to measure the real impact of machine learning systems, and how to scale experimentation without introducing risk.

It’s a good reminder that while products differ, the architectural challenges of building adaptive systems are surprisingly universal.

Speaker

Mallika Rao

Senior Engineering Manager @Zocdoc, Previously @Netflix, @Twitter and @Walmart

Mallika Rao who has been an Engineering Leader at Twitter, Walmart and Netflix with deep expertise in building and operating large-scale distributed systems, including search, recommendations, and personalization infrastructure. She brings a systems-thinking mindset to infrastructure strategy and is passionate about integrating AI into product and engineering in ways that enhance resilience, transparency, and operational excellence. Her work focuses on enabling teams to innovate rapidly while maintaining the stability and rigor required in enterprise-scale environments. Beyond her technical leadership, Mallika mentors senior engineers and leaders, and draws inspiration from the elegance of mathematics and the improvisational creativity of music.

Adaptive Recommenders in the Real World: Inference, Evals, and System Design

Interview:

What is your session about, and why is it important for senior software developers?

Why is it critical for software leaders to focus on this topic right now?

What are the common challenges developers and architects face in this area?

What's one thing you hope attendees will implement immediately after your talk?

What makes QCon stand out as a conference for senior software professionals?

What was one interesting thing that you learned from a previous QCon?

Speaker

Mallika Rao

Find Mallika Rao at:

Speaker

Mallika Rao

Date

Location

Topics

Share

InfoQ Resources

Social Media Links

Conference

Helpful Resources

InfoQ & QCon Events