Building GenAI Platform at DoorDash

When we started adopting LLMs across DoorDash, every team was implementing the same infrastructure: retry logic, fallback mechanisms, cost tracking, prompt versioning, and batch processing pipelines. Engineering time was wasted on repetitive plumbing work instead of building features. Teams also made different decisions: some used OpenAI directly, others went through Bedrock, some built custom retry logic that didn't handle rate limits properly, and nobody had consistent observability.

We built a set of platform components to address this: an LLM Gateway for request routing, observability, and fallback handling; a Batch Inference platform for processing large-scale workloads; an Agentic Gateway for multi-step LLM workflows; and ADK (Agent Development Kit) templates to standardize common patterns.

This talk covers the technical details and trade-offs. For the LLM Gateway, we'll discuss how we handle rate limiting across providers with different quota models, cost attribution, and how we implemented prompt caching to improve performance. For Batch Inference, we'll explain our job scheduler design that balances cost optimization with SLA requirements. The Agentic Gateway section covers how we handled streaming protocols like MCP, Auth for internal/external users, state management and scaffolding, which improves the velocity of building agents in DoorDash. We'll also share our decision framework for when to build shared infrastructure versus letting teams own their solutions, and how we measured whether the platform was actually helping or just adding another layer teams had to learn.

Relevant for platform engineers, ML infrastructure teams, and engineering leaders building or evaluating GenAI infrastructure.


Speaker

Siddharth Kodwani

Software Engineer, AI Infrastructure @DoorDash

Siddharth Kodwani is a Software Engineer on the GenAI Platform team at DoorDash, building infrastructure for AI agents that accelerates development velocity and improves production reliability. He has spent the last 10 years building AI/ML platforms at Amazon Prime Video, Zoox, and DoorDash, specializing in the infrastructure that enables teams to ship AI-powered features faster.

Read more

Speaker

Swaroop Chitlur

Staff Engineer / Engineering Manager Machine Learning Platform @DoorDash

Swaroop Chitlur leads the Generative AI Platform at DoorDash, building infrastructure for LLM inference, fine-tuning, RAG, evals, and AI Agents. He is an engineering leader with 20+ years of experience, including co-founding a hardware startup and being the first backend engineer at Automatic Labs (YC S11, acquired by Sirius XM). Swaroop holds a granted patent and has authored two books -- "A Byte of Python" (10M+ downloads, translated into 10+ languages) and "A Byte of Vim".

Read more