Building an AI agent is easier than ever. However, moving from a local notebook to a production-grade "Agent Engine" that serves large scale web services poses complex problems. Developers often struggle with unpredictable execution times, framework lock-in, and the sheer overhead of managing long-running agentic loops.
In this session, we dive into how to leverage a unified deployment layer that remains agnostic to your agent frameworkâregardless of the specific Agentic framework you may use. Using Ray as our reference implementation, we will demonstrate how to bridge the gap between local development and production-grade reliability.
We will explore:
- Decoupling Orchestration from Execution: How to manage complex agent state across distributed nodes without sacrificing scalability.
- Operational Excellence: Utilizing native autoscaling and intelligent traffic management to handle the unpredictable, bursty nature of agentic workloads.
- The "Agent Engine" Pattern: A blueprint for building resilient, high-throughput agent deployments designed to evolve as the AI landscape shifts.
Speaker
Deepak Chandramouli
Senior Machine Learning Engineer @Apple, 20+ Years in Distributed Systems and Scalable Data/Compute/ML Infrastructure
Deepak Chandramouli is a Senior Machine Learning Engineer at Apple with over 20 years of experience in distributed systems and scalable data/compute/ML infrastructure. At Apple, he specializes in building robust ML compute planes that bridge the gap between research and high-scale production environments.
Speaker
Bhumik Thakkar
Senior Software Engineer @Apple, Expert in Artificial Intelligence and Large-Scale Distributed Systems
Bhumik Thakkar is a senior engineering leader specializing in Artificial Intelligence and large-scale Distributed Systems, with extensive experience building enterprise-grade AI infrastructure. He has led high-impact initiatives at global technology companies including Microsoft, Meta, and Apple, delivering scalable systems that serve billions of users worldwide. His work focuses on Large Language Models, optimized model inference platforms, and resilient distributed architectures that power mission-critical applications at global scale.