Batch Intelligence at Scale: Cost-Efficient Multi-Agent LLM Workflows with Built-In Resilience

Most teams building production LLM systems face a hard tradeoff: premium models with low latency or cost-efficient inference with unpredictable quality. This talk presents a third path, one validated at Walmart, the largest retailers in the world.

This session walks through a real-world multi-agent LLM pipeline designed to generate inventory accuracy insights across millions of SKUs, running as a batched offline job during non-peak hours. By deliberately choosing pay-as-you-go foundational models over reserved capacity, the system achieves substantial cost savings, accepting higher latency as a deliberate tradeoff, since batch jobs tolerate it. The quality risk that comes with cost-optimized models is addressed head-on: structured outputs enforce schema compliance, MCP-based grounding validates responses against real inventory data and automated retry logic ensures resilience without human intervention.

The result: a multi-agent workflow using sequential agent coordination, powered by orchestration framework, that generated actionable inventory recommendations, grounded, validated and cost-efficient, saving millions of dollars in food wastage annually.

Main Takeaways:

  1. The pay-go vs. reserved model decision: when latency is acceptable, foundational models on pay-as-you-go can cut LLM inference costs significantly and the quality gap is closable with the right guardrails
  2. How to architect multi-agent workflows for batch-scale inference, when to use sequential agents vs. loop agents and how agent coordination fails under load
  3. Structured outputs + MCP grounding + retry logic as a resilience stack: three complementary layers that together control hallucinations, enforce schema compliance and ensure pipeline completion without human oversight
     

Interview:

What is the focus of your work these days?

Building and scaling LLM infrastructure for high-throughput production environments, specifically multi-agent orchestration, cost-efficient inference strategies and the observability and resilience patterns that only surface when you’re running real workloads at scale. I work across Google ADK, LangChain and OpenAI SDK for orchestration, with MCP as a grounding and validation layer. I’m also actively contributing to open-source tooling in the A.I. orchestration ecosystem and to Harvard’s cs249r open ML systems curriculum.

What is the motivation behind your talk?

Most conference content on LLM agents stops at the demo layer. The hard problems, batch orchestration at scale, grounding pipelines, agent failure recovery only surface when you're running real workloads with real consequences. I wanted to bring a ground-up account of what it actually takes to deploy multi-agent AI in a production environment where errors cost millions, not just compute cycles.I wanted to share the engineering decisions we made, including the deliberate choice to trade latency for cost and the specific guardrails that made it viable in production. These are patterns the QCon AI audience can take back and apply immediately.

Who is your presentation for?

Senior engineers, ML platform teams and architects building or evaluating LLM-powered systems in production. Particularly relevant for:

  • Teams running or planning batch LLM workloads who are evaluating model cost vs. quality tradeoffs
  • Engineers dealing with hallucination, schema compliance or pipeline reliability in agentic systems
  • Platform teams moving from prototype to production-scale multi-agent workflows
     

Speaker

Aditya Mulik

Senior Software Engineer @Walmart Global Tech,

Aditya Mulik is a Senior Software Engineer at Walmart Global Tech specializing in large-scale distributed systems and platform engineering, with deep expertise in designing and deploying scalable & resilient solutions. He has led the development of microservices architectures and production LLM systems, enabling high-throughput, low-latency applications serving millions of requests in demanding production environments. Aditya is also actively contributing towards the open source community.

Read more

Date

Tuesday Jun 2 / 03:40PM EDT ( 50 minutes )

Location

Metcalf Hall Small

Share