From AI Agent Demo to Production: Automated Testing and Evaluation

90% of the enterprise agents are stuck in POC. We don't know when the agents are good enough to deploy without a systematic testing and evaluation process. This presentation will share best practices, our practical dev tools, and lessons from the field to help you break out of the demo loop and build AI agents that deliver real business impact. In the talk, we will showcase an example of multi-turn agent simulation tool for automated agents

Main Takeaways:

Simulation is critical for agent evaluation to reduce manual testing and improve agent quality pre-production and during iterations
Multi-turn eval is necessary in agent eval to guarantee agent quality

Speaker

Zhou Yu

Founder @Arklex.ai and CS Professor at Columbia University - Co-Director of the DAPlab

Zhou(Jo) Yu is a CS Professor at Columbia University and Founder of Arklex.ai. She obtained her Ph.D. from Carnegie Mellon University. Dr Yu is the co-director of the Columbia DAPlab. Dr. Yu has received several best paper awards in top NLP conferences (such as ACL 2024) and has won the Forbes 30 under 30 in 2018. Dr. Yu has developed various AI systems that have had a real impact, including winning the Amazon Alexa Prize. Dr. Yu co-founded Arklex.ai, democratizing AI agent development through an enterprise-grade automated testing tool.

From AI Agent Demo to Production: Automated Testing and Evaluation

Main Takeaways:

Speaker

Zhou Yu

Find Zhou Yu at:

Speaker

Zhou Yu

Date

Location

Topics

Share

InfoQ Resources

Social Media Links

Conference

Helpful Resources

InfoQ & QCon Events