Inference

Session Inference

Inferencing for Enterprises

Monday Jun 1 / 01:20PM EDT

This presentation will cover what areas enterprises like JPMC consider to be most important when running inferencing at scale.

Speaker image - Dio Rettori

Dio Rettori

Head of Product for AI Infrastructure Platforms @JPMorganChase & Co, Previously @Solo.io, @Red Hat, and @Pivotal Software

Session Performance

Serving LLMs at Scale: The Hidden KV Cache Advantage

Monday Jun 1 / 11:30AM EDT

KV cache is the hidden lever behind inference cost and performance. It directly impacts GPU utilization, throughput, and Time to First Token.

Speaker image - Khawaja Shams

Khawaja Shams

Co-Founder & CEO @Momento, previously @NASA and @Amazon