Speaker

Sundara Raman Ramachandran

He / him / his

Lead Engineer @LinkedIn on LLM Inference Team, Previously Worked on Azure Identity & Authorization and Microsoft Office

Sundara Raman Ramachandran is a Lead Engineer on LinkedIn’s LLM Inference team, where he plays a key role in designing and scaling the infrastructure powering LLM-based ranking systems for search and recommendation. His work centers on building latency-critical, high-throughput LLM serving platforms that operate reliably at global scale.

He has driven production deployment of prefill-only LLM scoring systems and contributed extensively to the SGLang open-source ecosystem, including leading the development of the Prefill-Only Ranking API informed by real-world, high-QPS production constraints. Sundara has co-authored research accepted to MLSys 2026, with additional work currently under review at KDD 2026.

Prior to LinkedIn, he worked on Azure Identity & Authorization and Microsoft Office. He holds a Master’s degree from The University of Texas at Austin.

Find Sundara Raman Ramachandran at:

Session

Scaling LLM-Based Ranking Systems for Latency-Critical Search & Recommendation Workloads

Large Language Models are powerful — but deploying them in latency-critical ranking systems is a fundamentally different problem than building chat applications.

Sundara Raman Ramachandran

Find Sundara Raman Ramachandran at:

Session

Scaling LLM-Based Ranking Systems for Latency-Critical Search & Recommendation Workloads

Date

Location

Topics

Video

Slides

Share

InfoQ Resources

Social Media Links

Conference

Helpful Resources

InfoQ & QCon Events