Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents

Li, Yueying; Dai, Jim; Peng, Tianyi

Statistics > Machine Learning

arXiv:2504.07347 (stat)

[Submitted on 10 Apr 2025]

Title:Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents

Authors:Yueying Li, Jim Dai, Tianyi Peng

View PDF HTML (experimental)

Abstract:As demand for Large Language Models (LLMs) and AI agents rapidly grows, optimizing systems for efficient LLM inference becomes critical. While significant efforts have targeted system-level engineering, little is explored through a mathematical modeling and queuing perspective.
In this paper, we aim to develop the queuing fundamentals for LLM inference, bridging the gap between queuing and LLM system communities. In particular, we study the throughput aspect in LLM inference systems. We prove that a large class of 'work-conserving' scheduling algorithms can achieve maximum throughput for both individual requests and AI agent workloads, highlighting 'work-conserving' as a key design principle in practice. Evaluations of real-world systems show that Orca and Sarathi-serve are throughput-optimal, reassuring practitioners, while FastTransformer and vanilla vLLM are not maximally stable and should be used with caution.
Our results highlight the substantial benefits queuing community can offer in improving LLM inference systems and call for more interdisciplinary developments.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
Cite as:	arXiv:2504.07347 [stat.ML]
	(or arXiv:2504.07347v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2504.07347

Submission history

From: Yueying Li [view email]
[v1] Thu, 10 Apr 2025 00:12:12 UTC (681 KB)

Statistics > Machine Learning

Title:Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators