Teola: Towards End-to-End Optimization of LLM-based Applications

Tan, Xin; Jiang, Yimin; Yang, Yitao; Xu, Hong

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2407.00326 (cs)

[Submitted on 29 Jun 2024]

Title:Teola: Towards End-to-End Optimization of LLM-based Applications

Authors:Xin Tan, Yimin Jiang, Yitao Yang, Hong Xu

View PDF HTML (experimental)

Abstract:Large language model (LLM)-based applications consist of both LLM and non-LLM components, each contributing to the end-to-end latency. Despite great efforts to optimize LLM inference, end-to-end workflow optimization has been overlooked. Existing frameworks employ coarse-grained orchestration with task modules, which confines optimizations to within each module and yields suboptimal scheduling decisions. We propose fine-grained end-to-end orchestration, which utilizes task primitives as the basic units and represents each query's workflow as a primitive-level dataflow graph. This explicitly exposes a much larger design space, enables optimizations in parallelization and pipelining across primitives of different modules, and enhances scheduling to improve application-level performance. We build Teola, a novel orchestration framework for LLM-based applications that implements this scheme. Comprehensive experiments show that Teola can achieve up to 2.09x speedup over existing systems across various popular LLM applications.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Cite as:	arXiv:2407.00326 [cs.DC]
	(or arXiv:2407.00326v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2407.00326

Submission history

From: Xin Tan [view email]
[v1] Sat, 29 Jun 2024 05:59:53 UTC (667 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Teola: Towards End-to-End Optimization of LLM-based Applications

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Teola: Towards End-to-End Optimization of LLM-based Applications

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators