Deploying Foundation Model Powered Agent Services: A Survey

Xu, Wenchao; Chen, Jinyu; Zheng, Peirong; Yi, Xiaoquan; Tian, Tianyi; Zhu, Wenhui; Wan, Quan; Wang, Haozhao; Fan, Yunfeng; Su, Qinliang; Shen, Xuemin

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2412.13437 (cs)

[Submitted on 18 Dec 2024]

Title:Deploying Foundation Model Powered Agent Services: A Survey

Authors:Wenchao Xu, Jinyu Chen, Peirong Zheng, Xiaoquan Yi, Tianyi Tian, Wenhui Zhu, Quan Wan, Haozhao Wang, Yunfeng Fan, Qinliang Su, Xuemin Shen

View PDF HTML (experimental)

Abstract:Foundation model (FM) powered agent services are regarded as a promising solution to develop intelligent and personalized applications for advancing toward Artificial General Intelligence (AGI). To achieve high reliability and scalability in deploying these agent services, it is essential to collaboratively optimize computational and communication resources, thereby ensuring effective resource allocation and seamless service delivery. In pursuit of this vision, this paper proposes a unified framework aimed at providing a comprehensive survey on deploying FM-based agent services across heterogeneous devices, with the emphasis on the integration of model and resource optimization to establish a robust infrastructure for these services. Particularly, this paper begins with exploring various low-level optimization strategies during inference and studies approaches that enhance system scalability, such as parallelism techniques and resource scaling methods. The paper then discusses several prominent FMs and investigates research efforts focused on inference acceleration, including techniques such as model compression and token reduction. Moreover, the paper also investigates critical components for constructing agent services and highlights notable intelligent applications. Finally, the paper presents potential research directions for developing real-time agent services with high Quality of Service (QoS).

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.13437 [cs.DC]
	(or arXiv:2412.13437v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2412.13437

Submission history

From: Jinyu Chen [view email]
[v1] Wed, 18 Dec 2024 02:15:31 UTC (2,504 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Deploying Foundation Model Powered Agent Services: A Survey

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Deploying Foundation Model Powered Agent Services: A Survey

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators