Streamlining the Collaborative Chain of Models into A Single Forward Pass in Generation-Based Tasks

Lyu, Yuanjie; Zhang, Chao; Chen, Yuhao; Chen, Yong; Xu, Tong

Computer Science > Computation and Language

arXiv:2502.11083 (cs)

[Submitted on 16 Feb 2025]

Title:Streamlining the Collaborative Chain of Models into A Single Forward Pass in Generation-Based Tasks

Authors:Yuanjie Lyu, Chao Zhang, Yuhao Chen, Yong Chen, Tong Xu

View PDF HTML (experimental)

Abstract:In Retrieval-Augmented Generation (RAG) and agent-based frameworks, the "Chain of Models" approach is widely used, where multiple specialized models work sequentially on distinct sub-tasks. This approach is effective but increases resource demands as each model must be deployed separately. Recent advancements attempt to address this by applying prompt tuning, which allows a shared base model to adapt to multiple tasks with minimal parameter changes. However, a key challenge remains: intermediate outputs, passed between models as plain text, require recomputation of hidden states (i.e., Key and Value (KV) states in Transformers) during inference. In this paper, we introduce FTHSS, a novel prompt-tuning method that enables models to share KV hidden states, eliminating redundant forward passes and reducing KV cache storage. By modifying input and attention masks during training, FTHSS allows models to effectively utilize KV hidden states from prior models in both single- and multi-round scenarios. Empirical results on four tasks show that FTHSS matches the performance of traditional model chains while improving inference efficiency.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.11083 [cs.CL]
	(or arXiv:2502.11083v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.11083

Submission history

From: Yuanjie Lyu [view email]
[v1] Sun, 16 Feb 2025 11:37:14 UTC (301 KB)

Computer Science > Computation and Language

Title:Streamlining the Collaborative Chain of Models into A Single Forward Pass in Generation-Based Tasks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Streamlining the Collaborative Chain of Models into A Single Forward Pass in Generation-Based Tasks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators