OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

Zhang, Jintian; Peng, Cheng; Sun, Mengshu; Chen, Xiang; Liang, Lei; Zhang, Zhiqiang; Zhou, Jun; Chen, Huajun; Zhang, Ningyu

Computer Science > Computation and Language

arXiv:2409.05152 (cs)

[Submitted on 8 Sep 2024 (v1), last revised 2 Oct 2024 (this version, v2)]

Title:OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

Authors:Jintian Zhang, Cheng Peng, Mengshu Sun, Xiang Chen, Lei Liang, Zhiqiang Zhang, Jun Zhou, Huajun Chen, Ningyu Zhang

View PDF

Abstract:Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face limitations in directly handling retrieval tasks. However, many practical applications demand the seamless integration of both retrieval and generation. This paper introduces a novel and efficient One-pass Generation and retrieval framework (OneGen), designed to improve LLMs' performance on tasks that require both generation and retrieval. The proposed framework bridges the traditionally separate training approaches for generation and retrieval by incorporating retrieval tokens generated autoregressively. This enables a single LLM to handle both tasks simultaneously in a unified forward pass. We conduct experiments on two distinct types of composite tasks, RAG and Entity Linking, to validate the pluggability, effectiveness, and efficiency of OneGen in training and inference. Furthermore, our results show that integrating generation and retrieval within the same context preserves the generative capabilities of LLMs while improving retrieval performance. To the best of our knowledge, OneGen is the first to enable LLMs to conduct vector retrieval during the generation.

Comments:	EMNLP 2024 Findings; code is available at this https URL
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Databases (cs.DB); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2409.05152 [cs.CL]
	(or arXiv:2409.05152v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2409.05152

Submission history

From: Ningyu Zhang [view email]
[v1] Sun, 8 Sep 2024 16:35:19 UTC (6,098 KB)
[v2] Wed, 2 Oct 2024 05:02:02 UTC (6,099 KB)

Computer Science > Computation and Language

Title:OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators