Self-Retrieval: Building an Information Retrieval System with One Large Language Model

Tang, Qiaoyu; Chen, Jiawei; Yu, Bowen; Lu, Yaojie; Fu, Cheng; Yu, Haiyang; Lin, Hongyu; Huang, Fei; He, Ben; Han, Xianpei; Sun, Le; Li, Yongbin

Computer Science > Information Retrieval

arXiv:2403.00801 (cs)

[Submitted on 23 Feb 2024]

Title:Self-Retrieval: Building an Information Retrieval System with One Large Language Model

Authors:Qiaoyu Tang, Jiawei Chen, Bowen Yu, Yaojie Lu, Cheng Fu, Haiyang Yu, Hongyu Lin, Fei Huang, Ben He, Xianpei Han, Le Sun, Yongbin Li

View PDF HTML (experimental)

Abstract:The rise of large language models (LLMs) has transformed the role of information retrieval (IR) systems in the way to humans accessing information. Due to the isolated architecture and the limited interaction, existing IR systems are unable to fully accommodate the shift from directly providing information to humans to indirectly serving large language models. In this paper, we propose Self-Retrieval, an end-to-end, LLM-driven information retrieval architecture that can fully internalize the required abilities of IR systems into a single LLM and deeply leverage the capabilities of LLMs during IR process. Specifically, Self-retrieval internalizes the corpus to retrieve into a LLM via a natural language indexing architecture. Then the entire retrieval process is redefined as a procedure of document generation and self-assessment, which can be end-to-end executed using a single large language model. Experimental results demonstrate that Self-Retrieval not only significantly outperforms previous retrieval approaches by a large margin, but also can significantly boost the performance of LLM-driven downstream applications like retrieval augumented generation.

Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2403.00801 [cs.IR]
	(or arXiv:2403.00801v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2403.00801

Submission history

From: Qiaoyu Tang [view email]
[v1] Fri, 23 Feb 2024 18:45:35 UTC (516 KB)

Computer Science > Information Retrieval

Title:Self-Retrieval: Building an Information Retrieval System with One Large Language Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Self-Retrieval: Building an Information Retrieval System with One Large Language Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators