Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Lee, Jinhyuk; Chen, Anthony; Dai, Zhuyun; Dua, Dheeru; Sachan, Devendra Singh; Boratko, Michael; Luan, Yi; Arnold, Sébastien M. R.; Perot, Vincent; Dalmia, Siddharth; Hu, Hexiang; Lin, Xudong; Pasupat, Panupong; Amini, Aida; Cole, Jeremy R.; Riedel, Sebastian; Naim, Iftekhar; Chang, Ming-Wei; Guu, Kelvin

Computer Science > Computation and Language

arXiv:2406.13121 (cs)

[Submitted on 19 Jun 2024]

Title:Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Authors:Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

View PDF HTML (experimental)

Abstract:Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-end modeling that minimizes cascading errors in complex pipelines, and allows for the application of sophisticated prompting techniques across the entire system. To assess this paradigm shift, we introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks. However, LCLMs still face challenges in areas like compositional reasoning that are required in SQL-like tasks. Notably, prompting strategies significantly influence performance, emphasizing the need for continued research as context lengths grow. Overall, LOFT provides a rigorous testing ground for LCLMs, showcasing their potential to supplant existing paradigms and tackle novel tasks as model capabilities scale.

Comments:	29 pages. Dataset available at this https URL
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Cite as:	arXiv:2406.13121 [cs.CL]
	(or arXiv:2406.13121v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.13121

Submission history

From: Jinhyuk Lee [view email]
[v1] Wed, 19 Jun 2024 00:28:58 UTC (2,124 KB)

Computer Science > Computation and Language

Title:Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators