Corpus-Steered Query Expansion with Large Language Models

Lei, Yibin; Cao, Yu; Zhou, Tianyi; Shen, Tao; Yates, Andrew

Computer Science > Information Retrieval

arXiv:2402.18031 (cs)

[Submitted on 28 Feb 2024]

Title:Corpus-Steered Query Expansion with Large Language Models

Authors:Yibin Lei, Yu Cao, Tianyi Zhou, Tao Shen, Andrew Yates

View PDF HTML (experimental)

Abstract:Recent studies demonstrate that query expansions generated by large language models (LLMs) can considerably enhance information retrieval systems by generating hypothetical documents that answer the queries as expansions. However, challenges arise from misalignments between the expansions and the retrieval corpus, resulting in issues like hallucinations and outdated information due to the limited intrinsic knowledge of LLMs. Inspired by Pseudo Relevance Feedback (PRF), we introduce Corpus-Steered Query Expansion (CSQE) to promote the incorporation of knowledge embedded within the corpus. CSQE utilizes the relevance assessing capability of LLMs to systematically identify pivotal sentences in the initially-retrieved documents. These corpus-originated texts are subsequently used to expand the query together with LLM-knowledge empowered expansions, improving the relevance prediction between the query and the target documents. Extensive experiments reveal that CSQE exhibits strong performance without necessitating any training, especially with queries for which LLMs lack knowledge.

Comments:	EACL 2024 (Short)
Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as:	arXiv:2402.18031 [cs.IR]
	(or arXiv:2402.18031v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2402.18031

Submission history

From: Yibin Lei [view email]
[v1] Wed, 28 Feb 2024 03:58:58 UTC (7,757 KB)

Computer Science > Information Retrieval

Title:Corpus-Steered Query Expansion with Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Corpus-Steered Query Expansion with Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators