Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls

Li, Hang; Mourad, Ahmed; Zhuang, Shengyao; Koopman, Bevan; Zuccon, Guido

Computer Science > Information Retrieval

arXiv:2108.11044 (cs)

[Submitted on 25 Aug 2021 (v1), last revised 1 Jul 2022 (this version, v2)]

Title:Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls

Authors:Hang Li, Ahmed Mourad, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

View PDF

Abstract:Pseudo Relevance Feedback (PRF) is known to improve the effectiveness of bag-of-words retrievers. At the same time, deep language models have been shown to outperform traditional bag-of-words rerankers. However, it is unclear how to integrate PRF directly with emergent deep language models. In this article, we address this gap by investigating methods for integrating PRF signals into rerankers and dense retrievers based on deep language models. We consider text-based and vector-based PRF approaches, and investigate different ways of combining and scoring relevance signals. An extensive empirical evaluation was conducted across four different datasets and two task settings (retrieval and ranking). Text-based PRF results show that the use of PRF had a mixed effect on deep rerankers across different datasets. We found that the best effectiveness was achieved when (i) directly concatenating each PRF passage with the query, searching with the new set of queries, and then aggregating the scores; (ii) using Borda to aggregate scores from PRF runs. Vector-based PRF results show that the use of PRF enhanced the effectiveness of deep rerankers and dense retrievers over several evaluation metrics. We found that higher effectiveness was achieved when (i) the query retains either the majority or the same weight within the PRF mechanism, and (ii) a shallower PRF signal (i.e., a smaller number of top-ranked passages) was employed, rather than a deeper signal. Our vector-based PRF method is computationally efficient; thus this represents a general PRF method others can use with deep rerankers and dense retrievers.

Comments:	Accepted to the Journal of ACM Transactions on Information Systems (TOIS)
Subjects:	Information Retrieval (cs.IR)
Cite as:	arXiv:2108.11044 [cs.IR]
	(or arXiv:2108.11044v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2108.11044

Submission history

From: Hang Li [view email]
[v1] Wed, 25 Aug 2021 04:43:58 UTC (12,494 KB)
[v2] Fri, 1 Jul 2022 03:33:00 UTC (14,175 KB)

Computer Science > Information Retrieval

Title:Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators