An Approach for Weakly-Supervised Deep Information Retrieval

MacAvaney, Sean; Hui, Kai; Yates, Andrew

Computer Science > Information Retrieval

arXiv:1707.00189v2 (cs)

[Submitted on 1 Jul 2017 (v1), revised 24 Jul 2017 (this version, v2), latest version 5 Jul 2019 (v3)]

Title:An Approach for Weakly-Supervised Deep Information Retrieval

Authors:Sean MacAvaney, Kai Hui, Andrew Yates

View PDF

Abstract:Recent developments in neural information retrieval models have been promising, but a problem remains: human relevance judgments are expensive to produce, while neural models require a considerable amount of training data. In an attempt to fill this gap, we present an approach that---given a weak training set of pseudo-queries, documents, relevance information---filters the data to produce effective positive and negative query-document pairs. This allows large corpora to be used as neural IR model training data, while eliminating training examples that do not transfer well to relevance scoring. The filters include unsupervised ranking heuristics and a novel measure of interaction similarity. We evaluate our approach using a news corpus with article headlines acting as pseudo-queries and article content as documents, with implicit relevance between an article's headline and its content. By using our approach to train state-of-the-art neural IR models and comparing to established baselines, we find that training data generated by our approach can lead to good results on a benchmark test collection.

Comments:	Neu-IR 2017 SIGIR Workshop on Neural Information Retrieval
Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as:	arXiv:1707.00189 [cs.IR]
	(or arXiv:1707.00189v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.1707.00189

Submission history

From: Andrew Yates [view email]
[v1] Sat, 1 Jul 2017 18:42:29 UTC (51 KB)
[v2] Mon, 24 Jul 2017 12:05:43 UTC (52 KB)
[v3] Fri, 5 Jul 2019 12:00:09 UTC (472 KB)

Computer Science > Information Retrieval

Title:An Approach for Weakly-Supervised Deep Information Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:An Approach for Weakly-Supervised Deep Information Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators