Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

Humeau, Samuel; Shuster, Kurt; Lachaux, Marie-Anne; Weston, Jason

Computer Science > Computation and Language

arXiv:1905.01969 (cs)

[Submitted on 22 Apr 2019 (v1), last revised 25 Mar 2020 (this version, v4)]

Title:Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

Authors:Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, Jason Weston

View PDF

Abstract:The use of deep pre-trained bidirectional transformers has led to remarkable progress in a number of applications (Devlin et al., 2018). For tasks that make pairwise comparisons between sequences, matching a given input with a corresponding label, two approaches are common: Cross-encoders performing full self-attention over the pair and Bi-encoders encoding the pair separately. The former often performs better, but is too slow for practical use. In this work, we develop a new transformer architecture, the Poly-encoder, that learns global rather than token level self-attention features. We perform a detailed comparison of all three approaches, including what pre-training and fine-tuning strategies work best. We show our models achieve state-of-the-art results on three existing tasks; that Poly-encoders are faster than Cross-encoders and more accurate than Bi-encoders; and that the best results are obtained by pre-training on large datasets similar to the downstream tasks.

Comments:	ICLR 2020
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:1905.01969 [cs.CL]
	(or arXiv:1905.01969v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1905.01969

Submission history

From: Kurt Shuster [view email]
[v1] Mon, 22 Apr 2019 02:18:00 UTC (6,127 KB)
[v2] Mon, 19 Aug 2019 19:07:46 UTC (3,328 KB)
[v3] Wed, 12 Feb 2020 20:07:00 UTC (1,712 KB)
[v4] Wed, 25 Mar 2020 22:53:51 UTC (1,713 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2019-05

Change to browse by:

cs
cs.AI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Samuel Humeau
Kurt Shuster
Marie-Anne Lachaux
Jason Weston

export BibTeX citation

Computer Science > Computation and Language

Title:Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators