Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification

Vu, Tu; Iyyer, Mohit

Computer Science > Computation and Language

arXiv:1906.03656 (cs)

[Submitted on 9 Jun 2019]

Title:Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification

Authors:Tu Vu, Mohit Iyyer

View PDF

Abstract:While paragraph embedding models are remarkably effective for downstream classification tasks, what they learn and encode into a single vector remains opaque. In this paper, we investigate a state-of-the-art paragraph embedding method proposed by Zhang et al. (2017) and discover that it cannot reliably tell whether a given sentence occurs in the input paragraph or not. We formulate a sentence content task to probe for this basic linguistic property and find that even a much simpler bag-of-words method has no trouble solving it. This result motivates us to replace the reconstruction-based objective of Zhang et al. (2017) with our sentence content probe objective in a semi-supervised setting. Despite its simplicity, our objective improves over paragraph reconstruction in terms of (1) downstream classification accuracies on benchmark datasets, (2) faster training, and (3) better generalization ability.

Comments:	Accepted as a conference paper at ACL 2019
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1906.03656 [cs.CL]
	(or arXiv:1906.03656v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1906.03656

Submission history

From: Tu Vu [view email]
[v1] Sun, 9 Jun 2019 15:18:53 UTC (149 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2019-06

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Tu Vu
Mohit Iyyer

export BibTeX citation

Computer Science > Computation and Language

Title:Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators