Unnatural Language Inference

Sinha, Koustuv; Parthasarathi, Prasanna; Pineau, Joelle; Williams, Adina

Computer Science > Computation and Language

arXiv:2101.00010v1 (cs)

[Submitted on 30 Dec 2020 (this version), latest version 11 Jun 2021 (v2)]

Title:Unnatural Language Inference

Authors:Koustuv Sinha, Prasanna Parthasarathi, Joelle Pineau, Adina Williams

View PDF

Abstract:Natural Language Understanding has witnessed a watershed moment with the introduction of large pre-trained Transformer networks. These models achieve state-of-the-art on various tasks, notably including Natural Language Inference (NLI). Many studies have shown that the large representation space imbibed by the models encodes some syntactic and semantic information. However, to really "know syntax", a model must recognize when its input violates syntactic rules and calculate inferences accordingly. In this work, we find that state-of-the-art NLI models, such as RoBERTa and BART are invariant to, and sometimes even perform better on, examples with randomly reordered words. With iterative search, we are able to construct randomized versions of NLI test sets, which contain permuted hypothesis-premise pairs with the same words as the original, yet are classified with perfect accuracy by large pre-trained models, as well as pre-Transformer state-of-the-art encoders. We find the issue to be language and model invariant, and hence investigate the root cause. To partially alleviate this effect, we propose a simple training methodology. Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.

Comments:	10 pages + appendix
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2101.00010 [cs.CL]
	(or arXiv:2101.00010v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2101.00010

Submission history

From: Koustuv Sinha [view email]
[v1] Wed, 30 Dec 2020 20:40:48 UTC (5,752 KB)
[v2] Fri, 11 Jun 2021 03:44:22 UTC (8,743 KB)

Computer Science > Computation and Language

Title:Unnatural Language Inference

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Unnatural Language Inference

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators