Precise Task Formalization Matters in Winograd Schema Evaluations

Liu, Haokun; Huang, William; Mungra, Dhara A.; Bowman, Samuel R.

Computer Science > Computation and Language

arXiv:2010.04043 (cs)

[Submitted on 8 Oct 2020]

Title:Precise Task Formalization Matters in Winograd Schema Evaluations

Authors:Haokun Liu, William Huang, Dhara A. Mungra, Samuel R. Bowman

View PDF

Abstract:Performance on the Winograd Schema Challenge (WSC), a respected English commonsense reasoning benchmark, recently rocketed from chance accuracy to 89% on the SuperGLUE leaderboard, with relatively little corroborating evidence of a correspondingly large improvement in reasoning ability. We hypothesize that much of this improvement comes from recent changes in task formalization---the combination of input specification, loss function, and reuse of pretrained parameters---by users of the dataset, rather than improvements in the pretrained model's reasoning ability. We perform an ablation on two Winograd Schema datasets that interpolates between the formalizations used before and after this surge, and find (i) framing the task as multiple choice improves performance by 2-6 points and (ii) several additional techniques, including the reuse of a pretrained language modeling head, can mitigate the model's extreme sensitivity to hyperparameters. We urge future benchmark creators to impose additional structure to minimize the impact of formalization decisions on reported results.

Comments:	Accepted to the EMNLP 2020 conference
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2010.04043 [cs.CL]
	(or arXiv:2010.04043v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.04043

Submission history

From: Haokun Liu [view email]
[v1] Thu, 8 Oct 2020 15:10:47 UTC (266 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-10

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Samuel R. Bowman

export BibTeX citation

Computer Science > Computation and Language

Title:Precise Task Formalization Matters in Winograd Schema Evaluations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Precise Task Formalization Matters in Winograd Schema Evaluations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators