Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

Schick, Timo; Udupa, Sahana; Schütze, Hinrich

Computer Science > Computation and Language

arXiv:2103.00453v1 (cs)

[Submitted on 28 Feb 2021 (this version), latest version 9 Sep 2021 (v2)]

Title:Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

Authors:Timo Schick, Sahana Udupa, Hinrich Schütze

View PDF

Abstract:When trained on large, unfiltered crawls from the internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: they often generate racist, sexist, violent or otherwise toxic language. As large models often require millions of training examples to achieve good performance, it is difficult to completely prevent them from being exposed to such content. In this paper, we investigate whether pretrained language models at least know when they exhibit some undesirable bias or produce toxic content. Based on our findings, we propose a decoding algorithm that reduces the probability of a model producing problematic text given only a textual description of the undesired behavior. This algorithm does not rely on manually curated word lists, nor does it require any training data or changes to the model's parameters. While our approach does by no means eliminate the issue of language models generating biased text, we believe it to be an important step in this direction.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2103.00453 [cs.CL]
	(or arXiv:2103.00453v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2103.00453

Submission history

From: Timo Schick [view email]
[v1] Sun, 28 Feb 2021 11:07:37 UTC (35 KB)
[v2] Thu, 9 Sep 2021 14:45:48 UTC (51 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-03

Change to browse by:

References & Citations

1 blog link

(what is this?)

DBLP - CS Bibliography

listing | bibtex

Timo Schick
Hinrich Schütze

export BibTeX citation

Computer Science > Computation and Language

Title:Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators