Controlling Bias Exposure for Fair Interpretable Predictions

He, Zexue; Wang, Yu; McAuley, Julian; Majumder, Bodhisattwa Prasad

Computer Science > Computation and Language

arXiv:2210.07455 (cs)

[Submitted on 14 Oct 2022 (v1), last revised 22 Oct 2022 (this version, v2)]

Title:Controlling Bias Exposure for Fair Interpretable Predictions

Authors:Zexue He, Yu Wang, Julian McAuley, Bodhisattwa Prasad Majumder

View PDF

Abstract:Recent work on reducing bias in NLP models usually focuses on protecting or isolating information related to a sensitive attribute (like gender or race). However, when sensitive information is semantically entangled with the task information of the input, e.g., gender information is predictive for a profession, a fair trade-off between task performance and bias mitigation is difficult to achieve. Existing approaches perform this trade-off by eliminating bias information from the latent space, lacking control over how much bias is necessarily required to be removed. We argue that a favorable debiasing method should use sensitive information 'fairly', rather than blindly eliminating it (Caliskan et al., 2017; Sun et al., 2019; Bogen et al., 2020). In this work, we provide a novel debiasing algorithm by adjusting the predictive model's belief to (1) ignore the sensitive information if it is not useful for the task; (2) use sensitive information minimally as necessary for the prediction (while also incurring a penalty). Experimental results on two text classification tasks (influenced by gender) and an open-ended generation task (influenced by race) indicate that our model achieves a desirable trade-off between debiasing and task performance along with producing debiased rationales as evidence.

Comments:	Accepted to EMNLP-2022 Findings
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2210.07455 [cs.CL]
	(or arXiv:2210.07455v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.07455

Submission history

From: Zexue He [view email]
[v1] Fri, 14 Oct 2022 01:49:01 UTC (743 KB)
[v2] Sat, 22 Oct 2022 06:43:53 UTC (745 KB)

Computer Science > Computation and Language

Title:Controlling Bias Exposure for Fair Interpretable Predictions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Controlling Bias Exposure for Fair Interpretable Predictions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators