SABAF: Removing Strong Attribute Bias from Neural Networks with Adversarial Filtering

Li, Jiazhi; Khayatkhoei, Mahyar; Zhu, Jiageng; Xie, Hanchen; Hussein, Mohamed E.; AbdAlmageed, Wael

Computer Science > Machine Learning

arXiv:2311.07141 (cs)

[Submitted on 13 Nov 2023 (v1), last revised 16 Nov 2023 (this version, v2)]

Title:SABAF: Removing Strong Attribute Bias from Neural Networks with Adversarial Filtering

Authors:Jiazhi Li, Mahyar Khayatkhoei, Jiageng Zhu, Hanchen Xie, Mohamed E. Hussein, Wael AbdAlmageed

View PDF

Abstract:Ensuring a neural network is not relying on protected attributes (e.g., race, sex, age) for prediction is crucial in advancing fair and trustworthy AI. While several promising methods for removing attribute bias in neural networks have been proposed, their limitations remain under-explored. To that end, in this work, we mathematically and empirically reveal the limitation of existing attribute bias removal methods in presence of strong bias and propose a new method that can mitigate this limitation. Specifically, we first derive a general non-vacuous information-theoretical upper bound on the performance of any attribute bias removal method in terms of the bias strength, revealing that they are effective only when the inherent bias in the dataset is relatively weak. Next, we derive a necessary condition for the existence of any method that can remove attribute bias regardless of the bias strength. Inspired by this condition, we then propose a new method using an adversarial objective that directly filters out protected attributes in the input space while maximally preserving all other attributes, without requiring any specific target label. The proposed method achieves state-of-the-art performance in both strong and moderate bias settings. We provide extensive experiments on synthetic, image, and census datasets, to verify the derived theoretical bound and its consequences in practice, and evaluate the effectiveness of the proposed method in removing strong attribute bias.

Comments:	35 pages, 18 figures, 32 tables. This work is an extended version of our paper (arXiv:2310.04955). Code will be released at this https URL
Subjects:	Machine Learning (cs.LG); Computers and Society (cs.CY)
Cite as:	arXiv:2311.07141 [cs.LG]
	(or arXiv:2311.07141v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.07141

Submission history

From: Jiazhi Li [view email]
[v1] Mon, 13 Nov 2023 08:13:55 UTC (3,441 KB)
[v2] Thu, 16 Nov 2023 07:23:17 UTC (3,441 KB)

Computer Science > Machine Learning

Title:SABAF: Removing Strong Attribute Bias from Neural Networks with Adversarial Filtering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:SABAF: Removing Strong Attribute Bias from Neural Networks with Adversarial Filtering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators