Who's the (Multi-)Fairest of Them \textsc{All}: Rethinking Interpolation-Based Data Augmentation Through the Lens of Multicalibration

Halevy, Karina; Hou, Karly; Badrinath, Charumathi

Computer Science > Machine Learning

arXiv:2412.10575 (cs)

[Submitted on 13 Dec 2024]

Title:Who's the (Multi-)Fairest of Them \textsc{All}: Rethinking Interpolation-Based Data Augmentation Through the Lens of Multicalibration

Authors:Karina Halevy, Karly Hou, Charumathi Badrinath

View PDF HTML (experimental)

Abstract:Data augmentation methods, especially SoTA interpolation-based methods such as Fair Mixup, have been widely shown to increase model fairness. However, this fairness is evaluated on metrics that do not capture model uncertainty and on datasets with only one, relatively large, minority group. As a remedy, multicalibration has been introduced to measure fairness while accommodating uncertainty and accounting for multiple minority groups. However, existing methods of improving multicalibration involve reducing initial training data to create a holdout set for post-processing, which is not ideal when minority training data is already sparse. This paper uses multicalibration to more rigorously examine data augmentation for classification fairness. We stress-test four versions of Fair Mixup on two structured data classification problems with up to 81 marginalized groups, evaluating multicalibration violations and balanced accuracy. We find that on nearly every experiment, Fair Mixup \textit{worsens} baseline performance and fairness, but the simple vanilla Mixup \textit{outperforms} both Fair Mixup and the baseline, especially when calibrating on small groups. \textit{Combining} vanilla Mixup with multicalibration post-processing, which enforces multicalibration through post-processing on a holdout set, further increases fairness.

Comments:	Expanded version of AAAI 2025 main track paper. 8 pages, 2 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2412.10575 [cs.LG]
	(or arXiv:2412.10575v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.10575

Submission history

From: Karina Halevy [view email]
[v1] Fri, 13 Dec 2024 21:36:12 UTC (553 KB)

Computer Science > Machine Learning

Title:Who's the (Multi-)Fairest of Them \textsc{All}: Rethinking Interpolation-Based Data Augmentation Through the Lens of Multicalibration

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Who's the (Multi-)Fairest of Them \textsc{All}: Rethinking Interpolation-Based Data Augmentation Through the Lens of Multicalibration

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators