Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback

Lerner, Emilia Agis; Dorner, Florian E.; Ash, Elliott; Goel, Naman

Computer Science > Machine Learning

arXiv:2406.05902 (cs)

[Submitted on 9 Jun 2024]

Title:Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback

Authors:Emilia Agis Lerner, Florian E. Dorner, Elliott Ash, Naman Goel

View PDF HTML (experimental)

Abstract:There is a growing body of work on learning from human feedback to align various aspects of machine learning systems with human values and preferences. We consider the setting of fairness in content moderation, in which human feedback is used to determine how two comments -- referencing different sensitive attribute groups -- should be treated in comparison to one another. With a novel dataset collected from Prolific and MTurk, we find significant gaps in fairness preferences depending on the race, age, political stance, educational level, and LGBTQ+ identity of annotators. We also demonstrate that demographics mentioned in text have a strong influence on how users perceive individual fairness in moderation. Further, we find that differences also exist in downstream classifiers trained to predict human preferences. Finally, we observe that an ensemble, giving equal weight to classifiers trained on annotations from different demographics, performs better for different demographic intersections; compared to a single classifier that gives equal weight to each annotation.

Comments:	To appear in the Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
Cite as:	arXiv:2406.05902 [cs.LG]
	(or arXiv:2406.05902v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.05902

Submission history

From: Naman Goel [view email]
[v1] Sun, 9 Jun 2024 19:42:25 UTC (7,335 KB)

Computer Science > Machine Learning

Title:Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators