Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models?

Zhang, Zeliang; Liang, Xin; Feng, Mingqian; Liang, Susan; Xu, Chenliang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2410.10160 (cs)

[Submitted on 14 Oct 2024]

Title:Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models?

Authors:Zeliang Zhang, Xin Liang, Mingqian Feng, Susan Liang, Chenliang Xu

View PDF HTML (experimental)

Abstract:As the demand for high-quality training data escalates, researchers have increasingly turned to generative models to create synthetic data, addressing data scarcity and enabling continuous model improvement. However, reliance on self-generated data introduces a critical question: Will this practice amplify bias in future models? While most research has focused on overall performance, the impact on model bias, particularly subgroup bias, remains underexplored. In this work, we investigate the effects of the generated data on image classification tasks, with a specific focus on bias. We develop a practical simulation environment that integrates a self-consuming loop, where the generative model and classification model are trained synergistically. Hundreds of experiments are conducted on Colorized MNIST, CIFAR-20/100, and Hard ImageNet datasets to reveal changes in fairness metrics across generations. In addition, we provide a conjecture to explain the bias dynamics when training models on continuously augmented datasets across generations. Our findings contribute to the ongoing debate on the implications of synthetic data for fairness in real-world applications.

Comments:	15 pages, 7 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2410.10160 [cs.CV]
	(or arXiv:2410.10160v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2410.10160

Submission history

From: Zeliang Zhang [view email]
[v1] Mon, 14 Oct 2024 05:07:06 UTC (13,238 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators