Understanding Bias in Large-Scale Visual Datasets

Zeng, Boya; Yin, Yida; Liu, Zhuang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.01876 (cs)

[Submitted on 2 Dec 2024]

Title:Understanding Bias in Large-Scale Visual Datasets

Authors:Boya Zeng, Yida Yin, Zhuang Liu

View PDF HTML (experimental)

Abstract:A recent study has shown that large-scale visual datasets are very biased: they can be easily classified by modern neural networks. However, the concrete forms of bias among these datasets remain unclear. In this study, we propose a framework to identify the unique visual attributes distinguishing these datasets. Our approach applies various transformations to extract semantic, structural, boundary, color, and frequency information from datasets, and assess how much each type of information reflects their bias. We further decompose their semantic bias with object-level analysis, and leverage natural language methods to generate detailed, open-ended descriptions of each dataset's characteristics. Our work aims to help researchers understand the bias in existing large-scale pre-training datasets, and build more diverse and representative ones in the future. Our project page and code are available at this http URL .

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2412.01876 [cs.CV]
	(or arXiv:2412.01876v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.01876

Submission history

From: Yida Yin [view email]
[v1] Mon, 2 Dec 2024 18:56:52 UTC (9,836 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2024-12

Change to browse by:

cs
cs.LG

References & Citations

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Understanding Bias in Large-Scale Visual Datasets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Understanding Bias in Large-Scale Visual Datasets

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators