LEMoN: Label Error Detection using Multimodal Neighbors

Zhang, Haoran; Balagopalan, Aparna; Oufattole, Nassim; Jeong, Hyewon; Wu, Yan; Zhu, Jiacheng; Ghassemi, Marzyeh

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.18941 (cs)

[Submitted on 10 Jul 2024]

Title:LEMoN: Label Error Detection using Multimodal Neighbors

Authors:Haoran Zhang, Aparna Balagopalan, Nassim Oufattole, Hyewon Jeong, Yan Wu, Jiacheng Zhu, Marzyeh Ghassemi

View PDF HTML (experimental)

Abstract:Large repositories of image-caption pairs are essential for the development of vision-language models. However, these datasets are often extracted from noisy data scraped from the web, and contain many mislabeled examples. In order to improve the reliability of downstream models, it is important to identify and filter images with incorrect captions. However, beyond filtering based on image-caption embedding similarity, no prior works have proposed other methods to filter noisy multimodal data, or concretely assessed the impact of noisy captioning data on downstream training. In this work, we propose LEMoN, a method to automatically identify label errors in multimodal datasets. Our method leverages the multimodal neighborhood of image-caption pairs in the latent space of contrastively pretrained multimodal models. We find that our method outperforms the baselines in label error identification, and that training on datasets filtered using our method improves downstream classification and captioning performance.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2407.18941 [cs.CV]
	(or arXiv:2407.18941v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.18941

Submission history

From: Haoran Zhang [view email]
[v1] Wed, 10 Jul 2024 19:36:30 UTC (6,580 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:LEMoN: Label Error Detection using Multimodal Neighbors

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:LEMoN: Label Error Detection using Multimodal Neighbors

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators