From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models

Bhatia, Mehar; Ravi, Sahithya; Chinchure, Aditya; Hwang, Eunjeong; Shwartz, Vered

Computer Science > Computation and Language

arXiv:2407.00263 (cs)

[Submitted on 28 Jun 2024]

Title:From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models

Authors:Mehar Bhatia, Sahithya Ravi, Aditya Chinchure, Eunjeong Hwang, Vered Shwartz

View PDF HTML (experimental)

Abstract:Despite recent advancements in vision-language models, their performance remains suboptimal on images from non-western cultures due to underrepresentation in training datasets. Various benchmarks have been proposed to test models' cultural inclusivity, but they have limited coverage of cultures and do not adequately assess cultural diversity across universal as well as culture-specific local concepts. To address these limitations, we introduce the GlobalRG benchmark, comprising two challenging tasks: retrieval across universals and cultural visual grounding. The former task entails retrieving culturally diverse images for universal concepts from 50 countries, while the latter aims at grounding culture-specific concepts within images from 15 countries. Our evaluation across a wide range of models reveals that the performance varies significantly across cultures -- underscoring the necessity for enhancing multicultural understanding in vision-language models.

Comments:	Under peer review
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.00263 [cs.CL]
	(or arXiv:2407.00263v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.00263

Submission history

From: Mehar Bhatia [view email]
[v1] Fri, 28 Jun 2024 23:28:28 UTC (31,161 KB)

Computer Science > Computation and Language

Title:From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators