Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Zhang, Yuhui; Su, Yuchang; Liu, Yiming; Wang, Xiaohan; Burgess, James; Sui, Elaine; Wang, Chenyu; Aklilu, Josiah; Lozano, Alejandro; Wei, Anjiang; Schmidt, Ludwig; Yeung-Levy, Serena

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.03225 (cs)

[Submitted on 6 Jan 2025]

Title:Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Authors:Yuhui Zhang, Yuchang Su, Yiming Liu, Xiaohan Wang, James Burgess, Elaine Sui, Chenyu Wang, Josiah Aklilu, Alejandro Lozano, Anjiang Wei, Ludwig Schmidt, Serena Yeung-Levy

View PDF HTML (experimental)

Abstract:The rapid development of vision language models (VLMs) demands rigorous and reliable evaluation. However, current visual question answering (VQA) benchmarks often depend on open-ended questions, making accurate evaluation difficult due to the variability in natural language responses. To address this, we introduce AutoConverter, an agentic framework that automatically converts these open-ended questions into multiple-choice format, enabling objective evaluation while reducing the costly question creation process. Our experiments demonstrate that AutoConverter can generate correct and challenging multiple-choice questions, with VLMs demonstrating consistently similar or lower accuracy on these questions compared to human-created ones. Using AutoConverter, we construct VMCBench, a benchmark created by transforming 20 existing VQA datasets into a unified multiple-choice format, totaling 9,018 questions. We comprehensively evaluate 33 state-of-the-art VLMs on VMCBench, setting a new standard for scalable, consistent, and reproducible VLM evaluation.

Comments:	Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2501.03225 [cs.CV]
	(or arXiv:2501.03225v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.03225

Submission history

From: Yuhui Zhang [view email]
[v1] Mon, 6 Jan 2025 18:57:31 UTC (6,255 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators