VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

Choong, Wey Yeh; Guo, Yangyang; Kankanhalli, Mohan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.16771 (cs)

[Submitted on 25 Nov 2024]

Title:VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

Authors:Wey Yeh Choong, Yangyang Guo, Mohan Kankanhalli

View PDF HTML (experimental)

Abstract:Vision Large Language Models (VLLMs) are widely acknowledged to be prone to hallucination. Existing research addressing this problem has primarily been confined to image inputs, with limited exploration of video-based hallucinations. Furthermore, current evaluation methods fail to capture nuanced errors in generated responses, which are often exacerbated by the rich spatiotemporal dynamics of videos. To address this, we introduce VidHal, a benchmark specially designed to evaluate video-based hallucinations in VLLMs. VidHal is constructed by bootstrapping video instances across common temporal aspects. A defining feature of our benchmark lies in the careful creation of captions which represent varying levels of hallucination associated with each video. To enable fine-grained evaluation, we propose a novel caption ordering task requiring VLLMs to rank captions by hallucinatory extent. We conduct extensive experiments on VidHal and comprehensively evaluate a broad selection of models. Our results uncover significant limitations in existing VLLMs regarding hallucination generation. Through our benchmark, we aim to inspire further research on 1) holistic understanding of VLLM capabilities, particularly regarding hallucination, and 2) extensive development of advanced VLLMs to alleviate this problem.

Comments:	8 pages, 10 figures. Code available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.16771 [cs.CV]
	(or arXiv:2411.16771v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.16771

Submission history

From: Wey Yeh Choong [view email]
[v1] Mon, 25 Nov 2024 06:17:23 UTC (10,681 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators