MedHEval: Benchmarking Hallucinations and Mitigation Strategies in Medical Large Vision-Language Models

Chang, Aofei; Huang, Le; Bhatia, Parminder; Kass-Hout, Taha; Ma, Fenglong; Xiao, Cao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.02157 (cs)

[Submitted on 4 Mar 2025]

Title:MedHEval: Benchmarking Hallucinations and Mitigation Strategies in Medical Large Vision-Language Models

Authors:Aofei Chang, Le Huang, Parminder Bhatia, Taha Kass-Hout, Fenglong Ma, Cao Xiao

View PDF HTML (experimental)

Abstract:Large Vision Language Models (LVLMs) are becoming increasingly important in the medical domain, yet Medical LVLMs (Med-LVLMs) frequently generate hallucinations due to limited expertise and the complexity of medical applications. Existing benchmarks fail to effectively evaluate hallucinations based on their underlying causes and lack assessments of mitigation strategies. To address this gap, we introduce MedHEval, a novel benchmark that systematically evaluates hallucinations and mitigation strategies in Med-LVLMs by categorizing them into three underlying causes: visual misinterpretation, knowledge deficiency, and context misalignment. We construct a diverse set of close- and open-ended medical VQA datasets with comprehensive evaluation metrics to assess these hallucination types. We conduct extensive experiments across 11 popular (Med)-LVLMs and evaluate 7 state-of-the-art hallucination mitigation techniques. Results reveal that Med-LVLMs struggle with hallucinations arising from different causes while existing mitigation methods show limited effectiveness, especially for knowledge- and context-based errors. These findings underscore the need for improved alignment training and specialized mitigation strategies to enhance Med-LVLMs' reliability. MedHEval establishes a standardized framework for evaluating and mitigating medical hallucinations, guiding the development of more trustworthy Med-LVLMs.

Comments:	Preprint, under review
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2503.02157 [cs.CV]
	(or arXiv:2503.02157v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.02157

Submission history

From: Aofei Chang [view email]
[v1] Tue, 4 Mar 2025 00:40:09 UTC (5,023 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MedHEval: Benchmarking Hallucinations and Mitigation Strategies in Medical Large Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MedHEval: Benchmarking Hallucinations and Mitigation Strategies in Medical Large Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators