Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering

Guo, Danfeng; Terzopoulos, Demetri

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.21368 (cs)

[Submitted on 31 Jul 2024]

Title:Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering

Authors:Danfeng Guo, Demetri Terzopoulos

View PDF HTML (experimental)

Abstract:Large Vision-Language Models (LVLMs) have achieved significant success in recent years, and they have been extended to the medical domain. Although demonstrating satisfactory performance on medical Visual Question Answering (VQA) tasks, Medical LVLMs (MLVLMs) suffer from the hallucination problem, which makes them fail to diagnose complex pathologies. Moreover, they readily fail to learn minority pathologies due to imbalanced training data. We propose two prompting strategies for MLVLMs that reduce hallucination and improve VQA performance. In the first strategy, we provide a detailed explanation of the queried pathology. In the second strategy, we fine-tune a cheap, weak learner to achieve high performance on a specific metric, and textually provide its judgment to the MLVLM. Tested on the MIMIC-CXR-JPG and Chexpert datasets, our methods significantly improve the diagnostic F1 score, with the highest increase being 0.27. We also demonstrate that our prompting strategies can be extended to general LVLM domains. Based on POPE metrics, it effectively suppresses the false negative predictions of existing LVLMs and improves Recall by approximately 0.07.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2407.21368 [cs.CV]
	(or arXiv:2407.21368v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.21368

Submission history

From: Danfeng Guo [view email]
[v1] Wed, 31 Jul 2024 06:34:38 UTC (4,673 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators