MedThink: Inducing Medical Large-scale Visual Language Models to Hallucinate Less by Thinking More

Jiang, Yue; Chen, Jiawei; Yang, Dingkang; Li, Mingcheng; Wang, Shunli; Wu, Tong; Li, Ke; Zhang, Lihua

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.11451v2 (cs)

[Submitted on 17 Jun 2024 (v1), last revised 18 Jun 2024 (this version, v2)]

Title:MedThink: Inducing Medical Large-scale Visual Language Models to Hallucinate Less by Thinking More

Authors:Yue Jiang, Jiawei Chen, Dingkang Yang, Mingcheng Li, Shunli Wang, Tong Wu, Ke Li, Lihua Zhang

View PDF HTML (experimental)

Abstract:When Large Vision Language Models (LVLMs) are applied to multimodal medical generative tasks, they suffer from significant model hallucination issues. This severely impairs the model's generative accuracy, making it challenging for LVLMs to be implemented in real-world medical scenarios to assist doctors in diagnosis. Enhancing the training data for downstream medical generative tasks is an effective way to address model hallucination. Moreover, the limited availability of training data in the medical field and privacy concerns greatly hinder the model's accuracy and generalization capabilities. In this paper, we introduce a method that mimics human cognitive processes to construct fine-grained instruction pairs and apply the concept of chain-of-thought (CoT) from inference scenarios to training scenarios, thereby proposing a method called MedThink. Our experiments on various LVLMs demonstrate that our novel data construction method tailored for the medical domain significantly improves the model's performance in medical image report generation tasks and substantially mitigates the hallucinations. All resources of this work will be released soon.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2406.11451 [cs.CV]
	(or arXiv:2406.11451v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.11451

Submission history

From: Jiawei Chen [view email]
[v1] Mon, 17 Jun 2024 12:03:32 UTC (315 KB)
[v2] Tue, 18 Jun 2024 14:20:46 UTC (316 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MedThink: Inducing Medical Large-scale Visual Language Models to Hallucinate Less by Thinking More

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MedThink: Inducing Medical Large-scale Visual Language Models to Hallucinate Less by Thinking More

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators