Visual Prompt Engineering for Medical Vision Language Models in Radiology

Denner, Stefan; Bujotzek, Markus; Bounias, Dimitrios; Zimmerer, David; Stock, Raphael; Jäger, Paul F.; Maier-Hein, Klaus

Computer Science > Computer Vision and Pattern Recognition

arXiv:2408.15802 (cs)

[Submitted on 28 Aug 2024]

Title:Visual Prompt Engineering for Medical Vision Language Models in Radiology

Authors:Stefan Denner, Markus Bujotzek, Dimitrios Bounias, David Zimmerer, Raphael Stock, Paul F. Jäger, Klaus Maier-Hein

View PDF HTML (experimental)

Abstract:Medical image classification in radiology faces significant challenges, particularly in generalizing to unseen pathologies. In contrast, CLIP offers a promising solution by leveraging multimodal learning to improve zero-shot classification performance. However, in the medical domain, lesions can be small and might not be well represented in the embedding space. Therefore, in this paper, we explore the potential of visual prompt engineering to enhance the capabilities of Vision Language Models (VLMs) in radiology. Leveraging BiomedCLIP, trained on extensive biomedical image-text pairs, we investigate the impact of embedding visual markers directly within radiological images to guide the model's attention to critical regions. Our evaluation on the JSRT dataset, focusing on lung nodule malignancy classification, demonstrates that incorporating visual prompts $\unicode{x2013}$ such as arrows, circles, and contours $\unicode{x2013}$ significantly improves classification metrics including AUROC, AUPRC, F1 score, and accuracy. Moreover, the study provides attention maps, showcasing enhanced model interpretability and focus on clinically relevant areas. These findings underscore the efficacy of visual prompt engineering as a straightforward yet powerful approach to advance VLM performance in medical image analysis.

Comments:	Accepted at ECCV 2024 Workshop on Emergent Visual Abilities and Limits of Foundation Models
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2408.15802 [cs.CV]
	(or arXiv:2408.15802v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.15802

Submission history

From: Stefan Denner [view email]
[v1] Wed, 28 Aug 2024 13:53:27 UTC (33,326 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Visual Prompt Engineering for Medical Vision Language Models in Radiology

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Visual Prompt Engineering for Medical Vision Language Models in Radiology

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators