MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts

Chen, Qiuhui; Hu, Xinyue; Wang, Zirui; Hong, Yi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2305.10799 (cs)

[Submitted on 18 May 2023]

Title:MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts

Authors:Qiuhui Chen, Xinyue Hu, Zirui Wang, Yi Hong

View PDF

Abstract:Vision-language pre-training (VLP) models have been demonstrated to be effective in many computer vision applications. In this paper, we consider developing a VLP model in the medical domain for making computer-aided diagnoses (CAD) based on image scans and text descriptions in electronic health records, as done in practice. To achieve our goal, we present a lightweight CAD system MedBLIP, a new paradigm for bootstrapping VLP from off-the-shelf frozen pre-trained image encoders and frozen large language models. We design a MedQFormer module to bridge the gap between 3D medical images and 2D pre-trained image encoders and language models as well. To evaluate the effectiveness of our MedBLIP, we collect more than 30,000 image volumes from five public Alzheimer's disease (AD) datasets, i.e., ADNI, NACC, OASIS, AIBL, and MIRIAD. On this largest AD dataset we know, our model achieves the SOTA performance on the zero-shot classification of healthy, mild cognitive impairment (MCI), and AD subjects, and shows its capability of making medical visual question answering (VQA). The code and pre-trained models is available online: this https URL.

Comments:	11 pages, 3 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2305.10799 [cs.CV]
	(or arXiv:2305.10799v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2305.10799

Submission history

From: Qiuhui Chen [view email]
[v1] Thu, 18 May 2023 08:19:33 UTC (1,673 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators