Vision Foundation Models for Computed Tomography

Pai, Suraj; Hadzic, Ibrahim; Bontempi, Dennis; Bressem, Keno; Kann, Benjamin H.; Fedorov, Andriy; Mak, Raymond H.; Aerts, Hugo J. W. L.

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2501.09001 (eess)

[Submitted on 15 Jan 2025]

Title:Vision Foundation Models for Computed Tomography

Authors:Suraj Pai (1 and 2 and 3), Ibrahim Hadzic (1 and 2 and 3), Dennis Bontempi (1 and 2 and 3), Keno Bressem (4 and 5), Benjamin H. Kann (1 and 3), Andriy Fedorov (6), Raymond H. Mak (1 and 3), Hugo J. W. L. Aerts (1 and 2 and 3 and 6) ((1) Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, (2) Radiology and Nuclear Medicine, CARIM & GROW, Maastricht University, (3) Department of Radiation Oncology, Brigham and Women's Hospital, Dana-Farber Cancer Institute, Harvard Medical School, (4) Department of Diagnostic and Interventional Radiology, Technical University of Munich, School of Medicine and Health, Klinikum rechts der Isar, TUM University Hospital, (5) Department of Cardiovascular Radiology and Nuclear Medicine, Technical University of Munich, School of Medicine and Health, German Heart Center, TUM University Hospital, (6) Department of Radiology, Brigham and Women's Hospital, Dana-Farber Cancer Institute, Harvard Medical School)

View PDF

Abstract:Foundation models (FMs) have shown transformative potential in radiology by performing diverse, complex tasks across imaging modalities. Here, we developed CT-FM, a large-scale 3D image-based pre-trained model designed explicitly for various radiological tasks. CT-FM was pre-trained using 148,000 computed tomography (CT) scans from the Imaging Data Commons through label-agnostic contrastive learning. We evaluated CT-FM across four categories of tasks, namely, whole-body and tumor segmentation, head CT triage, medical image retrieval, and semantic understanding, showing superior performance against state-of-the-art models. Beyond quantitative success, CT-FM demonstrated the ability to cluster regions anatomically and identify similar anatomical and structural concepts across scans. Furthermore, it remained robust across test-retest settings and indicated reasonable salient regions attached to its embeddings. This study demonstrates the value of large-scale medical imaging foundation models and by open-sourcing the model weights, code, and data, aims to support more adaptable, reliable, and interpretable AI solutions in radiology.

Comments:	6 figures, followed by 9 Extended Data Figures and a Supplementary Information document
Subjects:	Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.09001 [eess.IV]
	(or arXiv:2501.09001v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2501.09001

Submission history

From: Suraj Pai [view email]
[v1] Wed, 15 Jan 2025 18:30:58 UTC (14,304 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Vision Foundation Models for Computed Tomography

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Vision Foundation Models for Computed Tomography

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators