SegICL: A Multimodal In-context Learning Framework for Enhanced Segmentation in Medical Imaging

Shen, Lingdong; Shang, Fangxin; Huang, Xiaoshuang; Yang, Yehui; Huang, Haifeng; Xiang, Shiming

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.16578v3 (cs)

[Submitted on 25 Mar 2024 (v1), revised 29 May 2024 (this version, v3), latest version 30 May 2024 (v4)]

Title:SegICL: A Multimodal In-context Learning Framework for Enhanced Segmentation in Medical Imaging

Authors:Lingdong Shen, Fangxin Shang, Xiaoshuang Huang, Yehui Yang, Haifeng Huang, Shiming Xiang

View PDF HTML (experimental)

Abstract:In the field of medical image segmentation, tackling Out-of-Distribution (OOD) segmentation tasks in a cost-effective manner remains a significant challenge. Universal segmentation models is a solution, which aim to generalize across the diverse modality of medical images, yet their effectiveness often diminishes when applied to OOD data modalities and tasks, requiring intricate fine-tuning of model for optimal performance. Few-shot learning segmentation methods are typically designed for specific modalities of data and cannot be directly transferred for use with another modality. Therefore, we introduce SegICL, a novel approach leveraging In-Context Learning (ICL) for image segmentation. Unlike existing methods, SegICL has the capability to employ text-guided segmentation and conduct in-context learning with a small set of image-mask pairs, eliminating the need for training the model from scratch or fine-tuning for OOD tasks (including OOD modality and dataset). Extensive experimental demonstrates a positive correlation between the number of shots and segmentation performance on OOD tasks. The performance of segmentation when provided thre-shots is approximately 1.5 times better than the performance in a zero-shot setting. This indicates that SegICL effectively address new segmentation tasks based on contextual information. Additionally, SegICL also exhibits comparable performance to mainstream models on OOD and in-distribution tasks. Our code will be released after paper review.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2403.16578 [cs.CV]
	(or arXiv:2403.16578v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.16578

Submission history

From: Fangxin Shang [view email]
[v1] Mon, 25 Mar 2024 09:43:56 UTC (3,411 KB)
[v2] Tue, 2 Apr 2024 09:55:02 UTC (3,411 KB)
[v3] Wed, 29 May 2024 07:00:22 UTC (4,936 KB)
[v4] Thu, 30 May 2024 03:35:06 UTC (4,936 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SegICL: A Multimodal In-context Learning Framework for Enhanced Segmentation in Medical Imaging

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SegICL: A Multimodal In-context Learning Framework for Enhanced Segmentation in Medical Imaging

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators