SU-SAM: A Simple Unified Framework for Adapting Segment Anything Model in Underperformed Scenes

Song, Yiran; Zhou, Qianyu; Lu, Xuequan; Shao, Zhiwen; Ma, Lizhuang

Abstract:Segment anything model (SAM) has demonstrated excellent generalizability in common vision scenarios, yet falling short of the ability to understand specialized data. Recently, several methods have combined parameter-efficient techniques with task-specific designs to fine-tune SAM on particular tasks. However, these methods heavily rely on handcraft, complicated, and task-specific designs, and pre/post-processing to achieve acceptable performances on downstream tasks. As a result, this severely restricts generalizability to other downstream tasks. To address this issue, we present a simple and unified framework, namely SU-SAM, that can easily and efficiently fine-tune the SAM model with parameter-efficient techniques while maintaining excellent generalizability toward various downstream tasks. SU-SAM does not require any task-specific designs and aims to improve the adaptability of SAM-like models significantly toward underperformed scenes. Concretely, we abstract parameter-efficient modules of different methods into basic design elements in our framework. Besides, we propose four variants of SU-SAM, i.e., series, parallel, mixed, and LoRA structures. Comprehensive experiments on nine datasets and six downstream tasks to verify the effectiveness of SU-SAM, including medical image segmentation, camouflage object detection, salient object segmentation, surface defect segmentation, complex object shapes, and shadow masking. Our experimental results demonstrate that SU-SAM achieves competitive or superior accuracy compared to state-of-the-art methods. Furthermore, we provide in-depth analyses highlighting the effectiveness of different parameter-efficient designs within SU-SAM. In addition, we propose a generalized model and benchmark, showcasing SU-SAM's generalizability across all diverse datasets simultaneously.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2401.17803 [cs.CV]
	(or arXiv:2401.17803v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2401.17803

Computer Science > Computer Vision and Pattern Recognition

Title:SU-SAM: A Simple Unified Framework for Adapting Segment Anything Model in Underperformed Scenes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators