SU-SAM: A Simple Unified Framework for Adapting Segment Anything Model in Underperformed Scenes

Lizhuang Ma; Qianyu Zhou; Xuequan Lu; Yiran Song; Zhiwen Shao

arxiv: 2401.17803 · v2 · pith:VBXY7KYTnew · submitted 2024-01-31 · 💻 cs.CV

SU-SAM: A Simple Unified Framework for Adapting Segment Anything Model in Underperformed Scenes

Yiran Song , Qianyu Zhou , Xuequan Lu , Zhiwen Shao , Lizhuang Ma This is my paper

classification 💻 cs.CV

keywords su-samtasksdesignsdownstreamgeneralizabilitymethodsmodelparameter-efficient

0 comments

read the original abstract

Segment anything model (SAM) has demonstrated excellent generalizability in common vision scenarios, yet falling short of the ability to understand specialized data. Recently, several methods have combined parameter-efficient techniques with task-specific designs to fine-tune SAM on particular tasks. However, these methods heavily rely on handcraft, complicated, and task-specific designs, and pre/post-processing to achieve acceptable performances on downstream tasks. As a result, this severely restricts generalizability to other downstream tasks. To address this issue, we present a simple and unified framework, namely SU-SAM, that can easily and efficiently fine-tune the SAM model with parameter-efficient techniques while maintaining excellent generalizability toward various downstream tasks. SU-SAM does not require any task-specific designs and aims to improve the adaptability of SAM-like models significantly toward underperformed scenes. Concretely, we abstract parameter-efficient modules of different methods into basic design elements in our framework. Besides, we propose four variants of SU-SAM, i.e., series, parallel, mixed, and LoRA structures. Comprehensive experiments on nine datasets and six downstream tasks to verify the effectiveness of SU-SAM, including medical image segmentation, camouflage object detection, salient object segmentation, surface defect segmentation, complex object shapes, and shadow masking. Our experimental results demonstrate that SU-SAM achieves competitive or superior accuracy compared to state-of-the-art methods. Furthermore, we provide in-depth analyses highlighting the effectiveness of different parameter-efficient designs within SU-SAM. In addition, we propose a generalized model and benchmark, showcasing SU-SAM's generalizability across all diverse datasets simultaneously.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CLIP-Guided SAM: Parameter-Efficient Semantic Conditioning for Promptable Segmentation
cs.CV 2026-05 unverdicted novelty 5.0

CLIP-Guided SAM injects CLIP-derived features into SAM via lightweight adapters for semantic conditioning, supporting text and spatial prompts while remaining parameter-efficient and achieving competitive performance.