In this paper, we introduce SAM3-UNet, a simplified variant of Segment Anything Model 3 (SAM3), designed to adapt SAM3 for downstream tasks at a low cost. Our SAM3-UNet consists of three components: a SAM3 image encoder, a simple adapter for parameter-efficient fine-tuning, and a lightweight U-Net-style decoder. Preliminary experiments on multiple tasks, such as mirror detection and salient object detection, demonstrate that the proposed SAM3-UNet outperforms the prior SAM2-UNet and other state-of-the-art methods, while requiring less than 6 GB of GPU memory during training with a batch size of 12. The code is publicly available at https://github.com/WZH0120/SAM3-UNet.
翻译:本文提出SAM3-UNet,作为Segment Anything Model 3(SAM3)的简化变体,旨在以较低成本将SAM3适配至下游任务。我们的SAM3-UNet包含三个组件:SAM3图像编码器、用于参数高效微调的简单适配器,以及轻量级U-Net风格解码器。在镜像检测和显著目标检测等多个任务上的初步实验表明,所提出的SAM3-UNet在批大小为12的训练过程中仅需不足6 GB的GPU显存,其性能超越了先前的SAM2-UNet及其他先进方法。代码已公开于https://github.com/WZH0120/SAM3-UNet。