OFL-SAM2：通过在线小样本学习器提示SAM2实现高效医学图像分割 (OFL-SAM2: Prompt SAM2 with Online Few-shot Learner for Efficient Medical Image Segmentation)

The Segment Anything Model 2 (SAM2) has demonstrated remarkable promptable visual segmentation capabilities in video data, showing potential for extension to medical image segmentation (MIS) tasks involving 3D volumes and temporally correlated 2D image sequences. However, adapting SAM2 to MIS presents several challenges, including the need for extensive annotated medical data for fine-tuning and high-quality manual prompts, which are both labor-intensive and require intervention from medical experts. To address these challenges, we introduce OFL-SAM2, a prompt-free SAM2 framework for label-efficient MIS. Our core idea is to leverage limited annotated samples to train a lightweight mapping network that captures medical knowledge and transforms generic image features into target features, thereby providing additional discriminative target representations for each frame and eliminating the need for manual prompts. Crucially, the mapping network supports online parameter update during inference, enhancing the model's generalization across test sequences. Technically, we introduce two key components: (1) an online few-shot learner that trains the mapping network to generate target features using limited data, and (2) an adaptive fusion module that dynamically integrates the target features with the memory-attention features generated by frozen SAM2, leading to accurate and robust target representation. Extensive experiments on three diverse MIS datasets demonstrate that OFL-SAM2 achieves state-of-the-art performance with limited training data.

翻译：Segment Anything Model 2 (SAM2) 在视频数据中展现了卓越的可提示视觉分割能力，显示出将其扩展到涉及3D体数据和时序相关2D图像序列的医学图像分割（MIS）任务的潜力。然而，将SAM2适配到MIS面临若干挑战，包括需要大量标注医学数据进行微调以及高质量的人工提示，这两者都劳动密集且需要医学专家的干预。为解决这些挑战，我们提出了OFL-SAM2，一个用于标签高效MIS的无提示SAM2框架。我们的核心思想是利用有限的标注样本来训练一个轻量级映射网络，该网络捕获医学知识并将通用图像特征转换为目标特征，从而为每一帧提供额外的判别性目标表示，并消除了对人工提示的需求。至关重要的是，该映射网络支持在推理过程中在线更新参数，从而增强了模型在测试序列间的泛化能力。在技术上，我们引入了两个关键组件：(1) 一个在线小样本学习器，用于训练映射网络以利用有限数据生成目标特征；(2) 一个自适应融合模块，动态地将目标特征与冻结SAM2生成的内存注意力特征相融合，从而产生准确且鲁棒的目标表示。在三个不同的MIS数据集上进行的大量实验表明，OFL-SAM2在有限训练数据下实现了最先进的性能。

相关内容