At the core of Camouflaged Object Detection (COD) lies segmenting objects from their highly similar surroundings. Previous efforts navigate this challenge primarily through image-level modeling or annotation-based optimization. Despite advancing considerably, this commonplace practice hardly taps valuable dataset-level contextual information or relies on laborious annotations. In this paper, we propose RISE, a RetrIeval SElf-augmented paradigm that exploits the entire training dataset to generate pseudo-labels for single images, which could be used to train COD models. RISE begins by constructing prototype libraries for environments and camouflaged objects using training images (without ground truth), followed by K-Nearest Neighbor (KNN) retrieval to generate pseudo-masks for each image based on these libraries. It is important to recognize that using only training images without annotations exerts a pronounced challenge in crafting high-quality prototype libraries. In this light, we introduce a Clustering-then-Retrieval (CR) strategy, where coarse masks are first generated through clustering, facilitating subsequent histogram-based image filtering and cross-category retrieval to produce high-confidence prototypes. In the KNN retrieval stage, to alleviate the effect of artifacts in feature maps, we propose Multi-View KNN Retrieval (MVKR), which integrates retrieval results from diverse views to produce more robust and precise pseudo-masks. Extensive experiments demonstrate that RISE outperforms state-of-the-art unsupervised and prompt-based methods. Code is available at https://github.com/xiaohainku/RISE.
翻译:伪装目标检测(Camouflaged Object Detection, COD)的核心在于从高度相似的背景中分割出目标物体。先前的研究主要通过图像级建模或基于标注的优化来应对这一挑战。尽管取得了显著进展,但这种常见做法几乎未能利用宝贵的数据集级上下文信息,或依赖于费力的标注工作。本文提出RISE(RetrIeval SElf-augmented paradigm),一种检索自增强范式,它利用整个训练数据集为单张图像生成伪标签,可用于训练COD模型。RISE首先使用训练图像(无需真实标注)构建环境与伪装物体的原型库,随后通过K近邻(K-Nearest Neighbor, KNN)检索,基于这些库为每张图像生成伪掩码。必须认识到,仅使用无标注的训练图像对构建高质量原型库提出了显著挑战。为此,我们引入一种“先聚类后检索”(Clustering-then-Retrieval, CR)策略:首先通过聚类生成粗掩码,进而促进基于直方图的图像过滤与跨类别检索,以产生高置信度的原型。在KNN检索阶段,为减轻特征图中伪影的影响,我们提出多视图KNN检索(Multi-View KNN Retrieval, MVKR),它整合来自不同视图的检索结果,以生成更鲁棒、更精确的伪掩码。大量实验表明,RISE在性能上超越了当前最先进的无监督及基于提示的方法。代码发布于https://github.com/xiaohainku/RISE。