Salient object detection (SOD) has been well studied in recent years, especially using deep neural networks. However, SOD with RGB and RGB-D images is usually treated as two different tasks with different network structures that need to be designed specifically. In this paper, we proposed a unified and efficient structure with a cross-attention context extraction (CRACE) module to address both tasks of SOD efficiently. The proposed CRACE module receives and appropriately fuses two (for RGB SOD) or three (for RGB-D SOD) inputs. The simple unified feature pyramid network (FPN)-like structure with CRACE modules conveys and refines the results under the multi-level supervisions of saliency and boundaries. The proposed structure is simple yet effective; the rich context information of RGB and depth can be appropriately extracted and fused by the proposed structure efficiently. Experimental results show that our method outperforms other state-of-the-art methods in both RGB and RGB-D SOD tasks on various datasets and in terms of most metrics.
翻译:近些年来,特别利用深神经网络,对突出物体的探测进行了深入研究,然而,使用RGB和RGB-D图像的SOD通常被视为两种不同的任务,需要专门设计不同的网络结构;在本文件中,我们建议建立一个统一和有效的结构,具有一个跨注意背景提取模块,以高效地处理SOD的两项任务;拟议的CRACE模块接收并适当结合了2项(RGB SOD)或3项(RGB-D SOD)投入;使用CRACE模块的简单统一特征金字塔式网络(FPN)类似结构,在对显著和边界的多层次监督下传达和完善结果;拟议的结构简单而有效;拟议结构的丰富背景信息和深度可以通过拟议的结构加以适当提取和整合;实验结果表明,我们的方法在RGB和RGB-D SOD的任务中超越了各种数据集和大多数指标方面的其他状态方法。