Focusing on the issue of how to effectively capture and utilize cross-modality information in RGB-D salient object detection (SOD) task, we present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement. For the cross-modality interaction, 1) a progressive attention guided integration unit is proposed to sufficiently integrate RGB-D feature representations in the encoder stage, and 2) a convergence aggregation structure is proposed, which flows the RGB and depth decoding features into the corresponding RGB-D decoding streams via an importance gated fusion unit in the decoder stage. For the cross-modality refinement, we insert a refinement middleware structure between the encoder and the decoder, in which the RGB, depth, and RGB-D encoder features are further refined by successively using a self-modality attention refinement unit and a cross-modality weighting refinement unit. At last, with the gradually refined features, we predict the saliency map in the decoder stage. Extensive experiments on six popular RGB-D SOD benchmarks demonstrate that our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
翻译:侧重于如何有效捕捉和利用RGB-D突出物体探测(SOD)任务中的跨模式信息的问题,我们根据新的跨模式互动和完善,提出了一个名为CIR-Net的动态神经网络模型(CNN),该模型以新型的跨模式互动和完善为基础;关于跨模式互动,建议一个渐进式引导集成单元,以充分整合RGB-D特征在编码器阶段的表现;和(2)提议一个集成结构,将RGB和深度解码特性通过一个在解码器阶段的重要门式融合单元流入相应的RGB-D解码流中。关于跨模式的改进,我们插入一个精细的中器结构,在编码器和解码器之间插入一个精细的中器结构,其中RGB、深度和RGB-D的编码特征通过连续使用一个自调调调调调的精度改进单元和一个交叉模式的精度加精细的改进单元来进一步完善RGB-D。最后,我们预测了在解码器化阶段的精度地图,在分解器化阶段,并展示了我们的国家质量-GRGB-G-G-G-GM-GM的六级基准的深度测试。