用于高效 RGB-D 和视频显要物体探测的深度质量- 受质量- 启发的地物操作 (Depth Quality-Inspired Feature Manipulation for Efficient RGB-D and Video Salient Object Detection)

Recently CNN-based RGB-D salient object detection (SOD) has obtained significant improvement on detection accuracy. However, existing models often fail to perform well in terms of efficiency and accuracy simultaneously. This hinders their potential applications on mobile devices as well as many real-world problems. To bridge the accuracy gap between lightweight and large models for RGB-D SOD, in this paper, an efficient module that can greatly improve the accuracy but adds little computation is proposed. Inspired by the fact that depth quality is a key factor influencing the accuracy, we propose an efficient depth quality-inspired feature manipulation (DQFM) process, which can dynamically filter depth features according to depth quality. The proposed DQFM resorts to the alignment of low-level RGB and depth features, as well as holistic attention of the depth stream to explicitly control and enhance cross-modal fusion. We embed DQFM to obtain an efficient lightweight RGB-D SOD model called DFM-Net, where we in addition design a tailored depth backbone and a two-stage decoder as basic parts. Extensive experimental results on nine RGB-D datasets demonstrate that our DFM-Net outperforms recent efficient models, running at about 20 FPS on CPU with only 8.5Mb model size, and meanwhile being 2.9/2.4 times faster and 6.7/3.1 times smaller than the latest best models A2dele and MobileSal. It also maintains state-of-the-art accuracy when even compared to non-efficient models. Interestingly, further statistics and analyses verify the ability of DQFM in distinguishing depth maps of various qualities without any quality labels. Last but not least, we further apply DFM-Net to deal with video SOD (VSOD), achieving comparable performance against recent efficient models while being 3/2.3 times faster/smaller than the prior best in this field. Our code is available at https://github.com/zwbx/DFM-Net.

翻译：最近有CNN的 RGB-D显要物体探测(SOD)在探测准确性方面有了显著改善,但现有模型往往在效率和准确性方面不能同时在效率和准确性方面同时取得良好的效果。这妨碍了其在移动设备上的潜在应用以及许多现实世界问题。为了缩小轻量和大模型之间在RGB-D SOD的精确性差,本文件提出了一个高效模块,可以大大提高准确性,但增加很少的计算。深质质量是影响准确性的关键因素,因此我们建议采用高效深度质量激励特性操作(DQFM)程序,它可以根据深度质量质量质量进行动态过滤。提议的DQFMM程序采用低水平 RGB和深度特性的匹配,以及整体关注深度流以明确控制和加强跨模式的融合。我们嵌入DQFM模式以获得高效的 RGB-D SOD模型称为DFM-Net,我们不仅设计了定制的深度主干线,而且还设计了两个阶段的解码作为基本部分。在9个深度实验结果上,甚至没有对 RGB-DS-D质量的深度数据分析,在最新的S-S-RMSDM 数据分析中,在任何最新的S-S-mode-mode-mode-mode-mode 之前,我们最短的S-d-mod-mod-d-mod-d-d-mod-d-mod-d-d-mod-d-d-d-d-mod-mod-mod-mod-mod-d-d-d-d-d-d-d-d-d-d-d-mod-d-mod-mod-mod-d-mod-mod-mod-mod-s-d-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod-mod