RGB-D saliency detection aims to fuse multi-modal cues to accurately localize salient regions. Existing works often adopt attention modules for feature modeling, with few methods explicitly leveraging fine-grained details to merge with semantic cues. Thus, despite the auxiliary depth information, it is still challenging for existing models to distinguish objects with similar appearances but at distinct camera distances. In this paper, from a new perspective, we propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection. Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies. To realize multi-modal and multi-level fusion, we first use a granularity-based attention scheme to strengthen the discriminatory power of RGB and depth features separately. Then we introduce a unified cross dual-attention module for multi-modal and multi-level fusion in a coarse-to-fine manner. The encoded multi-modal features are gradually aggregated into a shared decoder. Further, we exploit a multi-scale loss to take full advantage of the hierarchical information. Extensive experiments on challenging benchmark datasets demonstrate that our HiDAnet performs favorably over the state-of-the-art methods by large margins.
翻译:RGB-D显要性检测旨在整合多式导线,以准确定位突出区域。现有的工程往往采用特效模型的注意模块,采用极少的方法,明确利用细细细细细细节与语义导线结合。因此,尽管有辅助深度信息,但现有模型仍难以区分相貌相似但相距相近的物体。在本文件中,我们从新角度为RGB突出区域探测工作提出了一个新型的跨级深度感知网络(HIDANET),我们的积极性来自以下观察:几何前方的多色谱特性与神经网络等级结构密切相关。为了实现多式和多级融合,我们首先使用基于颗粒的注意计划,以分别加强RGB和深度特征的歧视性力量。然后,我们从新的角度为多式和多级混合感知网络(HIDANET)推出一个统一的跨级双向模块。我们调多式多式的多式感知识识识识网特征正在逐渐整合成一个共享的分流体特性特性。我们首先利用一个具有挑战性的多级基级模型的优势,然后展示了我们高层次的跨级基准数据,从而展示了我们高级数据库的高级数据库的优势。