Salient object detection is a fundamental topic in computer vision. Previous methods based on RGB-D often suffer from the incompatibility of multi-modal feature fusion and the insufficiency of multi-scale feature aggregation. To tackle these two dilemmas, we propose a novel multi-modal and multi-scale refined network (M2RNet). Three essential components are presented in this network. The nested dual attention module (NDAM) explicitly exploits the combined features of RGB and depth flows. The adjacent interactive aggregation module (AIAM) gradually integrates the neighbor features of high, middle and low levels. The joint hybrid optimization loss (JHOL) makes the predictions have a prominent outline. Extensive experiments demonstrate that our method outperforms other state-of-the-art approaches.
翻译:显性物体探测是计算机视觉的一个基本主题。 以 RGB-D 为基础的以往方法往往因多模式特性融合不兼容和多规模特性聚合不足而受到影响。 为了解决这两个难题,我们提议建立一个新的多模式和多规模改进网络(M2RNet) 。 这个网络包含三个基本组成部分。 嵌套的双重关注模块(NDAM) 明确利用了RGB和深度流的综合特征。 相邻的互动聚合模块(AIAM) 逐渐融合了高、中、低层次的相邻特征。 联合混合优化损失(JHOL) 使预测有一个突出的轮廓。 广泛的实验表明,我们的方法优于其他最先进的方法。