The multi-modal salient object detection model based on RGB-D information has better robustness in the real world. However, it remains nontrivial to better adaptively balance effective multi-modal information in the feature fusion phase. In this letter, we propose a novel gated recoding network (GRNet) to evaluate the information validity of the two modes, and balance their influence. Our framework is divided into three phases: perception phase, recoding mixing phase and feature integration phase. First, A perception encoder is adopted to extract multi-level single-modal features, which lays the foundation for multi-modal semantic comparative analysis. Then, a modal-adaptive gate unit (MGU) is proposed to suppress the invalid information and transfer the effective modal features to the recoding mixer and the hybrid branch decoder. The recoding mixer is responsible for recoding and mixing the balanced multi-modal information. Finally, the hybrid branch decoder completes the multi-level feature integration under the guidance of an optional edge guidance stream (OEGS). Experiments and analysis on eight popular benchmarks verify that our framework performs favorably against 9 state-of-art methods.
翻译:以 RGB-D 信息为基础的多式显著物体探测模型在现实世界中更加稳健。 但是,在功能融合阶段,要更好地在适应性上平衡有效的多模式信息,它仍然是非边际的。 在本信中,我们提议建立一个新型的门式重新编码网络(GRNet),以评价两种模式的信息有效性,并平衡其影响。我们的框架分为三个阶段:认知阶段、重新编码混合阶段和特征整合阶段。首先,采用感知编码器来提取多级单一模式特征,为多模式语义比较分析打下基础。然后,提议建立一个模式适应门单元(MGUU),以抑制无效的信息,并将有效的模式特性转移到重新编码混合组合器和混合分解码器。重新编码混合器负责重新编码和混合平衡的多模式信息。最后,混合分解码器在选择边缘指导流(OEGS)的指导下完成多层次特征整合。然后,根据8项流行基准框架进行实验和分析,以便对照8项流行基准进行我们的实验和分析。