Salient object detection (SOD) focuses on distinguishing the most conspicuous objects in the scene. However, most related works are based on RGB images, which lose massive useful information. Accordingly, with the maturity of thermal technology, RGB-T (RGB-Thermal) multi-modality tasks attain more and more attention. Thermal infrared images carry important information which can be used to improve the accuracy of SOD prediction. To accomplish it, the methods to integrate multi-modal information and suppress noises are critical. In this paper, we propose a novel network called Interactive Context-Aware Network (ICANet). It contains three modules that can effectively perform the cross-modal and cross-scale fusions. We design a Hybrid Feature Fusion (HFF) module to integrate the features of two modalities, which utilizes two types of feature extraction. The Multi-Scale Attention Reinforcement (MSAR) and Upper Fusion (UF) blocks are responsible for the cross-scale fusion that converges different levels of features and generate the prediction maps. We also raise a novel Context-Aware Multi-Supervised Network (CAMSNet) to calculate the content loss between the prediction and the ground truth (GT). Experiments prove that our network performs favorably against the state-of-the-art RGB-T SOD methods.
翻译:显性天体探测(SOD)侧重于区分现场最突出的物体。然而,大多数相关工程都以RGB图像为基础,这些图像失去了大量有用的信息。因此,随着热技术的成熟,RGB-T(RGBB-Themal)多式任务越来越得到越来越多的关注。热红红外图像包含重要信息,可用于提高SOD预测的准确性。要完成这一任务,整合多模式信息和抑制噪音的方法至关重要。在本文件中,我们提议建立一个名为互动背景软件网络(ICANet)的新网络。它包含三个模块,能够有效地执行跨模式和跨规模的聚变。我们设计了一个混合地貌聚合模块(HFF),以整合两种模式的特性,即两种类型地貌提取。多层注意强化(MSAROD)和上层熔化(UF)区负责跨尺度的聚合,使不同层面的特性趋同,并生成预测地图。我们还提出了一个新型的内源多维网络(CMSNet),可以有效地进行跨式和跨规模的聚合网络(CGV-OD),用以计算我们测算的真理-GD网络。