Salient object detection requires a comprehensive and scalable receptive field to locate the visually significant objects in the image. Recently, the emergence of visual transformers and multi-branch modules has significantly enhanced the ability of neural networks to perceive objects at different scales. However, compared to the traditional backbone, the calculation process of transformers is time-consuming. Moreover, different branches of the multi-branch modules could cause the same error back propagation in each training iteration, which is not conducive to extracting discriminative features. To solve these problems, we propose a bilateral network based on transformer and CNN to efficiently broaden local details and global semantic information simultaneously. Besides, a Multi-Head Boosting (MHB) strategy is proposed to enhance the specificity of different network branches. By calculating the errors of different prediction heads, each branch can separately pay more attention to the pixels that other branches predict incorrectly. Moreover, Unlike multi-path parallel training, MHB randomly selects one branch each time for gradient back propagation in a boosting way. Additionally, an Attention Feature Fusion Module (AF) is proposed to fuse two types of features according to respective characteristics. Comprehensive experiments on five benchmark datasets demonstrate that the proposed method can achieve a significant performance improvement compared with the state-of-the-art methods.
翻译:显性天体探测要求有一个全面和可扩展的可扩缩的接收场以定位图像中的可见重要天体。 最近,视觉变压器和多部门模块的出现大大提高了神经网络在不同尺度上观测天体的能力。 但是,与传统的主干相比,变压器的计算过程耗时。 此外,多部门模块的不同分支可能会在每次培训迭代中造成同样的错误反向传播,这不利于提取歧视性特征。为了解决这些问题,我们提议在变压器和CNN的基础上建立一个双边网络,以便同时有效地扩大本地细节和全球语义信息。此外,还提议采用多领导人推动(MHB)战略来增强不同网络分支的特性。通过计算不同预测头的错误,每个分支可以分别更多地注意其他分支预测错误的像素。此外,与多方向平行培训不同,MHB随机选择一个分支,每次以加速的方式加速变色度回传播。此外,一个注意地变形模块(AFAF)将两种功能的精细化类型与不同的特性进行比较。