利用基于变异器的对称性双边U-Net,促进远端物体探测 (Boosting Salient Object Detection with Transformer-based Asymmetric Bilateral U-Net)

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Existing salient object detection (SOD) methods mainly rely on U-shaped convolution neural networks (CNNs) with skip connections to combine the global contexts and local spatial details that are crucial for locating salient objects and refining object details, respectively. Despite great successes, the ability of CNNs in learning global contexts is limited. Recently, the vision transformer has achieved revolutionary progress in computer vision owing to its powerful modeling of global dependencies. However, directly applying the transformer to SOD is suboptimal because the transformer lacks the ability to learn local spatial representations. To this end, this paper explores the combination of transformers and CNNs to learn both global and local representations for SOD. We propose a transformer-based Asymmetric Bilateral U-Net (ABiU-Net). The asymmetric bilateral encoder has a transformer path and a lightweight CNN path, where the two paths communicate at each encoder stage to learn complementary global contexts and local spatial details, respectively. The asymmetric bilateral decoder also consists of two paths to process features from the transformer and CNN encoder paths, with communication at each decoder stage for decoding coarse salient object locations and fine-grained object details, respectively. Such communication between the two encoder/decoder paths enables AbiU-Net to learn complementary global and local representations, taking advantage of the natural properties of transformers and CNNs, respectively. Hence, ABiU-Net provides a new perspective for transformer-based SOD. Extensive experiments demonstrate that ABiU-Net performs favorably against previous state-of-the-art SOD methods. The code is available at https://github.com/yuqiuyuqiu/ABiU-Net.

翻译：现有显著天体探测(SOD)方法主要依赖U形混凝土网络神经网络网络(CNNs),这些网络相互连接,将全球背景和本地空间细节结合起来,对于定位突出天体和完善天体细节至关重要。尽管取得了巨大成功,但CNN在全球学习环境中的能力有限。最近,视觉变压器因其强大的全球依赖性模型,在计算机愿景方面取得了革命性的进展。然而,直接将变压器应用于SOD并不理想,因为变压器缺乏学习当地空间表现的能力。为此,本文探讨了变压器和CNN的组合,以学习全球变压器和CNN的变压器和本地空间细节。我们提议采用基于变压器的变压器和CNN的变压器和本地空间细节来学习 SOD(ABU-Net) 。不对称的双边变压器和CNN的变压器的变压器-变压器-变压器-变压式网络(SBiU-Net-Net) 和变压式的变压器-变压器-变压器-变压器-变压器-变压的路径之间, 将SBi-reader-de-de-deal-dealdeal-deal-dealdealdealdeal decoal dreal decommotion Adalal dalalal-dalalal ladeal dal dal dal dal dalational dal 提供两种不同的通信, 。