In this paper, we conduct extensive research on exploring the contribution of transformers to salient object detection, achieving both accurate and reliable saliency predictions. We first investigate transformers for accurate salient object detection with deterministic neural networks, and explain that the effective structure modeling and global context modeling abilities lead to its superior performance compared with the CNN based frameworks. Then, we design stochastic networks to evaluate the transformers' ability in reliable salient object detection. We observe that both CNN and transformer based frameworks suffer greatly from the over-confidence issue, where the models tend to generate wrong predictions with high confidence, leading to over-confident predictions or a poorly-calibrated model. To estimate the calibration degree of both CNN- and transformer-based frameworks for reliable saliency prediction, we introduce generative adversarial network (GAN) based models to identify the over-confident regions by sampling from the latent space. Specifically, we present the inferential generative adversarial network (iGAN). Different from the conventional GAN based framework, which defines the distribution of the latent variable as fixed standard normal distribution N(0,1), the proposed "iGAN" infers the latent variable by gradient-based Markov Chain Monte Carlo (MCMC), namely Langevin dynamics. We apply the proposed inferential generative adversarial network (iGAN) to both fully and weakly supervised salient object detection, and explain that iGAN within the transformer framework leads to both accurate and reliable salient object detection. The source code and experimental results are publicly available via our project page: https://github.com/fupiao1998/TrasformerSOD.
翻译:在本文中,我们进行了广泛的研究,探讨变压器对突出物体探测的贡献,同时实现准确和可靠的显著预测。我们首先调查变压器,以便通过确定性神经神经网络进行精确突出物体探测;我们首先调查变压器,以便用确定性神经神经网络网络进行精确的预测,并解释有效的结构建模和全球背景建模能力使其与CNN所依据的框架相比表现优异。然后,我们设计变压器网络,以评估变压器在可靠突出物体探测方面的能力。我们观察到,以CNNN和变压器为基础的框架都因过度信任问题而深受其害。在这种问题上,模型往往产生错误的预测,从而导致过分自信的预测,导致对目标的预测或有不完全精确的、有准确的变压的变压器模型。为了估计CN-和变压器基础框架的校准度,我们提出的G-变压式网络(我们提出的变压式G-变压的变压式G-变压式网络)和变压式的变压式G-变压式G-变压式的变压式G-变压式G-G-G-G-G-G-G-G-G-LV-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-L-L-G-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L