Transformer, which originates from machine translation, is particularly powerful at modeling long-range dependencies. Currently, the transformer is making revolutionary progress in various vision tasks, leading to significant performance improvements compared with the convolutional neural network (CNN) based frameworks. In this paper, we conduct extensive research on exploiting the contributions of transformers for accurate and reliable salient object detection. For the former, we apply transformer to a deterministic model, and explain that the effective structure modeling and global context modeling abilities lead to its superior performance compared with the CNN based frameworks. For the latter, we observe that both CNN and transformer based frameworks suffer greatly from the over-confidence issue, where the models tend to generate wrong predictions with high confidence. To estimate the reliability degree of both CNN- and transformer-based frameworks, we further present a latent variable model, namely inferential generative adversarial network (iGAN), based on the generative adversarial network (GAN). The stochastic attribute of the latent variable makes it convenient to estimate the predictive uncertainty, serving as an auxiliary output to evaluate the reliability of model prediction. Different from the conventional GAN, which defines the distribution of the latent variable as fixed standard normal distribution $\mathcal{N}(0,\mathbf{I})$, the proposed iGAN infers the latent variable by gradient-based Markov Chain Monte Carlo (MCMC), namely Langevin dynamics, leading to an input-dependent latent variable model. We apply our proposed iGAN to both fully and weakly supervised salient object detection, and explain that iGAN within the transformer framework leads to both accurate and reliable salient object detection.
翻译:由机器翻译产生的变压器在模拟远程依赖性方面特别强大。 目前,变压器在各种愿景任务方面正在取得革命性的进展,导致与基于 convolual 神经网络(CNN) 的框架相比,业绩显著改善。 在本文中,我们对利用变压器的贡献进行广泛研究,以便准确和可靠地检测显著对象。 对于前者,我们将变压器应用到确定性模型,并解释有效的结构建模和全球背景建模能力导致其与CNN基于框架的准确性能。 对于后者,我们观察到CNN和变压器基础框架在各种愿景任务方面都取得了革命性的进展,从而导致与基于 convolual 神经网络(CNN) 的框架相比,业绩大幅改进。为了估计基于CNN 和变压器框架的可靠性,我们进一步展示了一种潜在的变压模型,即以GAN+N 的变压式变压式变压器,我们从常规的变压式变压式变压式变压式变压式变压式变压式变压式变式变式变压式变压式变式的 。