We propose a novel energy-based prior for generative saliency prediction, where the latent variables follow an informative energy-based prior. Both the saliency generator and the energy-based prior are jointly trained via Markov chain Monte Carlo-based maximum likelihood estimation, in which the sampling from the intractable posterior and prior distributions of the latent variables are performed by Langevin dynamics. With the generative saliency model, we can obtain a pixel-wise uncertainty map from an image, indicating model confidence in the saliency prediction. Different from existing generative models, which define the prior distribution of the latent variable as a simple isotropic Gaussian distribution, our model uses an energy-based informative prior which can be more expressive in capturing the latent space of the data. With the informative energy-based prior, we extend the Gaussian distribution assumption of generative models to achieve a more representative distribution of the latent space, leading to more reliable uncertainty estimation. We apply the proposed frameworks to both RGB and RGB-D salient object detection tasks with both transformer and convolutional neural network backbones. Experimental results show that our generative saliency model with an energy-based prior can achieve not only accurate saliency predictions but also reliable uncertainty maps that are consistent with human perception.
翻译:我们提议在基因显著性预测之前先用新的能源为基础,潜伏变量遵循的是先前以能源为基础的信息化预测。显要生成器和以前能源为基础的变量都通过Markov链进行联合培训,在基于Monte Carlo的最大可能性估计中,由Langevin动力学对棘手的远地点和潜在变量的先前分布进行取样。通过基因显著性模型,我们可以从一个图像中获取一个像素智慧的不确定性图,表明对显要性预测的模型信心。不同于现有的基因模型,该模型将潜伏变量的先前分布定义为简单的等向高斯分布,我们模型使用之前的能源信息化信息化方法,在捕捉数据的潜在空间时可以更清晰地显示。在以前基于能源的信息化模型中,我们扩展了Gausian分布模型的假设,以更具有代表性的潜伏空间分布,从而导致更可靠的不确定性估计。我们把拟议框架应用于RGB和RGB显要性物体探测任务,同时将潜在变量作为简单的异质神经网络主干线。实验结果显示,我们之前的精确性模型也只能实现一种稳定的人类基因显著性预测。