Are we using the right potential functions in the Conditional Random Field models that are popular in the Vision community? Semantic segmentation and other pixel-level labelling tasks have made significant progress recently due to the deep learning paradigm. However, most state-of-the-art structured prediction methods also include a random field model with a hand-crafted Gaussian potential to model spatial priors, label consistencies and feature-based image conditioning. In this paper, we challenge this view by developing a new inference and learning framework which can learn pairwise CRF potentials restricted only by their dependence on the image pixel values and the size of the support. Both standard spatial and high-dimensional bilateral kernels are considered. Our framework is based on the observation that CRF inference can be achieved via projected gradient descent and consequently, can easily be integrated in deep neural networks to allow for end-to-end training. It is empirically demonstrated that such learned potentials can improve segmentation accuracy and that certain label class interactions are indeed better modelled by a non-Gaussian potential. In addition, we compare our inference method to the commonly used mean-field algorithm. Our framework is evaluated on several public benchmarks for semantic segmentation with improved performance compared to previous state-of-the-art CNN+CRF models.
翻译:我们是否在视觉界流行的有条件随机场模型中使用了适当的潜在功能?语义分解和其他像素级标签任务最近由于深层次学习模式而取得重大进展。然而,大多数最先进的结构化预测方法也包含随机的实地模型,这种随机模型具有手工制作的高斯模拟空间前行的潜力,标签包含各种内容和基于地貌的图像调节。在本文中,我们通过开发一个新的推论和学习框架来挑战这一观点,这种框架能够学习到对齐的通用报告格式潜力,而只有依赖图像像素值和支持规模才限制对齐的通用报告格式潜力。我们考虑的标准空间和高度双边内核。我们的框架基于这样的观察,即通用报告格式推论可以通过预测梯度下降实现,因此可以很容易地融入深神经网络,以便进行端对端培训。我们的经验证明,这种学到的潜力可以提高分解准确性,而且某些标签类互动确实通过非伽西文的潜在能力来更好地模拟。此外,我们将我们先前的分级和高空双边内核分析框架比了我们先前的分界法比了我们共同使用的标准。