Many deep learning architectures for semantic segmentation involve a Fully Convolutional Neural Network (FCN) followed by a Conditional Random Field (CRF) to carry out inference over an image. These models typically involve unary potentials based on local appearance features computed by FCNs, and binary potentials based on the displacement between pixels. We show that while current methods succeed in segmenting whole objects, they perform poorly in situations involving a large number of object parts. We therefore suggest incorporating into the inference algorithm additional higher-order potentials inspired by the way humans identify and localize parts. We incorporate two relations that were shown to be useful to human object identification - containment and attachment - into the energy term of the CRF and evaluate their performance on the Pascal VOC Parts dataset. Our experimental results show that the segmentation of fine parts is positively affected by the addition of these two relations, and that the segmentation of fine parts can be further influenced by complex structural features.
翻译:许多关于语义分解的深层次学习结构涉及一个全面进化神经网络,然后是有条件随机场,对图像进行推断。这些模型通常涉及基于FCN所计算的当地外观特征的单一潜力,以及基于像素之间偏移的二进制潜力。我们表明,虽然目前的方法成功地分解了整个物体,但在涉及大量物体组成部分的情况下,它们的表现很差。因此,我们建议在推理算法中增加人类识别和定位部分的方式所激发的更高顺序潜力。我们把两种关系,我们证明对识别人类物体有用——封隔和附加——纳入通用报告格式的能源术语中,并评价其在Pascal VOC部件数据集上的性能。我们的实验结果表明,细部分的分解因添加这两个部分而受到积极的影响,细部分的分解可能受到复杂结构特征的进一步影响。