Understanding the context of complex and cluttered scenes is a challenging problem for semantic segmentation. However, it is difficult to model the context without prior and additional supervision because the scene's factors, such as the scale, shape, and appearance of objects, vary considerably in these scenes. To solve this, we propose to learn the structures of objects and the hierarchy among objects because context is based on these intrinsic properties. In this study, we design novel hierarchical, contextual, and multiscale pyramidal representations to capture the properties from an input image. Our key idea is the recursive segmentation in different hierarchical regions based on a predefined number of regions and the aggregation of the context in these regions. The aggregated contexts are used to predict the contextual relationship between the regions and partition the regions in the following hierarchical level. Finally, by constructing the pyramid representations from the recursively aggregated context, multiscale and hierarchical properties are attained. In the experiments, we confirmed that our proposed method achieves state-of-the-art performance in PASCAL Context.
翻译:理解复杂和杂乱的场景的背景是一个具有挑战性的语义分割问题。然而,如果没有事先和额外的监督,很难模拟场景,因为场景的各种因素,如天体的规模、形状和外观,在这些场景中差异很大。为了解决这个问题,我们提议学习天体结构和天体之间的等级,因为上下文基于这些内在特性。我们在这次研究中设计了新的等级、背景和多尺度的金字塔图示,从输入图像中捕捉属性。我们的关键思想是,根据预先确定的区域数目和这些区域的环境组合,在不同等级区域的递归性分割。综合背景环境被用来预测各区域之间的背景关系,并在以下等级层次划分区域。最后,通过从循环汇总环境中构建金字塔表,实现多尺度和等级特性。在实验中,我们确认我们提出的方法在PASAL环境中取得了最新性能。