Exploiting multi-scale features has shown great potential in tackling semantic segmentation problems. The aggregation is commonly done with sum or concatenation (concat) followed by convolutional (conv) layers. However, it fully passes down the high-level context to the following hierarchy without considering their interrelation. In this work, we aim to enable the low-level feature to aggregate the complementary context from adjacent high-level feature maps by a cross-scale pixel-to-region relation operation. We leverage cross-scale context propagation to make the long-range dependency capturable even by the high-resolution low-level features. To this end, we employ an efficient feature pyramid network to obtain multi-scale features. We propose a Relational Semantics Extractor (RSE) and Relational Semantics Propagator (RSP) for context extraction and propagation respectively. Then we stack several RSP into an RSP head to achieve the progressive top-down distribution of the context. Experiment results on two challenging datasets Cityscapes and COCO demonstrate that the RSP head performs competitively on both semantic segmentation and panoptic segmentation with high efficiency. It outperforms DeeplabV3 [1] by 0.7% with 75% fewer FLOPs (multiply-adds) in the semantic segmentation task.
翻译:挖掘多尺度的特征在解决语义分解问题方面具有巨大潜力。 聚合通常是以总和或连接( Concat) (Concate) (Concate) (Concate) (Conv) (convolution (conv) ) 来完成。 但是, 在这项工作中, 我们的目标是使低层次的特征能够通过一个跨比例像素到区域关系操作将相邻的高层次地貌图的互补环境汇总起来。 我们利用跨比例背景传播来使长距离依赖性能够被高分辨率低的特征所覆盖。 为此,我们使用高效的地物金字塔网络来获取多尺度的特征。 我们建议使用一种关系性语义抽取(RSE) 和关系语义调推进器(RSP) 分别用于背景的提取和传播。 然后,我们把几个RSP堆在RSP头上, 以实现环境的逐步自上下向下分布。 在两个具有挑战性的城市景象和COCOCO上实验的结果显示, RSP的头在磁盘结构上进行竞争, 以降为SMAL 75 的高度结构, 。