Feature fusion modules from encoder and self-attention module have been adopted in semantic segmentation. However, the computation of these modules is costly and has operational limitations in real-time environments. In addition, segmentation performance is limited in autonomous driving environments with a lot of contextual information perpendicular to the road surface, such as people, buildings, and general objects. In this paper, we propose an efficient feature fusion method, Feature Fusion with Different Norms (FFDN) that utilizes rich global context of multi-level scale and vertical pooling module before self-attention that preserves most contextual information while reducing the complexity of global context encoding in the vertical direction. By doing this, we could handle the properties of representation in global space and reduce additional computational cost. In addition, we analyze low performance in challenging cases including small and vertically featured objects. We achieve the mean Interaction of-union(mIoU) of 73.1 and the Frame Per Second(FPS) of 191, which are comparable results with state-of-the-arts on Cityscapes test datasets.
翻译:在语义分区中采用了编码器和自我注意模块的特性融合模块,然而,这些模块的计算费用昂贵,在实时环境中具有操作上的局限性;此外,在自主驱动环境中,分解性能有限,其背景信息与道路表面密切相关,例如人、建筑物和一般物体等,因此在自主驱动环境中,分解性能很多。在本文件中,我们提出了一个高效的特性融合方法,即不同规范的特征融合(FFDN),在自我注意之前,利用多层次规模和垂直集合模块的丰富全球背景环境环境,保存大多数背景信息,同时降低纵向方向全球背景编码的复杂性。通过这样做,我们可以处理全球空间代表的特性,并减少额外的计算费用。此外,我们分析包括小型和纵向特征物体在内的具有挑战性的案例的低性能。我们实现了171年的组合(mIoU)和191年的FCC(FS)的平均互动,这与城市景象测试数据集的状态相似。