Semantic segmentation for aerial platforms has been one of the fundamental scene understanding task for the earth observation. Most of the semantic segmentation research focused on scenes captured in nadir view, in which objects have relatively smaller scale variation compared with scenes captured in oblique view. The huge scale variation of objects in oblique images limits the performance of deep neural networks (DNN) that process images in a single scale fashion. In order to tackle the scale variation issue, in this paper, we propose the novel bidirectional multi-scale attention networks, which fuse features from multiple scales bidirectionally for more adaptive and effective feature extraction. The experiments are conducted on the UAVid2020 dataset and have shown the effectiveness of our method. Our model achieved the state-of-the-art (SOTA) result with a mean intersection over union (mIoU) score of 70.80%.
翻译:空中平台的语义分解是地球观测的基本现场理解任务之一。 大部分语义分解研究侧重于在纳迪尔视图中捕捉到的场景, 与在斜视中捕捉的场景相比, 物体的大小变化相对较小。 斜视图像中天体的巨大变化限制了以单一规模处理图像的深神经网络( DNN)的性能。 为了解决规模变异问题, 在本文中, 我们提议建立新的双向双向多级关注网络, 将多个尺度的功能双向结合, 以更具适应性和有效的特征提取。 实验是在UAVID2020数据集上进行的, 并展示了我们方法的有效性。 我们的模型实现了最先进的神经网络( SOTA) 的结果, 其平均交错率( mIOU) 达70.80% 。