Building extraction from aerial images has several applications in problems such as urban planning, change detection, and disaster management. With the increasing availability of data, Convolutional Neural Networks (CNNs) for semantic segmentation of remote sensing imagery has improved significantly in recent years. However, convolutions operate in local neighborhoods and fail to capture non-local features that are essential in semantic understanding of aerial images. In this work, we propose to improve building segmentation of different sizes by capturing long-range dependencies using contextual pyramid attention (CPA). The pathways process the input at multiple scales efficiently and combine them in a weighted manner, similar to an ensemble model. The proposed method obtains state-of-the-art performance on the Inria Aerial Image Labelling Dataset with minimal computation costs. Our method improves 1.8 points over current state-of-the-art methods and 12.6 points higher than existing baselines on the Intersection over Union (IoU) metric without any post-processing. Code and models will be made publicly available.
翻译:在城市规划、变化探测和灾害管理等问题上,从空中图像中提取的建筑具有若干应用性。随着数据越来越多,近年来,用于遥感图像的语义分割的进化神经网络(CNNs)有了显著改善。然而,在本地社区中,发生演变,未能捕捉对空中图像的语义理解至关重要的非本地特征。在这项工作中,我们提议利用背景金字塔的注意(CPA)来捕捉远距离依赖性,从而改进不同尺寸的建筑分隔。途径是高效率地处理多种规模的投入,并以加权的方式将其组合在一起,类似于组合模型。拟议方法在Inria图像标签数据集中以最低的计算成本获得了最新业绩。我们的方法改进了目前最先进的艺术方法的1.8点,并且比现有的联盟间测量基线高出12.6点。代码和模型将公开提供。