Accurate and reliable building footprint maps are vital to urban planning and monitoring, and most existing approaches fall back on convolutional neural networks (CNNs) for building footprint generation. However, one limitation of these methods is that they require strong supervisory information from massive annotated samples for network learning. State-of-the-art semi-supervised semantic segmentation networks with consistency training can help to deal with this issue by leveraging a large amount of unlabeled data, which encourages the consistency of model output on data perturbation. Considering that rich information is also encoded in feature maps, we propose to integrate the consistency of both features and outputs in the end-to-end network training of unlabeled samples, enabling to impose additional constraints. Prior semi-supervised semantic segmentation networks have established the cluster assumption, in which the decision boundary should lie in the vicinity of low sample density. In this work, we observe that for building footprint generation, the low-density regions are more apparent at the intermediate feature representations within the encoder than the encoder's input or output. Therefore, we propose an instruction to assign the perturbation to the intermediate feature representations within the encoder, which considers the spatial resolution of input remote sensing imagery and the mean size of individual buildings in the study area. The proposed method is evaluated on three datasets with different resolutions: Planet dataset (3 m/pixel), Massachusetts dataset (1 m/pixel), and Inria dataset (0.3 m/pixel). Experimental results show that the proposed approach can well extract more complete building structures and alleviate omission errors.
翻译:准确和可靠的建筑足迹图对于城市规划和监测至关重要,而且大多数现有方法都回到建设足迹生成的遗传神经网络(CNNs)上,但是,这些方法的一个局限性是,它们需要从大量附加说明的样本中获得强有力的监督信息,以便进行网络学习。最先进的半监督的语义分解网络经过一致性培训,有助于解决这一问题,办法是利用大量未贴标签的数据,鼓励数据扰动的模型产出的一致性。考虑到功能图中也编码了丰富的信息,我们提议在未贴标签样本的端到端网络培训中结合特征和产出的一致性,以便能够施加额外的限制。先是半监督的语义分解网络建立了群假设,其中决定边界应位于低样本密度附近。在这项工作中,我们观察到,为了建立足迹生成,低密度区域在编码内部的中间地貌表示方式上(1个)比精密的输入或输出显示,我们提议在最终的网络培训中将特性和(3个图像区域中)数据显示一个中间的分辨率显示。因此,我们提议在远程图像结构中显示一个分辨率区域的数据显示。