Aerial image analysis, specifically the semantic segmentation thereof, is the basis for applications such as automatically creating and updating maps, tracking city growth, or tracking deforestation. In true orthophotos, which are often used in these applications, many objects and regions can be approximated well by polygons. However, this fact is rarely exploited by state-of-the-art semantic segmentation models. Instead, most models allow unnecessary degrees of freedom in their predictions by allowing arbitrary region shapes. We therefore present a refinement of our deep learning model which predicts binary space partitioning trees, an efficient polygon representation. The refinements include a new feature decoder architecture and a new differentiable BSP tree renderer which both avoid vanishing gradients. Additionally, we designed a novel loss function specifically designed to improve the spatial partitioning defined by the predicted trees. Furthermore, our expanded model can predict multiple trees at once and thus can predict class-specific segmentations. Taking all modifications together, our model achieves state-of-the-art performance while using up to 60% fewer model parameters when using a small backbone model or up to 20% fewer model parameters when using a large backbone model.
翻译:空中图像分析,特别是其中的语义区块分析,是自动创建和更新地图、跟踪城市增长或跟踪森林砍伐等应用的基础。在这些应用中经常使用的真正的正方形中,许多物体和区域可以被多边形大致接近。然而,这一事实很少被最先进的语义区块分析模型所利用。相反,大多数模型允许任意区域形状,从而允许在预测中出现不必要的自由度。因此,我们展示了我们的深层次学习模型,预测二元空间分隔树,一个有效的多边形表示法。这些改进包括一个新的地貌解码结构以及一个新的不同的BSP树变形器,两者都避免了梯度的消失。此外,我们设计了一个新的损失功能,专门用来改善预测树木界定的空间分隔。此外,我们的扩大模型可以一次预测多棵树,从而可以预测具体的分区。通过所有修改,我们模型在使用小型骨架模型或使用大型时使用最小的20 %的基底座参数,同时使用最多60 %的模型性能。