Long-range contextual information is crucial for the semantic segmentation of High-Resolution (HR) Remote Sensing Images (RSIs). However, image cropping operations, commonly used for training neural networks, limit the perception of long-range contexts in large RSIs. To overcome this limitation, we propose a Wide-Context Network (WiCoNet) for the semantic segmentation of HR RSIs. Apart from extracting local features with a conventional CNN, the WiCoNet has an extra context branch to aggregate information from a larger image area. Moreover, we introduce a Context Transformer to embed contextual information from the context branch and selectively project it onto the local features. The Context Transformer extends the Vision Transformer, an emerging kind of neural network, to model the dual-branch semantic correlations. It overcomes the locality limitation of CNNs and enables the WiCoNet to see the bigger picture before segmenting the land-cover/land-use (LCLU) classes. Ablation studies and comparative experiments conducted on several benchmark datasets demonstrate the effectiveness of the proposed method. In addition, we present a new Beijing Land-Use (BLU) dataset. This is a large-scale HR satellite dataset with high-quality and fine-grained reference labels, which can facilitate future studies in this field.
翻译:远程背景信息对于高分辨率遥感图像(RSI)的语义分解至关重要。然而,通常用于培训神经网络的图像裁剪操作,限制了大型RSI对长距离环境的认识。为了克服这一限制,我们提议为HRRSI的语义分解建立一个宽通网络(WiCoNet)。除了与常规CNN提取本地特征外,WiCoNet有一个额外的上下文分支,从更大的图像领域汇总信息。此外,我们引入了一个背景变换器,将上下文分支的背景资料嵌入到上下文分支,并有选择地将其投放到本地特征上。环境变换器扩展了视野变异器,这是一个新兴的神经网络,以模拟双轨语义关系。它克服了CNN的地域局限性,使WiCoNet在将土地覆盖/土地使用(LLLUU)课程分解之前能够看到更大的图画。在几个基准数据集上进行的对比研究和比较实验显示了拟议方法的有效性。此外,我们介绍了一个新的北京地平面图像转换器(BLLU),这个高尺度的域数据库可以提供新的卫星质量的实地数据。