Semantic segmentation necessitates approaches that learn high-level characteristics while dealing with enormous amounts of data. Convolutional neural networks (CNNs) can learn unique and adaptive features to achieve this aim. However, due to the large size and high spatial resolution of remote sensing images, these networks cannot analyze an entire scene efficiently. Recently, deep transformers have proven their capability to record global interactions between different objects in the image. In this paper, we propose a new segmentation model that combines convolutional neural networks with transformers, and show that this mixture of local and global feature extraction techniques provides significant advantages in remote sensing segmentation. In addition, the proposed model includes two fusion layers that are designed to represent multi-modal inputs and output of the network efficiently. The input fusion layer extracts feature maps summarizing the relationship between image content and elevation maps (DSM). The output fusion layer uses a novel multi-task segmentation strategy where class labels are identified using class-specific feature extraction layers and loss functions. Finally, a fast-marching method is used to convert all unidentified class labels to their closest known neighbors. Our results demonstrate that the proposed methodology improves segmentation accuracy compared to state-of-the-art techniques.
翻译:处理大量数据时需要采用高层次特征的方法。 进化神经网络(CNNs)可以学习独特的适应性特征。 然而,由于遥感图像的大小和高度空间分辨率,这些网络无法有效地分析整个场景。 最近, 深层变压器已证明了它们记录图像中不同对象之间全球相互作用的能力。 在本文件中, 我们提议了一个新的分化模型, 将卷发神经网络与变压器结合起来, 并表明这种本地和全球地物提取技术的混合为遥感分化提供了显著的优势。 此外, 拟议的模型包括两个混合层, 旨在高效地代表网络的多模式投入和输出。 输入层提取了特征图, 概括图像内容和高地图( DSM) 之间的关系。 输出层聚变压层使用了一种新型的多任务分化战略, 其中使用特定地物提取层和损失功能确定阶级标签。 最后, 使用了一种快速海平面方法, 将所有不明的分类标签转换为最接近的邻居。 我们的结果显示, 将改进方法的精确度与最接近的状态。