The fully convolutional network (FCN) with an encoder-decoder architecture has been the standard paradigm for semantic segmentation. The encoder-decoder architecture utilizes an encoder to capture multilevel feature maps, which are incorporated into the final prediction by a decoder. As the context is crucial for precise segmentation, tremendous effort has been made to extract such information in an intelligent fashion, including employing dilated/atrous convolutions or inserting attention modules. However, these endeavors are all based on the FCN architecture with ResNet or other backbones, which cannot fully exploit the context from the theoretical concept. By contrast, we introduce the Swin Transformer as the backbone to extract the context information and design a novel decoder of densely connected feature aggregation module (DCFAM) to restore the resolution and produce the segmentation map. The experimental results on two remotely sensed semantic segmentation datasets demonstrate the effectiveness of the proposed scheme.Code is available at https://github.com/WangLibo1995/GeoSeg
翻译:具有编码器- 解码器结构的完全革命网络(FCN) 是一个具有编码器- 解码器结构的典型语系分解模式。 编码器- 解码器结构利用一个编码器捕捉多层次地貌图,这些图已纳入解码器的最后预测中。 由于背景对精确分解至关重要,因此已作出巨大努力,以智能方式提取这类信息,包括使用放大/突变或插入注意模块。然而,这些工作都以具有ResNet或其他主干线的FCN结构为基础,无法充分利用理论概念的背景。相比之下,我们采用Swin变形器作为主干线,提取背景信息,设计一个具有密集连接特性的地貌集成模块(DCFAM)的新式解码器,以恢复分辨率并制作分解图。两个遥感的语系分解数据集的实验结果显示了拟议办法的有效性。Code可在https://github.com/WangLibo1995/ GeoSeg查阅。