Semantic segmentation from very fine resolution (VFR) urban scene images plays a significant role in several application scenarios including autonomous driving, land cover classification, and urban planning, etc. However, the tremendous details contained in the VFR image severely limit the potential of the existing deep learning approaches. More seriously, the considerable variations in scale and appearance of objects further deteriorate the representational capacity of those se-mantic segmentation methods, leading to the confusion of adjacent objects. Addressing such is-sues represents a promising research field in the remote sensing community, which paves the way for scene-level landscape pattern analysis and decision making. In this manuscript, we pro-pose a bilateral awareness network (BANet) which contains a dependency path and a texture path to fully capture the long-range relationships and fine-grained details in VFR images. Specif-ically, the dependency path is conducted based on the ResT, a novel Transformer backbone with memory-efficient multi-head self-attention, while the texture path is built on the stacked convo-lution operation. Besides, using the linear attention mechanism, a feature aggregation module (FAM) is designed to effectively fuse the dependency features and texture features. Extensive experiments conducted on the three large-scale urban scene image segmentation datasets, i.e., ISPRS Vaihingen dataset, ISPRS Potsdam dataset, and UAVid dataset, demonstrate the effective-ness of our BANet. Specifically, a 64.6% mIoU is achieved on the UAVid dataset.
翻译:由非常精细的分辨率(VFR)城市景象图像产生的语义分解在包括自主驾驶、土地覆盖分类和城市规划等若干应用情景中起着重要作用。然而,VFR图像中包含的大量细节严重限制了现有深层学习方法的潜力。更为严重的是,物体的规模和外观上的巨大变化使这些语义分解方法的表达能力进一步恶化,导致相邻物体的混乱。解决这种片段代表着遥感界的一个有希望的研究领域,为地貌景观模式分析和决策铺平了道路。在此手稿中,我们推广一个双边意识网络(BANet),其中包含依赖性路径和纹理路径,以充分捕捉到VFRFR图像中的长距离关系和细细细微细的细细细细细的细细细细细的图解分解。从表面上看,依赖性路径以ResT(RT)为基础,一个具有记忆高效的多头自留功能的新型变形骨架骨质骨,而纹路则建在堆叠的和解操作中。此外,我们使用线性关注机制,一个配置模型集集集模型模块(FAMS),一个用于IRSPD大规模数据的I-I-SPD(FSPD) 数据部分。