Accurate urban maps provide essential information to support sustainable urban development. Recent urban mapping methods use multi-modal deep neural networks to fuse Synthetic Aperture Radar (SAR) and optical data. However, multi-modal networks may rely on just one modality due to the greedy nature of learning. In turn, the imbalanced utilization of modalities can negatively affect the generalization ability of a network. In this paper, we investigate the utilization of SAR and optical data for urban mapping. To that end, a dual-branch network architecture using intermediate fusion modules to share information between the uni-modal branches is utilized. A cut-off mechanism in the fusion modules enables the stopping of information flow between the branches, which is used to estimate the network's dependence on SAR and optical data. While our experiments on the SEN12 Global Urban Mapping dataset show that good performance can be achieved with conventional SAR-optical data fusion (F1 score = 0.682 $\pm$ 0.014), we also observed a clear under-utilization of optical data. Therefore, future work is required to investigate whether a more balanced utilization of SAR and optical data can lead to performance improvements.
翻译:准确的城市地图为支持可持续城市发展提供了必要的信息。最近的城市制图方法使用多模式深度神经网络将合成孔径雷达(SAR)和光学数据融合。然而,多模式网络可能会由于学习的贪婪性而仅依赖于一种模态。反过来,模态的不平衡利用会对网络的泛化能力产生负面影响。本文研究了SAR和光学数据在城市制图中的利用情况。为此,采用了双分支网络架构,利用中间融合模块在单模分支之间共享信息。融合模块中的截止机制可停止信息在分支之间的流动,这用于估计网络对SAR和光学数据的依赖性。虽然我们在SEN12全球城市制图数据集上的实验表明,传统的SAR-光学数据融合可以实现良好的性能(F1得分= 0.682 ± 0.014),但我们也观察到了明显的光学数据未被充分利用。因此,未来的工作需要调查更平衡的SAR和光学数据利用是否可以导致性能提高。