Jointly harnessing complementary features of multi-modal input data in a common latent space has been found to be beneficial long ago. However, the influence of each modality on the models decision remains a puzzle. This study proposes a deep learning framework for the modality-level interpretation of multimodal earth observation data in an end-to-end fashion. While leveraging an explainable machine learning method, namely Occlusion Sensitivity, the proposed framework investigates the influence of modalities under an early-fusion scenario in which the modalities are fused before the learning process. We show that the task of wilderness mapping largely benefits from auxiliary data such as land cover and night time light data.
翻译:本研究提出了一种深度学习框架,用于在一个通用的潜在空间中对不同的多模态地球观测数据进行模态水平的解释。利用可解释的机器学习方法,即遮盖敏感性,我们研究了模态对应的影响,模态在模型决策中的贡献具有不同的重要性。我们展示了辅助数据如土地覆盖和夜间光数据在荒野地图制作这一任务中的巨大潜力。在这种任务下,多模态数据融合的早期融合方案在地图制作中会带来显著的效果。