Fusing satellite imagery acquired with different sensors has been a long-standing challenge of Earth observation, particularly across different modalities such as optical and Synthetic Aperture Radar (SAR) images. Here, we explore the joint analysis of imagery from different sensors in the light of representation learning: we propose to learn a joint embedding of multiple satellite sensors within a deep neural network. Our application problem is the monitoring of lake ice on Alpine lakes. To reach the temporal resolution requirement of the Swiss Global Climate Observing System (GCOS) office, we combine three image sources: Sentinel-1 SAR (S1-SAR), Terra MODIS, and Suomi-NPP VIIRS. The large gaps between the optical and SAR domains and between the sensor resolutions make this a challenging instance of the sensor fusion problem. Our approach can be classified as a late fusion that is learned in a data-driven manner. The proposed network architecture has separate encoding branches for each image sensor, which feed into a single latent embedding. I.e., a common feature representation shared by all inputs, such that subsequent processing steps deliver comparable output irrespective of which sort of input image was used. By fusing satellite data, we map lake ice at a temporal resolution of < 1.5 days. The network produces spatially explicit lake ice maps with pixel-wise accuracies > 91% (respectively, mIoU scores > 60%) and generalises well across different lakes and winters. Moreover, it sets a new state-of-the-art for determining the important ice-on and ice-off dates for the target lakes, in many cases meeting the GCOS requirement.
翻译:利用不同传感器获得的卫星成像引信是地球观测的一个长期挑战,特别是在光学和合成孔径雷达(SAR)图像等不同模式下。在这里,我们探索根据代表性学习对不同传感器的图像进行联合分析:我们建议学习在深神经网络中联合嵌入多个卫星传感器。我们的应用问题是监测阿尔卑斯湖上的湖冰。为了达到瑞士全球气候观测系统(GCOS)办公室的时间分辨率要求,我们综合了三个图像来源:Sentinel-1SAR(S1-SAR)、Terra MODIS和Suomi-NPP VIIRS。光学和合成孔径雷达(SAR)域之间以及传感器分辨率之间的巨大差距使这个传感器融合问题成为具有挑战性的事例。我们的方法可以归类为以数据驱动方式学习的延迟熔化。我们提议的网络结构为每个图像传感器有单独的编码分支,它们组成一个单一的隐性嵌入层。I.e.e.一个共同的特征代表,因此随后的处理步骤可以提供可比的输出,而不论投入的种类是60个输入的图像,而传感器域域域域域域域域域域之间则使用一个清晰的岩石上的数据。 。我们用了一个清晰的轨道上的数据数据,一个清晰的岩石分辨率测量了一个清晰的岩石解解的岩石分辨率解的岩石分辨率, 。